CN116664265A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116664265A
CN116664265A CN202310664570.XA CN202310664570A CN116664265A CN 116664265 A CN116664265 A CN 116664265A CN 202310664570 A CN202310664570 A CN 202310664570A CN 116664265 A CN116664265 A CN 116664265A
Authority
CN
China
Prior art keywords
data
target
transaction
transaction data
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310664570.XA
Other languages
Chinese (zh)
Inventor
彭莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202310664570.XA priority Critical patent/CN116664265A/en
Publication of CN116664265A publication Critical patent/CN116664265A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Biomedical Technology (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: when the target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction; if the transaction data to be complemented exist under any target data characteristic, carrying out noise adding processing on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation; inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model; and solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of the target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction. The method avoids manual design and improves the accuracy of the data after completion.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.
Background
In a financial transaction system such as a bank, a large amount of transaction data is generated as each transaction progresses. In these transaction data, due to technical reasons or factors considered, data loss problems are often encountered, and if the data is not complemented, inaccurate results are obtained.
In the prior art, data is complemented mainly by adopting a data model, but the existing data model mainly depends on manual experience design. For example, the missing data is complemented by the existing characteristic data by utilizing the low rank property between the characteristics. However, artificial experience often lacks accuracy, adaptivity, and minor differences in different data sets are difficult to accurately characterize.
Disclosure of Invention
Accordingly, an object of the present application is to provide a data processing method, apparatus, electronic device and storage medium, so as to overcome the problems in the prior art.
In a first aspect, an embodiment of the present application provides a method for data processing, where the method includes:
when a target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction;
if the transaction data to be complemented exist under any target data characteristic, carrying out noise adding processing on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
and solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
In some embodiments of the present application, the noise adding processing for the initial transaction data includes:
and gradually adding Gaussian noise to the initial transaction data under each other data characteristic until characteristic signals are submerged.
In some embodiments of the present application, the Gao Sijia noise process is determined by the added noise intensity and the standard normal distribution noise, and the method obtains the diffusion equation by:
and obtaining the diffusion equation according to the noise intensity added by the initial transaction data under the other data characteristics and the standard normal distribution noise.
In some embodiments of the present application, the objective function model is obtained by:
taking transaction data of other transactions in the same field as the target transaction as test data;
inputting the test data into an initial function model to obtain a test result output by the initial function model;
and adjusting the initial function model according to the test result until a cut-off condition is reached, and taking the initial function model at the moment as the target function model.
In some embodiments of the present application, the initial function model is a neural network with a U-Net structure.
In some embodiments of the present application, the obtaining the transaction data to be complemented under the target data feature by solving the diffusion equation and the scoring function includes:
and approximating the scoring function to the diffusion equation until a preset approximation condition is met, and obtaining transaction data to be complemented under the characteristic of the target data.
In some embodiments of the present application, the approximating the scoring function to the diffusion equation until a preset approximation condition is satisfied, to obtain transaction data to be complemented under a target data feature includes:
subtracting the diffusion equation from the score function to obtain a difference function;
and continuously optimizing the difference function to obtain an optimal solution of the difference function, and taking the optimal solution of the difference function as transaction data to be complemented under the characteristic of target data.
In a second aspect, an embodiment of the present application provides an apparatus for data processing, the apparatus including:
the acquisition module is used for acquiring initial transaction data under each data characteristic corresponding to the target transaction when the target transaction is completed;
the noise adding module is used for adding noise to the initial transaction data under other data characteristics if the transaction data to be complemented exist under any target data characteristic; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
the model processing module is used for inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
and the solving module is used for solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the steps of the method for implementing data processing described above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of data processing described above.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
when a target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction; if the transaction data to be complemented exist under any target data characteristic, carrying out noise adding processing on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation; inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model; and solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction. The method avoids manual design and improves the accuracy of the data after completion.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for data processing according to an embodiment of the present application;
FIG. 2 shows a schematic diagram of one embodiment of the present application;
FIG. 3 is a schematic diagram of a noise adding process according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a model structure according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an apparatus for data processing according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.
In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.
In a financial transaction system such as a bank, a large amount of transaction data is generated as each transaction progresses. In these transaction data, due to technical reasons or factors considered, data loss problems are often encountered, and if the data is not complemented, inaccurate results are obtained.
In the prior art, data is complemented mainly by adopting a data model, but the existing data model mainly depends on manual experience design. For example, the missing data is complemented by the existing characteristic data by utilizing the low rank property between the characteristics. However, artificial experience often lacks accuracy, adaptivity, and minor differences in different data sets are difficult to accurately characterize.
Based on this, the embodiment of the application provides a data processing method, a data processing device, an electronic device and a storage medium, and the description is given below by way of embodiments.
Fig. 1 shows a flow chart of a method for data processing according to an embodiment of the present application, where the method includes steps S101 to S104; specific:
s101, when a target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction;
s102, if transaction data to be complemented exist under any target data characteristic, noise adding processing is carried out on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
s103, inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
s104, solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
The method avoids manual design and improves the accuracy of the data after completion.
Some embodiments of the application are described in detail below. The following embodiments and features of the embodiments may be combined with each other without conflict.
In banking related businesses, data loss is a common situation. Missing values can cause the system to lose a large amount of useful information, the system exhibits uncertainty, and mining is confusing. In the later data processing analysis task, if the critical data is missing, only the set of data can be discarded. Thus, the data utilization is extremely insufficient, thereby affecting the accuracy of the data processing analysis. The embodiment of the application provides the data complement method based on joint distribution modeling, which can accurately complement missing data, provides more and richer complement data for later data analysis tasks, improves analysis accuracy and has important practical significance.
In practice, when various tasks are completed by using machine learning and other methods, the problem of data loss is often encountered, and if the total number of samples with a certain characteristic loss is extremely large, noise may be brought when the samples are added as the characteristic, so that the final result is affected. Therefore, it is necessary to study a precise data complement technique.
The existing data model mainly depends on manual experience design. For example, the missing data is complemented by the existing characteristic data by utilizing the low rank property between the characteristics. However, artificial experience often lacks accuracy, adaptivity, and minor differences in different data sets are difficult to accurately characterize. Therefore, the research on data-driven data complement technology becomes a hotspot problem by adaptively mining the relationships among features from big data.
In recent years, diffusion models have been developed significantly in the field of computer vision due to their precise data distribution characterization capabilities. The application focuses on missing data completion, whereas missing feature data can be completed according to existing features, the logic behind which is that there is a correlation between features. The accurate characterization of feature correlation becomes the key to missing feature complementation. If the feature data is considered as a sample of some random variable, its joint distribution is the most accurate representation of the intrinsic correlation. Therefore, the application aims to develop a precise data complement technology based on the precise data distribution characterization capability of the diffusion model.
S101, when a target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction.
The data complement method in the embodiment of the application is mainly aimed at the financial fields such as banks, etc., for example, the data in the transaction process. The transaction data is incomplete due to technical problems or human factors in the transaction process. For example, the user does not fill in the age of the transfer person, etc. at the time of the transfer transaction.
At the completion of the target transaction, the present application obtains all data generated during the transaction. When acquiring data, the data needs to be acquired according to each data characteristic corresponding to the transaction, wherein the data characteristic represents the data dimension, for example, the name, the gender, the age and the like of the applicant are contained in a transaction application form. The name, sex and age are all data features in the embodiment of the application. That is, the embodiment of the application determines each data characteristic corresponding to the target transaction before acquiring the data. In particular, the data may be stored in advance in the form of a database. The database may store corresponding data characteristics according to specific transactions, or may correspond to a set of data characteristics for each type of transaction according to the type of transaction, etc. After the data characteristics of the target transaction are determined, respectively acquiring the data under the data characteristics according to the data characteristics. In order to distinguish the embodiment of the application, the acquired data is called initial transaction data, and in the case that the initial transaction data can be acquired, the initial transaction data is indicated to have corresponding data characteristics. Because the embodiment of the application obtains the data according to the data characteristics after the data characteristics are determined. In the embodiment of the application, the data which is not obtained under the data characteristics is called as transaction data to be complemented, the data characteristics are called as target data characteristics, and the data characteristics which can obtain initial transaction data are called as other data characteristics. All data generated during the target transaction is only generated when the target data to be completed transaction data and the initial transaction data under other data characteristics are combined together.
S102, if transaction data to be complemented exist under any target data characteristic, noise adding processing is carried out on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation.
After the initial transaction data under each data feature is acquired, it is detected whether the initial transaction data is contained under each data feature. If the target data characteristics of the transaction data to be complemented exist, the embodiment of the application needs to complement the transaction data to be complemented under the target data characteristics. When the transaction data to be complemented is complemented, the embodiment of the application carries out noise adding processing on the initial transaction data under other data characteristics.
And (3) denoising the initial transaction data to gradually add Gaussian noise to the initial transaction data under the other data characteristics until characteristic signals are submerged. In the Gaussian noise adding process, the Gao Sijia noise process is determined by the added noise intensity and the standard normal distribution noise, so that the diffusion equation corresponding to the Gaussian noise adding process can be obtained by analyzing the Gaussian noise adding process. The application establishes a diffusion equation in the process of adding noise to the initial transaction data.
S103, inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model.
After obtaining the diffusion equation of the initial transaction data, the embodiment of the application also needs to obtain the score function of each data feature corresponding to the target transaction data. And predicting the transaction data to be complemented under the target data characteristics by obtaining a diffusion equation of the score function and the initial transaction data of other data characteristic data.
In order to obtain the score function of each data feature corresponding to the target transaction, the embodiment of the application trains and obtains an objective function model, and the score function output by the objective function model is obtained by inputting each data feature corresponding to the target transaction into a preset objective function model.
Training of the objective function model is as follows: and selecting other transactions in the same field as the target transaction processing, and training the initial function model by using transaction data of the other transactions. The same field here characterizes transactions that have the same data characteristics as the target transaction data. The initial functional model here is a neural network of U-Net structure. Training the initial function model through transaction data of other transactions, and obtaining the objective function model when the cutoff condition is reached.
S104, solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
After the diffusion equation and the score function are obtained, the embodiment of the application can obtain the transaction data to be complemented under the characteristic of the target data by solving the diffusion equation and the score function. When the diffusion equation and the scoring function are used for solving, the embodiment of the application adopts a mode of approaching the scoring function to the diffusion equation. The embodiment of the application considers that when the score function approaches to the diffusion equation and reaches the preset approximation condition, the score function is in the same or approximate state with the diffusion equation, namely the score function can represent the diffusion equation.
Further, the embodiment of the application converts the process of approaching the scoring function to the diffusion equation into the solving process of the optimal solution: and subtracting the diffusion equation from the score function to obtain a difference function, continuously optimizing the difference function to obtain an optimal solution of the difference function, and taking the optimal solution of the difference function as transaction data to be complemented under the characteristic of target data.
After the data of the transaction to be completed is obtained, the data of the transaction to be completed and the initial transaction data are integrated together, so that the complete transaction data of the target transaction can be obtained.
In an alternative implementation, the embodiment of the present application may be implemented in the manner shown in fig. 2: establishing a diffusion equation, learning joint distribution and generating missing data under conditions. The specific operation flow is as follows:
establishing a diffusion equation: in the statistical learning task, the joint distribution p (x 1 ,...,x M ) It is very difficult to benefit from the development of a diffusion model, it is possible to transfer the learning of the joint distribution to its gaussian noise disturbance data and approximate the joint distribution by a scoring function. Thus, a noisy diffusion process of the data needs to be established first. Let the clean feature data be initial X (0) = [ X ] 1 (0),...,x M (0)]=X=[x 1 ,...,x M ]To which gaussian noise is gradually added until the characteristic signal is submerged. As shown in fig. 3, assuming a "dog" as the first feature and a human face as the mth feature, by gradually adding noise until the signal is submerged.
Mathematically, the gaussian noise-plus-spread process can be implemented by the following simultaneous equations:
wherein sigma i (t) represents the intensity of the added noise ζ i Representing a standard normal distribution noise of independent co-distribution. Its compact form can be expressed as:
X(t+1)=X(t)+∑(t)Ξ;
wherein, the liquid crystal display device comprises a liquid crystal display device,
feature joint distribution approximation: approximating the joint distribution according to the Gao Sijia noise diffusion process
I.e.Can be obtained by training the scoring function s (X (t), i.e. optimizing the following objective function:
wherein the scoring function s (X (t), t) may be represented by a network of U-Net structures, as shown in fig. 4. The optimization problem can be solved by Adam optimization.
Conditional sampling generates missing data: finally, a score function (joint distribution) is obtained according to the learning.
And sampling according to the existing characteristic data to generate missing data.
Let data X contain M features, i.e., x= [ X 1 ,...,x M ]. In reality, these features x 1 ,...,x M Partial data loss, if the first s features x 1 ,...,x s To miss feature data, the last M-s features x s+1 ,...,x M For the existing characteristic data, the purpose of the application is to realize the following sampling:
x 1 ,...x s ~p(x 1 ,...,x s |x s+1 ,...,x M )。
for simplicity, x is abbreviated i≤s :=x 1 ,...,x s And x i>s :=x s+1 ,...,x M . According to the bayesian formula we have:
wherein the method comprises the steps ofHas been approximated by a scoring function S (X (t), t) in the previous step. Thus, according to the scoring function S (X (t), t), the following monte carlo sampling iterations are performed:
when the iteration is performed sufficiently many (e.g., T steps), x i≤s (T) can be regarded as being based on p (x) i≤s |x i>s (t)) conditional distribution, i.e. the complement missing feature.
Traditional data complement methods rely on a priori by hand design, such as low rank priors, etc. However, the manner of manual design makes it difficult to accurately characterize the exact relationship between existing data and missing data, which typically results in complementation inaccuracy. If the data is considered as a sample of random variables, then joint distribution is one of the most accurate and straightforward methods of characterizing the correlation between different random variables. To our knowledge, the present application builds a joint diffusion model for the first time and learns the joint distribution among the data accordingly. And finally, relying on joint distributed sampling to realize the real data completion. The method is an innovative brand new method for constructing data based on joint distribution modeling, and has original contribution on methodology;
for data distribution estimation, conventional methods typically rely on the principle of variance inference, i.e., approximating complex distributions by simple distributions (gaussian distributions). However, in reality, the distribution to which real data belongs is high-dimensional and complex, and it is often difficult to approximate by simple distribution. Thus, conventional methods often have a risk of failure in reality. On the other hand, the general deep learning method has strong learning representation capability, and the approximation capability often exceeds the traditional model, however, the general working principle is a black box, the necessary interpretability is lacked, and the unreliable risk exists, so that the method is difficult to apply to sensitive data tasks. The joint distribution approximation method based on the diffusion model is innovatively provided by the application, and is ensured by a random partial differential equation theory, and in the method, the deep neural network approximates the gradient of the probability density function through a score-matching method, so that the method has a certain interpretation. Meanwhile, the method also utilizes the strong learning characterization capability of the deep neural network. In other words, the method has the advantages of strong interpretability and strong learning and representing capability of the deep learning method;
fig. 5 shows a schematic structural diagram of an apparatus for data processing according to an embodiment of the present application, where the apparatus includes:
the acquisition module is used for acquiring initial transaction data under each data characteristic corresponding to the target transaction when the target transaction is completed;
the noise adding module is used for adding noise to the initial transaction data under other data characteristics if the transaction data to be complemented exist under any target data characteristic; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
the model processing module is used for inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
and the solving module is used for solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
The noise adding processing for the initial transaction data comprises the following steps:
and gradually adding Gaussian noise to the initial transaction data under each other data characteristic until characteristic signals are submerged.
The Gao Sijia noise process is determined by the added noise strength and standard normal distributed noise, and the device derives the diffusion equation by:
and obtaining the diffusion equation according to the noise intensity added by the initial transaction data under the other data characteristics and the standard normal distribution noise.
The objective function model is obtained by the following steps:
taking transaction data of other transactions in the same field as the target transaction as test data;
inputting the test data into an initial function model to obtain a test result output by the initial function model;
and adjusting the initial function model according to the test result until a cut-off condition is reached, and taking the initial function model at the moment as the target function model.
The initial function model is a neural network with a U-Net structure.
The obtaining the transaction data to be complemented under the target data characteristic by solving the diffusion equation and the scoring function comprises the following steps:
and approximating the scoring function to the diffusion equation until a preset approximation condition is met, and obtaining transaction data to be complemented under the characteristic of the target data.
The approximation of the scoring function to the diffusion equation is carried out until a preset approximation condition is met, and transaction data to be complemented under the characteristic of target data is obtained, which comprises the following steps:
subtracting the diffusion equation from the score function to obtain a difference function;
and continuously optimizing the difference function to obtain an optimal solution of the difference function, and taking the optimal solution of the difference function as transaction data to be complemented under the characteristic of target data.
As shown in fig. 6, an embodiment of the present application provides an electronic device for performing a method for data processing in the present application, where the device includes a memory, a processor, a bus, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the method for data processing when executing the computer program.
In particular, the above-mentioned memory and processor may be general-purpose memory and processor, and are not particularly limited herein, and the above-mentioned data processing method can be executed when the processor runs a computer program stored in the memory.
Corresponding to the method of data processing in the present application, the embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of data processing described above.
In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, on which a computer program is executed that is capable of performing the above-described method of data processing.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other manners. The system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions in actual implementation, and e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data processing, the method comprising:
when a target transaction is completed, acquiring initial transaction data under each data characteristic corresponding to the target transaction;
if the transaction data to be complemented exist under any target data characteristic, carrying out noise adding processing on the initial transaction data under other data characteristics; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
and solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
2. The method of claim 1, wherein the noise-adding the initial transaction data comprises:
and gradually adding Gaussian noise to the initial transaction data under each other data characteristic until characteristic signals are submerged.
3. The method of claim 2, wherein the Gao Sijia noise process is determined from the added noise strength and standard normal distribution noise, the method deriving the diffusion equation by:
and obtaining the diffusion equation according to the noise intensity added by the initial transaction data under the other data characteristics and the standard normal distribution noise.
4. The method according to claim 1, characterized in that the objective function model is obtained by:
taking transaction data of other transactions in the same field as the target transaction as test data;
inputting the test data into an initial function model to obtain a test result output by the initial function model;
and adjusting the initial function model according to the test result until a cut-off condition is reached, and taking the initial function model at the moment as the target function model.
5. The method of claim 4, wherein the initial functional model is a neural network of U-Net structure.
6. The method according to claim 1, wherein the obtaining the transaction data to be complemented under the target data feature by solving the diffusion equation and the scoring function includes:
and approximating the scoring function to the diffusion equation until a preset approximation condition is met, and obtaining transaction data to be complemented under the characteristic of the target data.
7. The method of claim 6, wherein approximating the scoring function to the diffusion equation until a preset approximation condition is satisfied, obtaining transaction data to be completed under a target data feature, comprises:
subtracting the diffusion equation from the score function to obtain a difference function;
and continuously optimizing the difference function to obtain an optimal solution of the difference function, and taking the optimal solution of the difference function as transaction data to be complemented under the characteristic of target data.
8. An apparatus for data processing, the apparatus comprising:
the acquisition module is used for acquiring initial transaction data under each data characteristic corresponding to the target transaction when the target transaction is completed;
the noise adding module is used for adding noise to the initial transaction data under other data characteristics if the transaction data to be complemented exist under any target data characteristic; the other data features are data features capable of acquiring initial transaction data, and the noise adding process of the initial transaction data corresponds to a diffusion equation;
the model processing module is used for inputting each data characteristic corresponding to the target transaction into a preset target function model to obtain a score function output by the target function model;
and the solving module is used for solving the diffusion equation and the scoring function to obtain to-be-completed transaction data under the characteristic of target data, and integrating the initial transaction data and the to-be-completed transaction data into complete transaction data of the target transaction.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of data processing according to any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when run by a processor, performs the steps of the method of data processing according to any of claims 1 to 7.
CN202310664570.XA 2023-06-06 2023-06-06 Data processing method and device, electronic equipment and storage medium Pending CN116664265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310664570.XA CN116664265A (en) 2023-06-06 2023-06-06 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310664570.XA CN116664265A (en) 2023-06-06 2023-06-06 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116664265A true CN116664265A (en) 2023-08-29

Family

ID=87727555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310664570.XA Pending CN116664265A (en) 2023-06-06 2023-06-06 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116664265A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932245A (en) * 2024-03-21 2024-04-26 华南理工大学 Financial data missing value completion method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932245A (en) * 2024-03-21 2024-04-26 华南理工大学 Financial data missing value completion method, device and storage medium
CN117932245B (en) * 2024-03-21 2024-06-11 华南理工大学 Financial data missing value completion method, device and storage medium

Similar Documents

Publication Publication Date Title
Dassios et al. Exact simulation of Hawkes process with exponentially decaying intensity
Cont et al. Recovering volatility from option prices by evolutionary optimization
CN112699998B (en) Time series prediction method and device, electronic equipment and readable storage medium
CN113326852A (en) Model training method, device, equipment, storage medium and program product
Liu et al. Co-correcting: noise-tolerant medical image classification via mutual label correction
CN116664265A (en) Data processing method and device, electronic equipment and storage medium
CN116309571B (en) Three-dimensional cerebrovascular segmentation method and device based on semi-supervised learning
CN113221104A (en) User abnormal behavior detection method and user behavior reconstruction model training method
CN111523593B (en) Method and device for analyzing medical images
Mai Ten strategies towards successful calibration of environmental models
CN110399279B (en) Intelligent measurement method for non-human intelligent agent
CN109994207B (en) Mental health early warning method, server and system
CN109977400B (en) Verification processing method and device, computer storage medium and terminal
CN113673609B (en) Questionnaire data analysis method based on linear hidden variables
CN115167965A (en) Transaction progress bar processing method and device
CN110348577B (en) Knowledge tracking method based on fusion cognitive computation
Wang et al. A bent line Tobit regression model with application to household financial assets
CN113240513A (en) Method for determining user credit line and related device
CN112346995A (en) Construction method and device of test risk estimation model based on banking industry
Cai et al. Online risk measure estimation via natural gradient boosting
Wang et al. Uncertainty-guided domain alignment for layer segmentation in oct images
CN115966314B (en) Data processing method and device, electronic equipment and storage medium
Espinosa et al. Leverage Effect on Financial Series using a Bayesian TAR Model
US20230066478A1 (en) Method and system for learning behavior of highly complex and non-linear systems
US20230368920A1 (en) Learning apparatus, mental state sequence prediction apparatus, learning method, mental state sequence prediction method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination