CN115688934A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN115688934A
CN115688934A CN202210988043.XA CN202210988043A CN115688934A CN 115688934 A CN115688934 A CN 115688934A CN 202210988043 A CN202210988043 A CN 202210988043A CN 115688934 A CN115688934 A CN 115688934A
Authority
CN
China
Prior art keywords
data
feature
characteristic
knowledge base
engineering knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210988043.XA
Other languages
Chinese (zh)
Inventor
李策
邬子庄
杨晓然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210988043.XA priority Critical patent/CN115688934A/en
Publication of CN115688934A publication Critical patent/CN115688934A/en
Pending legal-status Critical Current

Links

Images

Abstract

The embodiment of the application provides a data processing method and a data processing device, which are used for a financial business machine learning model, and the method comprises the following steps: acquiring characteristic data of a client and a data name representing the meaning of the characteristic data; updating a feature engineering knowledge base according to the data names of the feature data, wherein the feature engineering knowledge base stores operation relations among different data names; according to the updated feature engineering knowledge base, performing feature data derivation on the client to obtain derived feature data; and screening the derived feature data, and taking the screened data as the input of the financial business machine learning model. Through the embodiment of the application, on the premise of not depending on expert experience, the data potential can be fully mined in a short time based on the implementation characteristic engineering of business experience automation through an automatic and intelligent mode, and the model effect is improved.

Description

Data processing method and device
Technical Field
The present application relates to the field of data processing, and in particular, to a data processing method and apparatus.
Background
Financial services include marketing, risk control, and the like, and in the financial industry, it is common to represent customers and predict certain behaviors of customers based on customer information, including but not limited to financial products that the customers may favor, credit or default risks of the customers, and the like, to perform the financial services of marketing, risk control, and the like.
In the traditional mode, various financial businesses such as marketing, risk control and the like are performed depending on expert rules, business experiences of workers, statistical models and the like. With the development of information technology, financial services are gradually transformed to rely on digital technology, and especially the application of artificial intelligence technology represented by machine learning greatly improves the accuracy and efficiency of customer portrayal and promotes the rapid development of various financial services.
In the case of machine learning techniques, it is necessary to build a machine learning model that receives various items of feature data representing customer information and outputs a prediction of one or more behaviors of the customer, including but not limited to the probability of purchase of a financial product by the customer, the credit or default risk of the customer, and the like.
However, similar to the traditional method, the construction process of the machine learning model still does not involve the intervention of domain experts, and the characteristics with high value and business significance are constructed from the original data by combining the characteristic engineering of expert knowledge, so that the data content is fully expressed, the model can dig more rules, and the effect of the model is further improved. However, domain experts are relatively scarce, so that the implementation difficulty of feature engineering construction is high, and the implementation engineering is long.
It should be noted that the above background description is only for the convenience of clear and complete description of the technical solutions of the present application and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the present application.
Disclosure of Invention
In order to solve at least one of the above problems, embodiments of the present application provide a data processing method and apparatus for a financial business machine learning model, which implement feature engineering based on business experience automation in a short time in an automated and intelligent manner without depending on expert experience, fully mine data potential, and improve model effect.
According to an embodiment of the first aspect of the present application, there is provided a data processing method for a financial transaction machine learning model, the method including:
a data acquisition step of acquiring characteristic data of a client and a data name representing the meaning of the characteristic data;
updating a characteristic engineering knowledge base, namely updating the characteristic engineering knowledge base according to the data names of the characteristic data, wherein the characteristic engineering knowledge base stores the operation relation among different data names;
a characteristic derivation step, namely performing characteristic data derivation on the client according to the updated characteristic engineering knowledge base to obtain derived characteristic data; and
and a screening step, namely screening the derived characteristic data, and taking the screened data as the input of the financial business machine learning model.
In one or more embodiments, in the data acquiring step, the data name is determined according to the acquired description information of the feature data, or the feature data is determined as one of a plurality of preset data names according to the meaning of the feature data.
In one or more embodiments, the feature engineering knowledge base stores operational relationships between different data names in the form of a graph database, the graph database including a plurality of entities, each entity representing a data name, and the relationships between the different entities representing operational operators between the corresponding data names.
In one or more embodiments, the graph database stores a plurality of triples, each of the triples including two different entities and a relationship between the two different entities,
in the step of updating the feature engineering knowledge base, generating a triple including the feature name of the obtained feature data of the client, judging whether the generated triple exists in the database, and updating the feature engineering knowledge base according to the judgment result.
In one or more embodiments, in the feature derivation step, the derived feature data is generated by performing an operation on the obtained feature data of the customer according to the triples in the feature engineering knowledge base.
In one or more embodiments, the derived feature data includes one-dimensional derived feature data generated from one triplet in the feature engineering knowledge database and multi-dimensional derived feature data generated from at least two triplets in the feature engineering knowledge database.
In one or more embodiments, the filtering step filters according to a statistical indicator representing an all-zero feature, an all-one feature, a negative feature, or a low variance feature and/or a business significance.
According to an embodiment of a second aspect of the present application, there is provided a data processing apparatus for a financial transaction machine learning model, the apparatus comprising:
the data acquisition module is used for acquiring the characteristic data of the client and the data name representing the meaning of the characteristic data;
the characteristic engineering knowledge base updating module is used for updating a characteristic engineering knowledge base according to the data names of the characteristic data, and the characteristic engineering knowledge base stores the operation relation among different data names;
the characteristic derivation module is used for deriving the characteristic data of the client according to the updated characteristic engineering knowledge base to obtain derived characteristic data; and
and the screening module is used for screening the derived characteristic data and taking the screened data as the input of the financial business machine learning model.
According to an embodiment of other aspects of the present application, there is provided a computer device, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the data processing method described in the embodiment of the first aspect of the present application when executing the computer program.
According to an embodiment of other aspects of the present application, there is provided a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the data processing method described in the embodiment of the first aspect of the present application.
One of the beneficial effects of the embodiment of the application lies in:
and updating the characteristic engineering knowledge base according to the data name of the characteristic data of the client, deriving the characteristic data according to the updated characteristic engineering knowledge base to obtain derived characteristic data, and screening the derived characteristic data. Therefore, on the premise of not depending on expert experience, the data potential can be fully mined and the model effect is improved based on the automatic implementation characteristic engineering of business experience in a short time through an automatic and intelligent mode.
Specific embodiments of the present application are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the application may be employed. It should be understood that the embodiments of the present application are not so limited in scope. The embodiments of the application include many variations, modifications and equivalents within the spirit and scope of the appended claims.
The feature information described and illustrated with respect to one embodiment may be used in the same or similar manner in one or more other embodiments, in combination with or instead of the feature information in the other embodiments.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the data acquisition step according to an embodiment of the present application;
FIG. 3 is a schematic diagram of feature engineering knowledge base updating according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a feature derivation step of an embodiment of the present application;
fig. 5 is a schematic diagram of a configuration of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example 1
The embodiment 1 of the application provides a data processing method which is used for a financial business machine learning model. Fig. 1 is a schematic diagram of a data processing method according to an embodiment of the present application, and as shown in fig. 1, the method includes:
step 100, acquiring characteristic data of a client and a data name representing the meaning of the characteristic data;
step 200, updating a feature engineering knowledge base according to the data names of the feature data, wherein the feature engineering knowledge base stores operation relations among different data names;
300, performing characteristic data derivation on the client according to the updated characteristic engineering knowledge base to obtain derived characteristic data; and
and 400, screening the derived characteristic data, and taking the screened data as the input of the financial business machine learning model.
In this way, the feature engineering knowledge base is updated according to the data name of the feature data of the client, the feature data is derived according to the updated feature engineering knowledge base to obtain derived feature data, and the derived feature data is screened. Therefore, by maintaining and using the feature engineering knowledge base, the relationship between the features displayed or implicitly expressed in the feature engineering knowledge base is fully utilized, the feature engineering can be automatically implemented in an automatic and intelligent mode in a short time based on business experience on the premise of not depending on expert experience, the data potential is fully mined, and the model effect is improved.
In the embodiment of the present application, the financial business may include marketing, risk control, and the like, that is, the data processing method in the embodiment of the present application may be used for marketing of financial products, risk control of banks, for example, control of default risk after customer credit, which is not limited in the present application.
The steps of the above method are described in detail below.
Step 100, acquiring characteristic data of a client and a data name representing the meaning of the characteristic data.
Fig. 2 is a schematic diagram of a data acquisition step according to an embodiment of the present application.
As shown in fig. 2, step 100 may include:
step 101, collecting customer information.
In the embodiment of the application, for the financial business involved in modeling, relevant customer information needs to be collected, and the customer information comprises raw data which is not subjected to feature processing and is a data base for subsequent feature engineering development. In some embodiments, the customer information includes the full amount of information involved in the modeling task, i.e., all of the customer information related to the modeling task, i.e., the particular financial transaction involved in the modeling.
In the embodiment of the present application, the customer information includes basic information of the customer, such as name, gender, nationality, contact address, identification number, occupation, academic calendar, marital status, etc., and the customer information may also include financial information of the customer, including but not limited to deposit information, income information, financial product purchase information, etc.
In the embodiment of the present application, there is no limitation on how to obtain the raw data of the client, for example, the raw data may be input by the client when the client handles related business, or extracted from a database of a bank, and the present application is not limited to this.
It should be noted that the user (client) information in the embodiment of the present application is obtained through legal compliance means, and the user (client) information is obtained, stored, used, processed, and the like through authorization approval of the user (client).
As shown in fig. 2, in one or more embodiments, step 100 further comprises:
and step 102, defining characteristic data.
In the embodiment of the present application, for various information of the client collected in step 101, processing is required to facilitate subsequent processing.
In one or some embodiments, step 102 may include determining a data name according to the obtained description information of the feature data, where the data name may also be referred to as a feature name, or referred to as an attribute, and/or determining the feature data as one of a plurality of preset data names according to the meaning of the feature data.
For example, when the feature data of the client is obtained, corresponding description information may be obtained, where the description information may include a data name, a data type, and a data value range of the feature data, so that the data name of the feature data of the client can be determined based on the description information in the obtained client information.
For another example, the feature data of the client collected in step 101 may be classified and analyzed one by one, so as to implement feature data definition. That is, according to the financial business related to the model, the corresponding multiple data names are preset, and then according to the meaning of the collected feature data, the data is classified into one of the preset multiple data names, so as to realize the definition of the feature data. Taking a common intelligent marketing scene of the financial industry as an example, the method can be classified according to the business field, and can be generally divided into dimensions such as basic attribute information of customers, asset information of customers, consumption information of customers, income information of customers, credit information of customers and the like, and can synchronously acquire data types, value ranges and other information of each type of data, so that the data can be classified into corresponding types and defined as corresponding data names under the condition of acquiring characteristic data of the customers.
Further, step 102 may include determining a data name according to the description information of the acquired feature data, and determining the feature data as one of a plurality of data names set in advance according to the meaning of the feature data. For example, this processing method can be adopted for a case where the data name of the acquired feature data is not included in the plurality of data names set in advance.
In this embodiment of the application, the obtained feature data of the client is multiple, the feature data may also be referred to as a feature value, and represents a specific value of a feature name (or a data name, or an attribute), for example, the feature name is gender or deposit, the corresponding feature data may be female or 5000, and the processing of step 102 is performed on each feature data, so as to obtain a data name of the feature data, and further, a corresponding data type and a corresponding data value range may be obtained.
For example, in an intelligent marketing scenario, the predefined data name may include both an annual income amount and an annual expense amount, and the corresponding data type may be defined as continuous type or continuous type.
For the client a, when the obtained client information of the client a includes the characteristic data of the annual high-end consumption amount as the data name in addition to the two characteristic data, the data name of the client a may include the annual income amount, the annual expense amount, and the annual high-end consumption amount, and the corresponding data types may be defined as continuous type, and continuous type. Thereby, the definition of the acquired characteristic data of the client is achieved.
It should be noted that the above data names are merely exemplary, and the embodiments of the present application may further include other data names, for example, data names not included in the above examples, such as a customer scholarly, a customer age, a customer occupation, etc., are added, or one or some data names in the above examples are deleted, which is not limited in the present application and may be determined according to actual needs.
In addition, in addition to the continuous feature data, discrete feature data may be included, for example, the customer scholarly, the customer age, and the customer occupation may be discrete.
Next, in step 200, the feature engineering knowledge base is updated according to the data name of the feature data.
In the embodiment of the application, the feature engineering knowledge base stores the operation relation between different data names.
In one or more embodiment modes, the feature engineering knowledge base stores operational relations between different data names in a graph database mode, the graph database comprises a plurality of entities, each entity represents one data name, and the relations between the different entities represent operational operators between the corresponding data names. However, the present application is not limited to this, and the operational relationship between different data names may be stored in other ways, and the following description will be given by taking a graph database as an example.
In one or more embodiments, a graph database stores a plurality of triples, each triplet including two different entities and relationships between the two different entities, including but not limited to, a ratio, a sum, etc., where a ratio may indicate whether a relationship between two entities is part of one of the other, and the result is yes or no, and if payroll revenue is part of the total revenue, the ratio relationship is yes, although the application is not so limited, e.g., a ratio may also indicate a ratio or a scale, i.e., indicating a ratio of one of two entities to the other.
That is, the feature engineering knowledge base is constructed and maintained in the form of a knowledge graph and is stored through a graph database. Each data name is an entity in the knowledge graph, a feature derivation method between two entities is defined as a relationship, the relationship includes but is not limited to proportion, summation and the like, a specific storage form is an entity-relationship-entity triple form, and a feature engineering knowledge base can also be called as a feature engineering priori knowledge base.
Taking the aforementioned intelligent marketing scenario as an example, the feature engineering repository may include a relationship between two of three data names of annual income amount, annual expenditure amount, and customer marital status, for example, "annual expenditure amount-proportion-annual income amount", "annual expenditure amount-plus-annual income amount", such as "feature engineering repository before update" shown in fig. 3.
That is, the feature engineering knowledge base may include only the relationship between two of some names among the predefined data names, but the present application is not limited thereto, and the feature engineering knowledge base may also include the relationship between two of all names among the predefined data names.
In one or more embodiments, in step 200, a triplet including the feature name of the obtained client's feature data is generated, whether the generated triplet exists in the graph database is determined, and the feature engineering knowledge base is updated according to the determination result.
For example, taking customer a as an example, the feature name of the feature data of customer a includes an annual income amount, an annual expenditure amount, and an annual high-end expenditure amount. The method comprises 3 combinations of pairwise feature names, compares the generated processing relation of pairwise feature names with triples in a feature engineering knowledge base, and adds the generated triples to the feature engineering knowledge base if repetition, conflict and the like do not exist, thereby completing the updating of the feature engineering knowledge base.
For example, two triplets of "annual expense amount-duty-annual income amount", "annual high-end expense amount-duty-annual expense amount" are generated according to the annual income amount, the annual expense amount and the annual high-end expense amount, and compared with the triplets in the feature engineering knowledge base, since the feature engineering knowledge base comprises the "annual expense amount-duty-annual income amount" but does not comprise the "annual high-end expense amount-duty-annual expense amount", the "annual high-end expense amount-duty-annual expense amount" is newly added to the feature engineering knowledge base to complete the updating of the feature engineering knowledge base. Such as the "updated feature engineering knowledge base" shown in fig. 3.
It should be noted that the above is only an exemplary illustration, and for example, a triple of the relationship between the data names of the annual income amount and the annual high-end consumption amount may be generated according to the characteristic data of the customer and added to the characteristic engineering knowledge base.
In the embodiment of the application, the updating of the feature engineering knowledge base can be performed on a specific service to be modeled, that is, for a specific service, such as marketing or risk control, the feature engineering knowledge base can be updated according to a feature name related to the service, so that derived features highly related to the service can be derived by using the feature engineering knowledge base, and the accuracy of a machine learning model of the corresponding service is improved.
Next, in step 300, the customer is derived from the updated feature engineering knowledge base to obtain derived feature data.
In the case that the feature engineering knowledge base stores the triples, in step 300, the obtained feature data of the customer is operated according to the triples in the feature engineering knowledge base to generate derived feature data.
That is to say, corresponding feature data is extracted according to entity-relationship-entity triples stored in a feature engineering prior knowledge base, and an operator for performing relationship operation is performed to generate new features, wherein the new features can be named as: feature 1 name-operator name-feature 2 name-derived information. Conductivity between relationships can also be considered when performing operations, thereby enabling the computation of non-artificially defined features that are contained in a feature engineering a priori knowledge base.
The following is an exemplary description.
FIG. 4 is a schematic diagram of a feature derivation step of an embodiment of the present application.
As shown in fig. 4, step 300 includes:
step 301, generating one-dimensional derived feature data according to a triple in the feature engineering knowledge database.
That is, only the calculation of the relationship between two directly related entities is considered, the target entity is used as the starting node, other entities directly connected with the target entity are searched, and the relationship is extracted for feature derivation. Specifically, if the feature engineering certificate prior knowledge base has predefined triples: feature 1-proportion-feature 2, feature 1-sum-feature 3, feature 2-proportion-feature 3. Then with feature 1 as the starting node and feature 2 directly associated with feature 1, the ratio calculation operator is used to calculate
Figure BDA0003802701250000091
And the new feature is named: feature 1 name-divide-feature 2 name-1 degree derivative, compute the value of feature 1+ feature 2 using a summation operator, and name the new feature: feature 1 name-plus-feature 2 name-1 degree derivative.
For example, taking the above-mentioned intelligent marketing scenario as an example, the updated feature engineering knowledge base includes three triples of "annual expenditure amount-plus-annual income amount", "annual expenditure amount-odds-annual income amount", and "annual high-end consumption amount-odds-annual expenditure amount". For client a, three 1-degree derived feature data of client a are obtained by using the three triples. In addition, for all other customers, the three triples in the updated feature engineering knowledge base are also used for 1-degree feature data derivation.
In one or more embodiments, as shown in fig. 3, step 300 may further include:
step 302, generating multidimensional derived feature data according to at least two triples in the feature engineering knowledge database.
That is, the calculation of relationships between indirectly associated entities is considered. Specifically, the target node is used as an initial node, an intermediate node directly related to the target node is searched for first, and the intermediate node is used as the initial node to search for a final node directly related to the intermediate node. And extracting the relation between the initial node and the final node, and performing feature derivation. Specifically, theIn other words, if the feature engineering certificate prior knowledge base has predefined triples, feature 1-occupation-feature 2 and feature 2-occupation-feature 3, if the initial node is feature 1, the intermediate node is feature 2, and the final node is feature 3, the relationship between feature 1 and feature 3 is occupation, and the occupation operator is used for calculating
Figure BDA0003802701250000092
And name the new feature as: feature 1 name-divide-feature 3 name-2 degree derivative. By analogy, feature derivatives of 3 degrees and greater can be calculated.
For example, taking the above-mentioned intelligent marketing scenario as an example, the updated feature engineering knowledge base includes three triples of "annual expenditure amount-plus-annual income amount", "annual expenditure amount-odds-annual income amount", and "annual high-end consumption amount-odds-annual expenditure amount". In this case, the updated feature engineering knowledge base may be used to derive the 2-degree feature data, for example, two triples of "annual expense amount-proportion-annual income amount" and "annual top consumption amount-proportion-annual expense amount" may be used to derive the 2-degree feature data by deriving the two triples
Figure BDA0003802701250000101
For this derived feature data, the data name of the derived feature data may be: annual high-end consumption amount-minus annual income amount-2 degrees of derivation.
Therefore, the derived feature data comprise one-dimensional derived feature data and multidimensional derived feature data, the potential of the data can be fully mined, and the effect of the model is improved.
In the embodiment of the present application, step 300 derives a large number of features, but not all derived features are suitable as input to the machine learning model.
Therefore, in the embodiment of the present application, the derived feature data is filtered through step 400, and the raw data and the filtered data are used as the input of the financial business machine learning model. Therefore, the characteristic data of the machine learning model of the financial business can be input accurately, and the effect of the machine learning model can be effectively improved.
In the embodiment of the present application, the screening step performs screening according to a statistical index and/or a business significance, where the statistical index represents an all-zero feature, an all-one feature, a negative feature, or a low variance feature, and for example, for the all-zero feature, the all-one feature, the negative feature, or the low variance feature, the content of data information is not high, and deletion may be considered. This enables appropriate feature data to be screened out.
In the embodiment of the present application, the screening step may further include service significance screening, that is, performing service rationality screening according to service rationality corresponding to the new feature or a name of the new feature, and in particular, checking whether the service rationality exists for information highly derived at 2 degrees or more, so as to avoid unreasonable features generated by incorrect information conduction. Therefore, appropriate characteristic data can be screened out, the efficiency and the accuracy of the machine learning model are improved, and the effect of the machine learning model is improved.
According to the embodiment of the application, the feature engineering knowledge base is updated according to the data name of the feature data of the client, the feature data are derived according to the updated feature engineering knowledge base, derived feature data are obtained, and the derived feature data are screened. Therefore, on the premise of not depending on expert experience, the data potential can be fully mined and the model effect is improved based on the automatic implementation characteristic engineering of business experience in a short time through an automatic and intelligent mode.
Example 2
The embodiment of the present application further provides a data processing apparatus, which is used for a financial service machine learning model and corresponds to the data processing method in embodiment 1, so that the implementation of the apparatus may refer to the implementation of the data processing method in embodiment 1, and repeated details are not repeated.
Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 5, the data processing apparatus 10 includes a data acquisition module 11, a feature engineering knowledge base update module 12, a feature derivation module 13, and a filtering module 14, wherein,
the data acquisition module 11 acquires the characteristic data of the client and the data name representing the meaning of the characteristic data;
the feature engineering knowledge base updating module 12 updates the feature engineering knowledge base according to the data names of the feature data, wherein the feature engineering knowledge base stores operation relations among different data names;
the feature derivation module 13 performs feature data derivation on the client according to the updated feature engineering knowledge base to obtain derived feature data;
the screening module 14 screens the derived feature data, and uses the screened data as the input of the financial business machine learning model.
According to the embodiment of the application, the feature engineering knowledge base is updated according to the data name of the feature data of the client, the feature data is derived according to the updated feature engineering knowledge base, derived feature data are obtained, and the derived feature data are screened. Therefore, on the premise of not depending on expert experience, the data potential can be fully mined and the model effect is improved based on the automatic implementation characteristic engineering of business experience in a short time through an automatic and intelligent mode.
The embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the data processing method when executing the computer program.
Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data processing method.
It should be noted that, in the technical solution of the present application, the acquisition, storage, usage, processing, etc. of data all conform to the relevant regulations of the national laws and regulations.
Although the present application provides method steps as described in an embodiment or flowchart, additional or fewer steps may be included based on conventional or non-inventive efforts. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of sequences, and does not represent a unique order of performance. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, apparatus (system) or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "upper", "lower", and the like, refer to an orientation or positional relationship based on that shown in the drawings, merely for convenience in describing the present application and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the present application. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate. It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present application may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; these modifications and substitutions do not depart from the spirit of the embodiments of the present application, and they should be construed as being included in the scope of the claims and description of the present application.

Claims (10)

1. A data processing method for a financial transaction machine learning model, the method comprising:
a data acquisition step of acquiring characteristic data of a client and a data name representing the meaning of the characteristic data;
updating a characteristic engineering knowledge base, namely updating the characteristic engineering knowledge base according to the data names of the characteristic data, wherein the characteristic engineering knowledge base stores the operation relation among different data names;
a characteristic derivation step, namely performing characteristic data derivation on the client according to the updated characteristic engineering knowledge base to obtain derived characteristic data; and
and a screening step, namely screening the derived characteristic data, and taking the screened data as the input of the financial business machine learning model.
2. The method of claim 1,
in the data acquisition step, the data name is determined according to the acquired description information of the characteristic data, and/or the characteristic data is determined to be one of multiple preset data names according to the meaning of the characteristic data.
3. The method of claim 1,
the feature engineering knowledge base stores operational relations among different data names in a graph database mode, the graph database comprises a plurality of entities, each entity represents one data name, and the relations among the different entities represent operational operators among the corresponding data names.
4. The method of claim 3,
the graph database storing a plurality of triples, each of the triples including two different entities and a relationship between the two different entities,
in the step of updating the feature engineering knowledge base, generating a triple including the feature name of the obtained feature data of the client, judging whether the generated triple exists in the database, and updating the feature engineering knowledge base according to the judgment result.
5. The method of claim 4,
in the feature derivation step, the obtained feature data of the client is operated according to the triples in the feature engineering knowledge base to generate the derived feature data.
6. The method of claim 5, wherein the feature derivation step comprises:
generating one-dimensional derived feature data according to a triple in the feature engineering knowledge database; and
and generating multi-dimensional derived feature data according to at least two triples in the feature engineering knowledge database.
7. The method of claim 1,
and the screening step is used for screening according to a statistical index and/or a service significance, wherein the statistical index represents an all-zero characteristic, an all-one characteristic, a negative characteristic or a low variance characteristic.
8. A data processing apparatus for a financial transaction machine learning model, the apparatus comprising:
the data acquisition module is used for acquiring the characteristic data of the client and the data name representing the meaning of the characteristic data;
the characteristic engineering knowledge base updating module is used for updating a characteristic engineering knowledge base according to the data names of the characteristic data, and the characteristic engineering knowledge base stores the operation relation among different data names;
the characteristic derivation module is used for deriving the characteristic data of the client according to the updated characteristic engineering knowledge base to obtain derived characteristic data; and
and the screening module is used for screening the derived characteristic data and taking the screened data as the input of the financial business machine learning model.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210988043.XA 2022-08-17 2022-08-17 Data processing method and device Pending CN115688934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210988043.XA CN115688934A (en) 2022-08-17 2022-08-17 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210988043.XA CN115688934A (en) 2022-08-17 2022-08-17 Data processing method and device

Publications (1)

Publication Number Publication Date
CN115688934A true CN115688934A (en) 2023-02-03

Family

ID=85061303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210988043.XA Pending CN115688934A (en) 2022-08-17 2022-08-17 Data processing method and device

Country Status (1)

Country Link
CN (1) CN115688934A (en)

Similar Documents

Publication Publication Date Title
US11327935B2 (en) Intelligent data quality
US8645332B1 (en) Systems and methods for capturing data refinement actions based on visualized search of information
WO2019212857A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
CA2935281C (en) A multidimensional recursive learning process and system used to discover complex dyadic or multiple counterparty relationships
CN111309822A (en) User identity identification method and device
CN108052542B (en) Multidimensional data analysis method based on presto data
US11379466B2 (en) Data accuracy using natural language processing
CN110880124A (en) Conversion rate evaluation method and device
CN116882520A (en) Prediction method and system for predetermined prediction problem
CN114598539A (en) Root cause positioning method and device, storage medium and electronic equipment
CN113570437A (en) Product recommendation method and device
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring
CN112416918A (en) Data management system and working method thereof
CN111046912A (en) Feature derivation method and device and computer equipment
CN115688934A (en) Data processing method and device
CN114519073A (en) Product configuration recommendation method and system based on atlas relation mining
CN113436023A (en) Financial product recommendation method and device based on block chain
CN113610225A (en) Quality evaluation model training method and device, electronic equipment and storage medium
CN117539948B (en) Service data retrieval method and device based on deep neural network
CN115688935A (en) Data processing method and device
KR100515347B1 (en) Method and system of preprocessing information data of proteome
CN115081716A (en) Enterprise default risk prediction method, computer equipment and storage medium
CN116975774A (en) Mechanism name fusion method, terminal equipment and computer readable storage medium
CN113946632A (en) Agile data warehouse architecture and construction method and application thereof
CN116662433A (en) Incremental data processing method, device, computer equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination