CN110688433B - Path-based feature generation method and device - Google Patents

Path-based feature generation method and device Download PDF

Info

Publication number
CN110688433B
CN110688433B CN201911254655.0A CN201911254655A CN110688433B CN 110688433 B CN110688433 B CN 110688433B CN 201911254655 A CN201911254655 A CN 201911254655A CN 110688433 B CN110688433 B CN 110688433B
Authority
CN
China
Prior art keywords
entity
path
relationship
target entity
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911254655.0A
Other languages
Chinese (zh)
Other versions
CN110688433A (en
Inventor
卢翠兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Data Services Co ltd
Original Assignee
China Unionpay Data Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Data Services Co ltd filed Critical China Unionpay Data Services Co ltd
Priority to CN201911254655.0A priority Critical patent/CN110688433B/en
Publication of CN110688433A publication Critical patent/CN110688433A/en
Application granted granted Critical
Publication of CN110688433B publication Critical patent/CN110688433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for generating characteristics based on paths, wherein the method comprises the following steps: the method comprises the steps of obtaining entities and attributes in a data set, generating a directed relationship set comprising a plurality of first directed relationships according to the entities, generating a virtual entity according to the same attributes if it is determined that a first entity and a second entity in the first directed relationship have the same attributes, generating a second directed relationship according to the virtual entity and the first directed relationship, adding the second directed relationship into the directed relationship set, generating a relationship path set of a target entity according to the directed relationship set and the target entity, and generating a feature set of the target entity according to the relationship path set of the target entity. The technical scheme is used for automatically generating the target characteristics of the target entity, the labor cost is saved, and the generated characteristic set can be interpreted.

Description

Path-based feature generation method and device
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a method and a device for generating characteristics based on paths.
Background
Enterprise business development accumulates large amounts of multidimensional structured data. Machine learning models tend to be better at processing mass data to achieve the efficiency and effect of a business objective than methods where experts formulate rules. The mining of valid features from the data is crucial to the model effect.
In the feature extraction process, an expert needs to manually perform feature engineering by combining a business target and practice experience accumulated by the expert to determine a feature set for establishing a model, but the method not only needs to consume a large amount of manpower and time, but also is greatly influenced by subjectivity of the expert, and standards of the obtained feature set are inconsistent. Automatic feature engineering generally uses a form of brute force combination, and the generated features are less interpretable.
Disclosure of Invention
The embodiment of the invention provides a path-based feature generation method and device, which are used for automatically generating target features of a target entity, so that the labor cost is saved, and a generated feature set can be interpreted.
The embodiment of the invention provides a path-based feature generation method, which comprises the following steps:
acquiring entities and attributes in a data set; the entities comprise behavior entities and physical entities;
generating a directed relationship set comprising a plurality of first directed relationships according to entities in the data set; the first directed relationship includes a first entity pointing to a second entity, the first entity and the second entity being any two of the entities in the dataset, the first entity and the second entity being a many-to-one relationship in the dataset;
for any first directed relation in the directed relation set, if it is determined that a first entity and the second entity in the first directed relation have the same attribute, generating a virtual entity according to the same attribute; generating a second directed relation according to the virtual class entity and the first directed relation, and adding the second directed relation into the directed relation set; the second directed relationship comprises the first entity pointing to the virtual class entity and the virtual class entity pointing to the second entity;
generating a relationship path set of the target entity according to the directed relationship set and the target entity;
generating a feature set of the target entity according to the relationship path set of the target entity; the feature set of the target entity is used in model training with the target entity as a sample.
Optionally, the generating a relationship path set of the target entity according to the directed relationship set and the target entity includes:
sequentially combining the directed relations in the directed relation set to generate a plurality of relation paths pointing to the target entity;
and forming the plurality of relationship paths into a relationship path set of the target entity.
Optionally, the sequentially combining the directed relationships in the directed relationship set to generate a plurality of relationship paths pointing to the target entity includes:
if the first relation path which only comprises two entity types and has the path length of 1 is determined to be generated, updating the first relation path into a second relation path which has the same attribute of the two entity types and is connected with the two entity types and has the path length of 2 according to the same attribute of the two entity types;
wherein, the path length refers to the number of directed relationships in the relationship path.
Optionally, the generating a feature set of the target entity according to the relationship path set of the target entity includes:
generating a feature set of the target entity corresponding to the relationship path according to the relationship paths in the relationship path set in an increasing order of path length, and recording attributes corresponding to the features;
wherein the generating the feature set of the target entity corresponding to the relationship path according to the relationship path includes:
generating a sub-relationship path pointing to the other entities except the target entity in a recursive manner; for each sub-relationship path, determining temporary characteristics of the other entities according to the data of the other entities in the data set; the other entities comprise a real entity and a virtual entity;
and sequentially performing aggregation operation and combination operation according to the temporary characteristics of the other entities and the relationship path to generate a characteristic set of the target entity corresponding to the relationship path.
Optionally, the performing a combined operation according to the temporary features of the other entities and the relationship path includes:
performing attribute reachable combination operation according to the temporary characteristics of the other entities and the relationship path; the attribute reachable property means that when determining that the attributes corresponding to the two temporary features are the same or one of the temporary features is a count and ratio attribute, the two temporary features can be subjected to combined operation.
Optionally, the other entities are virtual class entities, and the virtual class entities point to the target entity;
after determining the temporary characteristics of the remaining entities, the method includes:
and performing attribute reachable combined operation according to the temporary characteristics of the virtual entity and the generated characteristic set of the target entity corresponding to the relationship path, and determining the reference point characteristics of the target entity.
Optionally, the relationship path set includes a circulatable path; the target entity occurs at least twice in the circulatable path;
generating a feature set of the target entity according to the relationship path set of the target entity, including:
generating temporary characteristics of the rest entities under the circulatable path according to a first subrelationship path in the circulatable path; the first subrelational path is a subrelational path which contains the target entity pointing to other entities in the circulated path;
generating interactive characteristics of the target entity by combining a second sub-relation path in the circulatable path according to temporary characteristics of other entities under the circulatable path; the second sub-relationship path is a sub-relationship path including the remaining entities pointing to the target entity.
Optionally, the generating a feature set of the target entity according to the relationship path set of the target entity includes:
and performing combined operation to determine the trend characteristic of the target entity when the target entity meets the condition that the attribute is reachable between the features of the target entity in the first time period and the features of the target entity in the second time period.
Optionally, after the generating the feature set of the target entity, the method further includes:
storing an entity set, the directed relationship set, the temporary feature set of the virtual entity, the feature set of the target entity and the relationship path set of the target entity to serve as storage data;
multiplexing the stored data into other data sets; or
And multiplexing the stored data into the engineering of other target models.
In the technical scheme, entities and attributes in a data set are acquired, a directed relationship set is generated according to the entities in the data set, whether the same attribute exists between a first entity and a second entity in the first directed relationship is judged aiming at any first directed relationship in the directed relationship set, if the same attribute exists, a virtual entity is generated according to the same attribute, a second directed relationship is generated according to the first directed relationship and the virtual entity, a relationship path set consisting of relationship paths pointing to a target entity is automatically generated according to the directed relationship set containing the first directed relationship and the second directed relationship and the target entity, and further a feature set corresponding to the target entity is generated for model training with the target entity as a sample, and a plurality of relationship paths pointing to the target entity can be automatically generated through the entities and the directed relationships, the method is used for generating the characteristics of the target entity, the generated characteristics of the target entity have interpretability, manual adjustment and combination are not needed, and labor cost is saved.
The embodiment of the invention can process the data tables with multiple dimensions, and can extract the features based on the data tables with multiple dimensions, so that the extracted features are more comprehensive.
In addition, the scheme is that the first entity and the second entity are connected by establishing the virtual entity, so that when a relation path comprising the virtual entity is generated, reference point characteristics of a target entity can be generated based on the virtual entity in the relation path; when the generated relationship path comprises a circulatable path, generating an interactive feature of the target entity based on the generated circulatable path; and generating trend characteristics of the target entity based on the characteristics corresponding to the target entity in different time periods.
And the generated high-dimensional characteristics such as reference point characteristics, interaction characteristics, trend characteristics and the like are more explanatory when the combined operation with reachable attributes is carried out.
Correspondingly, an embodiment of the present invention further provides a feature generation apparatus based on a path, including:
the system comprises an acquisition module, a directed relationship generation module, a path generation module and a feature generation module;
the acquisition module is used for acquiring entities and attributes in the data set; the entities comprise behavior entities and physical entities;
the directed relationship generation module is used for generating a directed relationship set comprising a plurality of first directed relationships according to the entities in the data set; the first directed relationship includes a first entity pointing to a second entity, the first entity and the second entity being any two of the entities in the dataset, the first entity and the second entity being a many-to-one relationship in the dataset;
the directed relationship generation module is further configured to, for any first directed relationship in the directed relationship set, if it is determined that a first entity and the second entity in the first directed relationship have the same attribute, generate a virtual class entity according to the same attribute; generating a second directed relation according to the virtual class entity and the first directed relation, and adding the second directed relation into the directed relation set; the second directed relationship comprises the first entity pointing to the virtual class entity and the virtual class entity pointing to the second entity;
the path generation module is used for generating a relationship path set of the target entity according to the directed relationship set and the target entity;
the characteristic generating module is used for generating a characteristic set of the target entity according to the relationship path set of the target entity; the feature set of the target entity is used in model training with the target entity as a sample.
Optionally, the path generating module is specifically configured to:
sequentially combining the directed relations in the directed relation set to generate a plurality of relation paths pointing to the target entity;
and forming the plurality of relationship paths into a relationship path set of the target entity.
Optionally, the path generating module is specifically configured to:
if the first relation path which only comprises two entity types and has the path length of 1 is determined to be generated, updating the first relation path into a second relation path which has the same attribute of the two entity types and is connected with the two entity types and has the path length of 2 according to the same attribute of the two entity types;
wherein, the path length refers to the number of directed relationships in the relationship path.
Optionally, the feature generation module is specifically configured to:
generating a feature set of the target entity corresponding to the relationship path according to the relationship paths in the relationship path set in an increasing order of path length, and recording attributes corresponding to the features; the path length refers to the number of directed relationships in the relationship path;
wherein the generating the feature set of the target entity corresponding to the relationship path according to the relationship path includes:
generating a sub-relationship path pointing to the other entities except the target entity in a recursive manner; for each sub-relationship path, determining temporary characteristics of the other entities according to the data of the other entities in the data set; the other entities comprise a real entity and a virtual entity;
and sequentially performing aggregation operation and combination operation according to the temporary characteristics of the other entities and the relationship path to generate a characteristic set of the target entity corresponding to the relationship path.
Optionally, the feature generation module is specifically configured to:
performing attribute reachable combination operation according to the temporary characteristics of the other entities and the relationship path; the attribute reachable property means that when determining that the attributes corresponding to the two temporary features are the same or one of the temporary features is a count and ratio attribute, the two temporary features can be subjected to combined operation.
Optionally, the other entities are virtual class entities, and the virtual class entities point to the target entity;
the feature generation module is further to:
after the temporary features of the other entities are determined, performing attribute-reachable combination operation according to the temporary features of the virtual entity and the generated feature set of the target entity corresponding to the relationship path, and determining the reference point features of the target entity.
Optionally, the relationship path set includes a circulatable path; the target entity occurs at least twice in the circulatable path;
the feature generation module is specifically configured to:
generating temporary characteristics of the rest entities under the circulatable path according to a first subrelationship path in the circulatable path; the first subrelational path is a subrelational path which contains the target entity pointing to other entities in the circulated path;
generating interactive characteristics of the target entity by combining a second sub-relation path in the circulatable path according to temporary characteristics of other entities under the circulatable path; the second sub-relationship path is a sub-relationship path including the remaining entities pointing to the target entity.
Optionally, the feature generation module is specifically configured to:
and performing combined operation to determine the trend characteristic of the target entity when the target entity meets the condition that the attribute is reachable between the features of the target entity in the first time period and the features of the target entity in the second time period.
Optionally, the apparatus further comprises a storage module;
the storage module is used for storing an entity set, the directed relationship set, the temporary feature set of the virtual entity, the feature set of the target entity and the relationship path set of the target entity as storage data after the feature set of the target entity is generated; multiplexing the stored data into other data sets; or the stored data is multiplexed into the engineering of other target models.
Correspondingly, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the characteristic generating method based on the path according to the obtained program.
Accordingly, an embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer is caused to execute the above-mentioned path-based feature generation method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for generating features based on paths according to an embodiment of the present invention;
fig. 2(a) is a first directed relationship diagram provided in the embodiment of the present invention;
FIG. 2(b) is a second directed relationship diagram provided in the embodiment of the present invention;
fig. 3 is a schematic flowchart of determining a feature set of a target entity according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of determining an interaction characteristic of a target entity according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a multiplexing feature engineering according to an embodiment of the present invention;
FIG. 6(a) is a third directed relationship diagram provided by the embodiment of the present invention;
FIG. 6(b) is a fourth directed relationship diagram provided by the embodiment of the present invention;
fig. 7 is a schematic structural diagram of a path-based feature generation apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 schematically shows a flow of a path-based feature generation method according to an embodiment of the present invention, where the flow may be executed by a path-based feature generation apparatus.
As shown in fig. 1, the process specifically includes:
step 101, acquiring entities and attributes in a data set.
The acquired data set includes at least one structured data table, illustratively, intersection
The free-flow water meter, the business meter and the customer meter are respectively shown in table 1, table 2 and table 3.
Table 1 transaction flow water meter
Figure 972058DEST_PATH_IMAGE001
TABLE 2 Merchant Table
Figure 951515DEST_PATH_IMAGE002
TABLE 3 customer table
Figure 527990DEST_PATH_IMAGE003
Determining an entity according to the primary key of the data set, namely determining the entity according to the ID primary key of each data table, wherein the entity can comprise a behavior entity and a physical entity; the attribute is determined from the unit of data of the data set other than the primary key data, that is, from the unit of each column of the respective data table.
Taking table 2 as an example, a physical entity, that is, a merchant entity, can be determined according to the ID primary key; attributes, i.e., category and city, are determined from the units of each column. From tables 1, 2, and 3, it can be determined that the entity class includes a merchant entity and a customer entity, and the behavior class includes a transaction entity.
In the embodiment of the invention, when the attribute is determined, the category and the city are also the attributes except the measurement units such as the element and the rank in the data table. In addition, a parameter table may exist in the acquired data set, the parameter table is used for determining a virtual entity, and the parameter table does not include a physical entity.
After determining the entities and attributes in the dataset, the attribute definitions may be determined according to an attribute table as shown in table 4, where the attribute table may be used to determine virtual class entities associated with the entities in the dataset (including behavioral class entities and physical class entities). From table 4, it can be determined that the virtual class entities include time, amount, gender, city, category.
TABLE 4 Attribute Table
Figure 985516DEST_PATH_IMAGE004
In the embodiment of the present invention, the acquired data set may be preprocessed first, and the preprocessing manner includes, but is not limited to: missing value processing, abnormal value processing, numerical value encoding and one-hot encoding.
Step 102, generating a directed relationship set according to the entities in the data set.
Determining a first directed relationship between entities according to a corresponding relationship between entities in the data set, wherein two entities having a corresponding relationship can be defined as a first entity and a second entity, and when the first entity and the second entity are in a many-to-one relationship in the data set, the first directed relationship is that the first entity points to the second entity. The first entity and the second entity are any two of the entities in the dataset. After determining the plurality of first directed relationships from the data set, the plurality of first directed relationships may be grouped into a set of directed relationships.
In the embodiment of the present invention, the first entity and the second entity may be different entity in physical category, or a behavior entity and a physical entity.
Specifically, there may be the following two cases:
when a first entity and a second entity are different physical entities, if a many-to-one relationship exists between the first physical entity and the second physical entity in the two physical entities, pointing the first physical entity to the second physical entity, wherein the first entity is the first physical entity and the second entity is the second physical entity; if it is determined that a one-to-many relationship exists between a first physical entity and a second physical entity in the two behavior entities, the second physical entity is pointed to the first physical entity, and at this time, the first entity is the second physical entity, and the second entity is the first physical entity.
When the first entity and the second entity are a behavior entity and a physical entity, if the fact that the behavior entity and the physical entity have a many-to-one relationship is determined, the behavior entity points to the physical entity, and at the moment, the first entity is the behavior entity, and the second entity is the physical entity; and if the one-to-many relationship exists between the behavior entity and the physical entity, pointing the physical entity to the behavior entity, wherein the first entity is the physical entity and the second entity is the behavior entity.
It should be noted that, if the acquired data set includes the parameter table, the virtual class entity may be determined according to the acquired parameter table, and at this time, there may be one of the following cases:
when the first entity and the second entity are a behavior entity and a virtual entity, if the fact that the behavior entity and the virtual entity have a many-to-one relationship is determined, the behavior entity points to the virtual entity, and at the moment, the first entity is the behavior entity, and the second entity is the virtual entity; if the one-to-many relationship exists between the behavior entity and the virtual entity, the virtual entity is pointed to the behavior entity, and at this time, the first entity is the virtual entity, and the second entity is the behavior entity.
In the examples shown in tables 1 to 3, a client may perform multiple transaction actions, and then the client entity is a parent node of the transaction entity, and the directed relationship between the two is that the transaction entity points to the client entity; a merchant can conduct multiple transaction actions, so that the merchant entity is a parent node of the transaction entity, and the directional relationship between the merchant entity and the transaction entity is that the transaction entity points to the merchant entity. Similarly, a many-to-one relationship exists between the customer entity and the merchant entity in the transaction entity, the customer entity is directed to the merchant entity, a many-to-one relationship exists between the merchant entity and the customer entity in the transaction entity, the merchant entity is directed to the customer entity, and finally, the directed relationship graph shown in fig. 2(a) can be generated.
103, for any first directed relationship in the directed relationship set, if it is determined that the first entity and the second entity in the first directed relationship have the same attribute, generating a virtual entity according to the same attribute; and generating a second directed relation according to the virtual class entity and the first directed relation, and adding the second directed relation into the directed relation set.
And if the first entity and the second entity are determined to have the same attribute, generating a virtual entity according to the same attribute, connecting the virtual entity with the first entity and the second entity, equivalently, pointing the first entity to the virtual entity and pointing the virtual entity to the second entity, and determining the directed relationship as the second directed relationship. For example, as shown in fig. 2(a), the transaction entity includes a money attribute, which indicates the transaction money of the current transaction behavior, the client entity also includes a money attribute, which indicates the available money of the client, the money attribute may be used as the same attribute of the transaction entity and the client entity, and the money attribute is used as a virtual class entity to associate the behavior class entity and the physical class entity, specifically, the first directed relationship is that the transaction entity points to the client entity, then the second directed relationship generated according to the money attribute and the first directed relationship points to the money attribute for the transaction entity, and the money attribute points to the client entity, and the generated second directed relationship may be as shown in fig. 2 (b).
A plurality of second directed relationships as shown in fig. 2(b) may be generated from the plurality of first directed relationships as shown in fig. 2(a), and the dotted line indicates the generated second directed relationship. As in fig. 2(b), the transaction entity and the customer entity are connected by a value attribute, the transaction entity being directed to the value attribute and the value attribute being directed to the customer entity; the customer entity is connected with the merchant entity through a city, the merchant entity points to the city, and the city points to the customer entity. Fig. 2(b) is merely an exemplary diagram illustrating the generated second directed relationship, and in the embodiment of the present invention, a plurality of second directed relationships may be generated according to the first directed relationship and the virtual class entity.
In addition, a virtual entity class can be constructed by using attributes related to the physical entity class, an additional parameter table or a model, for example, if a customer is used as a main key, and a list of customer ages is added to the transaction entity, the same attribute of the transaction entity and the customer entity is added with an age item, and the age attribute is used as the virtual entity class.
And 104, generating a relation path set of the target entity according to the directed relation set and the target entity.
In the embodiment of the invention, the directed relations in the directed relation set are sequentially combined to generate a plurality of relation paths pointing to the target entity, and the relation paths form the relation path set of the target entity.
The target entities are predefined, the target entities may have single target entities and multi-target entities, the single target entities may include customer entities and merchant entities, for example, for a model for credit management purposes, the customer entities may be used as target entities and denoted as (customers), and the multi-target entities may include customer-merchant entities and merchant-customer entities, for example, for a model for marketing recommendation purposes, the merchant-customer entities may be used as target entities and denoted as (merchants and customers).
When generating a plurality of relationship paths pointing to a target entity, the target entity may be used as an end point of the relationship path, traverse all possible relationship paths and generate a relationship path set after deduplication, as shown in fig. 2(b), in a directed relationship, a client entity is used as the target entity, that is, the client entity is used as an end point, traverse all possible relationship paths as shown in fig. 2(b), and obtain the following relationship path set.
Relationship path (1): customer
Relationship path (2): trade → customer
Relationship path (3): merchant → customer
Relationship path (4): transaction → amount → customer
Relationship path (5): trade → city → customer
Relationship path (6): transaction → Merchant → customer
Relationship path (7): transaction → customer → Merchant → customer
……
When the relational path set is generated, the path length of the generated relational path may be defined so as to limit the size of the relational path set, and the longer the defined length, the larger the generated relational path set, and the smaller the generated relational path set. The path length of the relationship path is used for indicating the number of the directed relationships in the relationship path, and if 1 directed relationship exists in the relationship path (3), the path length is 1; if there are 2 directional relations in the relation path (4), the path length is 2.
It should be noted that, in the process of generating the plurality of relationship paths, if it is determined that the first relationship path which only includes two real-object entities and has a path length of 1 is generated, the first relationship path is updated to a second relationship path which has the same attribute of the two real-object entities and has a path length of 2 connecting the two real-object entities according to the same attribute of the two real-object entities. As shown in the above-mentioned (3) relationship path in the relationship path set, that is, the originally generated relationship path is the merchant → the customer, and the relationship path is the first relationship path which includes only the merchant entity and the customer entity (two entity types) and has a path length of 1, so that the two entity types can be connected by the common attribute existing between the merchant entity and the customer entity, such as the city, and the second relationship path generated is the merchant → the city → the customer, and the merchant → the city → the customer is updated to the original merchant → the customer. Equivalently, when a first relationship path which only includes two physical entities and has a path length of 1 is determined to be generated, the same attributes of the two physical entities may be used as virtual entities, a second relationship path is generated according to the virtual entities and the two physical entities, the direction of the second relationship path is the same as the direction of the first relationship path, and if the first physical entity in the first relationship path points to the second physical entity, the first physical entity in the generated second relationship path points to the virtual entities and the virtual entities point to the second physical entity, and the second relationship path is updated to the first relationship path.
And 105, generating a feature set of the target entity according to the relationship path set of the target entity.
And according to the increasing sequence of the path length, generating a feature set of the target entity corresponding to the relationship path according to each relationship path in the relationship path set, and recording the attribute corresponding to each feature.
In the generated feature set of the target entity, the feature set of the target entity corresponding to the relationship path is generated according to each relationship path in the ascending order of the path length, specifically:
step 201, aiming at a relation path with a path length of 0, generating a feature set of a corresponding target entity according to the relation path (1);
step 202, aiming at the relation path with the path length of 1, generating a feature set of a corresponding target entity according to the relation path (2), and generating a feature set of a corresponding target entity according to the relation path (3);
step 203, aiming at the relation path with the path length of 2, generating a feature set of a corresponding target entity according to the relation path (4), generating a feature set of a corresponding target entity according to the relation path (5), and generating a feature set of a corresponding target entity according to the relation path (6);
step 204, aiming at the relation path with the path length of 3, generating a feature set of a corresponding target entity according to the relation path (7);
……
for any of the relationship paths, when determining the feature set of the target entity corresponding to the relationship path, the determination may be performed according to the flowchart shown in fig. 3.
Step 301, generating a sub-relationship path pointing to the other entities except the target entity in a recursive manner.
Step 302, determining temporary characteristics of the remaining entities according to the data of the remaining entities in the data set for each sub-relationship path.
In the route (7) "transaction → customer → business → customer", the last customer is the target entity, and in the recursive form, the characteristics of transaction → customer → business are generated by first reverse-deducing, and further, the characteristics of transaction → customer are generated by further recursion.
In the embodiment of the present invention, the remaining entities refer to entities other than the target entity in all the entities, and may include a real entity and a virtual entity, that is, the remaining entities may be a real entity or a virtual entity. When the other entities are virtual entities and the virtual entities point to the target entity, the temporary characteristics of the virtual entities can be determined, and attribute-reachable combined operation is performed based on the temporary characteristics of the virtual entities and the characteristic set of the target entity corresponding to the generated relationship path, so as to determine the reference point characteristics of the target entity. Here, the attribute reachable means that when it is determined that the attributes corresponding to the two temporary features are the same or that the attribute corresponding to one of the temporary features is a count or ratio attribute, a combination operation can be performed on the two temporary features.
In the embodiment of the invention, the feature set of the target entity corresponding to the relationship path is generated according to each relationship path in the relationship path set in the order of increasing path length, so that when the reference point feature of the target entity is determined, the combination operation determination can be performed on the basis of the temporary feature of the virtual entity and the feature set of the target entity corresponding to the generated relationship path.
For example, "deal → city → customer" in the relation route (5) first generates temporary characteristics of a city (virtual class entity), the customer has previously generated partial characteristics of the customer according to "deal → customer" in the relation route (1), and under the relation route (5) "deal → city → customer", temporary characteristics of a city are associated to all customer characteristics of the city, and the aggregation operation and the combination operation can be performed by using the city characteristics "number of deals in 2 months under the city" and the customer characteristics "3 cities consumed most often by the customer", to find the average number of deals of the city that the customer trades most often, that is, the deal volume level. In a specific implementation, a temporary feature of a city, namely "the number of transactions in 2 months under the city (f 1= event [ month =2]. group pb y (city) count ()", is associated (join) to a feature of a customer, namely "the most frequently consumed 3 cities of the customer (f 2= event [ month =2]. group pb y (cut) top3_ city)" through an aggregation operation, and the average number of transactions of the city most frequently transacted by the customer, namely, the transaction volume level (f 2.join (f1, on = city) mean (), is obtained.
In addition, when determining the reference point feature, not only the reference point feature of the target entity may be determined, but also the reference point feature of the non-target entity may be determined, the essence of the embodiment of the present invention is that, according to the data of the entity pointing to the virtual entity in the data set, the feature expression of the virtual entity, that is, the temporary feature of the virtual entity is determined, the temporary feature of the virtual entity is used as a reference, further processing is performed to generate the feature of the entity pointed to by the virtual entity, and the processing feature of the entity pointed to by the virtual entity is obtained according to the temporary feature of the virtual entity and the original feature of the entity pointed to by the virtual entity, and is the reference point feature, and the entity pointed to by the virtual entity may be the target entity or not the target entity.
And 303, sequentially performing aggregation operation and combination operation according to the temporary characteristics and the relationship paths of the other entities to generate a characteristic set of the target entity corresponding to the relationship path.
In this step, when performing a combination operation according to the temporary features and the relationship paths of the other entities, the combination operation may be performed with an attribute reachable property, where an attribute reachable property means that when determining that the attributes corresponding to two temporary features are the same or one of the temporary features is a count and ratio attribute, the combination operation may be performed on the two temporary features. In the embodiment of the invention, high-dimensional characteristics such as reference point characteristics, interaction characteristics, trend characteristics and the like generated by adopting the attribute reachable principle are more explanatory.
For the aggregation operation, operations that can be performed by the discrete type features include TOP N, frequency count, maximum and minimum values, and the like; the continuous type features can be operated according to maximum and minimum values, mean values, variance and the like. Both may incorporate time window and like conditions. And (4) the attribute of the characteristic result obtained by the aggregation operation is unchanged except for counting.
For the combined operation, the linear operation of the continuous or counting type characteristics needs to meet the requirement that the attribute can be reached, and the counting characteristics can perform linear operation with any continuous characteristics; the same type of discrete features may be logically operated on (equal or not) and different types of discrete attributes may be logically operated on (merged). The feature results obtained by the combined operation are unchanged in attributes except for the count and the ratio.
It should be noted that, in the embodiment of the present invention, discretization may also be performed on the continuous features, for example, equidistant and equal-frequency binning operations are performed to obtain corresponding discrete features.
The number of feature generations can be limited by defining a set of computational operations, such as maximum, minimum, mean, variance, frequency count.
The target variables are predefined, and path expansion or path pruning can be performed heuristically according to the distinguishing condition of each single feature/combined feature to the target variables. Metrics that may be used include, but are not limited to, model importance, IV, gain, and the like.
The dimension reduction can be performed on the feature result set in a manner including, but not limited to, evaluation using target variables, selection of salient features, PCA dimension reduction, hashing, and the like.
In one implementation, the relationship path set may include a recyclable path in which the target entity appears at least twice, and when the feature set of the target entity is generated according to the relationship path set of the target entity, the interactive feature of the target entity may be generated according to the recyclable path of the target entity. Specifically, as shown in the flowchart of fig. 4:
step 401, generating temporary characteristics of the remaining entities under the circulatable path according to the first sub-relationship path in the circulatable path.
And 402, combining the second sub-relation paths in the circulatable path according to the temporary characteristics of the other entities under the circulatable path to generate the interactive characteristics of the target entity.
The recyclable path includes a plurality of sub-relationship paths, the sub-relationship path including the target entity pointing to the other entities in the recyclable path is a first sub-relationship path, and the sub-relationship path including the other entities pointing to the target entity in the recyclable path is a second sub-relationship path.
For example, "transaction → customer → merchant → customer" in the path (7), the average risk of the merchant (determined by the average risk of the customer having a transaction with the merchant in the previous period) is calculated according to the first sub-relationship path "transaction → customer → merchant", the merchant transacted by the customer is calculated according to the second sub-relationship path "transaction → merchant → customer", and the average risk of all transaction merchants of the customer is calculated, such as the cardholder who aims at the cash register of the credit card, the merchant who has the main business as the cash register, and the customer of the merchant is the customer who aims at the cash register. Potential cash registering customers, namely the interaction features of the target entity, can be found by grabbing merchants where the known cash registering customers are located. Other examples of interactive features are: the customer flow rate of the merchant in the last 1 month is calculated according to the first sub-relationship path 'transaction → customer → merchant', the transaction times of the merchant which the customer transacts most frequently in the last 1 month is calculated according to the second sub-relationship path 'transaction → merchant → customer', and the two are divided to obtain a ratio feature which can show whether the customer is a faithful customer of the merchant.
In another implementation manner, when the two characteristics of the target entity in the first time period and the second time period satisfy the attribute reachability, a combined operation may be performed to determine the trend characteristic of the target entity.
For example, "deal → customer" in the path (2) may automatically generate a series of features, such as the number of deals in month 2 of customer (event [ month =2]. group pby (sum))) and the number of deals in month 3 of customer (event [ month =3]. group pby (sum ()) according to the aggregation operation, and since both features are count-type (count) attributes, a combination operation may be performed to obtain the difference between the number of deals in month 2 and month 3 of customer, that is, a trend feature of the number of deals of customer, for indicating whether the number of deals in month 3 of customer is greater than the number of deals in month 2, according to the criterion that the attributes can be reached.
Continuing with the above example, the features in the unit of the client entity are automatically generated according to the plurality of relationship paths, which are specifically as follows:
the features generated by the relationship path (1) include: gender, city, amount, etc.;
the generated characteristics of the relation path (2) comprise: the consumption times of the client in the recent period of time, the time of the last transaction of the client, the transaction frequency of the client in the recent period of time and the like;
the characteristics generated by the relationship path (3) comprise: the number of merchants in the city where the customer is located, etc.;
the characteristics generated by the relationship path (4) comprise: average transaction amount of the customer in a period of time, minimum transaction amount of the customer in a period of time, and the like;
the characteristics generated by the relationship path (5) comprise: the number of cities that the customer has traded in a recent period of time, the city that the customer has traded in the recent time, etc.;
the features generated by the relationship path (6) include: the number of merchants transacted by the customer in a period of time, the number of merchants transacted by the customer in the last time, and the like;
the characteristics generated by the relationship path (7) comprise: average amount of customers in a recent period of time for the merchant that the customer is trading most often, average risk for the merchant that the customer is trading, etc.
In the embodiment of the present invention, the determined feature set of the target entity may be used in model training using the target entity as a sample, and after the feature set of the target entity is generated, the obtained entity and feature may also be used in other data sets or projects of the target model. As in the flow chart shown in fig. 5, the flow may include:
step 501, storing an entity set, a directed relationship set, a temporary feature set of a virtual entity, a feature set of a target entity and a relationship path set of the target entity as storage data;
and 502, multiplexing the stored data into other data sets, or multiplexing the stored data into projects of other target models.
The embodiment of the invention can also be applied to the situation of multi-target entities, the merchant-customer entity is taken as the multi-target entity, and the characteristic set of the target entity can comprise the characteristic set taking the merchant as the target entity, the characteristic set taking the customer as the target entity and the characteristic set taking the merchant-customer as the multi-target entity. For example, the characteristics that the path length of the merchant-customer entity is 1 include customer gender, customer city, merchant city, and merchant amount, and the directional relationship between the merchant-customer entity and the transaction entity is that the transaction entity points to the merchant-customer entity, which can be shown in fig. 6 (a); if the common attribute of the merchant-customer entity and the transaction entity is a city, a virtual entity can be generated according to the city, and a new directed relationship is generated according to the city and the corresponding relationship between the merchant-customer entity and the transaction entity, that is, the transaction entity points to the city and the city points to the merchant-customer entity, as shown in fig. 6 (b). Further, according to the directional relationship as shown in fig. 6(b), the generated relationship path has transaction → merchant-customer, transaction → city → merchant-customer; the generated characteristics of the target entity include the average transaction amount of the customer at the merchant, how many cities the customer has transacted at the merchant, and the like.
In the above embodiment, the entities and attributes in the dataset are obtained, a directed relationship set is generated according to the entities in the dataset, and whether the same attribute exists between the first entity and the second entity in the first directed relationship is judged for any first directed relationship in the directed relationship set, if the same attribute exists, a virtual entity is generated according to the same attribute, a second directed relationship is generated according to the first directed relationship and the virtual entity, and then a relationship path set composed of relationship paths pointing to the target entity is automatically generated according to the directed relationship set including the first directed relationship and the second directed relationship and the target entity, so as to generate a feature set corresponding to the target entity, so that in model training using the target entity as a sample, the scheme can automatically generate a plurality of relationship paths pointing to the target entity through the entities and the directed relationships, the method is used for generating the characteristics of the target entity, the generated characteristics of the target entity have interpretability, manual adjustment and combination are not needed, and labor cost is saved.
The embodiment of the invention can process the data tables with multiple dimensions, and can extract the features based on the data tables with multiple dimensions, so that the extracted features are more comprehensive.
In addition, the scheme is that the first entity and the second entity are connected by establishing the virtual entity, so that when a relation path comprising the virtual entity is generated, reference point characteristics of a target entity can be generated based on the virtual entity in the relation path; when the generated relationship path comprises a circulatable path, generating an interactive feature of the target entity based on the generated circulatable path; and generating trend characteristics of the target entity based on the characteristics corresponding to the target entity in different time periods.
The embodiment of the invention performs the combined operation with reachable attributes, and the generated high-dimensional characteristics such as reference point characteristics, interactive characteristics, trend characteristics and the like have higher explanatory performance.
Based on the same inventive concept, fig. 7 exemplarily illustrates a structure of a path-based feature generation apparatus, which may perform a flow of a path-based feature generation method according to an embodiment of the present invention.
The device includes:
an obtaining module 701, a directed relationship generating module 702, a path generating module 703 and a feature generating module 704;
the obtaining module 701 is configured to obtain an entity and an attribute in a dataset; the entities comprise behavior entities and physical entities;
the directed relationship generation module 702 is configured to generate a directed relationship set including a plurality of first directed relationships according to an entity in the data set; the first directed relationship includes a first entity pointing to a second entity, the first entity and the second entity being any two of the entities in the dataset, the first entity and the second entity being a many-to-one relationship in the dataset;
the directed relationship generation module 702 is further configured to, for any first directed relationship in the directed relationship set, if it is determined that a first entity and the second entity in the first directed relationship have the same attribute, generate a virtual class entity according to the same attribute; generating a second directed relation according to the virtual class entity and the first directed relation, and adding the second directed relation into the directed relation set; the second directed relationship comprises the first entity pointing to the virtual class entity and the virtual class entity pointing to the second entity;
the path generating module 703 is configured to generate a relationship path set of the target entity according to the directed relationship set and the target entity;
the feature generation module 704 is configured to generate a feature set of the target entity according to the relationship path set of the target entity; the feature set of the target entity is used in model training with the target entity as a sample.
Optionally, the path generating module 703 is specifically configured to:
sequentially combining the directed relations in the directed relation set to generate a plurality of relation paths pointing to the target entity;
and forming the plurality of relationship paths into a relationship path set of the target entity.
Optionally, the path generating module 703 is specifically configured to:
if the first relation path which only comprises two entity types and has the path length of 1 is determined to be generated, updating the first relation path into a second relation path which has the same attribute of the two entity types and is connected with the two entity types and has the path length of 2 according to the same attribute of the two entity types;
wherein, the path length refers to the number of directed relationships in the relationship path.
Optionally, the feature generation module 704 is specifically configured to:
generating a feature set of the target entity corresponding to the relationship path according to the relationship paths in the relationship path set in an increasing order of path length, and recording attributes corresponding to the features; the path length refers to the number of directed relationships in the relationship path;
wherein the generating the feature set of the target entity corresponding to the relationship path according to the relationship path includes:
generating a sub-relationship path pointing to the other entities except the target entity in a recursive manner; for each sub-relationship path, determining temporary characteristics of the other entities according to the data of the other entities in the data set; the other entities comprise a real entity and a virtual entity;
and sequentially performing aggregation operation and combination operation according to the temporary characteristics of the other entities and the relationship path to generate a characteristic set of the target entity corresponding to the relationship path.
Optionally, the feature generation module 704 is specifically configured to:
performing attribute reachable combination operation according to the temporary characteristics of the other entities and the relationship path; the attribute reachable property means that when determining that the attributes corresponding to the two temporary features are the same or one of the temporary features is a count and ratio attribute, the two temporary features can be subjected to combined operation.
Optionally, the other entities are virtual class entities, and the virtual class entities point to the target entity;
the feature generation module 704 is further configured to:
after the temporary features of the other entities are determined, performing attribute-reachable combination operation according to the temporary features of the virtual entity and the generated feature set of the target entity corresponding to the relationship path, and determining the reference point features of the target entity.
Optionally, the relationship path set includes a circulatable path; the target entity occurs at least twice in the circulatable path;
the feature generation module 704 is specifically configured to:
generating temporary characteristics of the rest entities under the circulatable path according to a first subrelationship path in the circulatable path; the first subrelational path is a subrelational path which contains the target entity pointing to other entities in the circulated path;
generating interactive characteristics of the target entity by combining a second sub-relation path in the circulatable path according to temporary characteristics of other entities under the circulatable path; the second sub-relationship path is a sub-relationship path including the remaining entities pointing to the target entity.
Optionally, the feature generation module 704 is specifically configured to:
and performing combined operation to determine the trend characteristic of the target entity when the target entity meets the condition that the attribute is reachable between the features of the target entity in the first time period and the features of the target entity in the second time period.
Optionally, the apparatus further comprises a storage module 705;
the storage module 705 is configured to store, after the generating the feature set of the target entity, an entity set, the directed relationship set, the temporary feature set of the virtual class entity, the feature set of the target entity, and the relationship path set of the target entity as storage data; multiplexing the stored data into other data sets; or the stored data is multiplexed into the engineering of other target models.
Based on the same inventive concept, an embodiment of the present invention further provides a computing device, including:
a memory for storing program instructions;
and the processor is used for calling the program instructions stored in the memory and executing the characteristic generating method based on the path according to the obtained program.
Based on the same inventive concept, the embodiment of the present invention further provides a computer-readable non-volatile storage medium, which includes computer-readable instructions, and when the computer reads and executes the computer-readable instructions, the computer is caused to execute the above path-based feature generation method.
Based on the same technical concept, the embodiment of the present invention provides a server, configured to execute the above path-based feature generation method, as shown in fig. 8, including at least one processor 801 and a memory 802 connected to the at least one processor, where a specific connection medium between the processor 801 and the memory 802 is not limited in the embodiment of the present invention, and the processor 801 and the memory 802 are connected through a bus in fig. 8 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.
In the embodiment of the present invention, the memory 802 stores instructions executable by the at least one processor 801, and the at least one processor 801 may execute the steps included in the aforementioned interactive operation and maintenance method by executing the instructions stored in the memory 802.
The processor 801 is a control center of the server, and may connect various parts of the server by using various interfaces and lines, and implement data processing by executing or executing instructions stored in the memory 802 and calling up data stored in the memory 802. Optionally, the processor 801 may include one or more processing units, and the processor 801 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes an instruction issued by an operation and maintenance worker. It will be appreciated that the modem processor described above may not be integrated into the processor 801. In some embodiments, the processor 801 and the memory 802 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 801 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the disclosed method in connection with the interactive operation and maintenance embodiment may be directly embodied in a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.
Memory 802, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 802 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 802 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 of embodiments of the present invention may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for generating features based on paths, comprising:
acquiring entities and attributes in a data set; the entities comprise behavior entities and physical entities;
generating a directed relationship set comprising a plurality of first directed relationships according to entities in the data set; the first directed relationship includes a first entity pointing to a second entity, the first entity and the second entity being any two of the entities in the dataset, the first entity and the second entity being a many-to-one relationship in the dataset;
for any first directed relation in the directed relation set, if it is determined that a first entity and the second entity in the first directed relation have the same attribute, generating a virtual entity according to the same attribute; generating a second directed relation according to the virtual class entity and the first directed relation, and adding the second directed relation into the directed relation set; the second directed relationship comprises the first entity pointing to the virtual class entity and the virtual class entity pointing to the second entity;
generating a relationship path set of the target entity according to the directed relationship set and the target entity;
generating a feature set of the target entity according to the relationship path set of the target entity; the feature set of the target entity is used in model training with the target entity as a sample;
generating a feature set of the target entity according to the relationship path set of the target entity, including:
generating a feature set of the target entity corresponding to each relationship path according to each relationship path in the relationship path set in an increasing order of path length, and recording attributes corresponding to each feature; the path length refers to the number of directed relationships in the relationship path;
generating a feature set of the target entity corresponding to each relationship path according to each relationship path in the relationship path set includes:
aiming at any relation path, generating a sub-relation path pointing to other entities except the target entity in a recursive mode; for each sub-relationship path, determining temporary characteristics of the other entities according to the data of the other entities in the data set; the other entities comprise a real entity and a virtual entity;
and sequentially performing aggregation operation and combination operation according to the temporary characteristics of the other entities and the relationship path to generate a characteristic set of the target entity corresponding to the relationship path.
2. The method of claim 1, wherein generating the set of relationship paths for the target entity from the set of directed relationships and the target entity comprises:
sequentially combining the directed relations in the directed relation set to generate a plurality of relation paths pointing to the target entity;
and forming the plurality of relationship paths into a relationship path set of the target entity.
3. The method of claim 2, wherein the sequentially combining the directed relationships in the set of directed relationships to generate a plurality of relationship paths to the target entity comprises:
if the first relation path which only comprises two entity types and has the path length of 1 is determined to be generated, updating the first relation path into a second relation path which has the same attribute of the two entity types and is connected with the two entity types and has the path length of 2 according to the same attribute of the two entity types;
wherein, the path length refers to the number of directed relationships in the relationship path.
4. The method of claim 1, wherein performing a combining operation based on the temporal characteristics of the remaining entities and the relationship path comprises:
performing attribute reachable combination operation according to the temporary characteristics of the other entities and the relationship path; the attribute-reachable combination operation refers to performing combination operation on the two temporary features when determining that the attributes corresponding to the two temporary features are the same or one of the temporary features is a count attribute and a ratio attribute.
5. The method of claim 4, wherein the remaining entities are virtual class entities and the virtual class entities point to the target entity;
after determining the temporary characteristics of the remaining entities, the method includes:
and performing attribute reachable combined operation according to the temporary characteristics of the virtual entity and the generated characteristic set of the target entity corresponding to the relationship path, and determining the reference point characteristics of the target entity.
6. The method of claim 1, wherein the set of relational paths includes a circulatable path; the target entity occurs at least twice in the circulatable path;
generating a feature set of the target entity according to the relationship path set of the target entity, further comprising:
generating temporary characteristics of the rest entities under the circulatable path according to a first subrelationship path in the circulatable path; the first subrelational path is a subrelational path which contains the target entity pointing to other entities in the circulated path;
generating interactive characteristics of the target entity by combining a second sub-relation path in the circulatable path according to temporary characteristics of other entities under the circulatable path; the second sub-relationship path is a sub-relationship path including the remaining entities pointing to the target entity.
7. The method of claim 1, wherein the generating the feature set of the target entity from the set of relationship paths of the target entity, further comprises:
and performing combined operation to determine the trend characteristic of the target entity when the target entity meets the condition that the attribute is reachable between the features of the target entity in the first time period and the features of the target entity in the second time period.
8. The method of claim 1, wherein after the generating the set of features for the target entity, further comprising:
storing an entity set, the directed relationship set, the temporary feature set of the virtual entity, the feature set of the target entity and the relationship path set of the target entity to serve as storage data;
multiplexing the stored data into other data sets; or
And multiplexing the stored data into the engineering of other target models.
9. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 8 in accordance with the obtained program.
10. A computer-readable non-transitory storage medium including computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 8.
CN201911254655.0A 2019-12-10 2019-12-10 Path-based feature generation method and device Active CN110688433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911254655.0A CN110688433B (en) 2019-12-10 2019-12-10 Path-based feature generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911254655.0A CN110688433B (en) 2019-12-10 2019-12-10 Path-based feature generation method and device

Publications (2)

Publication Number Publication Date
CN110688433A CN110688433A (en) 2020-01-14
CN110688433B true CN110688433B (en) 2020-04-21

Family

ID=69117781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911254655.0A Active CN110688433B (en) 2019-12-10 2019-12-10 Path-based feature generation method and device

Country Status (1)

Country Link
CN (1) CN110688433B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418520B (en) * 2020-11-22 2022-09-20 同济大学 Credit card transaction risk prediction method based on federal learning
CN113688191B (en) * 2021-08-27 2023-08-18 阿里巴巴(中国)有限公司 Feature data generation method, electronic device, and storage medium
CN113792800B (en) * 2021-09-16 2023-12-19 创新奇智(重庆)科技有限公司 Feature generation method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6338579B2 (en) * 2012-07-24 2018-06-06 アビニシオ テクノロジー エルエルシー Mapping entities in the data model
EP2891077A4 (en) * 2012-08-29 2016-04-13 Hewlett Packard Development Co Querying structured and unstructured databases
KR20170021227A (en) * 2014-04-02 2017-02-27 시맨틱 테크놀로지스 프로프라이어터리 리미티드 Ontology mapping method and apparatus
CN105938479B (en) * 2016-03-31 2019-10-22 华南师范大学 A kind of structure conversion method of relation table and non-relation table
CN106447066A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Big data feature extraction method and device
CN106445988A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Intelligent big data processing method and system
CN109919608B (en) * 2018-11-28 2024-01-16 创新先进技术有限公司 Identification method, device and server for high-risk transaction main body

Also Published As

Publication number Publication date
CN110688433A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN110688433B (en) Path-based feature generation method and device
US8583649B2 (en) Method and system for clustering data points
CN107203774A (en) The method and device that the belonging kinds of data are predicted
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN111967521B (en) Cross-border active user identification method and device
CN106251178A (en) Data digging method and device
CN111242356A (en) Wealth trend prediction method, device, equipment and storage medium
Phillips et al. Testing the martingale hypothesis
JP2016206983A (en) Loan risk evaluation parameter calculating device, program and method
CN106776757B (en) Method and device for indicating user to complete online banking operation
CN114782201A (en) Stock recommendation method and device, computer equipment and storage medium
CN107330709B (en) Method and device for determining target object
Jiang et al. Intertemporal pricing via nonparametric estimation: Integrating reference effects and consumer heterogeneity
CN106874286B (en) Method and device for screening user characteristics
CN106815290B (en) Method and device for determining attribution of bank card based on graph mining
CN108537654B (en) Rendering method and device of customer relationship network graph, terminal equipment and medium
CN112950350B (en) Loan product recommendation method and system based on machine learning
CN114119168A (en) Information pushing method and device
CN114117052A (en) Method and device for classifying business data reports
CN107463564A (en) The characteristic analysis method and device of data in server
CN112950225A (en) Customer category determination method, device and storage medium
CN112232945A (en) Method and device for determining personal customer credit
US20150046317A1 (en) Customer Income Estimator With Confidence Intervals
CN111752662A (en) Bank transaction interface display method and device
KR100686466B1 (en) System and method for valuing loan portfolios using fuzzy clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant