CN111460047A

CN111460047A - Method, device and equipment for constructing characteristics based on entity relationship and storage medium

Info

Publication number: CN111460047A
Application number: CN202010156947.7A
Authority: CN
Inventors: 刘利
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2020-07-28

Abstract

The application discloses a feature construction method based on entity relationship, which comprises the following steps: acquiring a main table and a plurality of auxiliary tables associated with the main table in a relational database, wherein the main table is provided with a main key column and a plurality of external key columns, each entry in the main table corresponds to an entity, and the auxiliary tables are associated with the main table through the external keys of the main table; constructing a relationship graph between directed tables by taking the main table and the auxiliary table as nodes and taking the association relationship between every two main tables and two auxiliary tables as edges; taking the corresponding nodes of the main table as starting points, and traversing the relationship graph among the tables to acquire the relationship data between each entity in the main table and the corresponding auxiliary table; and performing conversion calculation on the relationship data between the tables based on a preset conversion function to construct the characteristics corresponding to each entity in the main table. The application also discloses a device, equipment and a storage medium for constructing the characteristics based on the entity relationship. The method and the device for modeling the data have the advantages that the characteristic data are collected based on the relation graph among the tables, the characteristics of the data can be expressed in multiple dimensions on the whole, and therefore the modeling success rate is improved.

Description

Method, device and equipment for constructing characteristics based on entity relationship and storage medium

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for feature construction based on entity relationships.

Background

Feature engineering is the most time and effort consuming part of the data analysis, it is not a deterministic step like algorithms and models, it is more an engineering experience and trade-off, and therefore there is no uniform approach. The feature construction is an important content of feature engineering, and means that new features are constructed from original data, so that some features with physical significance can be found from the original data. Assuming that the raw data is tabular data, features are typically created using mixed or combined attributes, or by decomposing or slicing the original features.

The existing feature construction usually needs to perform function calculation on different lists with association relations by means of forward association relations or backward association relations among all data tables, so that features are constructed.

Disclosure of Invention

The application mainly aims to provide a method, a device, equipment and a storage medium for constructing characteristics based on entity relationships, and aims to solve the technical problem that the characteristics of data cannot be expressed in multiple dimensions on the whole in the existing characteristic engineering technology.

In order to achieve the above object, the present application provides a method for constructing a feature based on an entity relationship, where the method for constructing a feature based on an entity relationship includes the following steps:

acquiring a main table and a plurality of auxiliary tables associated with the main table in a relational database, wherein the main table is provided with a main key column and a plurality of external key columns, each entry in the main table corresponds to an entity, and the auxiliary tables are associated with the main table through the external keys of the main table;

constructing a relationship graph between directed tables by taking the main table and the auxiliary table as nodes and taking the association relationship between every two main tables and every two auxiliary tables as edges;

traversing the relationship graph between the tables by taking the corresponding nodes of the main table as starting points to acquire relationship data between each entity in the main table and the corresponding auxiliary table;

and performing conversion calculation on the relation data between the tables based on a preset conversion function to construct the characteristics corresponding to each entity in the main table.

Optionally, the edge M of the inter-table relationship graph is defined as follows:

wherein, T_i-1、T_iIs a table in a database, C_iIs a linked list T_i-1、T_iI is a positive integer;

A. when C is present_iIs T_i-1When the main key of (1), T_i-1And T_iIs a one-to-many incidence relation;

B. when C is present_iIs not only T_i-1Is also T_iWhen the main key of (1), T_i-1And T_iIs a one-to-one association relationship;

C. when C is present_iIs T_iWhen the main key of (1), T_i-1And T_iIs a many-to-one incidence relation;

D. when C is present_iIs neither T nor_i-1Is not T, is not_iWhen the main key of (1), T_i-1And T_iIs a many-to-many incidence relation.

Optionally, traversing a connection path P corresponding to each entity in the relationship graph between tables_kThe table is formed by sequentially connecting edges M of the relationship graph among the k tables, and the following definition mode is adopted:

wherein, T_i-1、T_iRepresenting tables in a database, C_iIs a linked list T_i-1、T_iI and k are positive integers, i is any positive integer from 2 to (k-1), and T₀Represents the main table, T_iRepresenting a sub-table, C representing the last sub-table T in the connection path_kAttribute column (2).

Optionally, traversing the inter-table relationship graph with the primary table corresponding node as a starting point to acquire inter-table relationship data between each entity in the primary table and the corresponding secondary table includes:

taking the corresponding node of the main table as a starting point and according to the connection path P_iTraversing the relationship graph between the tables, and generating a relationship tree corresponding to each entity in the main table and the connection path corresponding to the auxiliary table;

based on the traversal depth of the inter-table relationship graph, grouping operation is respectively carried out on the relationship trees corresponding to the entities so as to collect the inter-table relationship data of the entities in the main table and the sub-table;

wherein a root node of the relationship tree corresponds to an entity in the main table and leaf nodes of the relationship tree correspond to nodes in the main table by traversing the connection path P_iCollected secondary table T_kC, the child node with traversal depth i corresponds to the link path P by traversing the link path P_iCollected secondary table T_iOuter key column C in_i。

Optionally, after the step of performing conversion calculation on the relationship data between the tables based on the preset conversion function to construct the features corresponding to the entities in the main table, the method further includes:

checking whether repeated features exist in the constructed features;

if the repeated features exist, deleting the repeated features, and adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable;

if the repeated features do not exist, adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable;

if the correlation exists and the chi-squared value is larger than the characteristic of the preset chi-squared value threshold value, the characteristic is reserved, and if not, the characteristic is deleted.

Optionally, the conversion function includes at least: any one or more of an averaging function, a maximum function, a minimum function, a sum function, a difference function, and a product function.

Further, in order to achieve the above object, the present application also provides a feature construction apparatus based on entity relationship, where the feature construction apparatus based on entity relationship includes:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a main table and a plurality of auxiliary tables associated with the main table in a relational database, the main table is provided with a main key column and a plurality of external key columns, each entry in the main table corresponds to an entity, and the auxiliary tables are associated with the main table through the external keys of the main table;

the table-to-table relational graph building module is used for building a directed table-to-table relational graph by taking the main table and the auxiliary table as nodes and taking the association relationship between every two of the main table and the auxiliary table as an edge;

the traversal module is used for traversing the inter-table relationship graph by taking the corresponding node of the main table as a starting point so as to acquire the inter-table relationship data of each entity in the main table and the corresponding auxiliary table;

and the characteristic construction module is used for performing conversion calculation on the relation data between the tables based on a preset conversion function so as to construct the characteristics corresponding to each entity in the main table.

D. when C is present_iIs neither T nor_i-1Is not T, is not_iWhen the main key of (1), T_i-1And T_iIs a many-to-many incidence relation。

Optionally, the traversal module is specifically configured to:

Optionally, the entity relationship-based feature construction apparatus further includes:

the characteristic inspection module is used for inspecting whether repeated characteristics exist in the constructed characteristics; if the repeated features exist, deleting the repeated features, and adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable; if the repeated features do not exist, adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable; if the correlation exists and the chi-squared value is larger than the characteristic of the preset chi-squared value threshold value, the characteristic is reserved, and if not, the characteristic is deleted.

Further, to achieve the above object, the present application also provides an entity relationship based feature construction device, where the entity relationship based feature construction device includes a memory, a processor, and a feature construction program stored in the memory and executable on the processor, and when executed by the processor, the feature construction program further implements the steps of the entity relationship based feature construction method according to any one of the above.

Further, to achieve the above object, the present application also provides a computer readable storage medium, which stores a feature construction program, and when the feature construction program is executed by a processor, the computer readable storage medium further implements the steps of the entity relationship based feature construction method according to any one of the above items.

The method and the device have the advantages that the incidence relation among the data tables is combed through the relation graph among the tables, then the data related to the entities are collected based on the relation graph among the tables, and finally the characteristics are constructed based on the collected data. Characteristic data are collected based on the relation graph between tables, so that the characteristics of the data can be expressed in multiple dimensions on the whole, and the modeling success rate is improved. In addition, the method and the device do not need human participation in the characteristic construction process of the relational database, can serve different data sets, and further can help to improve the data modeling efficiency, help managers to make decisions quickly at low cost, and support the quick development of enterprise business.

Drawings

FIG. 1 is a schematic structural diagram of an apparatus operating environment constructed based on characteristics of entity relationships according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of a first embodiment of a method for constructing features based on entity relationships according to the present application;

FIG. 3 is a table-to-table relationship diagram of an embodiment of the method for building a feature based on an entity relationship according to the present application;

FIG. 4 is a schematic diagram illustrating a detailed flow of step S30 in FIG. 2;

FIG. 5 is a schematic diagram of a relationship tree according to an embodiment of the method for constructing features based on entity relationships;

FIG. 6 is a flowchart illustrating a second embodiment of the method for constructing features based on entity relationships according to the present application;

fig. 7 is a functional module diagram of an embodiment of the apparatus for constructing features based on entity relationships according to the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The application provides a feature construction device based on entity relationships.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an apparatus operating environment constructed based on features of entity relationships according to an embodiment of the present application.

As shown in fig. 1, the entity relationship-based feature construction apparatus includes: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a Keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the hardware configuration of the entity relationship based feature construction apparatus shown in fig. 1 does not constitute a limitation of the entity relationship based feature construction apparatus, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a computer program. The operating system is a program for managing and controlling the feature building device and the software resources based on the entity relationship, and supports the operation of the feature building program and other software and/or programs.

In the hardware structure of the entity relationship based feature construction device shown in fig. 1, the network interface 1004 is mainly used for accessing a network; the user interface 1003 is mainly used for detecting a confirmation instruction, an editing instruction, and the like. And the processor 1001 may be configured to invoke the feature construction program stored in the memory 1005 and perform the operations of the following embodiments of the entity relationship based feature construction method.

Based on the above feature construction device hardware structure based on the entity relationship, various embodiments of the feature construction method based on the entity relationship are provided.

Referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of the method for constructing features based on entity relationships according to the present application. In this embodiment, the method for constructing a feature based on an entity relationship includes the following steps:

step S10, obtaining a main table and a plurality of auxiliary tables associated with the main table in a relational database, wherein the main table is provided with a main key column and a plurality of external key columns, each entry in the main table corresponds to an entity, and the auxiliary tables are associated with the main table through the external keys of the main table;

in this embodiment, feature construction is performed on data tables in a relational database, and meanwhile, in order to facilitate construction of a directed table relational graph between the tables, a primary table and a plurality of secondary tables are required to exist for an object data table of feature construction. The primary table is provided with a plurality of external key columns, each entry corresponds to one entity, and the secondary table is associated with the primary table through the external keys of the primary table.

For example, a simple relational database with 4 tables as shown in tables 1-4 below.

(1) Main table (main): containing information about the arrival time of the train. The target column is the arrival time. Each entry in the master table is uniquely identified by a MessageID column that corresponds to a message sent when a train arrives at a station. The main table has two external keys: StationID and TrainID.

TABLE 1

TrainID	StationID	Arrival time	MessageID
				IRE01	Dublin	2017-01-01 10:02:00	1
IRE01	AshTown	2017-01-01 10:12:00	2
				IRE01	Maynooth	2017-01-01 10:24:00	3
IRE01	Dublin	2017-01-02 10:03:00	4
				IRE01	AshTown	2017-01-02 10:15:00	5
IRE01	Maynooth	2017-01-02 10:27:30	6
				IRE02	Dublin	2017-01-01 11:00:00	7
IRE02	Cork	2017-01-01 14:20:00	8

(2) Delay table (delay): including train delay information. It is similar to the main table, but the arrival time translates to a delay in seconds.

TABLE 2

TrainID	StationID	Delay	TimeStamp
				IRE01	Dublin	120	2017-01-01 10:02:00
IRE01	AshTown		60					2017-01-01 10:12:00
				IRE01	Maynooth		60		2017-01-01 10:24:00
IRE01	Dublin	180	2017-01-02 10:03:00
				IRE01	AshTown	240	2017-01-02 10:15:00
IRE01	Maynooth	240	2017-01-02 10:27:30
				IRE02	Dublin	0	2017-01-01 11:00:00
IRE02	Cork		60					2017-01-01 14:20:00

(3) Information table (info): detailed information about the train, such as the train grade.

TABLE 3

TrainID	Trainclass	Max Speed(km/h)
			IRE01	Regional	120
IRE02	Intercity	240

(4) Event table (event): event logs of station occurrences where the train is scheduled to arrive.

TABLE 4

StationID	Event	TimeStamp
			Dublin	Roadwork	2017-01-01 10:00:00
Dublin	Roadwork	2017-01-01 18:00:00
			AshTown	Roadwork	2017-01-01 10:00:00
Dublin	Strike	2017-01-02 9:00:00
			AshTown	Strike	2017-01-02 9:00:00

Step S20, constructing a directed table relational graph by taking the main table and the auxiliary table as nodes and taking the association relationship between every two main tables and the auxiliary table as an edge;

in this embodiment, the relationship graph between tables is a relationship graph, where nodes are tables and edges are connections between tables. The relationships between the tables can be associated by the inter-table relationship diagram, and the features of the data can be expressed in a plurality of dimensions as a whole. It should be noted that there are various association relationships between two tables correspondingly connected with an edge of the inter-table relationship diagram, such as a one-to-one association relationship, a one-to-many association relationship, or a many-to-many association relationship.

Step S30, traversing the relationship graph between tables with the corresponding nodes of the main table as starting points to collect the relationship data between each entity in the main table and the corresponding sub table;

in this embodiment, data may be collected for each entity in the main table through an arbitrary path from the node corresponding to the main table. Data is collected by traversing the inter-table relationship graph for different paths in the graph, which is equivalent to exploring different relationships between tables. In general, the number of paths is exponential in relation to the depth of the graph, so it is necessary to limit the maximum depth d of traversal, preferably d equal to 2.

And step S40, performing conversion calculation on the relation data between the tables based on a preset conversion function to construct the characteristics corresponding to each entity in the main table.

In this embodiment, the relational data between tables collected through the above steps is actually a list of data of the last table in the traversal path, and the data of the tables are usually numbers, categories, timestamps and text types, so that data conversion functions are supported, such as converting label-like data into numerical variables and converting timestamps into 4 different features, namely week (1-7), day (1-28/30/31), month (1-12) or hour (1-24).

In this embodiment, a large number of new features having practical significance can be constructed by performing conversion calculation on the acquired data. For example, based on the data in tables 1-4, by calculating the difference between the arrival time and the delay time, the expected scheduled arrival time can be obtained; by comparing the delay time, the station name with the longest delay and the station name with the shortest delay can be obtained; by comparing the delay time of the same train arriving at the same station on different dates, the optimal travel time can be recommended.

Optionally, the conversion function at least includes: any one or more of an averaging function, a maximum function, a minimum function, a sum function, a difference function, and a product function.

In this embodiment, the association relationship between the data tables is sorted out through the inter-table relationship diagram, then data related to each entity is collected based on the inter-table relationship diagram, and finally, features are constructed based on the collected data. Characteristic data are collected based on the relation graph between tables, so that the characteristics of the data can be expressed in multiple dimensions on the whole, and the modeling success rate is improved. In addition, the embodiment does not need human participation in the feature construction process of the relational database, and can serve different data sets, so that the data modeling efficiency can be improved, a manager can be helped to make a decision quickly at a low cost, and the quick development of enterprise business is supported.

Further, in an embodiment of the feature construction method based on entity relationships, an edge M of the relationship graph between tables is defined as follows:

The inter-table relationship diagram in this embodiment is a relationship diagram, in which tables are used as nodes of the inter-table relationship diagram, and the association relationship between the tables is used as an edge of the inter-table relationship diagram. The relational database described in the above embodiment is taken as an example, and the corresponding relationship diagram between tables is shown in fig. 3.

It should be noted that the inter-table relationship graph is mainly composed of nodes represented by each table and edges represented by the association relationship between tables, where the edge M is defined as follows:

in this embodiment, the inter-table relationship diagram may be specifically constructed by using an adjacency matrix, an edge array, an adjacency list, a cross-linked list, an adjacency multiple list, and the like, where an arrow in the edge M indicates two tables (T) of the edge corresponding node_i-1、T_i) Correlation between (C)_iIs a linked list T_i-1、T_iKey column) and the connection direction.

In the inter-table relationship diagram depicted in figure 3,

the relation is many-to-one, that is, a plurality of records in the main table correspond to one record in the information table; while

The association relationship is many-to-many, that is, there are multiple records in the main table, and each record corresponds to multiple records in the delay table.

In this embodiment, the relationships between the tables can be associated by the inter-table relationship diagram, and the features of the data can be expressed in a plurality of dimensions as a whole. It should be noted that there are various association relationships between two tables connected to each other in correspondence with the edges of the inter-table relationship diagram, for example, a one-to-one association relationship or a one-to-many association relationship.

Further, in a specific embodiment of the method for constructing features based on entity relationships, a connection path P corresponding to each entity in the relationship graph among the history tables_kThe edge sequence is formed by connecting edges M of the relationship graph among the k tables in sequence, and adopts the following definition mode:

Referring to fig. 4, fig. 4 is a schematic view of a detailed flow of the step S30 in fig. 2. In this embodiment, the step S30 further includes:

step S301, using the corresponding node of the main table as the starting point, according to the connection path P_iTraversing the relationship graph between the tables, and generating a relationship tree corresponding to each entity in the main table and the connection path corresponding to the auxiliary table;

step S302, based on the traversal depth of the inter-table relationship graph, grouping operation is respectively carried out on the relationship trees corresponding to the entities so as to collect the inter-table relationship data of the entities in the main table and the sub-table;

In the above embodiment, the relational database is taken as an example, and in this embodiment, the data collection for the entity e may be represented as a relational tree, as shown in fig. 5.

In this embodiment, the entity e is the main table in which the MessageID is rainid ═ IRE01, and the traversed connection path is:

the root of the relationship tree in FIG. 5 corresponds to entity e, while the leaf nodes of the relationship tree correspond to the connection path P by traversal₂Attribute column Event in the collected sublist Event; the child node having the traversal depth of 1 corresponds to the connection path P by traversal₁Collected pairThe foreign key column StationID in Table delay; the child node having the traversal depth of 2 corresponds to the connection path P by traversal₂The foreign key column StationID in the collected sublist event. The traversal depth of the relationship graph among the tables can be determined by the number of the secondary tables traversed from the primary table. For example, if the first sub table is traversed from the main table, the traversal depth at this time is 1, and if 3 different sub tables are successively traversed from the main table, the traversal depth at this time is 3.

For convenience of description, the relationship tree corresponding to the connection path of each entity is recorded as

A relationship tree representing the connection path P of entity e. In addition, in order to collect data from multiple dimensions, the collected data of different traversal depths are further grouped while the data is collected.

Taking fig. 5 as an example, grouping operations at different traversal depths represent different information of events affecting the train. For example, each child node with depth of 1 in fig. 5 corresponds to the StationID attribute information corresponding to the traninid IRE01 in the main table, and the grouping operation

Information of the event table that affects the train delay is represented.

Referring to fig. 6, fig. 6 is a schematic flowchart of a second embodiment of the method for constructing features based on entity relationships according to the present application. Based on the first embodiment, the present embodiment further includes, after the step S40, the following steps:

step S50, checking whether the constructed features have repeated features;

step S60, if the repeated features exist, deleting the repeated features, and adopting chi-square hypothesis to check whether the correlation exists between the features and the target variables;

step S70, if there is no repeated feature, checking whether there is correlation between the feature and the target variable by adopting chi-square hypothesis;

in step S80, if there is a correlation and the chi-squared value is greater than the preset chi-squared value threshold, the feature is retained, otherwise, the feature is deleted.

In this embodiment, the features obtained by feature construction based on a plurality of dimensions inevitably have repetitive features having the same actual data value although the physical meanings are different, and therefore, it is necessary to further select the constructed features. The feature selection can eliminate irrelevant or excessive features, so that the aims of reducing the number of features, improving the accuracy of a model and reducing the running time are fulfilled.

In the present embodiment, it is preferable to use the chi-square assumption in the filtering method for feature selection. The chi-square hypothesis is to examine the correlation of qualitative independent variables to qualitative dependent variables. Assuming that the independent variable has N values and the dependent variable has M values, considering the difference between the observed value of the sample frequency number of the independent variable equal to i and the dependent variable equal to j and the expectation, constructing statistic, wherein the statistic is the correlation of the independent variable to the dependent variable. If the correlation exists and the chi-square value is larger than the characteristic of the preset chi-square value threshold, the correlation is reserved, and if not, the correlation is deleted.

The application also provides a device for constructing the characteristics based on the entity relationship.

Referring to fig. 7, fig. 7 is a functional module schematic diagram of an embodiment of the feature construction apparatus based on entity relationship according to the present application. In this embodiment, the apparatus for constructing features based on entity relationships includes:

an obtaining module 10, configured to obtain a main table and multiple auxiliary tables associated with the main table in a relational database, where the main table is provided with a main key column and multiple external key columns, each entry in the main table corresponds to an entity, and the auxiliary tables are associated with the main table through an external key of the main table;

the inter-table relationship graph building module 20 is configured to build a directed inter-table relationship graph by using the main table and the auxiliary table as nodes and using an association relationship between each two of the main table and the auxiliary table as an edge;

a traversal module 30, configured to traverse the inter-table relationship graph with the primary table corresponding node as a starting point to acquire inter-table relationship data between each entity in the primary table and the corresponding secondary table;

and the feature construction module 40 is configured to perform conversion calculation on the relationship data between the tables based on a preset conversion function, so as to construct features corresponding to the entities in the main table.

Optionally, in a specific embodiment, the edge M of the inter-table relationship graph is defined as follows:

Optionally, in a specific embodiment, the connection path P corresponding to each entity in the relationship graph between tables is traversed_kThe table is formed by sequentially connecting edges M of the relationship graph among the k tables, and the following definition mode is adopted:

Optionally, in a specific embodiment, the traversal module is specifically configured to:

Optionally, in a specific embodiment, the apparatus for constructing features based on entity relationships further includes:

the characteristic inspection module is used for inspecting whether repeated characteristics exist in the constructed characteristics; if the repeated features exist, deleting the repeated features, and adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable; if the repeated features do not exist, adopting a chi-square hypothesis to check whether the correlation exists between the features and the target variable; if the correlation exists and the chi-square value is larger than the characteristic of the preset chi-square value threshold, the correlation is reserved, and if not, the correlation is deleted.

Optionally, in a specific embodiment, the conversion function at least includes: any one or more of an averaging function, a maximum function, a minimum function, a sum function, a difference function, and a product function.

Based on the same embodiment description content as the method for constructing the feature based on the entity relationship in the present application, the embodiment of the device for constructing the feature based on the entity relationship is not described in detail in this embodiment.

The present application also provides a non-volatile computer-readable storage medium.

In this embodiment, a computer-readable storage medium stores a feature construction program, and when the feature construction program is executed by a processor, the feature construction program further implements the steps of the entity relationship based feature construction method according to any one of the embodiments. The method implemented when the feature building program is executed by the processor may refer to various embodiments of the method for building a feature building program based on an entity relationship in the present application, and therefore, redundant description is not repeated.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM), and includes several instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the drawings, but the present application is not limited to the above-mentioned embodiments, which are only illustrative and not restrictive, and those skilled in the art can make many changes and modifications without departing from the spirit and scope of the present application and the protection scope of the claims, and all changes and modifications that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A feature construction method based on entity relationship is characterized by comprising the following steps:

2. The method for constructing features based on entity relationship as claimed in claim 1, wherein the edge M of the relationship graph between tables is defined as follows:

3. The method of claim 2, wherein the connection path P corresponding to each entity in the inter-table relationship graph is traversed_kThe table is formed by sequentially connecting edges M of the relationship graph among the k tables, and the following definition mode is adopted:

4. The method for constructing features based on entity relationships according to claim 3, wherein traversing the inter-table relationship graph with the primary table corresponding node as a starting point to collect the inter-table relationship data between each entity in the primary table and the corresponding secondary table comprises:

5. The method for constructing characteristics based on entity relationships according to any one of claims 1 to 4, wherein after the step of performing conversion calculation on the relationship data between tables based on the preset conversion function to construct the characteristics corresponding to each entity in the main table, the method further comprises:

checking whether repeated features exist in the constructed features;

6. The entity relationship based feature construction method according to claim 1, wherein the conversion function at least comprises: any one or more of an averaging function, a maximum function, a minimum function, a sum function, a difference function, and a product function.

7. An entity relationship-based feature construction apparatus, comprising:

8. The entity relationship based feature construction apparatus according to claim 7, wherein the conversion function comprises at least: any one or more of an averaging function, a maximum function, a minimum function, a sum function, a difference function, and a product function.

9. An entity relationship based feature construction device, characterized in that the entity relationship based feature construction device comprises a memory, a processor and a feature construction program stored on the memory and executable on the processor, the feature construction program when executed by the processor further implementing the steps of the entity relationship based feature construction method according to any one of claims 1-6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a feature construction program, which when executed by a processor further implements the steps of the entity relationship based feature construction method according to any one of claims 1 to 6.