WO2022260670A1

WO2022260670A1 - Systems and methods for extracting entity features from a relational database

Info

Publication number: WO2022260670A1
Application number: PCT/US2021/036754
Authority: WO
Inventors: Shaozhi YE; Wei Dong; Siming Li
Original assignee: Industrial Artificial Intelligence Inc.
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2022-12-15

Abstract

A computing system performs the method of extracting entity features from a relational database. The method includes: extracting the table structures from the relational database; converting the table structures to neural network structures, and each entity in the relational database has a corresponding embedding subnet in the neural network structures: training the neural network structures by feeding a first set of table column values as inputs of the neural network structures and comparing outputs of the neural network structures with a second set of table column values: and generating the trained neural network structures including the embedding subnets as entity features of the relational database.

Description

SYSTEMS AND METHODS FOR EXTRACTING ENTITY FEATURES FROM A RELATIONAL DATABASE

TECHNICAL FIELD

[0001] This application relates generally to data processing technology including, but not limited to, methods, systems, and non-transitory computer-readable media for extracting entity features from a relational database.

BACKGROUND

[0002] Relational databases are widely used nowadays. Massive amount of data is stored in various kinds of relational databases. A relational database conforms to a database schema. Relational databases usually hold raw' business data and sometimes, the data is incomplete or noisy. While the traditional data mining focuses more on the extraction of overall information, the data mining does not handle the incomplete or noisy data.

[0003] It would be beneficial to develop systems and methods to handle incomplete or noisy data automatically for subsequent data cleaning and modeling.

SUMMARY

[0004] The present application describes implementations related to data processing and, more particularly, to system and method of extracting entity features from a relational database.

[0005] In one aspect, a method is implemented at a computer system for extracting entity features from a relational database. The method includes: extracting the table structures from the relational database; converting the table structures to neural network structures, and each entity in the relational database has a corresponding embedding subnet in the neural network structures; training the neural network structures by feeding a first set of table column val ues as inputs of the neural network structures and comparing outputs of the neural network structures with a second set of table column values; and generating the trained neural network structures including the embedding subnets as entity features of the relational database.

[0006] In some embodiments, extracting table structures includes: identifying die tables to be converted; and converting the table column values of the identified tables into a feature value format. [0007] In some embodiments, converting the table structures to the neural network structures includes: creating a respective subnet of the neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as the input table columns to the neural network structures; identifying a second set of table columns as the output table columns to the neural network structures; find creating a respective neural network structure for each table.

[0008] In some embodiments, the first set of table columns is table columns identified by primary keys or foreign keys, and the second set of table columns is table columns not identified by the primary keys or the foreign keys,

[0009] In some embodiments, training the neural network structures includes: selecting a respective neural network for each table, loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for each table; reducing the difference between respective outputs of the respective neural network and the respective fitting outputs; and generating a prediction for each of table column values of the respective outputs.

[0010] In some embodiments, training the neural network structures further includes: generating a loss function for the each of the table column values of the respective outputs by comparing the difference between the actual column value and the prediction for the each of the table column values of the respective outputs.

[0011] In some embodiments, training the neural network structures further includes: training the neural networks including the respective neural network from the each table in a round robin scheduling.

[0012] In some embodiments, training the neural network structures further includes: converting a respective loss function from the previous training iterations to a respective probability, and sampling a respective column value for training based on the respective probability.

[0013] In some embodiments, the neural network structures include multi-layer fully- connected networks .

[0014] In another aspect, some implementations include a computer system including one or more processors and memory having instructions stored thereon, winch when executed by the one or more processors cause the processors to perform any of the above methods. [0015] In yet another aspect, some implementations include a non-transitory computer-readable medium, having instructions stored thereon, which when executed by one or more processors cause the processors to perform any of the above methods.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The accompanying drawings, which are included to provide a further understanding of the implementations and are incorporated herein and constitute a part of the specification, illustrate the described implementations and together with the description serve to explain the underlying principles. like reference numerals refer to corresponding parts. [0017] Figure 1 is an exemplary data processing environment in accordance with some embodiments.

[0018] Figure 2 is an exemplary neural network for processing data in a machine learning model, in accordance with some implementations of the present disclosure.

[0019] Figure 3 is a diagram illustrating a relational database schema structure according to some implementations of the present disclosure.

[0020] Figure 4 is a flow diagram illustrating an entity' feature extracting process according to some implementations of the present disclosure.

[0021] Figure 5 is a block diagram illustrating the neural network structures converted or complied from the database structures according to some implementation of the present disclosure.

[0022] Figure 6 is a flowchart illustrating an exemplary process by which a computing system implements the techniques of extracting an entity feature from a relational database in accordance with some implementations of the present disclosure,

DETAILED DESCRIPTION

[0023] Reference will now be made in detail to specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to assist in understanding the subject matter presented herein. But it will he apparent to one of ordinary skill in the art that various alternatives may be used without departing from the scope of claims and the subject matter may be practiced without these specific details. For example, it will be apparent to one of ordinary' skill in the art that the subject matter presented herein can be implemented on many types of electronic devices. [0024] Figure 1 is an exemplary data processing environment 100 in which one or more network-connected client devices 102 and one or more server systems 104 interact with each other locally or remotely via one or more communication networks 106 in accordance with some embodiments.

[0025] In some embodiments, the server systems 104, such as 104A and 104B are physically remote from, but are communicatively coupled to the one or more client devices 102. In some embodiments, a client device 102(e.g., 102A, 102B) includes a desktop computer. In some embodiments, a client device 102 (e.g., 1Q2C) includes a mobile device, e.g., a mobile phone, a tablet computer and a laptop computer. Each client device 102 can collect data or user inputs, execute user applications, and present outputs on its user interface. The collected data or user inputs can be processed locally at the client device 102 and/or remotely by the server(s) 104. Each client device 102 communicates with another client device 102 or the server systems 104 using the one or more communication networks 106.

The communication networks 106 can be one or more netw orks having one or more types of topologies, including but not limited to the Internet, intranets, local area networks (LANs), cellular networks, Ethernet, telephone networks, Bluetooth personal area networks (PAN), and the like. In some embodiments, two or more client devices 102 in a sub-network are coupled via a wired connection, while at least some client devices 102 in the same subnetwork are coupled via a local radio communication network (e.g., ZigBee, Z-Wave,

Insteon, Bluetooth, Wi-Fi and other radio communication networks). In an example, a client device 102 establishes a connection to the one or more communication networks 106 either directly (e.g., using 3G/4G connectivity to a wireless carrier), or through a network interface (e.g., a router, switch, gateway, hub, or an intelligent, dedicated whole-home control node), or through any combination thereof,

[0026] Each of the server systems 104 includes one or more processors 110 and memory storing instructions for execution by the one or more processors 110. The server system 104 also includes an input/output interface to the c!ientis) as 114. The one or more server systems 104 provide system data (e.g., hoot files, operating system images, and user applications) to the client devices 102, and in some embodiments, process the data and user inputs received from the client device) s) 102 when the user applications are executed on the client devices 102. The one or more server systems 104 can enable real-time data communication with the client devices 102 that are remote from each other or from the one or more server systems 104. The server system 104A is configured to store an information database such as a relational database 112. The server system 104B is configured to store a machine learning model 116. In some embodiments, the machine learning model and the relational database can be in the same server 104. In some embodiments, the machine learning model 116 is one or more neural networks. A machine model training method can be implemented at one or more of the server systems 104.

[0027] Each client device 102 includes one or more processors and memory storing instructions for execution by the one or more processors. The instructions stored on the client device 102 enable implementation of the web browser and user interface application to the relational database 112 or to the machine learning model 116. The web browser and the user interface application are linked to a user account in the data processing environment 100. Specifically, the browser or user interface application enables a GUI for a user 120 to review information from the relational database 112 or from the machine learning model 116. For example, a user 120 can add notations or make changes to the data or selections on the relational database 112 or on the machine learning model 116. The new user data can be stored on the client device 102 or on the server systems 104.

[0028] Machine learning techniques are applied in the data processing environment100 to process data obtained by an application executed at a client device 102 or loaded from another database or files to identify information contained in the data, match the data with other data, categorize the data, or synthesize related data. Data can include text, images, audios, videos, etc. In these machine learning techniques, machine learning models are created based on one or more neural networks to process the data. These machine learning models are trained with training data before they are applied to process the data. In some embodiments, a machine learning model training method is implemented at a client device102. In some embodiment, a machine learning model training method is jointly implemented at the client device 102 and the server system 104. In some embodiments, a machine learning model can be held at a client device 102. In some embodiments, the client device 102 is configured to automatically and without user intervention, identify, classify or modify the data information from the relational database 112 or from the machine learning model 116. [0029] In some embodiments, both model training and data processing are implemented locally at each individual client device 102 (e.g., the client device 102C). The client device 102C obtains the training data from the one or more server systems 104 including the database 112 and applies the training data to tram the machine learning models. Subsequently to model training, the client device 104C obtains the and processes the data using the trained machine learning models locally. Alternatively, in some embodiments, both model training and data processing are implemented remotely at a server system 104 (e.g., the server system 104B) associated with a client device 102 (e.g. the client device 102 A). The server 104B obtains the training data from itself, another server 104 or the database 112 and applies the training data to tram the machine learning models 116. The client device 102A obtains the data, sends the data to the server 104B (e.g., in an application) for data processing using the trained machine learning models, receives data processing results from the server 104B, and presents the results on a user interface (e.g., associated with the application). The client device 102A itself implements no or little data processing on the data prior to sending them to the server I04B. Additionally, in some embodiments, data processing is implemented locally at a client device 102 (e.g., the client device 102B), while model training is implemented remotely at a server system 104 (e.g., the server 104B) associated with the client device 102B. The trained machine learning models are optionally stored in the server 104B or another database, such as 112, The client device 102B imports the trained machine learning models from the server 104B or database 112, processes the data using the machine learning models, and generates data processing results to be presented on a user interface locally.

[0030] The machine learning model system 116 includes one or more of a server, a client device, a storage, or a combination thereof. The machine learning model system 116, typically, includes one or more processing units (CPUs), one or more network interfaces, memory, and one or more communication buses for interconnecting these components (sometimes called a chipset). The machine learning model system 116 includes one or more input devices that facilitate user input, such as a keyboard, a mouse, a voice-command input unit or microphone, a touch screen display, a touch-sensitive input pad, a gesture capturing camera, or other input buttons or controls. The machine learning model system 116 also includes one or more output devices that enable presentation of user interfaces and display content, including one or more speakers and/or one or more visual displays.

[0031] Memory includes high-speed random access memory, such as DRAM, SRAM,

DDR RAM, or other random access solid state memory devices; and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory, optionally, includes one or more storage devices remotely located from one or more processing units. Memory, or alternatively the non- volatile memory within memory, includes a non-transitory computer readable storage medium. In some embodiments, memory, or the non-transitory computer readable storage medium of memory, stores programs, modules, and data structures including operating system, input processing module for detecting and processing input data, model training module for receiving training data and establishing a machine learning model for processing data, machine learning module for processing data using machine learning models, etc.

[0032] Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise re-arranged in various embodiments. In some embodiments, memory, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory, optionally, stores additional modules and data structures not described above.

[0033] Figure 2 is an exemplary neural network 200 implemented to process data in a machine learning model 116, in accordance with some implementations of the present disclosure. The machine learning model 116 is established based on the neural network 200.

A corresponding model-based processing module within the server system 104B applies the machine learning model 116 including the neural network 200 to process data that has been converted to a predefined format.

[0034] The neural network 200 includes a collection of neuron nodes 220 that are connected by links 212. Each neuron node 220 receives one or more neuron node inputs and applies a propagation function to generate a neuron node output from the one or more neuron node inputs. As the neuron node output is transmitted through one or more links 212 to one or more other neuron nodes 220, a weight associated with each link 212 is applied to the neuron node output. The one or more neuron node inputs are combined based on corresponding weights according to the propagation function. In an example, the propagation function is a product of a non-linear activation function and a linear weighted combination of the one or more neuron node inputs.

[0035] The neural network consists of one or more layers with the neuron nodes 220.

In some embodiments, the one or more layers include a single layer acting as both an input layer and an output layer. In some embodiments, the one or more layers include an input layer 202 for receiving inputs, an output layer 206 for generating outputs, and zero or more hidden layers/latent layers 204 (e.g., 204A and 204B) between the input and output layers 202 and 206. A deep neural network has more than one bidden layers 204 between the input and output layers 202 and 206. In the neural network 200, each layer is only connected with its immediately preceding and/or immediately following layer. In some embodiments, a layer 202 or 204B is a fully connected layer because each neuron node 220 In the layer 202 or 204B is connected to every neuron node 220 in its immediately following layer.

[0036] In some embodiments, one or more neural networks can be utilized by the machine learning model 116. The one or more neural networks include a fully connected neural network, Multi-layer Perception, Convolution Neural Network, Recurrent Neural Networks, Feed Forward Neural Network, Radial Basis Functional Neural Network, LSTM - Long Short-Term Memory, Auto encoders, and Sequence to Sequence Models, etc.

[0037] In some embodiments, a convolutional neural network (CNN) is utilized in a machine learning model 116 to process data. The CNN employs convolution operations and belongs to a class of deep neural networks 200, · e . a feedforward neural network that only moves data forward from the input layer 202 through the hidden layers to the output layer 206, The one or more hidden layers of the CNN are convolutional layers convolving with a multiplication or dot product. Each neuron node in a convolutional layer receives inputs from a receptive area associated with a previous layer (e.g., six neuron nodes), and the receptive area is smaller than the entire previous layer and may vary based on a location of the convolution layer in the convolutional neural network.

[0038] The training process is a process for calibrating all of the weights for each layer of the learning model using a training data set which is provided in the input layer 202, The training process typically includes two steps, forward propagation find backward propagation, which are repeated multiple times until a predefined convergence condition is satisfied. In the forward propagation, the set of weights for different layers are applied to the input data and intermediate results from the previous layers. In the backward propagation, a margin of error of the output (e.g., aloss function) is measured, and the weights are adjusted accordingly to decrease the error. The activation function is optionally linear, rectified linear unit, sigmoid, hyperbolic tangent, or of other types. In some embodiments, a network bias term is added to the sum of the weighted outputs from the previous layer before the activation function is applied. The network bias provides a perturbation that helps the neural network 200 avoid over fitting the training data. The result of the training includes the network bias parameter for each layer. [0039] In some embodiments, relational databases are modeled after entity -relational models. An entity-relationship model (or ER model) describes entities and relationships between the entities. An ER model is typically implemented in a database. An entity is defined as a physical or a logical thing that can be uniquely defined. For example, an entity may be a physical object such as a product or an event such as a product sale. A relation describes how entities are related or interacted with one another. Entities can be characterized not only by relationships but by attributes (or properties). The attributes include identifiers (IDs) such as primary' keys in the database. For example, a product may have attributes including productyid, category 1, category 2, price, etc. In some embodiments, an entity is represented by a database table. Figure 3 is a diagram illustrating a relational database schema structure according to some implementations of the present disclosure. In Figure 3, database items within rectangular boxes are entities, such as Store, Product and Region. An entity such as Product has many attributes within oval shaped boxes including Product Name, Product Color, and Product Style. The relations between the entities are represented by Rhombus shaped boxes, such as “Has” and “Contains”.

[0040] In some embodiments, the entity features or attributes from the relational database are extracted as an alternative representation of the entities for some subsequent processes, including machine learning, clustering, visualization, and, anomaly detection, etc. [0041] In some embodiments, a relational database is converted to a machine learning model mapping. In some embodiments, a machine learning model has a neural network structure. A relational database conforms to a database schema, and the relational database has a number of tables. Each table has a number of columns and a number of rows. Each column has a name and a type, and the type can include: Primary_' Key ID, Foreign Key IDs (reference to another table), Date/Datetime, Category (including categorical integers, strings, Boolean and others), Numeric, or Free text, etc.

[0042] In some embodiments, a machine learning model, such as 116, has inputs represented by matrix X and outputs represented by matrix Y, Both of the input X and output

Y matrices have the same number of rows. Each of the matrix columns is called a feature.

The X and Y matrices have corresponding rows, and each of the rows is an example. The input X can have missing values.

[0043] In some embodiments, matrix Y’ is used as ground truth for the output matrix

Y at a training stage for the machine learning model, and the matrix Y’ has the same shape, i.e. same rows and columns, as the output matrix Y. In some embodiments, a loss function is used to compare the prediction output matrix Y and tire ground truth Y\ A smaller loss function usually indicates a better fitting between the prediction Y and the ground truth Yv In some examples, because each column (or groups of columns) m matrices Y and Y ’ has special meanings, the loss function of each column (or groups of columns) is evaluated and then summed to produce a total loss.

[0044] In some embodiments, a table schema is converted to a machine learning model format. For example, each table column can potentially be used as an input and/or an output. When a column is used as an input, conditions are specified regarding how to convert tire values, and how to handle NULL (missing) values. When a column is used as an output, conditions are specified regarding how to convert the value, and what loss functions to use in the corresponding output.

[0045] In some embodiments, for both the input and/or output feature value conversions, an encoder of a machine learning model is used. For example, the encoder converts database values to value formats suitable for a machine learning model. For example, database values are converted to only numerical values. In another example, strings are converted into numbers.

[0046] in some embodiments, the encoder can work on the input side to convert a value to be an input format of the machine learning model. In some embodiments, the encoder can work on the output side to convert a value to be an output format of the machine learning model. In both cases, missing values are handled based on some predefined conditions.

[0047] In one embodiment, the compatible conversion methods for the encoder are maintained in a system as a library, with default method of conversion specified for each of the conversion methods. In one example, a relational database is created to store sales forecast information. The database tables to store sales forecast information includes: Store that has columns include store id, city, and state; Product that has columns include product_id, category!, category 2, and price, etc.; and Sales that hits columns include store Jd, product Jd, date, and quantity,

[0048] In some embodiments, store id, and product id are entity IDs, and these IDs are typically the primary' keys of the corresponding tables, for example the Store table and the Product table. And store_id, and productyid are referred as foreign keys in the Sales table. The entity IDs are used as linkages between multiple tables and they do not generate loss function in the machine learning model. The column values for columns city, state, category 1, and category2 are string or categorical values, and they are converted by the encoder to categorical values for the machine learning model. The column values for columns price, and quantity are numerical values, and they can be directly used in the machine learning model.

[0049] In some embodiments, the database specifies whether a column can be NULL.

If the column cannot he NULL, no values should be missing. If a column can be NULL, then a value could be missing (N ULL), In a database, ‘IT’ or empty⁷ string is different from NULL.

[0050] In some embodiments, when a column holds numerical values, NULL, values m the column are converted by the encoder to a default value (or average value). In addition, in some instances, a separate column for storing only 0 or 1 is optionally generated to record whether the original value is a null value. For example, if the row value for the separate column is 1, the original value is not null. If the row value for the separate column is 0, the original value is null.

[0051] In some embodiments, when a column holds categorical values, the encoder adds an additional NULL category to the existing categorical values during the conversion. [0052] in some embodiments, when NULL is in the output values, a loss function will not be generated.

[0053] In some embodiments, entities including entity IDs are converted to embeddings for the machine learning model, for example, a neural network. Here, embeddings refer to vectors of numerical values to represent each example of the table values. When the entity IDs are referred to as the foreign keys in another table, the embeddings from the original tables are copied over in the machine learning model. In some embodiments, the embeddings can be obtained from training a machine learning model.

[0054] In some embodiments, the embeddings can be copied over to the other columns of the table where the entity ID is the primary key. For example, the Product table embedding includes vector values each represented as (category 1, category 2, price). The Store table embedding includes vector values each represented as (city, state). The Sales table embedding then includes vector values each represented at least as (store.state, store. city, product. category I, product, category 2, product. price, quantity).

[0055] In some embodiments, the encoder automatically detects the categorical columns. For example, when a database has categorical columns, they are directly converted into categorical values. In some instances, categories are represented by numerical values or strings that are not explicitly specified to be categories. In that case, a categorical value detection is triggered, i.e., a column is converted into categorical when the ratio of the number of distinct values versus the total number of values in the same column is smaller than a predetermined threshold. For example, the threshold is 1/3 or ¼. When such a conversion is triggered, a dictionary of possible values is built (including a "‘misc” category to handle values that are not seen during the training of the machine learning model). String or numerical values are converted into 0, 1, 2, 3, etc. An exemplary encoder library is shown in Table 1 below.

Table 1. Exemplary Encoder Library

[0056] According to Table 1, for example, if the column type of the original column is integer, the column values can be converted into numerical values. The loss function type used for the column type in the machine learning model is Mean squared error (MSE). [0057] In another example from Table 1, if the column type of the original column is integer, the column values can be converted into categorical values. In this case, the automatic detection is triggered when the ratio of the number of distinct values versus the total number of values m the same column is smaller than a predetermined threshold. The loss function type used for the column type m the machine learning model is cross entropy. [0058] In some embodiments, a user can override default behaviors described in

Table 1. For example, users can change the conversion type for the machine learning model. [0059] Nowadays, massive amount of data is stored in various kinds of relational databases. In some embodiments, methods and systems for extracting the entity' features from a relational database are disclosed herein. As described above, a relational database conforms to a database schema. The data within a relational database is relatively clean to be imported for deep learning. Furthermore, the data structures are clear· and easy to extract from a relational database.

[0060] However, relational databases usually hold raw business data and sometimes, the data is incomplete or noisy. The traditional data mining focuses more on the extraction of overall information while does not handle the incomplete or noisy data.

[0061] Figure 4 is a flow diagram 400 illustrating an entity feature extracting process according to some implementations of the present disclosure. In some embodiments, an automatic extraction process 410 is applied to a relational database 420. As described above, the relational database 420 includes a number of tables. Each table has a number of columns and a number of rows. Each column has a name and a type, and the type can include: Primary Key ID, Foreign Key IDs (reference to another table), Date/Datetime, Category' (including categorical integers, strings, Boolean and/or others), Numeric, or Free text, etc. The relational database 420 is converted to a table structure or schema 422 for preparation of the input to a machine learning model, such as 116.

[0062] In some embodiments, a compiling process 412 is used to convert the table structure or schema 422 to a machine learning model structure. In some embodiments, the machine learning model structure includes a neural network 424. In some embodiments, each entity in the database has a corresponding table, where the entity is identified by a primary key

[0063] In some embodiments, an embedding subnet is created for each entity. The subnet could be a simple embedding layer in a neural network. As described above, the same embedding is used whenever the key is referred to as a primary key in its defining table, or as a foreign key in other tables,

[0064] In some embodiments, a neural network is created for each table. In some embodiments, the primary' key and foreign key columns are compiled to be inputs by default for each of the neural networks. Other columns other than the primary keys and foreign keys are compiled to be outputs by default for each of the neural networks. In some embodiments, a user can override the default input and output columns for each of the neural networks. [0065] Figure 5 is a block diagram illustrating the neural network structures converted or complied from the database structures according to some implementation of the present disclosure. In some embodiments, Figure 5 illustrates a neural network with entityembedding 500. in some embodiments, the component pails identified as “Loss” or “Total Loss” shown in Figure 5 are only used in the training process of the neural networks but not in the final machine learning model or the final neural network model with embedding 426 after the training process.

[0066] As illustrated m Figure 5, a Sales Forecast relational database is compiled into a neural network structure. The Sales Forecast relational database consists of three tables: Store table with columns including storejd, feature!, and feature2, etc,; Date table with columns including date id, year, month, day, and weekday, etc.; and Sales table with columns including store id, date id, and sales.

[0067] The primary key column for the Store table is storejd and the primary key column for the Date table is datejd. In the Sales table, the foreign key columns are storejd and date id that refer to the primary key store id column in the Store table, and the primary' key datejd column in the Date table, respectively. In this example, the primary_'· key for the entity Store is storejd, and the primary key for the entity Date is datejd. Three neural networks: Store network, Sales network, and Date network, are created for the three tables represented by three entities: Store, Date, and Sales, respectively. The primary key column storejd is used as the input to the Store neural network. The primary key column datejd is used as the input to the Date neural netwOrk. The foreign key storejd and datejd columns are used as the inputs for the Sales neural netwOrk.

[0068] As shown in Figure 5, the whole structure 500 is viewed as three neural networks. Each entity embedding is referred to in two tables. For example, features related to the Date entity is referred to in both of the Date and Sales network structures. Features related to the Store entity is referred to in both of the Store and Sales network structures. The entity features will be updated when each table referring to those entity features is trained,

[0069] In some embodiments, as shown in Figure 5, there are three types of neural network modules: embedding subnet, transform subnet, and prediction head. Data is inputted from the embedding subnet, processed through the transform subnet, and predicted/outputted from the prediction head. In some embodiments, the neural network modules are multi-layer fully-connected networks. In some embodiments, the neural network modules can include more complicated constructions such as residual blocks, and attentions.

[0070] In some embodiments, each column of the relational database has a corresponding data type, and a nullable specification defined in the database schema.

Nullable specification defines whether a column can be empty.

[0071] In some embodiments, a library of input subnets, output subnets and loss function modules are created. In some embodiments, a compatibility table is also maintained for subnet modules and data type compatibility.

[0072] In some embodiments, each data type has a default input subnet, default output subnet and default loss function. In one example, for keys of the entities, the input subnet is an embedding subnet, the output subnet is linear with softmax activation, and the loss function is cross entropy. In another example, for categorical datatypes, the input subnet is one-hot encoding (with a few categories) or embedding (with many categories), the output subnet is linear with softmax activation, and the loss function is cross entropy. In yet another example, for numerical datatype, the input subnet is identity subnet (passthrough subnet), the output subnet is linear without activation, and the loss function is mean-square-root function. In some embodiments, a user can override the default submodule selections,

[0073] During the compiling process such as 412, in some embodiments, the nullable columns are converted according to some predefined rules. For example, if an input column has null values and the input column type is numerical, the input is then converted into two inputs, the number itself and a separate null (0/1) indicator. In another example, if an output column has null values and the input column type is entity (key for entity )/categorical, the output is then converted with the existing categorical types and an additional null category. When an output has null values, the null values do not generate loss functions (loss ^::: 0). [0074] In some embodiments, the system disclosed herein can operate fully automatically. There’s a default mechanism to convert each column as an input or an output, for example, keys of the entities as inputs, all others as outputs. Each input/output column has a default submodule depending on its data type. The system has a default transformation subnet implementati on ,

[0075] In some embodiments, a user can customize the compiling process 412. The user can override one or more of the parameter settings. For example, the user can specify a particular column to be input or output. The user can select compatible input, output, and/or loss function submodules for each column. The user can implement the subnet. In some embodiments, the user’s overriding can be implemented with a graphical user interface or a configuration file at a client device such as 102.

[0076] In some embodiments, as shown in Figure 4, a training step 414 is applied to the neural network 424 to produce an embedding model 426, In some embodiments, the embedding model has losses or residuals as described above. In some embodiments, the neural networks formed from each table of the relational database are trained in batches For cad; iteration of the training, a neural network converted from a table is selected. A set of records of that table is loaded from the database, or intermediate storage, as a mmibatch. The mini batch is then fed to the neural network for training.

[0077] In some embodiments, different selection strategies are used in the training.

For some examples, a round robin method is implemented for the selection of the tables/neurai networks. The neural networks converted from the tables are trained in a sequence. The records within each table are also trained m a sequence.

[0078] In some other examples, a weighted sampling method is implemented for the selection of the tables/neurai networks. Each record in each of the neural networks converted from the tables has a loss computed from the previous training iterations as shown in Figure 5. Each of the neural networks has a respective total loss or a respective summed loss from each of the losses from the records in the respective table/neural network. The losses are converted into probabilities. A higher loss means a higher probability. Tables/neurai networks and/or records are sampled by probabilities for the training. A higher probability means more weight in the sampling process for the training.

[0079] The neural network structures as shown in Figure 5 also have a total loss from each of the losses from the records in all tables/neurai networks. The losses and total loss shown in Figure 5 are only used in the training but not used as inferences for the neural networks.

[QQ8Q] A prediction head and a prediction for each of the columns or features is also included in Figure 5. For example, the output columns including Feature 1 and Feature 2 m the Store neural network hold prediction values. The output column Sales m the Sales network holds prediction values. The output columns including year and month in the Date neural network hold prediction values.

[0081] In some embodiments, other common neural network training practices that can be adopted in the training 414 including optimizers, learning rate schedule, etc.

[0082] In some embodiments, the trained neural network can have many applications.

During the training, the neural network generates a fitting/prediction for each output column as shown in Figure 5. The fitting or prediction is what the neural network thinks the value should be, and the fitting or prediction is typically different from the actual value The difference between the prediction and the actual ground truth value of the output column is measured by the loss function.

[0083] In some embodiments, the neural network generated values can be used as predictions or completion for missing values or null values in a relational database.

[0084] In some embodiments, values with a big loss in the outputs of the neural networks can be used as outlier/error detection for the relational data when the original values in the relational database is not null.

[0085] Figure 6 is a flowchart illustrating an exemplary process 600 by which a computing system implements the techniques of extracting an entity feature from a relational database in accordance with some implementations of the present disclosure.

[0086] According to Figure 6, the computing system extracts the table structures from the relational database (610).

[0087] The computing system then converts the table structures to neural network structures, and each entity in the relational database has a corresponding embedding subnet in the neural network structures (620).

[0088] The computing system trains the neural network structures by feeding a first set of table column values as inputs of the neural network structures and comparing outputs of the neural network structures with a second set of table column values (630).

[0089] The computing system generates the trained neural network structures including the embedding subnets as entity features of the relational database (640).

[0090] In some embodiments, extracting table structures (610) includes: identifying the tables to be converted, and converting the table column values of the identified tables into a feature value format. [0091] In some embodiments, converting the table structures to the neural network structures (620) includes: creating a respective subnet of the neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as the input table columns to the neural network structures, identifying a second set of table columns as the output table columns to the neural network structures; and creating a respective neural network structure for each table.

[0092] In some embodiments, the first set of table columns is table columns identified by primary keys or foreign keys, and the second set of table columns is table columns not identified by the primary keys or the foreign keys,

[0093] In some embodiments, training the neural network structures (630) includes: selecting a respective neural network for each table, loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for each table; reducing the difference between respective outputs of the respective neural network and the respective fitting outputs; and generating a prediction for each of table column values of the respective outputs.

[0094] In some embodiments, training the neural network structures (630) further includes: generating a loss function for the each of the table column values of the respective outputs by comparing the difference between the actual column value and the prediction for the each of the table column values of the respective outputs.

[0095] In some embodiments, training the neural network structures (630) further includes: training the neural networks including the respective neural network from the each table m around robin scheduling.

[0096] In some embodiments, training the neural network structures (630) further includes: converting a respective loss function from the previous training iterations to a respective probability, and sampling a respective column value for training based on the respective probability.

[0097] In some embodiments, the neural network structures include multi-layer fully- connected networks .

[0098] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an input value in an input table column to the neural network structures is null and the ty pe of the input table column is numerical converting the input value in the input table column into a number and an indicator whether a respective input value is null.

[0099] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an input value in an input table column to the neural network structures is null and a type of the input table column is categorical, adding a null category to the existing categories of the input table column.

[00100] In some embodiments, the process 600 of extracting entity features from a relational database further includes: in accordance with a determination that an output fitting value in an output table column of the neural network structures is null, setting the loss function for an output value to a default value.

[00101] In some embodiments, the identifications of the first set of table columns as the input table columns and the second set of table columns as the output table columns of the neural network structures are configurable.

[00102] In some embodiments, the identifications of the first and second sets of table columns are from a graphical user interface or a configuration file.

[QQ1Q3] In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof, if implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardw are-based processing unit. Computer- readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2.) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the implementations described in the present application. A computer program product may include a computer- readable medium.

[00104] The terminology' used in the description of the implementations herein is for the purpose of describing particular implementations only and is not intended to limit the scope of claims. As used in the description of the implementations and the appended claims. the singular forms “a,” “an,” and "‘the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also he understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof.

[00105] It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first electrode could be termed a second electrode, and, similarly, a second electrode could be termed a first electrode, without departing from the scope of the implementations. The first electrode and the second electrode are both electrodes, but they are not the same electrode. [00106] The description of the present application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention m the form disclosed. Many modifications, variations, and alternative implementations will be apparent to those of ordinary' skill m the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others skilled in the art to understand the invention for various implementations and to best utilize the underlying principles and various implementations with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of claims is not to be limited to the specific examples of the implementations disclosed and that modifications and other implementations are intended to be included within the scope of the appended claims.

Claims

What is claimed is:

1. A method of extracting entity features from a relational database comprising: extracting table structures from the relational database; converting the table structures to neural network structures, each entity in the relational database having a corresponding embedding subnet in the neural network structures; training the neural network structures by feeding a first set of table column values as inputs of the neural network structures and comparing outputs of the neural network structures with a second set of table column values; and generating the trained neural network structures including the embedding subnets as entity features of the relational database.

2. The method according to claim 1, wherein extracting table structures includes: identifying tables to be converted; and converting table column values of the identified tables into a feature value format.

3. The method according to claim 1, wherein converting the table structures to the neural network structures includes: creating a respective subnet of the neural network structures for each entity of the table structures, wherein each entity has one or more corresponding tables and is identified by a primary key in the table structures; identifying a first set of table columns as input table columns to the neural network structures; identifying a second set of table columns as output table columns to the neural network structures; and creating a respective neural network structure for each table.

4. The method according to claim 3, wherein the first set of table columns is table columns identified by primary keys or foreign keys, and the second set of table columns is table columns not identified by the primary keys or the foreign keys.

5. The method according to claim 1, wherein training the neural network structures includes: selecting a respective neural network for each table; loading a respective first set of table column values as respective inputs, and a respective second set of table column values as respective fitting outputs for the respective neural network for the each table; reducing a difference between respective outputs of the respective neural network and the respective fitting outputs; and generating a prediction for each of table column values of the respective outputs.

6. The method according to claim 5, wherein training the neural network structures further includes: generating a loss function for the each of the table column values of the respective outputs by comparing a difference between an actual column value and the prediction for the each of the table column values of the respective outputs.

7. The method according to claim 5, wherein training the neural network structures further includes: training neural networks including the respective neural network from the each table in a round robin scheduling.

8. The method according to claim 6, wherein training the neural network structures further includes: converting a respective loss function from previous training iterations to a respective probability, and sampling a respective column value for training based on the respective probability.

9. The method according to claim 1, wherein the neural network structures include multi-layer fully -connected networks.

10. The method according to claim 1, further comprising: in accordance with a determination that an input value in an input table column to the neural network structures is null and a type of the input table column is numerical, converting the input value in the input table column into a number and an indicator whether a respective input value is null.

11. The method according to claim 1, further comprising: in accordance with a determination that an input value in an input table column to the neural network structures is null and a type of the input table column is categorical, adding a null category to existing categories of the input table column.

12. The method according to claim 6, further comprising: in accordance with a determination that an output fitting value in an output table column of the neural network structures is null, setting the loss function for an output value to a default value.

13. The method according to claim 3, wherein the identifications of the first set of table columns as the input table columns and the second set of table columns as the output table columns of the neural network structures are configurable.

14. The method according to claim 13, wherein the identifications are from a graphical user interface or a configuration file.

15. An electronic apparatus comprising: one or more processing units; memory coupled to the one or more processing units; and a plurality of programs stored in the memory that, when executed by the one or more processing units, cause the electronic apparatus to perform the method of claims 1-14.

16. A non-transitory computer readable storage medium storing a plurality of programs for execution by an electronic apparatus having one or more processing units, wherein the plurality of programs, when executed by the one or more processing units, cause the electronic apparatus to perform the method of claims 1-14.