CN110955659A

CN110955659A - Method and system for processing data table

Info

Publication number: CN110955659A
Application number: CN201911189241.4A
Authority: CN
Inventors: 秦川; 周振华
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-03
Anticipated expiration: 2039-11-28
Also published as: CN116303459A; CN110955659B

Abstract

A method and system for processing a data table are provided. The method for processing the data table comprises the following steps: acquiring a basic sample table and at least one associated data table; and processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample. According to the method and the system for processing the data table, the data table splicing and the feature extraction can be realized by using the neural network.

Description

Method and system for processing data table

Technical Field

The present invention relates generally to the field of data processing, and more particularly, to a method and system for processing a data table.

Background

With the occurrence of mass data in various industries, various processing needs to be performed on data in more and more scenes, for example, processing such as data table splicing and feature extraction.

Although the neural network is relatively universal in the fields with relatively standard input such as images, voices and texts, a general scheme still does not exist for the machine learning problem under the scenes with various data sets and more data tables, multi-table splicing and feature extraction are usually required to be carried out manually, and then the neural network is used for machine learning model training or machine learning model prediction based on the extracted features.

Disclosure of Invention

An exemplary embodiment of the present invention is to provide a method and system for processing a data table, which can implement data table concatenation and feature extraction using a neural network of a specific structure.

According to an exemplary embodiment of the present invention, there is provided a method of processing a data table, including: acquiring a basic sample table and at least one associated data table; and processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

Optionally, the at least one association data table includes: at least one static meter and/or at least one flow meter.

Optionally, the fields in the base sample table include: at least one body ID field, a time field, and a tag field; the primary key of the at least one static table is a body ID field; the primary key of the at least one flow meter is a body ID field and a time field.

Optionally, the structure-specific neural network comprises: the method comprises the following steps of a plurality of hidden layers, a first splicing layer, a second splicing layer and a fusion layer, wherein for each basic sample in a basic sample table, a neural network with a specific structure is used for processing data related to the basic sample in at least one related data table, and the step of obtaining characteristics corresponding to the basic sample comprises the following steps: for each basic sample in the basic sample table, inputting a field value of a field belonging to each field type in a data record corresponding to the basic sample in each static table into a hidden layer corresponding to the field type in the plurality of hidden layers, wherein the output of the hidden layer corresponding to each field type in the at least one static table is used as the input of the first splicing layer; for each basic sample in the basic sample table, inputting a field value of a field belonging to each field type in a data record corresponding to the basic sample in each flow meter into a hidden layer corresponding to the field type in the plurality of hidden layers, wherein the output of the hidden layer corresponding to each field type in the at least one flow meter is used as the input of a specific hidden layer in the plurality of hidden layers through a second splicing layer; taking the vector output by the fusion layer as: features corresponding to each base sample in a base sample table, wherein an output of the first stitched layer and an output of the specific hidden layer are used as inputs of the fused layer, wherein the input of the specific hidden layer is a vector sequence and the output is a vector of fixed dimensions.

Optionally, the hidden layer corresponding to the type field in the at least one static table is: a first embedding layer, wherein an output of the first embedding layer is an embedding vector; the hidden layer corresponding to the time field in the at least one static table is: the difference layer is used for making difference between the field value of the time field in the static table and the field value of the time field in the basic sample table; the hidden layer corresponding to the type field in the at least one flow meter is as follows: a second embedding layer, wherein an output of the second embedding layer is an embedding vector.

Optionally, the hidden layer corresponding to the numeric field in the at least one static table is: a first fully-connected layer; the hidden layer corresponding to the numerical field in the at least one flow meter is as follows: a second fully connected layer.

Optionally, the step of inputting a field value of a field belonging to each field type in the data record corresponding to the basic sample in each flow meter into a hidden layer corresponding to the field type among the plurality of hidden layers includes: for each basic sample in the basic sample table, sorting the field value of at least one main body ID field in the flow table and the same data record of the basic sample according to the time field value of each flow table, and determining N data records with the time field value before the time field value of the basic sample from the sorted data records; sorting field values of the type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each type field in the flow chart, and inputting the obtained time sequences into a second embedded layer; and aiming at each numerical type field in the flow table, sorting the field values of the numerical type field in the N data records into corresponding time sequences according to the sorting of the N data records, and inputting the obtained time sequences into a second full connection layer.

Optionally, the specific hidden layer is a hidden layer composed of a long-short term memory network LSTM.

Optionally, the neural network of the specific structure further comprises: a third fully-connected layer and an output layer, wherein the output of the fused layer is input to the output layer via the third fully-connected layer, wherein the output layer is configured to output the result of the prediction.

Optionally, the body ID field in the basic sample table comprises a user ID field, and the tag field indicates credit card application rating; the at least one static table comprises at least one of: a user information table, an asset table and a people credit information table; the at least one flow meter comprises: a bank transaction flow meter.

Optionally, the body ID field in the basic sample table includes a user ID field and an article ID field, and the tag field indicates whether the user will purchase the article; the at least one static table comprises at least one of: a user information table and a commodity information table; the at least one flow meter comprises: internet user behavior record table.

According to another exemplary embodiment of the present invention, there is provided a system for processing a data table, including: the data table acquisition device is suitable for acquiring a basic sample table and at least one associated data table; and the data table processing device is suitable for processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

Optionally, the structure-specific neural network comprises: the data table processing device is suitable for inputting field values of fields belonging to each field type in data records corresponding to the basic sample in each static table into a hidden layer corresponding to the field type in the plurality of hidden layers aiming at each basic sample in a basic sample table, wherein the output of the hidden layer corresponding to each field type in at least one static table is used as the input of the first splicing layer; the data table processing device is suitable for inputting field values of fields belonging to each field type in data records corresponding to the basic sample in each flow meter into a hidden layer corresponding to the field type in the plurality of hidden layers aiming at each basic sample in the basic sample table, wherein the output of the hidden layer corresponding to each field type in the at least one flow meter is used as the input of a specific hidden layer in the plurality of hidden layers through a second splicing layer; the data table processing means is adapted to take the vector output by the fusion layer as: features corresponding to each base sample in a base sample table, wherein an output of the first stitched layer and an output of the specific hidden layer are used as inputs of the fused layer, wherein the input of the specific hidden layer is a vector sequence and the output is a vector of fixed dimensions.

Optionally, the data table processing device is adapted to, for each base sample in the base sample table, sort, for each flow meter, data records of which the field value of at least one subject ID field in the flow meter is the same as that of the base sample by their time field values, and determine, from the sorted data records, N data records whose time field value is before the time field value of the base sample; sorting field values of the type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each type field in the flow chart, and inputting the obtained time sequences into a second embedded layer; and sorting the field values of the numerical type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each numerical type field in the flow chart, and inputting the obtained time sequences into a second full connection layer.

According to another exemplary embodiment of the invention, a system is provided comprising at least one computing device and at least one storage device storing instructions, wherein the instructions, when executed by the at least one computing device, cause the at least one computing device to perform the method of processing a data table as described above.

According to another exemplary embodiment of the present invention, a computer-readable storage medium storing instructions is provided, wherein the instructions, when executed by at least one computing device, cause the at least one computing device to perform the method of processing a data table as described above.

According to the method and the system for processing the data table, the data table splicing and the feature extraction are realized by using the neural network with a specific structure. Furthermore, by integrating the data table splicing and feature extraction steps into the structure of the neural network, automatic generation of a machine learning model or prediction of the machine learning model through the neural network based on the specified data table can be realized.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The above and other objects and features of exemplary embodiments of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments, wherein:

FIG. 1 illustrates a flowchart of a method of processing a data table according to an exemplary embodiment of the present invention;

FIG. 2 illustrates an example of a neural network of a particular structure in accordance with an exemplary embodiment of the present invention;

FIG. 3 illustrates another example of a neural network of a particular structure in accordance with an exemplary embodiment of the present invention;

fig. 4 illustrates a block diagram of a system for processing a data table according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

Fig. 1 illustrates a flowchart of a method of processing a data table according to an exemplary embodiment of the present invention.

Referring to fig. 1, in step S10, a base sample table and at least one associated data table are obtained.

Here, the at least one associated data table is associated with the base sample table, for example, a field value of a primary key of the at least one associated data table includes a field value of a body ID field in the base sample table.

As an example, the fields in the base sample table may include: at least one body ID field, a time field, and a label (label) field. Here, the label field is the target to be predicted using machine learning techniques; the time field may indicate a prediction time point of the corresponding base sample. For example, for a regression problem, the field value of the tag field may be one or more numerical values; for classification problems, the field value of the tag field may be a class number.

As an example, the at least one association data table may include: at least one static meter and/or at least one flow meter.

As an example, the at least one static table may be used to describe attribute information for a body ID. As an example, the primary key of the at least one static table may be a body ID field.

As an example, the primary key of the at least one flow meter may be a body ID field and a time field. For example, the flow meter may be a bank transaction flow meter, or an internet user behavior record.

In step S20, for each basic sample in the basic sample table, the neural network with a specific structure is used to process the data associated with the basic sample in the at least one associated data table, so as to obtain the feature corresponding to the basic sample.

That is, the data associated with the piece of basic sample in the at least one associated data table is input to the neural network with the specific structure by taking each basic sample in the basic sample table as an index, so as to obtain the feature corresponding to the piece of basic sample through the neural network with the specific structure, thereby achieving the effect consistent with the following operation: and splicing the at least one associated data table to a basic sample table, and extracting the characteristics of the spliced data table.

As an example, the data associated with each base sample in the at least one associated data table may include: a field value in a data record in the at least one associated data table corresponding to the piece of base sample. For example, the data record in each static table corresponding to each base sample may be: the data records in the static table having the same field value of the body ID field as the body ID field of the base sample.

As an example, data of different types of data tables among the at least one associated data table may be input to different hidden layers in the neural network of the particular structure. As an example, field values of different types of fields among the at least one associated data table may be input to different hidden layers in the neural network of the particular structure.

As an example, the corresponding machine learning model may subsequently be automatically trained based on machine learning samples composed of field values of the tag fields in each base sample and the resulting features corresponding to that base sample.

As another example, the resulting features corresponding to each base sample may then be input to a trained machine learning model to obtain a corresponding prediction result.

As an example, the method of processing a data table according to an exemplary embodiment of the present invention may be applied to a recommendation (e.g., merchandise recommendation, etc.) scenario, a credit card application scenario, etc. that requires machine learning using a data table.

As an example, when the method of processing a data table according to an exemplary embodiment of the present invention is applied to a credit card application scenario, the body ID field in the basic sample table may include a user ID field, and the tag field may indicate a credit card application score; the at least one static table may comprise at least one of: a user information table, an asset table and a people credit information table; the at least one flow meter may include: a bank transaction flow meter.

As another example, when the method of processing a data table according to an exemplary embodiment of the present invention is applied to a goods recommendation scenario, the body ID field in the base sample table may include two body ID fields of a user ID field and a goods ID field, and the tag field may indicate whether the user will purchase the goods; the at least one static table may comprise at least one of: a user information table and a commodity information table; the at least one flow meter may include: internet user behavior record table. The fields in the internet user behavior record table may include: a user ID field, an article ID field, a time field, an action field (indicating actions of browsing, purchasing, etc.), and other attribute fields (e.g., a money amount field, etc.) for describing the action.

Fig. 2 shows an example of a neural network of a specific structure according to an exemplary embodiment of the present invention.

As shown in fig. 2, the neural network of the specific structure may include: the multilayer optical fiber comprises a plurality of hidden layers, a first splicing layer, a second splicing layer and a fusion layer; the output of the hidden layer which is in one-to-one correspondence with each field type in the at least one static table is used as the input of the first splicing layer; the output of the hidden layer which is in one-to-one correspondence with each field type in the at least one flow meter is used as the input of a specific hidden layer in the plurality of hidden layers through a second splicing layer; the output of the first stitched layer and the output of the particular hidden layer are inputs to the fused layer.

Accordingly, step S20 may include: for each basic sample in the basic sample table, inputting a field value of a field belonging to each field type in a data record corresponding to the basic sample in each static table into a hidden layer corresponding to the field type in the plurality of hidden layers; inputting field values of fields belonging to each field type in data records corresponding to the basic sample in each flow meter into a hidden layer corresponding to the field type in the plurality of hidden layers aiming at each basic sample in the basic sample table; and taking the vector output by the fusion layer as: features corresponding to each base sample in the base sample table.

By way of example, the field types may include: a category type field, a numerical type field, and a time field. Specifically, the Category field may refer to a field value of a field for indicating an unordered Category (e.g., gender, occupation, region, etc.), the field value of the Category field having no size score; the numeric type field may refer to a field having a field value (e.g., an age value, an amount, etc.) as a continuity value; the time field may refer to a field value of a field for indicating time (e.g., registration date of a bank card, birthday, etc.).

As an example, the field value of the type field in the at least one association data table may be input to an Embedding layer (Embedding) in the neural network of the specific structure, wherein the output of the Embedding layer is an Embedding vector. The embedding layer can represent high-dimensional class features with fixed-dimensional vectors.

Specifically, as an example, the hidden layer corresponding to the type field in the at least one static table may be: a first embedding layer; the hidden layer corresponding to the type field in the at least one flow meter may be: a second embedding layer.

As an example, a field value of the numeric field in the at least one association data table may be input to a fully connected layer (density) in the neural network of the particular structure. The input and output of the fully-connected layer may be vectors of a specified size. The field values of the numeric field do not need to be made up into vectors before being input into the fully-connected layer, because the data table itself is structured data, and if m rows and n field values need to be input into the fully-connected layer, the input is a two-dimensional vector with the size of m x n.

Specifically, as an example, the hidden layer corresponding to the numeric field in the at least one static table may be: a first fully-connected layer; the hidden layer corresponding to the numeric field in the at least one flow meter may be: a second fully connected layer.

As an example, the hidden layer corresponding to the time field in the at least one static table may be: and the difference layer is used for making difference between the field value of the time field in the static table and the field value of the time field in the basic sample table. For example, for each base sample in the base sample table, the differencing layer may differen a time field value in the data record corresponding to the piece of base sample in each static table from a time field value in the piece of base sample.

As an example, the input of the specific hidden layer may be a sequence of vectors, and the output may be a fixed-dimension vector, thereby being able to carry timing information.

As an example, the specific hidden layer may be a hidden layer composed of a Long Short-term memory network (LSTM). The long-short term memory network structure can convert the vector sequence into a fixed-dimension vector, so that the obtained fixed-dimension vector contains sequence information. It should be understood that the specific hidden layer may be formed by other types of Neural Networks, for example, a Recurrent Neural Network (RNN) or a Convolutional Neural Network (CNN), which is not limited by the present invention.

As an example, the input of the concatenation layer (Concat) in the neural network of the specific structure may be a plurality of vectors, and the output may be a new vector formed by concatenating the plurality of vectors.

As an example, the fusion layer (Merge) in the neural network of the specific structure may fuse vectors in at least one of the following fusion manners: splicing, adding and subtracting. It should be understood that other suitable fusion means may be employed.

As an example, for each base sample in the base sample table, the step of inputting a field value of a field belonging to each field type in a data record corresponding to the base sample in each flow meter into a hidden layer corresponding to the field type among the plurality of hidden layers may include: for each basic sample in the basic sample table, sequencing at least one data record with the same field value of at least one main body ID field in the flow table and the basic sample for each flow meter according to the time field value of the data record from first to last, and determining N data records with the time field value before and closest to the time field value of the basic sample from the sequenced data records; sorting field values of the type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each type field in the flow chart, and inputting the obtained time sequences into a second embedded layer; and sorting the field values of the numerical type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each numerical type field in the flow chart, and inputting the obtained time sequences into a second full connection layer. It should be understood that the time series can be input into the neural network with a specific structure to obtain corresponding time sequence characteristics.

As an example, since the base sample table may include: one or more body ID fields, when the base sample table includes a plurality of body ID fields, may define different body combinations for the plurality of body ID fields. For example, when the base sample table includes: a body ID1 field and a body ID2 field, different body combinations may include: (body ID1 field), (body ID2 field), and (body ID1 field, body ID2 field). Correspondingly, for each basic sample in the basic sample table, for each subject combination, at least one data record in each pipeline table, whose field value corresponds to the subject combination and is the same as that of the basic sample, may be sorted according to its time field value from first to last, and then a corresponding time sequence may be obtained based on the sorted data records.

As an example, when the at least one associated data table comprises only static tables, the neural network of the specific structure may comprise only: the device comprises a first embedding layer, a first full-connection layer, a difference layer, a first splicing layer and a fusion layer. For example, the fusion layer may also be omitted when the at least one associated data table includes only one static table.

As another example, when the at least one associated data table comprises only a water flow meter, the neural network of the particular structure may comprise only: a second embedding layer, a second fully-connected layer, a second splice layer, the specific hidden layer, and a fused layer. For example, the fusion layer may also be omitted when the at least one associated data table comprises only one flow meter.

Fig. 3 illustrates another example of a neural network of a particular structure according to an exemplary embodiment of the present invention.

As shown in fig. 3, the neural network of the specific structure may include, in addition to the plurality of hidden layers, the first mosaic layer, the second mosaic layer, and the fusion layer described with reference to fig. 2: a third fully-connected layer and an Output layer (Output), wherein an Output of the fused layer is an input to the Output layer via the third fully-connected layer, wherein the Output layer is configured to Output a result of the prediction. For example, for a classification problem, the predicted outcome may include a probability of belonging to each category; for regression problems, the predicted outcome may include a regression value. It should be understood that the third fully connected layer may be other network structures, and the invention is not limited in this regard. According to the exemplary embodiment of the invention, the whole process of automatically completing machine learning training or prediction through the neural network based on the specified data table can be realized.

As shown in fig. 4, the system for processing a data table according to an exemplary embodiment of the present invention includes: a data sheet acquisition device 10 and a data sheet processing device 20.

In particular, the data table acquisition means 10 are adapted to acquire a base sample table and at least one associated data table.

As an example, the fields in the base sample table may include: at least one body ID field, a time field, and a tag field; the primary key of the at least one static table may be a body ID field; the primary key of the at least one flow meter may be a body ID field and a time field.

The data table processing device 20 is adapted to, for each base sample in the base sample table, process the data associated with the piece of base sample in the at least one associated data table using a neural network with a specific structure, so as to obtain a feature corresponding to the piece of base sample.

As an example, the neural network of the particular structure may include: a plurality of hidden layers, a first splicing layer, a second splicing layer and a fusion layer, wherein the data table processing apparatus 20 may be adapted to input, for each base sample in the base sample table, a field value of a field belonging to each field type in a data record corresponding to the base sample in each static table to a hidden layer corresponding to the field type among the plurality of hidden layers, wherein an output of a hidden layer corresponding to each field type in the at least one static table is used as an input of the first splicing layer; the data table processing device 20 may be adapted to, for each base sample in the base sample table, input a field value of a field belonging to each field type in the data record corresponding to the base sample in each flow meter to a hidden layer corresponding to the field type among the plurality of hidden layers, wherein an output of a hidden layer corresponding to each field type in the at least one flow meter is input to a specific hidden layer in the plurality of hidden layers via the second concatenation layer; the data table processing device 20 may be adapted to take the vector output by the fusion layer as: features corresponding to each base sample in a base sample table, wherein an output of the first stitched layer and an output of the specific hidden layer are used as inputs of the fused layer, wherein the input of the specific hidden layer is a vector sequence and the output is a vector of fixed dimensions.

As an example, the hidden layer corresponding to the type field in the at least one static table may be: a first embedding layer, wherein an output of the first embedding layer is an embedding vector; the hidden layer corresponding to the time field in the at least one static table may be: the difference layer is used for making difference between the field value of the time field in the static table and the field value of the time field in the basic sample table; the hidden layer corresponding to the type field in the at least one flow meter may be: a second embedding layer, wherein an output of the second embedding layer is an embedding vector.

As an example, the hidden layer corresponding to the numeric field in the at least one static table may be: a first fully-connected layer; the hidden layer corresponding to the numeric field in the at least one flow meter may be: a second fully connected layer.

As an example, the specific hidden layer may be a hidden layer composed of a long short term memory network LSTM.

As an example, the data table processing device 20 may be adapted to sort, for each base sample in the base sample table, field values of at least one body ID field in the flow table and the same data records of the piece of base sample by their time field values, and determine, from the sorted data records, N data records with time field values before the time field value of the piece of base sample; sorting field values of the type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each type field in the flow chart, and inputting the obtained time sequences into a second embedded layer; and sorting the field values of the numerical type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each numerical type field in the flow chart, and inputting the obtained time sequences into a second full connection layer.

As an example, the neural network of the particular structure may further include: a third fully-connected layer and an output layer, wherein the output of the fused layer is input to the output layer via the third fully-connected layer, wherein the output layer is configured to output the result of the prediction.

As an example, the body ID field in the base sample table may include a user ID field, and the tag field may indicate a credit card application score; the at least one static table may comprise at least one of: a user information table, an asset table and a people credit information table; the at least one flow meter may include: a bank transaction flow meter.

As an example, the body ID field in the base sample table may include a user ID field and an item ID field, and the tag field may indicate whether the user will purchase the item; the at least one static table may comprise at least one of: a user information table and a commodity information table; the at least one flow meter may include: internet user behavior record table.

It should be understood that the specific implementation manner of the system for processing a data table according to the exemplary embodiment of the present invention may be implemented with reference to the related specific implementation manner described in conjunction with fig. 1 to 3, and will not be described herein again.

The system for processing a data table according to an exemplary embodiment of the present invention includes devices that can be respectively configured as software, hardware, firmware, or any combination thereof to perform a specific function. These means may correspond, for example, to a dedicated integrated circuit, to pure software code, or to a module combining software and hardware. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

It is to be understood that the method of processing a data table according to an exemplary embodiment of the present invention may be implemented by a program recorded on a computer readable medium, for example, according to an exemplary embodiment of the present invention, there may be provided a computer readable medium for processing a data table, wherein a computer program for executing the following method steps is recorded on the computer readable medium: acquiring a basic sample table and at least one associated data table; and processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

The computer program in the computer-readable medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, and the like, and it should be noted that the computer program may also be used to perform additional steps other than the above steps or perform more specific processing when the above steps are performed, and the contents of the additional steps and the further processing are described with reference to fig. 1 to 3, and will not be described again to avoid repetition.

It should be noted that the system for processing a data table according to an exemplary embodiment of the present invention may completely depend on the execution of the computer program to realize the corresponding functions, that is, each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to realize the corresponding functions.

On the other hand, each means included in the system for processing a data table according to an exemplary embodiment of the present invention may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

For example, exemplary embodiments of the present invention may also be implemented as a computing device comprising a storage component and a processor, the storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform a method of processing a data table.

In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.

The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some of the operations described in the method of processing a data table according to an exemplary embodiment of the present invention may be implemented by software, some of the operations may be implemented by hardware, and further, the operations may be implemented by a combination of hardware and software.

The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.

The operations involved in a method of processing a data table according to an exemplary embodiment of the present invention may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

For example, as described above, a computing device for processing a data table according to an exemplary embodiment of the present invention may include a storage unit and a processor, wherein the storage unit stores therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: acquiring a basic sample table and at least one associated data table; and processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims

1. A method of processing a data table, comprising:

acquiring a basic sample table and at least one associated data table;

and processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

2. The method of claim 1, wherein,

the at least one association data table includes: at least one static meter and/or at least one flow meter.

3. The method of claim 2, wherein,

the fields in the base sample table include: at least one body ID field, a time field, and a tag field;

the primary key of the at least one static table is a body ID field;

the primary key of the at least one flow meter is a body ID field and a time field.

4. The method of claim 2, wherein the structure-specific neural network comprises: a plurality of hidden layers, a first splice layer, a second splice layer and a fusion layer,

the method comprises the following steps of for each basic sample in a basic sample table, processing data related to the basic sample in at least one related data table by using a neural network with a specific structure, and obtaining characteristics corresponding to the basic sample, wherein the step of obtaining the characteristics comprises the following steps:

for each basic sample in the basic sample table, inputting a field value of a field belonging to each field type in a data record corresponding to the basic sample in each static table into a hidden layer corresponding to the field type in the plurality of hidden layers, wherein the output of the hidden layer corresponding to each field type in the at least one static table is used as the input of the first splicing layer;

for each basic sample in the basic sample table, inputting a field value of a field belonging to each field type in a data record corresponding to the basic sample in each flow meter into a hidden layer corresponding to the field type in the plurality of hidden layers, wherein the output of the hidden layer corresponding to each field type in the at least one flow meter is used as the input of a specific hidden layer in the plurality of hidden layers through a second splicing layer;

taking the vector output by the fusion layer as: a feature corresponding to each base sample in a base sample table, wherein an output of the first stitched layer and an output of the particular hidden layer are inputs for the fused layer,

wherein the input of the particular hidden layer is a sequence of vectors and the output is a vector of fixed dimensions.

5. The method of claim 4, wherein,

the hidden layer corresponding to the type field in the at least one static table is: a first embedding layer, wherein an output of the first embedding layer is an embedding vector;

the hidden layer corresponding to the time field in the at least one static table is: the difference layer is used for making difference between the field value of the time field in the static table and the field value of the time field in the basic sample table;

the hidden layer corresponding to the type field in the at least one flow meter is as follows: a second embedding layer, wherein an output of the second embedding layer is an embedding vector.

6. The method of claim 5, wherein,

the hidden layer corresponding to the numeric field in the at least one static table is: a first fully-connected layer;

the hidden layer corresponding to the numerical field in the at least one flow meter is as follows: a second fully connected layer.

7. The method of claim 6, wherein the step of inputting, for each base sample in the base sample table, a field value of a field belonging to each field type in the data record corresponding to the base sample in each flow meter into the hidden layer corresponding to the field type among the plurality of hidden layers comprises:

for each basic sample in the basic sample table, sorting the field value of at least one main body ID field in the flow table and the same data record of the basic sample according to the time field value of each flow table, and determining N data records with the time field value before the time field value of the basic sample from the sorted data records;

sorting field values of the type fields in the N data records into corresponding time sequences according to the sorting of the N data records aiming at each type field in the flow chart, and inputting the obtained time sequences into a second embedded layer;

and aiming at each numerical type field in the flow table, sorting the field values of the numerical type field in the N data records into corresponding time sequences according to the sorting of the N data records, and inputting the obtained time sequences into a second full connection layer.

8. A system for processing a data table, comprising:

the data table acquisition device is suitable for acquiring a basic sample table and at least one associated data table;

and the data table processing device is suitable for processing the data related to the basic sample in the at least one associated data table by using a neural network with a specific structure aiming at each basic sample in the basic sample table to obtain the characteristic corresponding to the basic sample.

9. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform a method of processing a data table as claimed in any of claims 1 to 7.

10. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform a method of processing a data table as claimed in any one of claims 1 to 7.