CN112035582A - Structured data classification method and device, storage medium and electronic device - Google Patents

Structured data classification method and device, storage medium and electronic device Download PDF

Info

Publication number
CN112035582A
CN112035582A CN202010888659.0A CN202010888659A CN112035582A CN 112035582 A CN112035582 A CN 112035582A CN 202010888659 A CN202010888659 A CN 202010888659A CN 112035582 A CN112035582 A CN 112035582A
Authority
CN
China
Prior art keywords
structured data
target
linear combination
data
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010888659.0A
Other languages
Chinese (zh)
Inventor
李刚
毛灿
刘尔凯
丁永建
李璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Everbright Technology Co ltd
Original Assignee
Everbright Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Everbright Technology Co ltd filed Critical Everbright Technology Co ltd
Priority to CN202010888659.0A priority Critical patent/CN112035582A/en
Publication of CN112035582A publication Critical patent/CN112035582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for classifying structured data, a storage medium and an electronic device, wherein the method comprises the following steps: inputting target structured data into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; determining a plurality of probability values that the structured data belong to different categories according to the linear combination features; the structured data are classified according to the probability values, and by adopting the technical scheme, the problems that the structured data can be processed only through a logistic regression method and a decision tree algorithm and the like in the related technology are solved.

Description

Structured data classification method and device, storage medium and electronic device
Technical Field
The invention relates to the field of communication, in particular to a structured data classification method and device, a storage medium and an electronic device.
Background
A large amount of sensitive data related to personal privacy are collected in various data platforms and information systems of medical treatment, finance, social networks and the like, and the sensitive data are structured data, so that the sensitive structured data recorded by a bank are often required to be classified in a bank system, and further, the information of a user is better protected.
In the related technology, a logistic regression method is often adopted to classify structured data in a bank, after missing value filling, normalization and artificial generation of some features are carried out on original features of the structured data, linear combination of the features is input into a sigmoid function, an output value of the sigmoid function is obtained, the output value of the sigmoid function can be understood as probability of belonging to a certain class in a classification problem, and an index with the maximum probability value is the classification of the features.
In the related art, the combination characteristics such as age and gender are effectively utilized through a decision tree algorithm, due to the appearance of the xgboost algorithm, the excellent performance of the xgboost algorithm is far superior to that of other algorithms, and the xgboost algorithm is an essential algorithm for various classification problems at present.
Aiming at the problems that the structured data can be processed only by a logistic regression method and a decision tree algorithm and the like in the related technology, an effective technical scheme is not provided.
Disclosure of Invention
The embodiment of the invention provides a method and a device for classifying structured data, a storage medium and an electronic device, which are used for at least solving the problems that the structured data can be processed only through a logistic regression method and a decision tree algorithm and the like in the related technology.
According to an embodiment of the present invention, there is provided a method for classifying structured data, including: inputting target structured data into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; determining a plurality of probability values that the structured data belong to different categories according to the linear combination features; and classifying the structured data according to the plurality of probability values.
Optionally, inputting the target structured data into a target convolutional neural network model to obtain a linear combination feature of the structured data, including: performing convolution on the target structured data through the target convolution neural network model to obtain a convolution result; converting the convolution result into a column vector; and performing dimensionality reduction on the column vector to obtain linear combination features of the structured data.
Optionally, converting the convolution result into a column vector includes: acquiring a unit column matrix; and multiplying the convolution result by the unit column matrix to obtain a characteristic column vector corresponding to the characteristic of the convolution result.
Optionally, the performing dimension reduction processing on the column vector to obtain a linear combination feature of the structured data includes: performing full-connection processing on the column vectors to perform dimensionality reduction processing on the column vectors to obtain dimensionality-reduced one-dimensional column vectors; and taking the one-dimensional column vector as a linear combination feature of the structured data.
Optionally, determining a plurality of probability values that the structured data belong to different categories according to the linear combination features comprises: inputting the linear combination features into an objective logistic function to determine a plurality of probability values that the structured data belongs to different categories.
Optionally, classifying the structured data according to the plurality of probability values includes: determining a target category corresponding to a maximum probability value in the plurality of probability values; determining the category of the structured data as the target category.
According to an embodiment of the present invention, there is provided a structured data classification apparatus including: a processing module, configured to input target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, where the target convolutional neural network model is trained through machine learning by using multiple sets of data, and each set of data in the multiple sets of data includes: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; the determining module is used for determining a plurality of probability values of the structured data belonging to different categories according to the linear combination characteristics; and the classification module is used for classifying the structured data according to the plurality of probability values.
Optionally, the processing module is further configured to perform convolution on the target structured data through the target convolutional neural network model to obtain a convolution result; converting the convolution result into a column vector; and performing dimensionality reduction on the column vector to obtain linear combination features of the structured data.
Optionally, the processing module is further configured to obtain a unit column matrix; and multiplying the convolution result by the unit column matrix to obtain a characteristic column vector corresponding to the characteristic of the convolution result.
Optionally, the processing module is further configured to perform full-connection processing on the column vector to perform dimension reduction processing on the column vector to obtain a dimension-reduced one-dimensional column vector; and taking the one-dimensional column vector as a linear combination feature of the structured data.
Optionally, the determining module is further configured to input the linear combination feature into an objective logic function to determine a plurality of probability values that the structured data belong to different categories.
Optionally, the classification module is further configured to determine a target category corresponding to a maximum probability value of the multiple probability values; determining the category of the structured data as the target category.
According to another embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the invention, target structured data is input into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; determining a plurality of probability values that the structured data belong to different categories according to the linear combination features; the structured data are classified according to the probability values, namely the extraction of the linear combination features of the structured data can be rapidly completed through the target convolutional neural network model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal of a method for classifying structured data according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a method of classifying structured data according to an embodiment of the present invention;
fig. 3 is a block diagram of an SCNN network architecture according to an alternative embodiment of the present invention;
fig. 4 is a block diagram of a structure of a classification apparatus of structured data according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The method provided by the embodiment of the application can be executed in a computer terminal or a similar operation device. Taking the example of the operation on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a structured data classification method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the classification method of structured data in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
An embodiment of the present invention provides a method for classifying structured data, which is applied to the computer terminal, and fig. 2 is a flowchart of the method for classifying structured data according to the embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:
step S202, inputting target structured data into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by using multiple groups of data through machine learning, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
step S204, determining a plurality of probability values of the structured data belonging to different categories according to the linear combination characteristics;
step S206, classifying the structured data according to the plurality of probability values.
Through the above steps, inputting the target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, where the target convolutional neural network model is trained through machine learning by using multiple sets of data, and each set of data in the multiple sets of data includes: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; determining a plurality of probability values that the structured data belong to different categories according to the linear combination features; the structured data are classified according to the probability values, namely the extraction of the linear combination features of the structured data can be rapidly completed through the target convolutional neural network model.
It should be noted that, the target convolutional neural network model may add more processing functions to the structured data according to the requirement to implement more detailed structured data processing, which is not limited in the embodiment of the present invention.
In step S202, there are multiple implementation manners for processing the structured data, and optionally, the target structured data is convolved by the target convolutional neural network model to obtain a convolution result; converting the convolution result into a column vector; and performing dimensionality reduction on the column vector to obtain linear combination features of the structured data.
That is to say, after the target convolutional neural network model is trained through machine learning, the target convolutional neural network model can be used for directly carrying out convolutional processing on the target structured data through the target convolutional neural network model, in order to enable the obtained convolutional result to be more accurate, the target convolutional neural network model can carry out convolutional processing on the target structured data not less than once, after the rolling result is obtained, the convolutional result is converted into a column vector for carrying out dimension reduction processing for facilitating subsequent function processing, and then the linear combination characteristic of the structured data can be obtained through the target convolutional neural network model and the target structured data directly.
Optionally, converting the convolution result into a column vector includes: acquiring a unit column matrix; in short, in order to convert the convolution result existing in the matrix into a column vector, the unit column matrix with the same order as the matrix of the convolution result is obtained, and the matrix of the convolution result is converted into the column vector in a multiplication mode.
Optionally, the performing dimension reduction processing on the column vector to obtain a linear combination feature of the structured data includes: performing full-connection processing on the column vectors to perform dimensionality reduction processing on the column vectors to obtain dimensionality-reduced one-dimensional column vectors; and taking the one-dimensional column vector as a linear combination feature of the structured data.
That is to say, a plurality of linear features exist in the convolution result successfully converted into a column vector, in order to enable the target convolutional neural network model to output linear combination features of the structured data, the column vector is subjected to full-connection processing, and dimension reduction processing is performed on the fully-connected column vector, so that a plurality of linear features exist simultaneously and are combined, and then the linear combination features of the structured data are obtained after the column vector is reduced into a one-dimensional column vector.
Optionally, determining a plurality of probability values that the structured data belong to different categories according to the linear combination features comprises: inputting the linear combination features into an objective logistic function to determine a plurality of probability values that the structured data belongs to different categories.
That is to say, the linear combination feature of the structured data output by the target convolutional neural network model is sent to the target logic function which can process the linear combination feature to obtain the corresponding probability value, and then the probability value corresponding to each structured data can be determined.
Optionally, classifying the structured data according to the plurality of probability values includes: determining a target category corresponding to a maximum probability value in the plurality of probability values; determining the category of the structured data as the target category.
That is, the structured data may be classified according to the probability value corresponding to the structured data, and the target category may be the detailed level of the structured data, the importance level of the structured data, or other conditions for classifying the structured data, which are divided according to other requirements.
In order to better understand the classification flow of the structured data, the following description is provided with reference to an alternative embodiment, but is not intended to limit the technical solutions of the embodiments of the present invention.
Data generated in banking industries such as banks is often stored in a database in the form of a data table. The data of the table structure is usually a logistic regression model or a decision tree model, and a neural network is rarely adopted. Convolutional neural networks are widely used in image data processing, such as face recognition, crowd density estimation, etc., but are rarely used in table data. In theory, however, neural networks can fit any one of the continuous functions and therefore can also fit well to the tabular data. If a good network structure can be found, the effect of the neural network should break through the existing methods in effect.
An optional embodiment of the present invention provides an SCNN network structure (Neural network, referred to as SCNN for short), which may process structured data in a data table, so as to obtain a linear combination characteristic of the structured data in the data table. As shown in fig. 3, the network architecture of fig. 3 is used in processing bank structured data.
Optionally, the network structure SCNN according to the optional embodiment of the present invention includes the following steps:
step S1, using 32 1-dimensional convolutions with step length of 2, in order to keep consistency and same distribution when carrying out convolution processing on the network structure SCNN, carrying out batch normalization during convolution processing to obtain a convolution result;
it should be noted that the 32 1-dimensional convolutions with step length of 2 are a preferred data structure, and may also be any other number of 1-dimensional convolution structures with arbitrary compensation, and by using the 1-dimensional convolution with step length of 2, features after linear operation may be extracted by combining two adjacent features, and by 32 convolution operations, equivalent to 32 model fusion, the processing efficiency of structured data is greatly improved.
In step S2, in order to improve the accuracy of obtaining the convolution result, after repeating twice normalization, dropout with a threshold of 0.2 is added to the obtained convolution result to prevent the convolution result from being over-fitted.
By the formula: g ═ Ab(n-2), wherein G represents the number of convolution results, a represents the number of convolutions to be performed, and n represents the step size of one convolution, and it can be known that, after convolution normalization processing is performed after two times of processing, 1024 convolution results can be obtained after 32 convolutions are subjected to convolution and normalization twice.
Step S3, straightening 1024 convolution results into a column vector, performing full connection, reducing to 128 dimensions, and finally reducing to 1 dimension through full connection.
Optionally, in order to convert the convolution result existing in the matrix into a column vector, a unit column matrix having the same order as the matrix of the convolution result is obtained, and the matrix of the convolution result is converted into the column vector by multiplication.
Furthermore, the 1-dimensional column vector is input into a sigmoid function or a softmax function for operation and transformation, and then a classification probability result can be obtained.
Through the optional embodiment of the invention, the problems that the structured data can only be processed through a logistic regression method and a decision tree algorithm and the like in the related technology are solved, the linear combination features can be extracted from the features in the existing structured data through the target convolutional neural network model, the manual feature extraction is avoided, the processing efficiency of the structured data is improved, the linear combination features of the existing features can be extracted through the structure of the convolutional neural network provided by the embodiment of the invention, the manual feature extraction is not needed, further, the self-iterative optimization can be further realized through a gradient descent algorithm, the processing efficiency of the structured data is improved, the problem that the decision tree represented by xgb cannot complete the processing of the linear combination features is solved, further, in a certain data, the same features, the SCNN method provided by the optional embodiment of the invention is about 10 percentage points higher than auc of logistic regression, the dependence on feature engineering is reduced.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a structured data classification device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and the description of the device that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a structure of a structured data classification apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:
(1) a processing module 42, configured to input target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, where the target convolutional neural network model is trained through machine learning by using multiple sets of data, and each set of data in the multiple sets of data includes: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
(2) a determining module 44, configured to determine, according to the linear combination feature, a plurality of probability values that the structured data belong to different categories;
(3) a classification module 46 configured to classify the structured data according to the plurality of probability values.
By the device, target structured data are input into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features; determining a plurality of probability values that the structured data belong to different categories according to the linear combination features; the structured data are classified according to the probability values, namely the extraction of the linear combination features of the structured data can be rapidly completed through the target convolutional neural network model.
It should be noted that, the target convolutional neural network model may add more processing functions to the structured data according to the requirement to implement more detailed structured data processing, which is not limited in the embodiment of the present invention.
Optionally, the processing module 42 is further configured to perform convolution on the target structured data through the target convolutional neural network model to obtain a convolution result; converting the convolution result into a column vector; and performing dimensionality reduction on the column vector to obtain linear combination features of the structured data.
That is to say, after the target convolutional neural network model is trained through machine learning, the target convolutional neural network model can be used for directly carrying out convolutional processing on the target structured data through the target convolutional neural network model, in order to enable the obtained convolutional result to be more accurate, the target convolutional neural network model can carry out convolutional processing on the target structured data not less than once, after the rolling result is obtained, the convolutional result is converted into a column vector for carrying out dimension reduction processing for facilitating subsequent function processing, and then the linear combination characteristic of the structured data can be obtained through the target convolutional neural network model and the target structured data directly.
Optionally, the processing module 42 is further configured to obtain a unit column matrix; and multiplying the convolution result by the unit column matrix to obtain a characteristic column vector corresponding to the characteristic of the convolution result.
In short, in order to convert a convolution result existing in a matrix into a column vector, a matrix of the convolution result is converted into a column vector by multiplication by obtaining an unit column matrix of the same order as the matrix of the convolution result.
Optionally, the processing module 42 is further configured to perform full-connection processing on the column vector to perform dimension reduction processing on the column vector to obtain a dimension-reduced one-dimensional column vector; and taking the one-dimensional column vector as a linear combination feature of the structured data.
That is to say, a plurality of linear features exist in the convolution result successfully converted into a column vector, in order to enable the target convolutional neural network model to output linear combination features of the structured data, the column vector is subjected to full-connection processing, and dimension reduction processing is performed on the fully-connected column vector, so that a plurality of linear features exist simultaneously and are combined, and then the linear combination features of the structured data are obtained after the column vector is reduced into a one-dimensional column vector.
Optionally, the determining module 44 is further configured to input the linear combination feature into an objective logic function to determine a plurality of probability values that the structured data belong to different categories.
That is to say, the linear combination feature of the structured data output by the target convolutional neural network model is sent to the target logic function which can process the linear combination feature to obtain the corresponding probability value, and then the probability value corresponding to each structured data can be determined.
Optionally, the classification module 46 is further configured to determine a target category corresponding to a maximum probability value of the probability values; determining the category of the structured data as the target category.
That is, the structured data may be classified according to the probability value corresponding to the structured data, and the target category may be the detailed level of the structured data, the importance level of the structured data, or other conditions for classifying the structured data, which are divided according to other requirements.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
In an exemplary embodiment, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, inputting the target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
s2, determining a plurality of probability values of the structured data belonging to different categories according to the linear combination features;
s3, classifying the structured data according to the plurality of probability values.
An embodiment of the present invention further provides a storage medium including a stored program, wherein the program executes any one of the methods described above.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, inputting the target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
s2, determining a plurality of probability values of the structured data belonging to different categories according to the linear combination features;
s3, classifying the structured data according to the plurality of probability values.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for classifying structured data, comprising:
inputting target structured data into a target convolutional neural network model to obtain linear combination characteristics of the structured data, wherein the target convolutional neural network model is trained by machine learning by using multiple groups of data, and each group of data in the multiple groups of data comprises: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
determining a plurality of probability values that the structured data belong to different categories according to the linear combination features;
and classifying the structured data according to the plurality of probability values.
2. The method of claim 1, wherein inputting the target structured data into a target convolutional neural network model to obtain linear combination features of the structured data comprises:
performing convolution on the target structured data through the target convolution neural network model to obtain a convolution result;
converting the convolution result into a column vector;
and performing dimensionality reduction on the column vector to obtain linear combination features of the structured data.
3. The method of claim 2, wherein converting the convolution result into a column vector comprises:
acquiring a unit column matrix;
and multiplying the convolution result by the unit column matrix to obtain a characteristic column vector corresponding to the characteristic of the convolution result.
4. The method of claim 2, wherein performing dimension reduction on the column vector to obtain linear combination features of the structured data comprises:
performing full-connection processing on the column vectors to perform dimensionality reduction processing on the column vectors to obtain dimensionality-reduced one-dimensional column vectors;
and taking the one-dimensional column vector as a linear combination feature of the structured data.
5. The method of claim 1, wherein determining a plurality of probability values that the structured data belongs to different categories based on the linear combination features comprises:
inputting the linear combination features into an objective logistic function to determine a plurality of probability values that the structured data belongs to different categories.
6. The method of claim 1, wherein classifying the structured data according to the plurality of probability values comprises:
determining a target category corresponding to a maximum probability value in the plurality of probability values;
determining the category of the structured data as the target category.
7. An apparatus for classifying structured data, comprising:
a processing module, configured to input target structured data into a target convolutional neural network model to obtain linear combination features of the structured data, where the target convolutional neural network model is trained through machine learning by using multiple sets of data, and each set of data in the multiple sets of data includes: structured data and linear combination features corresponding to the structured data, wherein the target structured data comprises: a plurality of features;
the determining module is used for determining a plurality of probability values of the structured data belonging to different categories according to the linear combination characteristics;
and the classification module is used for classifying the structured data according to the plurality of probability values.
8. The apparatus of claim 7, wherein the processing module is further configured to convolve the target structured data with the target convolutional neural network model to obtain a convolution result, convert the convolution result into a column vector, and perform a dimension reduction process on the column vector to obtain a linear combination feature of the structured data.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.
CN202010888659.0A 2020-08-28 2020-08-28 Structured data classification method and device, storage medium and electronic device Pending CN112035582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010888659.0A CN112035582A (en) 2020-08-28 2020-08-28 Structured data classification method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010888659.0A CN112035582A (en) 2020-08-28 2020-08-28 Structured data classification method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN112035582A true CN112035582A (en) 2020-12-04

Family

ID=73586925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010888659.0A Pending CN112035582A (en) 2020-08-28 2020-08-28 Structured data classification method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112035582A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108614548A (en) * 2018-04-03 2018-10-02 北京理工大学 A kind of intelligent failure diagnosis method based on multi-modal fusion deep learning
US20180300608A1 (en) * 2017-04-12 2018-10-18 Yodlee, Inc. Neural Networks for Information Extraction From Transaction Data
CN108764314A (en) * 2018-05-17 2018-11-06 北京邮电大学 A kind of structural data sorting technique, device, electronic equipment and storage medium
CN109033169A (en) * 2018-06-21 2018-12-18 东南大学 Mobile traffic classification method based on multistage weight conversion and convolutional neural networks
CN109448855A (en) * 2018-09-17 2019-03-08 大连大学 A kind of diabetes glucose prediction technique based on CNN and Model Fusion
CN109816140A (en) * 2018-12-12 2019-05-28 哈尔滨工业大学(深圳) Forecasting of Stock Prices method, apparatus, equipment and the storage medium influenced based on cross-market
CN110427063A (en) * 2019-08-13 2019-11-08 深圳市睿海智电子科技有限公司 A kind of tomato growth monitoring management platform based on Internet of Things
CN110671092A (en) * 2019-09-26 2020-01-10 北京博达瑞恒科技有限公司 Oil gas productivity detection method and system
CN110955659A (en) * 2019-11-28 2020-04-03 第四范式(北京)技术有限公司 Method and system for processing data table

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180300608A1 (en) * 2017-04-12 2018-10-18 Yodlee, Inc. Neural Networks for Information Extraction From Transaction Data
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108614548A (en) * 2018-04-03 2018-10-02 北京理工大学 A kind of intelligent failure diagnosis method based on multi-modal fusion deep learning
CN108764314A (en) * 2018-05-17 2018-11-06 北京邮电大学 A kind of structural data sorting technique, device, electronic equipment and storage medium
CN109033169A (en) * 2018-06-21 2018-12-18 东南大学 Mobile traffic classification method based on multistage weight conversion and convolutional neural networks
CN109448855A (en) * 2018-09-17 2019-03-08 大连大学 A kind of diabetes glucose prediction technique based on CNN and Model Fusion
CN109816140A (en) * 2018-12-12 2019-05-28 哈尔滨工业大学(深圳) Forecasting of Stock Prices method, apparatus, equipment and the storage medium influenced based on cross-market
CN110427063A (en) * 2019-08-13 2019-11-08 深圳市睿海智电子科技有限公司 A kind of tomato growth monitoring management platform based on Internet of Things
CN110671092A (en) * 2019-09-26 2020-01-10 北京博达瑞恒科技有限公司 Oil gas productivity detection method and system
CN110955659A (en) * 2019-11-28 2020-04-03 第四范式(北京)技术有限公司 Method and system for processing data table

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘忠雨: "《深入浅出图神经网络GNN原理解析》", vol. 1, 30 April 2020, 机械工程出版社, pages: 29 - 30 *
刘鹏、孙元强等: "《人工智能应用技术基础》", vol. 1, 31 March 2020, 西安电子科技大学出版社, pages: 55 *
刘黎志、彭贝: "Spark 集群中还贷问题的逻辑回归模型研究", 《武汉工程大学学报》, vol. 42, no. 1, pages 113 - 118113 *

Similar Documents

Publication Publication Date Title
CN110378305B (en) Tea disease identification method, equipment, storage medium and device
CN113435509B (en) Small sample scene classification and identification method and system based on meta-learning
CN112862092B (en) Training method, device, equipment and medium for heterogeneous graph convolution network
CN109918498B (en) Problem warehousing method and device
CN112307762A (en) Search result sorting method and device, storage medium and electronic device
CN112785441B (en) Data processing method, device, terminal equipment and storage medium
CN110457704A (en) Determination method, apparatus, storage medium and the electronic device of aiming field
CN111767419B (en) Picture searching method, device, equipment and computer readable storage medium
CN111783830A (en) Retina classification method and device based on OCT, computer equipment and storage medium
CN114358252A (en) Operation execution method and device in target neural network model and storage medium
CN111191065A (en) Homologous image determining method and device
CN112035582A (en) Structured data classification method and device, storage medium and electronic device
CN115905702A (en) Data recommendation method and system based on user demand analysis
CN113868543B (en) Method for sorting recommended objects, method and device for model training and electronic equipment
CN114443843A (en) Industrial safety event type identification method, device, equipment and storage medium
CN115577765A (en) Network model pruning method, electronic device and storage medium
CN107784363B (en) Data processing method, device and system
CN111723872B (en) Pedestrian attribute identification method and device, storage medium and electronic device
CN114461619A (en) Energy internet multi-source data fusion method and device, terminal and storage medium
CN114357219A (en) Mobile-end-oriented instance-level image retrieval method and device
CN113807370A (en) Data processing method, device, equipment, storage medium and computer program product
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN113590720A (en) Data classification method and device, computer equipment and storage medium
CN117807237B (en) Paper classification method, device, equipment and medium based on multivariate data fusion
CN110460399A (en) Waveform image processing method, recognition processor, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination