CN110309127B - Data processing method and device and electronic equipment - Google Patents

Data processing method and device and electronic equipment Download PDF

Info

Publication number
CN110309127B
CN110309127B CN201910596822.3A CN201910596822A CN110309127B CN 110309127 B CN110309127 B CN 110309127B CN 201910596822 A CN201910596822 A CN 201910596822A CN 110309127 B CN110309127 B CN 110309127B
Authority
CN
China
Prior art keywords
data
processing
target
processing condition
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910596822.3A
Other languages
Chinese (zh)
Other versions
CN110309127A (en
Inventor
高鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201910596822.3A priority Critical patent/CN110309127B/en
Publication of CN110309127A publication Critical patent/CN110309127A/en
Application granted granted Critical
Publication of CN110309127B publication Critical patent/CN110309127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data processing method, a data processing device and electronic equipment, wherein the method comprises the following steps: obtaining data characteristics of target data to be processed on at least one data dimension; obtaining object processing conditions corresponding to target data; determining a target object for the target data based on the data features and the object processing conditions; and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition. According to the method and the device, multi-dimensional analysis can be performed on the target data, so that the target object with the processing efficiency meeting the object processing condition is obtained to process the target data, and the most appropriate migration scheme can be selected for the target data.

Description

Data processing method and device and electronic equipment
Technical Field
The present application relates to the field of data table migration technologies, and in particular, to a data processing method and apparatus, and an electronic device.
Background
With the advent of the big data era, the data volume generated by users is larger and larger, and how to extract and migrate the data becomes a problem which needs to be solved urgently.
At present, there are many technical means for data extraction and migration, such as Sqoop, Talend, and button. When selecting a migration scheme for a data table to be migrated, it is usually necessary to manually select a suitable scheme, for example, depending on the size of the data volume, but the data tables are not only different in data volume, and therefore, the selected scheme may not be the most suitable migration scheme.
Therefore, it is highly desirable to improve the selection accuracy of the migration scheme.
Disclosure of Invention
In view of this, the present application provides the following technical solutions:
a method of data processing, comprising:
obtaining data characteristics of target data to be processed on at least one data dimension;
obtaining object processing conditions corresponding to the target data;
determining a target object for the target data based on the data features and the object processing conditions;
and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
Preferably, determining a target object for the target data based on the data feature and the object processing condition includes:
obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels;
inputting the data features into the classification model to output a classification result;
and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
Preferably, the training the classification model by using a plurality of samples with preset object labels includes:
obtaining at least one data sample, wherein the data sample has data characteristics in at least one data dimension, the data sample has a preset object label, and the object label represents that the efficiency of processing the data sample by a processing object corresponding to the object label meets a corresponding object processing condition;
and training a classification model based on a decision tree algorithm based on the data characteristics of the data samples and the object labels thereof.
Preferably, the data dimension includes:
a data table dimension of the target data, wherein the data table dimension comprises: number of rows, number of columns, data type, and one or more dimensions of the data table source.
Preferably, the processing efficiency of the target data satisfies the object processing condition, and includes:
the processing efficiency of the target data is higher than a target processing efficiency value in the target processing condition.
A data processing apparatus comprising:
the device comprises an obtaining unit, a processing unit and a processing unit, wherein the obtaining unit is used for obtaining data characteristics of target data to be processed on at least one data dimension; obtaining object processing conditions corresponding to the target data;
a determination unit configured to determine a target object for the target data based on the data feature and the object processing condition;
and the processing unit is used for processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
An electronic device, comprising:
the memory is used for storing an application program and data generated by the running of the application program;
a processor for executing the application to perform the functions of: obtaining data characteristics of target data to be processed on at least one data dimension; obtaining object processing conditions corresponding to the target data; determining a target object for the target data based on the data features and the object processing conditions; and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
As can be seen from the foregoing technical solutions, an embodiment of the present application provides a data processing method, which determines a target object for target data by obtaining data features of the target data to be processed in at least one data dimension and corresponding object processing conditions, and processes the target data with the target object to achieve that processing efficiency of the target data meets the object processing conditions. Therefore, the target data can be subjected to multi-dimensional analysis, so that the target object with the processing efficiency meeting the object processing condition is obtained to process the target data, and the most appropriate migration scheme can be selected for the target data.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a system architecture diagram of a server cluster as disclosed in an embodiment of the present application;
fig. 2 is a block diagram of a hardware structure of an electronic device disclosed in an embodiment of the present application;
FIG. 3 is a flowchart of a method of a data processing method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a data table disclosed in an embodiment of the present application;
FIG. 5 is a flowchart of a data processing method disclosed in the second embodiment of the present application;
FIG. 6 is a flowchart of a data processing method disclosed in the third embodiment of the present application;
FIG. 7 is a schematic diagram of a decision tree classification model disclosed in an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 9 is a flowchart of a method of a data processing method disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The data processing method provided by the embodiment of the application can be applied to a server cluster of cloud computing. Fig. 1 is a system architecture diagram of a server cluster according to an embodiment of the present invention, and referring to fig. 1, data migration may occur between servers, such as data migration from a server 1 to a server 2, or may occur inside the servers, such as data migration from a disk 1 (not shown in fig. 1) to a disk 2 (not shown in fig. 1).
It should be noted that the above description is only one application scenario of data migration, and it should be understood that other electronic devices related to data migration, which are not listed, are within the scope of the embodiments of the present application.
Fig. 2 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure, and referring to fig. 2, the hardware structure of the electronic device may include: a memory 11, a processor 12, a communication interface 13, and a communication bus 14;
in the embodiment of the present application, the number of the memory 11, the processor 12, the communication interface 13 and the communication bus 14 is at least one, and the memory 11, the processor 12 and the communication interface 13 are communicated with each other through the communication bus 14.
The memory 11 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory; the memory stores application programs and data generated by the application programs.
The processor 12 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention, or the like; the processor 12 is configured to execute an application program to implement the functions of:
obtaining data characteristics of target data to be processed on at least one data dimension; obtaining object processing conditions corresponding to target data; determining a target object for the target data based on the data features and the object processing conditions; and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
The detailed functions and extended functions of the above application can be described with reference to the following.
In a first embodiment of the data processing method disclosed in the present application, as shown in fig. 3, the method includes the following steps:
step S101: data characteristics of target data to be processed on at least one data dimension are obtained.
In the embodiment of the application, the target data to be processed can be stored in the form of a data table, a document and the like. The data dimension of the target data is different for different storage forms. For example, for a data table, its data dimensions may include the number of rows, columns, data type, and data table source; as another example, for a document, its data dimensions may include the number of characters, the number of paragraphs, the data type, and the source of the document.
For convenience of understanding, the embodiments of the present application take a data table as an example to illustrate the data characteristics:
a data table dimension for the target data may be obtained, the data table dimension including one or more dimensions of a number of rows, a number of columns, a data type, and a data table source. Fig. 4 is a schematic diagram of a data table provided in an embodiment of the present application, and referring to fig. 4, the data table has a row number of "20", a column number of "13", a data type of "numeric", and a document source of "local disk C".
Step S102: and obtaining object processing conditions corresponding to the target data.
In this embodiment, for different data, object processing conditions corresponding to the data may be set in advance. While object processing conditions may focus on efficiency, may also focus on accuracy, may also focus on reliability, and so forth. And further determining the object processing condition corresponding to the target data based on the corresponding relation between the different data and the object processing condition.
For ease of understanding, different types of object processing conditions are described below:
1) the efficiency condition is a condition set for a processing efficiency that indicates the amount of data migrated per unit time. The larger the amount of data migrated per unit time, the higher the processing efficiency.
2) An accuracy condition, that is, a condition set for accuracy representing a proportion of data without information loss after migration with respect to data before migration. The greater the proportion of data without information loss after migration relative to data before migration, the higher the accuracy.
3) The reliability condition, i.e., a condition set for reliability that indicates a probability of completing data migration. The probability of 0 indicates that data migration is impossible, the probability of 1 indicates that data migration can be completed, and the probability of 1 indicates that data migration is more likely to be completed and the reliability is higher.
It should be noted that the above description is only an example of the object processing conditions, and it is understood that other types of object processing conditions not listed are also within the protection scope of the embodiments of the present application.
Step S103: based on the data features and the object processing conditions, a target object for the target data is determined.
In the embodiment of the application, for different object processing conditions, objects corresponding to different data characteristics under the object processing conditions can be preset, so that the object processing conditions can be met when the data is processed by the object. And further, based on the corresponding relation between the different data characteristics and the object, determining the target object corresponding to the data characteristics of the target data under the object processing condition corresponding to the target data.
For convenience of understanding, the following description continues with an example of an efficiency condition and a data table as follows:
in order to realize that the processing efficiency of the data meets the efficiency condition, objects corresponding to different data characteristics can be preset. Taking the data characteristics of the data table including the number of rows and columns as an example: if the row number is in the first range and the column number is in the second range, the corresponding object is JDBC; if the number of rows is in the third range and the number of columns is in the fourth range, the corresponding object is Sqoop.
It should be noted that the first range and the second range may be the same or different, and the third range and the fourth range may be the same or different, which is not limited in this embodiment and may be set according to actual needs.
Step S104: and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
In the embodiment of the present application, the target data is migrated by the target object determined in step S103. Since the target object is determined based on the data feature and the object processing condition, the target data can satisfy the object processing condition when the target data is migrated with the target object.
For ease of understanding, the following continues with reference to different types of object processing conditions:
1) the efficiency condition includes a limitation requirement of the processing efficiency of the data, such as the processing efficiency of the data is higher than a specified target processing efficiency value. At this time, the target object determined in step S103 migrates the target data whose processing efficiency is higher than the target processing efficiency value in the efficiency condition.
2) An accuracy condition including a limit requirement on the accuracy of the data, such as the accuracy of the data being higher than a specified accuracy. At this time, the target data is migrated by the target object determined in step S103, and the accuracy of the target data is higher than the target accuracy in the efficiency condition.
3) A reliability condition including a limit requirement on the reliability of the data, such as the reliability of the data being higher than a specified reliability. At this time, the target data is migrated with the target object determined in step S103, and the reliability of the target data is higher than that in the efficiency condition.
The data processing method provided by the embodiment of the application can be used for carrying out multi-dimensional analysis on the target data so as to obtain the target object with the processing efficiency meeting the object processing condition to process the target data, and therefore the most appropriate migration scheme can be selected for the target data.
As an implementation manner for determining a target object for target data based on data characteristics and object processing conditions, the second embodiment of the present application discloses a data processing method, as shown in fig. 5, the method includes the following steps:
step S201: data characteristics of target data to be processed on at least one data dimension are obtained.
Step S202: and obtaining object processing conditions corresponding to the target data.
Step S203: and obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels.
In the embodiment of the present application, for different object processing conditions, a classification model corresponding to the object processing conditions may be preset. The classification model is obtained by adopting a supervised learning mode, and specifically comprises the following steps:
taking data with a preset object label as a training sample, taking a prediction result of the to-be-trained classification model on the data characteristic of the training sample as a training target, and training and generating the to-be-trained classification model; wherein the preset object tag represents the most suitable migration scheme of the data under the object processing condition.
For convenience of understanding, the following description continues with the efficiency condition and the data table as an example for the process of training the classification model:
in order to obtain a classification model corresponding to an efficiency condition, a large number of data tables for training are obtained in advance, and the following operations are performed for each data table:
obtaining data characteristics of the data table in at least one data dimension; and processing the data table with the data characteristics by using different migration schemes, selecting the migration scheme which meets the efficiency condition and has the highest processing efficiency from the different migration schemes as a processing object, and calibrating an object label for representing the processing object for the data table.
Inputting a large number of data characteristics of the data sheet with the object label into the classification model to be trained to obtain a prediction result output by the classification model to be trained and aiming at the data characteristics of each data sheet; calculating a loss function value of the classification model to be trained according to the object label of each data table and the prediction result of the data characteristic of each data table; and updating the parameters of the classification model to be trained by taking the minimum loss function value as a target to obtain the final classification model.
It should be noted that, in the embodiment of the present application, the classification model to be trained may be a model of any machine learning algorithm, such as a neural network algorithm, and further such as a logistic regression algorithm, etc.
Step S204: and inputting the data characteristics into the classification model to output a classification result.
Step S205: and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
For ease of understanding, the following description continues with the efficiency condition as an example for the process of determining the target object:
the classification result output by the classification model corresponding to the efficiency condition includes the processing efficiency of the processing objects corresponding to the object labels used for training, and further determines a target object with the processing efficiency meeting the efficiency condition from the plurality of processing objects, for example, determines a target object with the processing efficiency higher than the target processing efficiency in the efficiency condition.
Of course, if there are a plurality of target objects, a final target object processing target data may be determined from the plurality of target objects in a random selection manner or a manner with the highest selection efficiency.
Step S206: and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
According to the data processing method provided by the embodiment of the application, the machine learning theory can be utilized to train the classification model aiming at the data characteristics, so that the target object with the processing efficiency meeting the object processing condition is obtained to process the target data, the interference of human factors can be reduced, and the accuracy of selecting the migration scheme is further improved.
As an implementation manner of training a classification model by using a plurality of samples with preset object labels, a third embodiment of the present application discloses a data processing method, as shown in fig. 6, the method includes the following steps:
step S301: data characteristics of target data to be processed on at least one data dimension are obtained.
Step S302: and obtaining object processing conditions corresponding to the target data.
Step S303: obtaining a classification model corresponding to the object processing condition, wherein the process of training the classification model by using a plurality of samples with preset object labels comprises the following steps:
the method comprises the steps that at least one data sample is obtained, the data samples have data characteristics on at least one data dimension, the data samples are provided with preset object labels, and the object labels represent that the efficiency of processing the data samples by processing objects corresponding to the object labels meets corresponding object processing conditions; and training a classification model based on a decision tree algorithm based on the data characteristics of the data samples and the object labels thereof.
Fig. 7 is a schematic diagram of a decision tree classification model according to an embodiment of the present application, and referring to fig. 7, a root node is first created, a data sample is placed at the root node, an optimal feature is selected, and the data sample is divided into a plurality of sub-data samples according to the feature; if all the sub-data samples can be correctly classified, further constructing leaf nodes, and dividing all the sub-data samples into the corresponding leaf nodes; if some sub-data samples can not be classified correctly, selecting new optimal features for the sub-data samples, continuing to segment and construct corresponding leaf nodes, and so on recursively until all sub-data samples can be classified correctly or have no proper features.
At this time, each sub-data sample is divided into leaf nodes, that is, there is an explicit classification, and a decision tree classification model is generated. The decision tree classification model can generalize a group of classification rules in the data samples, and minimize the regularized maximum likelihood function.
Step S304: and inputting the data characteristics into the classification model to output a classification result.
Step S305: and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
Step S306: and processing the target data by the target object so that the processing efficiency of the target data meets the object processing condition.
According to the data processing method provided by the embodiment of the application, the classification model aiming at the data characteristics can be trained by utilizing the decision tree algorithm, so that the target object with the processing efficiency meeting the object processing conditions is obtained to process the target data, the interference of human factors can be reduced, and the accuracy of selecting the migration scheme is further improved.
Corresponding to the above data processing method, the present application also discloses a data processing apparatus, as shown in fig. 8, the apparatus includes:
an obtaining unit 101, configured to obtain data characteristics of target data to be processed in at least one data dimension; obtaining object processing conditions corresponding to the target data;
a determination unit 102 configured to determine a target object for the target data based on the data feature and the object processing condition;
a processing unit 103, configured to process the target data with the target object so that the processing efficiency of the target data satisfies the object processing condition.
Optionally, the data dimension includes:
a data table dimension of the target data, wherein the data table dimension comprises: number of rows, number of columns, data type, and one or more dimensions of the data table source.
Optionally, the processing efficiency of the target data meets the object processing condition, and includes:
the processing efficiency of the target data is higher than the target processing efficiency value in the target processing condition.
The data processing device provided by the embodiment of the application can perform multi-dimensional analysis on the target data, so that the target object with the processing efficiency meeting the object processing condition is obtained to process the target data, and the most appropriate migration scheme can be selected for the target data.
In another embodiment of the data processing apparatus disclosed in the present application, the determining unit 102 determines a target object for the target data based on the data feature and the object processing condition, including:
obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels; inputting the data characteristics into a classification model to output a classification result; and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
The data processing device provided by the embodiment of the application can train the classification model aiming at the data characteristics by utilizing the machine learning theory, so that the target object with the processing efficiency meeting the object processing condition is obtained to process the target data, the interference of human factors can be reduced, and the selection accuracy of the migration scheme is further improved.
In another embodiment of the data processing apparatus disclosed in the present application, the determining unit 102 trains the classification model using a plurality of samples having preset object labels, including:
the method comprises the steps that at least one data sample is obtained, the data samples have data characteristics on at least one data dimension, the data samples are provided with preset object labels, and the object labels represent that the efficiency of processing the data samples by processing objects corresponding to the object labels meets corresponding object processing conditions; and training a classification model based on a decision tree algorithm based on the data characteristics of the data samples and the object labels thereof.
The data processing device provided by the embodiment of the application can train the classification model aiming at the data characteristics by utilizing the decision tree algorithm so as to obtain the target object with the processing efficiency meeting the object processing conditions to process the target data, so that the interference of human factors can be reduced, and the accuracy of selecting the migration scheme is further improved.
For convenience of understanding, the following detailed description will be given taking the selection of the data table migration scheme as an example:
with the advent of the big data era, the data volume of a data table generated by a user is larger and larger, and how to extract and migrate the data table becomes a problem which needs to be solved urgently. At present, there are many technical means for data extraction and migration, such as JDBC, Sqoop, Talend, and Kettle, among the simplest. When a migration scheme is selected for a data table to be migrated, an appropriate technique needs to be manually selected, and at this time, the referenced data dimension is relatively single, for example, only depending on the size of the data volume, and other information of the data table is often ignored, so that the selected technique is often not optimal in efficiency.
In order to solve the above problem of the data table, an embodiment of the present application provides a data processing method for intelligently selecting a data migration scheme:
fig. 9 is a flowchart of a data processing method according to an embodiment of the present application, and fig. 9 is a flowchart of the data processing method:
first, S401: selecting data characteristics (such as row number, column number, data type and data source) of a data table related to data migration;
further, S402: respectively carrying out data migration on each data table with data characteristics (characteristic values of the data characteristics can be different) for training by using different migration schemes, selecting the migration scheme with the highest processing efficiency from the different migration schemes as a processing object of the data table, and calibrating an object label representing the processing object for the data table;
further, at S403: training a classification model of a decision tree algorithm by using a large number of data tables with object label-data characteristics to obtain a classification model;
finally, S404: and for the target data table to be processed, obtaining the data characteristics of the target data table, inputting the data characteristics into the classification model to obtain the optimal migration scheme of the target data table, and performing data migration on the target data table according to the optimal migration scheme.
The embodiment of the application has the following advantages:
obtaining an optimal migration scheme by carrying out multi-dimensional analysis on the data table; a conclusion is drawn from the actual data by using a machine learning theory, so that the interference of human factors is reduced; and the decision cost is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A method of data processing, comprising:
obtaining data characteristics of target data to be processed on at least one data dimension;
obtaining an object processing condition corresponding to the target data, wherein the object processing condition includes any one of the following: efficiency, accuracy, reliability of data migration;
determining a target object for the target data based on the data features and the object processing conditions;
processing the target data by the target object so that a processing result of the target data meets the object processing condition;
wherein determining a target object for the target data based on the data features and the object processing conditions comprises:
obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels, and the object labels represent that the result of processing the data samples by the processing objects corresponding to the object labels meets the corresponding object processing condition;
inputting the data features into the classification model to output a classification result;
and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
2. The method of claim 1, the training the classification model with a plurality of samples having preset object labels, comprising:
obtaining at least one data sample, wherein the data sample has data characteristics in at least one data dimension, and the data sample has a preset object label;
and training a classification model based on a decision tree algorithm based on the data characteristics of the data samples and the object labels thereof.
3. The method of claim 1, the data dimension, comprising:
a data table dimension of the target data, wherein the data table dimension comprises: number of rows, number of columns, data type, and one or more dimensions of the data table source.
4. The method of claim 1, the processing result of the target data satisfying the object processing condition, comprising:
the processing efficiency of the target data is higher than a target processing efficiency value in the target processing condition.
5. A data processing apparatus comprising:
the device comprises an obtaining unit, a processing unit and a processing unit, wherein the obtaining unit is used for obtaining data characteristics of target data to be processed on at least one data dimension; and obtaining an object processing condition corresponding to the target data, wherein the object processing condition includes any one of the following: efficiency, accuracy, reliability of data migration;
a determination unit configured to determine a target object for the target data based on the data feature and the object processing condition;
a processing unit configured to process the target data with the target object so that a processing result of the target data satisfies the object processing condition;
wherein the determining unit is specifically configured to:
obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels, and the object labels represent that the result of processing the data samples by the processing objects corresponding to the object labels meets the corresponding object processing condition;
inputting the data features into the classification model to output a classification result;
and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
6. An electronic device, comprising:
the memory is used for storing an application program and data generated by the running of the application program;
a processor for executing the application to perform the functions of: obtaining data characteristics of target data to be processed on at least one data dimension; obtaining an object processing condition corresponding to the target data, wherein the object processing condition includes any one of the following: efficiency, accuracy, reliability of data migration; determining a target object for the target data based on the data features and the object processing conditions; processing the target data by the target object so that a processing result of the target data meets the object processing condition;
wherein determining a target object for the target data based on the data features and the object processing conditions comprises:
obtaining a classification model corresponding to the object processing condition, wherein the classification model is trained by utilizing a plurality of samples with preset object labels, and the object labels represent that the result of processing the data samples by the processing objects corresponding to the object labels meets the corresponding object processing condition;
inputting the data features into the classification model to output a classification result;
and determining a target object matched with the object processing condition in at least one processing object corresponding to the classification result.
CN201910596822.3A 2019-07-02 2019-07-02 Data processing method and device and electronic equipment Active CN110309127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910596822.3A CN110309127B (en) 2019-07-02 2019-07-02 Data processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910596822.3A CN110309127B (en) 2019-07-02 2019-07-02 Data processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110309127A CN110309127A (en) 2019-10-08
CN110309127B true CN110309127B (en) 2021-07-16

Family

ID=68079006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910596822.3A Active CN110309127B (en) 2019-07-02 2019-07-02 Data processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110309127B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081648B (en) * 2010-12-20 2012-07-04 北京航空航天大学 Case library system and method for supporting complex product advanced manufacture
CN106297280A (en) * 2015-05-22 2017-01-04 高德软件有限公司 A kind of information processing method and device
CN107887032A (en) * 2016-09-27 2018-04-06 中国移动通信有限公司研究院 A kind of data processing method and device
CN108960264A (en) * 2017-05-19 2018-12-07 华为技术有限公司 The training method and device of disaggregated model
CN108363738B (en) * 2018-01-19 2022-05-17 上海电气集团股份有限公司 Recommendation method for industrial equipment data analysis algorithm
CN108470071B (en) * 2018-03-29 2022-02-18 联想(北京)有限公司 Data processing method and device
CN109165249B (en) * 2018-08-07 2020-08-04 阿里巴巴集团控股有限公司 Data processing model construction method and device, server and user side
CN109508217B (en) * 2018-10-22 2022-03-08 郑州云海信息技术有限公司 Data processing method, device, equipment and medium
CN109934249A (en) * 2018-12-14 2019-06-25 网易(杭州)网络有限公司 Data processing method, device, medium and calculating equipment

Also Published As

Publication number Publication date
CN110309127A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
US11915104B2 (en) Normalizing text attributes for machine learning models
CN110866181B (en) Resource recommendation method, device and storage medium
US11144817B2 (en) Device and method for determining convolutional neural network model for database
CN111459993B (en) Configuration updating method, device, equipment and storage medium based on behavior analysis
CN111274785B (en) Text error correction method, device, equipment and medium
CN110515951B (en) BOM standardization method and system, electronic device and storage medium
CN110909868A (en) Node representation method and device based on graph neural network model
US20110113009A1 (en) Outlier data point detection
CN110647995A (en) Rule training method, device, equipment and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN113032580B (en) Associated file recommendation method and system and electronic equipment
WO2017039684A1 (en) Classifier
JP2018194919A (en) Learning program, learning method and learning device
CN115546809A (en) Table structure identification method based on cell constraint and application thereof
CN110209780B (en) Question template generation method and device, server and storage medium
CN114692889A (en) Meta-feature training model for machine learning algorithm
CN113780365A (en) Sample generation method and device
CN110309127B (en) Data processing method and device and electronic equipment
CN115797955A (en) Table structure identification method based on cell constraint and application thereof
CN115270711A (en) Electronic signature method, electronic signature device, electronic apparatus, and storage medium
CN115203556A (en) Score prediction model training method and device, electronic equipment and storage medium
US9842112B1 (en) System and method for identifying fields in a file using examples in the file received from a user
CN109165097B (en) Data processing method and data processing device
JP2014038392A (en) Spam account score calculation device, spam account score calculation method and program
CN115660722B (en) Prediction method and device for silver life customer conversion and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant