CN112052253A - Data processing method, electronic device and storage medium - Google Patents

Data processing method, electronic device and storage medium Download PDF

Info

Publication number
CN112052253A
CN112052253A CN202010808517.9A CN202010808517A CN112052253A CN 112052253 A CN112052253 A CN 112052253A CN 202010808517 A CN202010808517 A CN 202010808517A CN 112052253 A CN112052253 A CN 112052253A
Authority
CN
China
Prior art keywords
data
data processing
metadata
time granularity
framework
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010808517.9A
Other languages
Chinese (zh)
Other versions
CN112052253B (en
Inventor
何通庆
陈斌
连庆仁
吴琳炜
林鸿其
上官致钊
庄贤荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN202010808517.9A priority Critical patent/CN112052253B/en
Publication of CN112052253A publication Critical patent/CN112052253A/en
Application granted granted Critical
Publication of CN112052253B publication Critical patent/CN112052253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method, electronic equipment and a storage medium. In the invention, through the task decomposition class provided by the predefined data processing frame, the time interval is divided according to the time granularity extracted from the data acquisition instruction, and the subtask corresponding to the obtained subinterval is generated according to the extracted operation type, the data processing interface provided by the data processing frame is called by the subtask to acquire the source data and the configuration data to be processed, the source data and the configuration data are processed based on the interface, and finally the processed data are stored into a queue file capable of acquiring contents as required, so that developers do not need to deeply know the Spark principle and the bottom technology, meanwhile, because the source data and the configuration data are associated together by the queue file, the configuration data do not need to be associated again in the subsequent development, or the associated configuration data is reduced, thereby greatly simplifying the processing of subsequent services and effectively improving the development efficiency.

Description

Data processing method, electronic device and storage medium
Technical Field
The embodiment of the invention relates to the technical field of big data programming, in particular to a data processing method, electronic equipment and a storage medium.
Background
Apache Spark is a fast, general-purpose engine designed specifically for large-scale distributed data distributed memory computation. Is an open-source Hadoop MapReduce-like universal parallel framework provided by AMP laboratories, university of california, burkeley. Since the intermediate output result of the MapReduce Job can be stored in the memory, the write-read HDFS (Hadoop Distributed File System) is not needed any more, and thus the Spark can be better applied to MapReduce algorithms requiring iteration, such as data mining and machine learning.
However, due to different specific services, a large number of configuration operations are often required in actual development, and the implementation process is complicated and difficult. In addition, due to the complexity of Spark development, when large data Spark development is implemented, developers need to have a deep understanding of Spark principles and underlying technologies, such as broadcast variables (broadcases), RDDs (flexible Distributed data sets) operators, and the like, which requires a large amount of labor cost to cultivate dedicated Spark developers.
Disclosure of Invention
An object of embodiments of the present invention is to provide a data processing method, an electronic device, and a storage medium, which aim to reduce the input of labor cost, simplify the code amount, and improve the development efficiency.
In order to solve the above technical problem, an embodiment of the present invention provides a data processing method, including the following steps:
acquiring a data processing instruction, and extracting a job type, a time interval and a time granularity from the data processing instruction;
dividing the time interval into a plurality of subintervals according to the time granularity based on a task decomposition class provided by a predefined data processing framework, and generating a subtask corresponding to each subinterval according to the operation type;
calling a data processing interface provided by the data processing framework through the subtask to acquire data to be processed, wherein the data to be processed comprises source data and configuration data;
and processing the source data and the configuration data based on a data processing interface provided by the data processing framework, and storing the processed data as a queue file in a column storage format.
An embodiment of the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data processing method as described above.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements a data processing method as described above.
Compared with the prior art, the embodiment of the invention extracts the three information of the operation type, the time interval and the time granularity from the acquired data processing instruction, then based on the task decomposition class provided by the predefined data processing frame, dividing the extracted time interval into a plurality of subintervals according to the extracted time granularity, and generating the subtask corresponding to each subinterval according to the extracted job type, and further acquiring the source data and the configuration data to be processed through a data processing interface provided by a predefined data processing framework, and processes the source data and the configuration data based on the interface, and finally saves the processed data as a queue file capable of acquiring content as required, the method and the device enable developers to realize the processing of subsequent services without deeply knowing the Spark principle and the underlying technology, thereby effectively reducing the investment of labor cost. Meanwhile, the source data and the configuration data are associated together by the request file, so that the subsequent development does not need to associate the configuration data again or associate the configuration data less, the processing of subsequent services is greatly simplified, and the development efficiency is effectively improved.
In addition, before the processing the source data and the configuration data based on the data processing interface provided by the data processing framework, the method further includes:
packaging the source data to obtain an elastic distributed data set RDD object for SQL statement query;
and packaging the configuration data to obtain a simple entity Bean object for SQL statement query.
In addition, the encapsulating the source data to obtain an elastic distributed data set RDD object for SQL statement query includes:
acquiring predefined metadata of source data according to a preset metadata name;
obtaining metadata to be packaged according to the metadata and preset filtering conditions;
marking the source data specified in the metadata as an elastic distributed data set string type object;
converting the elastic distributed data set character string type object into an elastic distributed data set structured type object by taking the metadata to be packaged as a filtering condition;
converting the metadata to be packaged and the elastic distributed data set structured type object into a Dataset < Row > object;
and encapsulating the metadata to be encapsulated, the elastic distributed data set structured type object and the Dataset < Row > object in the same data object to obtain an RDD object for SQL statement query.
In addition, the encapsulating the configuration data to obtain the simple entity Bean object for query by the SQL statement includes:
acquiring predefined metadata of configuration data according to a preset metadata name;
obtaining metadata to be packaged according to the metadata and preset filtering conditions;
converting the configuration data appointed in the metadata into a structured array by taking the metadata to be packaged as a filtering condition;
and encapsulating the metadata to be encapsulated and the structured array in the same data object to obtain a Bean object for SQL statement query.
In addition, the processing the source data and the configuration data based on the data processing interface provided by the data processing framework, and saving the processed data as a queue file in a column-type storage format includes:
associating the RDD object with the Bean object based on a data processing interface provided by the data processing frame to obtain an associated object in an RDD format;
and saving the associated object in the RDD format as a queue file in a column storage format.
In addition, before the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of subintervals according to the time granularity and generates a subtask corresponding to each subinterval according to the job type, the method further includes:
detecting whether the time granularity accords with a preset time granularity dereferencing rule or not, wherein the time granularity dereferencing rule specifies that the time granularity is integral multiple of the generation granularity of the data to be processed;
if yes, executing the task decomposition class provided based on the predefined data processing framework, dividing the time interval into a plurality of subintervals according to the time granularity, and generating a subtask corresponding to each subinterval according to the operation type;
otherwise, rounding the time granularity;
the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of subintervals according to the time granularity, and generates a subtask corresponding to each subinterval according to the job type, including:
and dividing the time interval into a plurality of subintervals according to the rounded time granularity based on the task decomposition class provided by the predefined data processing framework, and generating the subtask corresponding to each subinterval according to the operation type.
In addition, before the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of subintervals according to the time granularity and generates a subtask corresponding to each subinterval according to the job type, the method further includes:
detecting whether the starting time corresponding to the time interval is greater than the ending time corresponding to the time interval;
if the task resolution class is larger than the preset task resolution class, executing the task resolution class provided based on the predefined data processing frame, dividing the time interval into a plurality of subintervals according to the time granularity, and generating a subtask corresponding to each subinterval according to the operation type;
otherwise, carrying out exception prompting.
In addition, before the data processing interface provided based on the predefined data processing framework is called through the subtask to acquire the data to be processed, the method further includes:
abstracting a task decomposition class and a data processing class based on a Spark framework;
constructing a data processing interface for the data processing class;
and packaging the environment initialization method of the Spark framework, the read-write method of the parquet file provided by Spark SQL, the task decomposition class and the data processing interface to obtain the data processing framework.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
Fig. 1 is a detailed flowchart of a data processing method according to a first embodiment of the present invention;
FIG. 2 is a detailed flowchart of a data processing method according to a second embodiment of the present invention;
FIG. 3 is a detailed flowchart of a data processing method according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a junction configuration of a data processing apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic configuration diagram of a data processing apparatus according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The present embodiment relates to a data processing method, which is applied to an electronic device, such as a personal computer, a tablet computer, a smart phone, and the like, which are not listed here, and the present embodiment is not limited thereto.
The following describes implementation details of the data processing method of the present embodiment, and the following description is provided only for the sake of understanding and is not essential to the present embodiment.
The specific flow of the present embodiment is shown in fig. 1, and specifically includes the following steps:
step 101, acquiring a data processing instruction, and extracting a job type, a time interval and a time granularity from the data processing instruction.
Specifically, in practical applications, the data acquisition command may be triggered by a user, such as a developer, or may be automatically triggered by a timing device when a certain system time is reached, and in specific implementations, a person skilled in the art may set the data acquisition command as needed, which is not limited in this embodiment.
Further, the job type extracted from the data processing instruction described above is a job name of a job to be processed which is input by the user or acquired from a preset area.
The time interval is determined based on the start time startTime and the end time endTime in the data processing instruction.
The time granularity is set according to the service requirement, such as 1 hour, or 1 day, or 1 month.
And 102, dividing the time interval into a plurality of subintervals according to the time granularity based on a task decomposition class provided by a predefined data processing framework, and generating a subtask corresponding to each subinterval according to the job type.
It should be appreciated that in order to ensure that step 102 is performed successfully, the data processing framework needs to be packaged before step 102 is performed.
The data processing framework is specifically packaged as follows:
first, a task breakdown class and a data processing class are abstracted based on a Spark framework, and for convenience of description, the task breakdown class is defined as a parkationjob class, and the data processing class is defined as a parkationjob class.
Next, a data processing interface is constructed for the data processing class.
And finally, encapsulating the environment initialization method of the Spark framework, the read-write method of the parquet file provided by Spark SQL, the task decomposition class and the data processing interface to obtain the data processing framework.
To facilitate understanding of the ParquettionJob class, the ParquettionJob class is described below in connection with part of the pseudo code for that class:
partition [ ] partitions ═ createparation (type); creation of a queue object in a reflective mode, corresponding to a job name input by a type user, for processing a specific service
Figure BDA0002630039600000051
Figure BDA0002630039600000061
The pseudo code of the first line is used for creating a queue class in a user reflection mode, and the type is a job name input by a user and corresponds to a specific service to be processed; granularity is the time granularity, the start time of the batchStart subinterval, and the end time of the batchEnd subinterval.
For ease of understanding, the operations in step 102 above are described below with reference to examples:
it is assumed that the start time startTime of the time interval extracted from the data processing instruction is 2020-07-01, the end time endTime is 2020-07-31, and the time granularity is 1 day.
The time region 2020-07-01 to 2020-07-31 may be divided into 31 sub-intervals, i.e. one sub-interval per day, based on the parkationjob class provided by the above-described encapsulated data processing framework.
For the first subinterval, batchStart 2020-07-01 and batchEnd 2020-07-01+1 2020-07-02.
Accordingly, for the next subinterval, batchStart is assigned to batchEnd again, and each assignment is the end time of the previous subinterval.
Accordingly, the batchEnd of the next subinterval is batchStart (reassigned) + granularity.
It should be understood that the above is only an example, and the technical solution of the present embodiment is not specifically limited.
In addition, it is worth mentioning that, in practical applications, the data to be processed is generated in a certain period, that is, the generated particle size may be in hours, days, months, or the like. In other words, there may be strong correlation between data in the same generation granularity, so to avoid the situation that the data in the same generation granularity is split according to the subintervals divided by the proposed time granularity, before executing step 102, it may be detected whether the time granularity meets the preset time granularity value-taking rule.
Correspondingly, if the extracted time granularity is determined to meet the preset time granularity value rule through detection, the operation in the step 102 is executed; otherwise, processing the extracted time granularity.
Specifically, in this embodiment, the time granularity dereferencing rule specifies that the time granularity is an integer multiple of a generated granularity of the data to be processed.
For example, if the generation granularity of the data to be processed is 1 day, the time granularity extracted from the data processing instruction may be 1 day, 2 days, or 3 days. If the extracted time granularity is 1.5 days, the current time granularity does not accord with the time granularity value rule, and therefore the time granularity needs to be processed.
In this embodiment, the processing of the time granularity that does not meet the time granularity value rule is specifically a rounding operation.
The rounding operation may be rounding-up, for example, after rounding-up for 1.5 days, the time granularity after rounding-up is 2 days; for example, after rounding down for 1.5 days, the time granularity after rounding up is 1 day, in a specific implementation, a person skilled in the art may preset a rounding rule, and this embodiment is not limited to this.
In addition, in practical application, when the extracted time granularity does not accord with the time granularity value-taking rule, a prompt can be made on a user interface, and the user inputs the time granularity which accords with the requirement again.
Correspondingly, after rounding the unsatisfactory time granularity, the operation executed in step 102 is specifically to divide the time interval into a plurality of sub-intervals based on the task decomposition class provided by the predefined data processing framework according to the rounded time granularity, and generate the sub-task corresponding to each sub-interval according to the job type.
Further, in order to avoid as much as possible the situation that the sub-intervals divided according to the proposed time granularity have data within the same generation granularity split, before the user triggers data processing, the user may be prompted as to which time granularity value rule the time granularity input by the user needs to meet.
In addition, in order to ensure the smooth proceeding of the step 102 as much as possible, after the detection of the time granularity, it may be further detected whether the start time corresponding to the time interval is greater than the end time corresponding to the time interval.
Correspondingly, if yes, the operation of the step 102 is executed; otherwise, carrying out exception prompting.
Further, in practical applications, the selection criteria of the start time and the end time corresponding to the time interval may also be defined, for example, the start time must be greater than the end time, and the start time + time granularity is not greater than the time interval.
Step 103, calling a data processing interface provided by the data processing framework through the subtask to acquire data to be processed, wherein the data to be processed comprises source data and configuration data.
Specifically, the main purpose of the present embodiment is to associate the source data and the configuration data into a wide table, that is, to store the source data and the configuration data, which are originally stored separately, in a table. Therefore, the data to be processed acquired based on the data processing interface provided by the data processing framework must include source data and configuration data related to the source data.
The Raw data, which is the source data, is data generated in real time, such as information on a website accessed by a user, order information of the user, and the like.
The configuration data, that is, the data described in the Conf or Config file, is attribute information that is not time-efficient and is updated less frequently, such as the sex, age, and telephone of the user.
In addition, it is worth mentioning that, in order to enable a developer to develop the big data Spark only by using the SQL statement without deeply knowing the Spark principle and the underlying technology, the input of the labor cost is reduced, and the source data and the configuration data may be encapsulated before the step 103 is executed.
Specifically, the source data is packaged into an elastic distributed data set RDD object of SQL statement query, and the configuration data is packaged into a simple entity Bean object of the SQL statement query.
Correspondingly, through the operation of step 103, the obtained source data is specifically an RDD object, and the configuration data is specifically a Bean object.
And 104, processing the source data and the configuration data based on a data processing interface provided by the data processing framework, and storing the processed data as a queue file in a column storage format.
Specifically, when the source data is an RDD object and the configuration data is a Bean object, the data processing operation performed in step 104 is specifically to associate the RDD object and the Bean object based on a data processing interface provided by the data processing framework to obtain an associated object in an RDD format, and further store the associated object in the RDD format as a queue file in a column storage format.
Further, in practical application, in order to enable the content stored in the obtained parquet file to be required by a developer, before the RDD object and the Bean object are associated, the RDD object and the Bean object may be filtered according to filtering information input by the developer or preset, and then the content in the RDD object obtained by filtering is associated with the content in the Bean object.
As can be seen from the above description, the acquired data to be processed are the RDD object and the Bean object, and the obtained management object is in the RDD format, so the request class and the data processing interface for the class component in the data processing framework should be capable of processing the RDD object and the Bean object.
Note that the RDD object in the present embodiment is different from the conventional RDD in the packaging principle, and is hereinafter referred to as SmartRDD for distinction.
Accordingly, since the Bean object in the present embodiment is different from a conventional Bean in the packaging principle, the Bean object in the present embodiment is hereinafter referred to as SmartBean for distinction.
In order to facilitate understanding of the request class and the data processing interface for the class component, the following describes the request class and the data processing interface for the class component in conjunction with a partial pseudo code of the class:
Figure BDA0002630039600000081
it should be understood that, the above description is given of obtaining a part of pseudo code of SmartRDD, and since the pseudo code defines not only an interface for obtaining SmartRDD that needs to be associated, but also a storage location and a storage granularity of SmartRDD, and a time format and a time boundary that need to be followed when obtaining SmartRDD, a subsequent service processing module does not need to know the storage location and the storage granularity of data, and does not need to process the time format problem and the time boundary problem, thereby greatly simplifying an implementation process and further improving development efficiency.
Accordingly, a partial pseudo code of SmartBean is obtained, roughly as follows:
Figure BDA0002630039600000091
in the pseudo code, "savePath ═ getSavePath (type, granularity, batchStart)" is a position where the final partial file needs to be saved for calculation, and "RDD.
In addition, "SmartRDD" (batch start, batch end) is a function realized by a developer, taking the service a as an example, and assuming that a specific service class realized by the developer is aparrection, the function corresponds to aparrection.
Specifically, the AParquetation class inherits the Parquetation class, and is mainly used for implementing operations such as filtering and associating of data to be processed.
Figure BDA0002630039600000092
Figure BDA0002630039600000101
Through the above description, it can be found that, in the specific implementation, how many sub-regions are divided into the time region according to the time granularity, and at least how many partial files are finally obtained.
Still taking the time granularity of 1 day and the time region of 2020-07-01 to 2020-07-31 as an example, at least 31 partial files are finally obtained.
Furthermore, it should be noted that, in practical applications, the update frequency of the configuration data is relatively low, so that the same configuration data can be shared by the source data in the same time zone. That is, the number of the obtained smartrdds needs to be 31, while the number of the smartbeans needs to be only 1, and when 31 partial files are obtained through association, each SmartRDD is specifically associated with the SmartBean to obtain a new SmartRDD, and then the obtained 31 new smartrdds are converted into the corresponding 31 partial files for storage.
In order to facilitate understanding of the data processing method provided in the present embodiment, the following description is made with reference to an example:
assuming that based on a predefined data processing framework and a data processing instruction, the obtained SmartRDD is shown in table 1, and the obtained SmartBean is shown in table 2, based on a data processing interface provided by the data processing framework, after associating the content in table 1 with the content in table 2, the obtained associated object is specifically shown in table 3.
SmartRDD obtained from Table 1
Name (I) Amount of consumption
Zhang three 300
Li four 200
TABLE 2 SmartBean obtained
Name (I) Sex Native place Department of department
Zhang three For male Fujian tea CIM
Li four Woman Shanghai province CIM
TABLE 3 SmartRDD after Association
Name (I) Amount of consumption Sex Native place Department of department
Zhang three 300 For male Fujian tea CIM
Li four 200 Woman Shanghai province CIM
Because the data processing method in the embodiment associates the source data with the configuration data, only one table needs to be processed when data writing input operation is performed and the shuffle operation of the elements in the array is rearranged in a random order, which can greatly reduce the consumption of device resources compared with the existing method for processing different tables.
In order to more intuitively see the resource consumption of the existing scheme (called as the old version) and the scheme (called as the new version) when the user accesses the data, the following comparison is made from the IO and shuffle perspectives:
TABLE 4 New version Job resource consumption
Submitted Duration Input Shuffle Red Shuffle Write
2020-03-11 14:04:44 19s 2.9GB 123.4MB
2020-03-11 14:04:37 3s 65KB
2020-03-11 14:04:37 0.3s 756.3KB 18.5KB
2020-03-11 14:04:35 0.6s 3.7KB 756.2KB
2020-03-11 14:04:36 0.3s 2.5KB 83B
2020-03-11 14:04:36 0.2s 74.9KB 2.5KB
2020-03-11 14:04:35 0.5s 3.7KB 74.9KB
2020-03-11 14:04:36 0.2s 441.6KB 42KB
2020-03-11 14:04:35 1s 3.7KB 441.6KB
2020-03-11 14:04:18 16s 1944.9MB 3.7MB
2020-03-11 14:04:07 4s
TABLE 5 old version Job resource consumption
Figure BDA0002630039600000111
Figure BDA0002630039600000121
It can be seen that the input difference between the new version and the old version is huge, the 41GB is changed into 2.9GB, and the shuffle is also reduced from dozens of GB to several MB.
In addition, by associating and storing the source data and the configuration data as a queue file, although the number of data fields in one table is about half, since only one table is occupied and the table is finally converted into the queue file, the data storage is reduced by about 75% through comparison.
Therefore, in the data processing method provided in this embodiment, three information, namely, a job type, a time interval and a time granularity, are extracted from an acquired data processing instruction, then, based on a task decomposition class provided by a predefined data processing framework, the extracted time interval is divided into a plurality of sub-intervals according to the extracted time granularity, a sub-task corresponding to each sub-interval is generated according to the extracted job type, source data and configuration data to be processed are acquired through a data processing interface provided by the predefined data processing framework, the source data and the configuration data are processed based on the interface, and finally, the processed data are stored as a queue file capable of acquiring content as needed, so that subsequent statistics only need to read related columns and do not need to read unrelated columns. Meanwhile, the source data and the configuration data are associated together by the request file, so that the subsequent development does not need to associate the configuration data again or associate the configuration data less, the processing of subsequent services is greatly simplified, and the development efficiency is effectively improved.
In addition, when the associated object is stored as a request file, data can be compressed, so that disk and memory storage is greatly reduced, and IO consumption is reduced.
Meanwhile, the associated object is stored as the queue file, and when the business processing is carried out based on the queue file, the data types stored in the queue file do not need to be converted, so that the processing speed is further improved.
In addition, because the associated data is packaged into SmartRDD and SmartBean in advance, developers can directly use SQL sentences to realize the data, so that the developers do not need to deeply know the Spark principle and the bottom layer technology, and the investment of labor cost is effectively reduced.
In addition, because the request files corresponding to different sub-intervals in the same time region are related, the data are divided into a plurality of sub-regions for simplification during data processing, and the plurality of request files correspond to the plurality of request files, and can be quickly related during subsequent statistics, so that the development is simplified, and the subsequent operation and maintenance work is facilitated.
A second embodiment of the present invention relates to a data processing method. The second embodiment mainly encapsulates the source data, and then obtains the elastic distributed data set RDD object for SQL statement query.
As shown in fig. 2, the data processing method according to the second embodiment includes the steps of:
step 201, obtaining predefined metadata of source data according to a preset metadata name.
Specifically, the metadata in this embodiment is a representation of the source data. In practical applications, the metadata mainly includes a path of the source data, a file name, a column type, column description information, a column default value, and the like, which are not listed one by one here.
Furthermore, it should be understood that, in practical applications, in order to quickly locate and acquire the metadata of the predefined source data, a name for identifying the uniqueness of each metadata may be assigned to each metadata, so that when the metadata of the predefined source data is acquired, the metadata matched with the name of each metadata stored in the storage is directly matched according to the preset metadata name, and then the metadata matched with the metadata name for input is screened out.
For convenience of explanation, the present embodiment stores information such as column names, column types, column description information, column default values, and the like included in metadata in the form of a table.
Correspondingly, the name used to identify the uniqueness of the metadata is the table name.
Accordingly, the preset metadata name may be composed of a path + time + table name, such as/var/data/2020-05-22/10: 00:00. test.
As is apparent from the above description, in the present embodiment, the table name is a suffix of the truncated metadata name, and thus when the metadata name is "/var/data/2020-05-22/10: 00:00. test", the metadata stored in/var/data/2020-05-22/10: 00:00 and having the table name "test" can be found finally based on the path in the metadata name and the truncated suffix.
In addition, regarding the column NAME included in the metadata, in a specific implementation, a person skilled in the art may define the column NAME in MySQL, for example, set the column NAME to ID/NAME/AGE/. so.
In addition, in practical applications, the preset metadata name is specifically stored in a designated storage area of an electronic device for implementing the data processing method provided in the present embodiment, for example, a developer stores the name in advance as needed before data encapsulation, or stores the name in another electronic device capable of communicating with the electronic device, or inputs the name in real time as needed during implementation of the data processing method by the electronic device, which is not limited in the present embodiment.
To facilitate understanding of the specific form of the metadata, the following is described in connection with examples:
assume that the content recorded in the source data is:
1, Zhang San, 23
2, lie four, 14
3, wangwu, 89
Then, based on the above data to be processed, predefined metadata is shown in table 1:
TABLE 6 metadata of Source data
Column name Type (B) Whether or not to allow null Default value
ID INT N 0
NAME STRING Y
AGE INT Y
Specifically, "ID" in table 6 is used to correspond to the identification number of each user in the source data, such as 1, 2, and 3 in the above source data, "NAME" is used to correspond to the NAME of each user in the source data, such as zhang, lie, and wang, and "AGE" is used to correspond to the AGE of each user in the source data, such as 23, 14, and 89. And specifies the type of each column NAME (i.e., the column type mentioned above), such as INT, which is an integer type for "ID" and "AGE", STRING type for "NAME", which is a null type for "Y", which is not a null "N" (i.e., column description information), and whether a default value (i.e., column default value) needs to be set.
As shown in table 6, in table 1, the default portion may record some extended contents according to actual service needs, for example, the default portion corresponding to "ID" in table 1 records "0"; it is also possible not to record any information, such as default value parts corresponding to "NAME" and "AGE".
It should be understood that the above is only an example, and the technical solution of the present embodiment is not limited at all, and in practical applications, a person skilled in the art may set the technical solution as needed, and the present embodiment does not limit the technical solution.
Step 202, obtaining metadata to be packaged according to the metadata and a preset filtering condition.
Specifically, in this embodiment, the filtering condition is specifically a preset column NAME, such as "ID, NAME".
As can be seen from the example of the metadata (table 6) given in step 201, when the received preset filtering condition is "ID, NAME", specifically, information (column NAME, type, column description information, column default value) corresponding to two columns with column NAMEs "ID" and "NAME" is screened from the metadata, that is, the obtained metadata to be encapsulated is information related to the column NAMEs "ID" and "NAME".
For ease of understanding, the metadata given in table 6 is still used as an example here, and when the received preset filtering condition is "ID, NAME", the metadata to be encapsulated obtained according to the metadata recorded in table 1 and the preset filtering condition "ID, NAME" is shown in table 7.
Table 7 metadata to be packaged
Column name Type (B) Whether or not to allow null Default value
ID INT N 0
NAME STRING Y
It should be understood that the above is only illustrative, and the specific content of the present embodiment is not limited in any way.
Further, the preset column name may be stored in the electronic device in advance, similar to the preset metadata name in step 201, or may be input in real time according to business needs when implementing the data processing method in the present embodiment.
Step 203, marking the source data specified in the metadata as an elastic distributed data set character string type object.
As can be seen from the above description, the metadata includes a path and a file name of the source data. Therefore, when the source data specified in the metadata is marked as an elastic distributed data set string type object, specifically, a source data file recording the source data is determined according to the source data path recorded in the metadata and the file name of the source data, and then the source data in the source data file is marked as an elastic distributed data set string type object.
In order to facilitate understanding of the above-mentioned elastic distributed data set String type object, the Java programming language is taken as an example in this embodiment, and the above-mentioned elastic distributed data set String type object is specifically a Java rdd < String > object for the Java programming language.
That is to say, in practical application, the specific format corresponding to the elastic distributed data set string type object is named based on different programming languages, which is not limited in this embodiment, and those skilled in the art can set the format as needed.
And 204, converting the elastic distributed data set character string type object into an elastic distributed data set structured type object by taking the metadata to be packaged as a filtering condition.
Still taking the data processing method provided in this embodiment as an example for the Java programming language, when the elastic distributed dataset String type object is a Java rdd < String > object, the elastic distributed dataset structured type object obtained by the conversion is also for the Java programming language, that is, in this embodiment, the elastic distributed dataset structured type object obtained by the conversion is specifically a Java rdd < Row > object for the Java programming language.
In addition, as can be seen from the above description, the metadata further includes related information of the columns of the source data, such as column names, column types, column description information, column default values, and the like. Therefore, when the java rdd < String > object is converted into the java rdd < Row > object by using the metadata to be encapsulated as a filtering condition, specifically, by using the preset column name, the preset column name of the source data, and the specific information is converted according to the preset column name and the related information of the columns, such as the column name, the column type, the column description information, the column default value, and the like, for example, based on the column name used for input, all the related information corresponding to the current column name is found from the java rdd < String > object, and is marked in the Row form, so that the java rdd < Row > object is obtained.
It should be understood that, since the JavaRDD < String > object and the JavaRDD < Row > object are both common objects, the use of the two objects and the conversion between them can be realized by referring to relevant data by those skilled in the art, and the description of the embodiment is omitted.
In addition, it should be noted that, in practical applications, before the step 204 is executed, the metadata to be packaged may be converted into a schema format.
Correspondingly, when the elastic distributed dataset String type object, such as the Java rdd < String > object for the Java programming language, is converted into the elastic distributed dataset structured type object with the metadata to be encapsulated as the filtering condition, such as the Java rdd < Row > object for the Java programming language, the elastic distributed dataset String type object is specifically converted into the Java rdd < Row > object with the schema as the filtering condition.
Step 205, converting the metadata to be encapsulated and the elastic distributed data set structured type object into a Dataset < Row > object.
Namely, the converted Dataset < Row > object records the relationship between the metadata to be encapsulated and the elastic distributed data set structured type object.
Step 206, encapsulating the metadata to be encapsulated, the elastic distributed data set structured type object and the Dataset < Row > object in the same data object to obtain an RDD object for SQL statement query.
As can be seen from the above description, since the metadata to be packaged can be converted into schema, the operation in step 206 may specifically be: and encapsulating the schema, the elastic distributed data set structured type object and the Dataset < Row > object in the same data object, thereby obtaining the RDD object for SQL statement query.
In addition, in the present embodiment, regardless of the JavaRDD < String > object, the JavaRDD < Row > object, or the Dataset < Row > object, the tag is used to store the corresponding data, and not the data itself.
It is not difficult to find out through the above description that the data processing method provided in this embodiment marks the source data as an elastic distributed data set string type object according to the metadata corresponding to the source data, filters the metadata to be encapsulated from the metadata according to the preset filtering condition, converts the elastic distributed data set string type object into an elastic distributed data set structured type object with the metadata to be encapsulated as the filtering condition, converts the metadata to be encapsulated and the elastic distributed data set structured type object into a Dataset < Row > object, encapsulates the obtained metadata to be encapsulated, the elastic distributed data set structured type object, and the Dataset < Row > object into a data object in an RDD format capable of being queried by using SQL statements, so that a developer can implement the development of the large data Spark only by using SQL statements without deeply knowing the Spark principle and the underlying technology, thereby further reducing the input of labor cost.
A third embodiment of the present invention relates to a data processing method. The second embodiment mainly encapsulates the configuration data, and then obtains a simple entity Bean object for query by an SQL statement.
As shown in fig. 3, the data processing method according to the third embodiment includes the steps of:
step 301, obtaining predefined metadata of configuration data according to a preset metadata name.
Step 302, obtaining metadata to be encapsulated according to the metadata and a preset filtering condition.
It is to be understood that steps 301 and 302 in this embodiment are substantially the same as steps 201 and 202 in the second embodiment, and are not repeated here.
Step 303, converting the configuration data specified in the metadata into a structured array by using the metadata to be packaged as a filtering condition.
Specifically, in this embodiment, the metadata specifically includes a path of the configuration data and a file name of the configuration data. Therefore, when the configuration data specified in the metadata is converted into a structured array by taking the metadata to be packaged as a filtering condition, a configuration data file for recording the configuration data is determined according to the path of the configuration data recorded in the metadata and the file name of the configuration data; then, reading the configuration data recorded in the configuration data file by using a row unit, and separating each row of the read configuration data to obtain a character string array; and finally, converting the character string array into a structured array according to the element number to be packaged.
The operation of separating the read configuration data for each line may be separation based on a separator input by a user or a style based on a separator default by a system in practical applications, and a person skilled in the art may set the operation as needed, which is not limited in this embodiment.
Furthermore, it is worth mentioning that, since the predefined metadata in the practical application usually includes column information of a plurality of columns, for convenience of management, the columns may be stored in a set so as to find out the metadata to be packaged that satisfies the condition from the column name input by the user.
Accordingly, in practical applications, if the screened metadata to be packaged includes column information of a plurality of columns, for convenience of management, the columns in the metadata to be packaged may also be stored in a set.
And 304, encapsulating the metadata to be encapsulated and the structured array in the same data object to obtain a Bean object for SQL statement query.
As can be easily found from the above description, the data processing method provided in this embodiment encapsulates configuration data with a low update frequency into smartbeans, and provides a custom interface to manage smartbeans, thereby implementing multiplexing of smartbeans, avoiding repeated development even if the same SmartBean can be used by different services, and improving development efficiency.
In addition, the SmartBean and the SmartRDD both store the metadata to be packaged which are screened out based on the same filtering condition, so that the SmartBean and the SmartRDD are conveniently related based on a data processing interface provided by a data processing framework, the consumption of equipment resources is reduced as much as possible under the condition of not increasing Spark operation, and the equipment cost is saved.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A fourth embodiment of the present invention relates to a data processing apparatus, as shown in fig. 4, including: an instruction acquisition module 401, a task decomposition module 402, a data acquisition module 403, and a data processing module 404.
The instruction obtaining module 401 is configured to obtain a data processing instruction, and extract a job type, a time interval, and a time granularity from the data processing instruction; a task decomposition module 402, configured to divide the time interval into multiple sub-intervals according to the time granularity based on a task decomposition class provided by a predefined data processing framework, and generate a subtask corresponding to each sub-interval according to the job type; a data obtaining module 403, configured to call, by the subtask, a data processing interface provided by the data processing framework, and obtain data to be processed, where the data to be processed includes source data and configuration data; and a data processing module 404, configured to process the source data and the configuration data based on a data processing interface provided by the data processing framework, and store the processed data as a queue file in a column storage format.
In addition, in another example, the data processing apparatus further includes a source data encapsulation module and a configuration data encapsulation module.
Specifically, the source data encapsulation module is configured to encapsulate the source data to obtain an elastic distributed data set RDD object for SQL statement query.
Correspondingly, the configuration data packaging module is used for packaging the configuration data to obtain the simple entity Bean object for SQL statement query.
In addition, in another example, the source data encapsulation module is specifically configured to obtain metadata of predefined source data according to a preset metadata name; obtaining metadata to be packaged according to the metadata and preset filtering conditions; marking the source data specified in the metadata as an elastic distributed data set string type object; converting the elastic distributed data set character string type object into an elastic distributed data set structured type object by taking the metadata to be packaged as a filtering condition; converting the metadata to be packaged and the elastic distributed data set structured type object into a Dataset < Row > object; and encapsulating the metadata to be encapsulated, the elastic distributed data set structured type object and the Dataset < Row > object in the same data object to obtain an RDD object for SQL statement query.
In addition, in another example, the configuration data encapsulation module is specifically configured to obtain metadata of predefined configuration data according to a preset metadata name; obtaining metadata to be packaged according to the metadata and preset filtering conditions; converting the configuration data appointed in the metadata into a structured array by taking the metadata to be packaged as a filtering condition; and encapsulating the metadata to be encapsulated and the structured array in the same data object to obtain a Bean object for SQL statement query.
In addition, in another example, the data processing module 404 is specifically configured to associate the RDD object and the Bean object based on a data processing interface provided by the data processing framework, so as to obtain an associated object in an RDD format; and saving the associated object in the RDD format as a queue file in a column storage format.
Further, in another example, the data processing apparatus further includes: the device comprises a time granularity detection module and a time granularity rounding module.
Specifically, the time granularity detection module is configured to detect whether the time granularity meets a preset time granularity dereferencing rule, where the time granularity dereferencing rule specifies that the time granularity is an integer multiple of a generated granularity of the data to be processed.
Correspondingly, if the task is met, the task decomposition module 402 is triggered to execute a task decomposition class provided based on a predefined data processing framework, the time interval is divided into a plurality of sub-intervals according to the time granularity, and a sub-task corresponding to each sub-interval is generated according to the operation type; otherwise, the time granularity rounding module is informed to round the time granularity.
Correspondingly, after the time granularity rounding module rounds the time granularity, the task decomposition module 402 is specifically configured to divide the time interval into a plurality of sub-intervals according to the rounded time granularity based on a task decomposition class provided by a predefined data processing framework, and generate a sub-task corresponding to each sub-interval according to the job type.
Further, in another example, the data processing apparatus further includes: the device comprises a time interval detection module and an abnormity prompting module.
Specifically, the time interval detection module is configured to detect whether a start time corresponding to the time interval is greater than an end time corresponding to the time interval.
Correspondingly, if the number of the sub-tasks is larger than the number of the sub-intervals, triggering the task decomposition module 402 to execute a task decomposition class provided based on a predefined data processing framework, dividing the time interval into a plurality of sub-intervals according to the time granularity, and generating a sub-task corresponding to each sub-interval according to the operation type; otherwise, the abnormity prompting module is informed to prompt abnormity.
In addition, in another example, the data processing apparatus further comprises a data processing framework encapsulation module.
Specifically, the data processing framework encapsulation module is configured to abstract a task decomposition class and a data processing class based on a Spark framework; constructing a data processing interface for the data processing class; and packaging the environment initialization method of the Spark framework, the read-write method of the parquet file provided by Spark SQL, the task decomposition class and the data processing interface to obtain the data processing framework.
It will be appreciated that this embodiment is an apparatus embodiment corresponding to the first, second or third embodiment and that this embodiment may be implemented in conjunction with the first, second or third embodiment. The related technical details mentioned in the first, second, or third embodiment are still valid in this embodiment, and are not repeated here for the sake of reducing repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first, or second, or third embodiment.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
A fifth embodiment of the present invention relates to an electronic device, as shown in fig. 5, including at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501; the memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501, so that the at least one processor 501 can execute the data processing method described in the first or second embodiment.
The memory 502 and the processor 501 are coupled by a bus, which may include any number of interconnected buses and bridges that couple one or more of the various circuits of the processor 501 and the memory 502 together. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 501 is transmitted over a wireless medium through an antenna, which further receives the data and transmits the data to the processor 501.
The processor 501 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 502 may be used to store data used by processor 501 in performing operations.
A sixth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program realizes the above-described data processing method embodiments when executed by a processor.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for practicing the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A data processing method, comprising:
acquiring a data processing instruction, and extracting a job type, a time interval and a time granularity from the data processing instruction;
dividing the time interval into a plurality of subintervals according to the time granularity based on a task decomposition class provided by a predefined data processing framework, and generating a subtask corresponding to each subinterval according to the operation type;
calling a data processing interface provided by the data processing framework through the subtask to acquire data to be processed, wherein the data to be processed comprises source data and configuration data;
and processing the source data and the configuration data based on a data processing interface provided by the data processing framework, and storing the processed data as a queue file in a column storage format.
2. The data processing method of claim 1, wherein before the processing the source data and the configuration data based on the data processing interface provided by the data processing framework, the method further comprises:
packaging the source data to obtain an elastic distributed data set RDD object for SQL statement query;
and packaging the configuration data to obtain a simple entity Bean object for SQL statement query.
3. The data processing method according to claim 2, wherein the encapsulating the source data to obtain an elastic distributed data set RDD object for SQL statement query includes:
acquiring predefined metadata of source data according to a preset metadata name;
obtaining metadata to be packaged according to the metadata and preset filtering conditions;
marking the source data specified in the metadata as an elastic distributed data set string type object;
converting the elastic distributed data set character string type object into an elastic distributed data set structured type object by taking the metadata to be packaged as a filtering condition;
converting the metadata to be packaged and the elastic distributed data set structured type object into a Dataset < Row > object;
and encapsulating the metadata to be encapsulated, the elastic distributed data set structured type object and the Dataset < Row > object in the same data object to obtain an RDD object for SQL statement query.
4. The data processing method of claim 2, wherein the encapsulating the configuration data to obtain a simple entity Bean object for SQL statement query comprises:
acquiring predefined metadata of configuration data according to a preset metadata name;
obtaining metadata to be packaged according to the metadata and preset filtering conditions;
converting the configuration data appointed in the metadata into a structured array by taking the metadata to be packaged as a filtering condition;
and encapsulating the metadata to be encapsulated and the structured array in the same data object to obtain a Bean object for SQL statement query.
5. The data processing method according to claim 2, wherein the processing the source data and the configuration data based on the data processing interface provided by the data processing framework, and saving the processed data as a column-wise storage format request file comprises:
associating the RDD object with the Bean object based on a data processing interface provided by the data processing frame to obtain an associated object in an RDD format;
and saving the associated object in the RDD format as a queue file in a column storage format.
6. The data processing method according to claim 1, wherein before the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of sub-intervals according to the time granularity, and generates a sub-task corresponding to each sub-interval according to the job type, the method further comprises:
detecting whether the time granularity accords with a preset time granularity dereferencing rule or not, wherein the time granularity dereferencing rule specifies that the time granularity is integral multiple of the generation granularity of the data to be processed;
if yes, executing the task decomposition class provided based on the predefined data processing framework, dividing the time interval into a plurality of subintervals according to the time granularity, and generating a subtask corresponding to each subinterval according to the operation type;
otherwise, rounding the time granularity;
the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of subintervals according to the time granularity, and generates a subtask corresponding to each subinterval according to the job type, including:
and dividing the time interval into a plurality of subintervals according to the rounded time granularity based on the task decomposition class provided by the predefined data processing framework, and generating the subtask corresponding to each subinterval according to the operation type.
7. The data processing method according to claim 6, wherein before the task decomposition class provided based on the predefined data processing framework divides the time interval into a plurality of sub-intervals according to the time granularity, and generates a sub-task corresponding to each sub-interval according to the job type, the method further comprises:
detecting whether the starting time corresponding to the time interval is greater than the ending time corresponding to the time interval;
if the task resolution class is larger than the preset task resolution class, executing the task resolution class provided based on the predefined data processing frame, dividing the time interval into a plurality of subintervals according to the time granularity, and generating a subtask corresponding to each subinterval according to the operation type;
otherwise, carrying out exception prompting.
8. The data processing method according to any one of claims 1 to 7, wherein before the data processing interface provided based on a predefined data processing framework is called by the subtask to obtain the data to be processed, the method further comprises:
abstracting a task decomposition class and a data processing class based on a Spark framework;
constructing a data processing interface for the data processing class;
and packaging the environment initialization method of the Spark framework, the read-write method of the parquet file provided by Spark SQL, the task decomposition class and the data processing interface to obtain the data processing framework.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1 to 8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the data processing method of any one of claims 1 to 8.
CN202010808517.9A 2020-08-12 2020-08-12 Data processing method, electronic device and storage medium Active CN112052253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010808517.9A CN112052253B (en) 2020-08-12 2020-08-12 Data processing method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010808517.9A CN112052253B (en) 2020-08-12 2020-08-12 Data processing method, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN112052253A true CN112052253A (en) 2020-12-08
CN112052253B CN112052253B (en) 2023-12-01

Family

ID=73602496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010808517.9A Active CN112052253B (en) 2020-08-12 2020-08-12 Data processing method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112052253B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612540A (en) * 2020-12-18 2021-04-06 北京达佳互联信息技术有限公司 Data model configuration method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032375A1 (en) * 2015-04-29 2018-02-01 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus
CN109542889A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Stream data column storage method, device, equipment and storage medium
CN110147377A (en) * 2019-05-29 2019-08-20 大连大学 General polling algorithm based on secondary index under extensive spatial data environment
CN111104417A (en) * 2019-12-05 2020-05-05 苏宁云计算有限公司 Spark Sql external data source device, implementation method and system
CN111309463A (en) * 2020-02-05 2020-06-19 北京明略软件系统有限公司 Method and device for determining task execution time and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032375A1 (en) * 2015-04-29 2018-02-01 Huawei Technologies Co., Ltd. Data Processing Method and Apparatus
CN109542889A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Stream data column storage method, device, equipment and storage medium
CN110147377A (en) * 2019-05-29 2019-08-20 大连大学 General polling algorithm based on secondary index under extensive spatial data environment
CN111104417A (en) * 2019-12-05 2020-05-05 苏宁云计算有限公司 Spark Sql external data source device, implementation method and system
CN111309463A (en) * 2020-02-05 2020-06-19 北京明略软件系统有限公司 Method and device for determining task execution time and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612540A (en) * 2020-12-18 2021-04-06 北京达佳互联信息技术有限公司 Data model configuration method and device, electronic equipment and storage medium
CN112612540B (en) * 2020-12-18 2024-04-09 北京达佳互联信息技术有限公司 Data model configuration method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112052253B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
JP7021228B2 (en) Blockchain-based data storage and query methods and devices
CN107918666B (en) Data synchronization method and system on block chain
CN107040578B (en) Data synchronization method, device and system
US20180107725A1 (en) Data Storage Method and Apparatus, and Data Read Method and Apparatus
CN111324610A (en) Data synchronization method and device
CN111767143A (en) Transaction data processing method, device, equipment and system
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
US20140317137A1 (en) Log management computer and log management method
CN103425780A (en) Data inquiry method and data inquiry device
EP3076309A1 (en) Programmable logic controller, data collection apparatus, database access method, and database access program
CN110807013B (en) Data migration method and device for distributed data storage cluster
CN110019111B (en) Data processing method, data processing device, storage medium and processor
CN110737594A (en) Database standard conformance testing method and device for automatically generating test cases
CN113177090A (en) Data processing method and device
CN110688361A (en) Data migration method, electronic device and computer equipment
US11580251B1 (en) Query-based database redaction
CN112052253B (en) Data processing method, electronic device and storage medium
CN112346761B (en) Front-end resource online method, device, system and storage medium
CN110083602B (en) Method and device for data storage and data processing based on hive table
CN112579676B (en) Method, device, storage medium and equipment for processing data among heterogeneous systems
CN107656868B (en) Debugging method and system for acquiring thread name by using thread private data
CN112052254B (en) Data encapsulation method, electronic device and storage medium
CN111143329B (en) Data processing method and device
CN109902067B (en) File processing method and device, storage medium and computer equipment
CN113238750A (en) Case form designer and target form interface generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant