CN111241182A - Data processing method and apparatus, storage medium, and electronic apparatus - Google Patents

Data processing method and apparatus, storage medium, and electronic apparatus Download PDF

Info

Publication number
CN111241182A
CN111241182A CN202010062392.XA CN202010062392A CN111241182A CN 111241182 A CN111241182 A CN 111241182A CN 202010062392 A CN202010062392 A CN 202010062392A CN 111241182 A CN111241182 A CN 111241182A
Authority
CN
China
Prior art keywords
data
target
format
source
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010062392.XA
Other languages
Chinese (zh)
Inventor
祝梦遥
李仓良
杨学毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010062392.XA priority Critical patent/CN111241182A/en
Publication of CN111241182A publication Critical patent/CN111241182A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Abstract

The application provides a data processing method and device, a storage medium and an electronic device, wherein the method comprises the following steps: reading source data in a source data format from a source database, wherein the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service in the plurality of services; converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship; analyzing the intermediate data, and extracting target field information of a target field of the intermediate data; assembling the target field information into target data in a target data format; and storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format. By the method and the device, the problem that the service development efficiency is low due to complex operation and easy error in a data extraction mode in the related technology is solved, data conversion operation is simplified, and the service development efficiency is improved.

Description

Data processing method and apparatus, storage medium, and electronic apparatus
Technical Field
The present application relates to the field of computers, and in particular, to a data processing method and apparatus, a storage medium, and an electronic apparatus.
Background
At present, on the basis of big data and cloud computing, a large number of businesses often adopt databases such as HBase to store mass data. Because data storage is often performed in a binary PB format in general HBase storage, and information stored in each Column is large, when the HBase is used by an actual service party, batch processing is often required to be performed for reading, if the whole HBase data is directly read and then extracted and analyzed, considerable resource waste and efficiency reduction are caused, and development and maintenance costs are generally high.
Therefore, in practical application, the data part (possibly only a small part of the whole HBase data) used by the service party is extracted into Hive in advance, which is convenient for the service party to use.
However, the process of extracting HBase data to Hive is realized in a PB hard coding manner, that is, fields in Hive need to be added or modified each time in a series of manners such as modifying codes, compiling and packaging, and the like, which is tedious in operation and prone to errors, resulting in low service development efficiency.
Therefore, the data extraction method in the related art has the problem of low business development efficiency caused by complex operation and easy error.
Disclosure of Invention
The embodiment of the application provides a data processing method and device, a storage medium and an electronic device, and aims to at least solve the problem that the service development efficiency is low due to complex operation and easy error existing in a data extraction mode in the related technology.
According to an aspect of an embodiment of the present application, there is provided a data processing method, including: reading source data in a source data format from a source database, wherein the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service in the plurality of services; converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship; analyzing the intermediate data, and extracting target field information of a target field of the intermediate data; assembling the target field information into target data in a target data format; and storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
Optionally, before reading the source data in the source data format from the source database, the method further includes: reading configuration information, wherein the configuration information comprises: path information for representing a path of the target field in the intermediate data format; and constructing a parser corresponding to the intermediate data format according to the path information, wherein the parser is used for extracting the target field information from the intermediate data.
Optionally, the configuration information further includes data table information for indicating the target data table, and after reading the configuration information, the method further includes: and constructing a target data table of a target data table mode according to the data table information, wherein the target data table mode corresponds to the target data format.
Optionally, analyzing the intermediate data, and extracting target field information of a target field of the intermediate data includes: and analyzing the intermediate data by using an analyzer, and extracting target field information of a target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
According to another aspect of the embodiments of the present application, there is provided a data processing apparatus including: the system comprises a first reading unit, a second reading unit and a third reading unit, wherein the first reading unit is used for reading source data in a source data format from a source database, the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service in the plurality of services; the conversion unit is used for converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship; the extraction unit is used for analyzing the intermediate data and extracting the target field information of the target field of the intermediate data; the assembling unit is used for assembling the target field information into target data in a target data format; and the storage unit is used for storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
Optionally, the apparatus further comprises: a second reading unit, configured to read configuration information before reading the source data in the source data format from the source database, where the configuration information includes: path information for representing a path of the target field in the intermediate data format; and the first construction unit is used for constructing a parser corresponding to the intermediate data format according to the path information, wherein the parser is used for extracting the target field information from the intermediate data.
Optionally, the configuration information further includes data table information used for representing a target data table, and the apparatus further includes: and the second construction unit is used for constructing a target data table of the target data table mode according to the data table information after the configuration information is read, wherein the target data table mode corresponds to the target data format.
Optionally, the extraction unit comprises: and analyzing the intermediate data by using an analyzer, and extracting target field information of a target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
By the method, the source data in the source data format is read from the source database in a mode of performing data conversion by adopting the intermediate data format with the hierarchical relationship, wherein the service data of a plurality of services are stored in the source database, and the source data is the service data corresponding to the target service in the plurality of services; converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship; analyzing the intermediate data, and extracting target field information of a target field of the intermediate data; assembling the target field information into target data in a target data format; the target data is stored in a target data table, wherein the target data table is used for storing the data in the target data format, and the intermediate data format (such as JSON format and xml) which can be conveniently converted with other data formats is adopted, so that the data conversion operation can be simplified, errors in the conversion operation are avoided, the technical effects of reducing the service development cost and the maintenance cost and improving the service development efficiency are achieved, and the problem of low service development efficiency caused by complex operation and easy error in a data extraction mode in the related technology is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a block diagram of an alternative server hardware configuration according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative data processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another alternative data processing method according to an embodiment of the present application;
FIG. 5 is a flow diagram of another alternative data processing method according to an embodiment of the present application;
fig. 6 is a block diagram of an alternative data processing apparatus according to an embodiment of the present application.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The technical terms referred to in the embodiments of the present application are explained below as follows:
HBase: hadoop Database, a distributed computing Database;
HDFS (Hadoop distributed File System): a Hadoop Distributed File System;
hive: a Hadoop-based data warehouse tool;
JSON: JavaScript Object Notation, JS Object numbered Notation;
PB format: protobuf format.
According to an aspect of an embodiment of the present application, there is provided a data processing method. Alternatively, the method may be performed in a server or similar computing device. Taking an example of an application running on a server, fig. 1 is a block diagram of a hardware structure of an optional server according to an embodiment of the present application. As shown in fig. 1, the server 10 may include one or more processors 102 (only one is shown in fig. 1), wherein the processors 102 may include, but are not limited to, a processing device such as an MCU (micro controller Unit) or an FPGA (Field Programmable Gate Array) and a memory 104 for storing data, and optionally, the server may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and is not intended to limit the structure of the server. For example, the server 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the data processing method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to server 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 10. In one example, the transmission device 106 includes a NIC (Network Interface Controller) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be an RF (Radio Frequency) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, a data processing method operating on the server is provided, and fig. 2 is a flowchart of an alternative data processing method according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
step S202, reading source data in a source data format from a source database, wherein the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service of the plurality of services;
step S204, converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship;
step S206, analyzing the intermediate data, and extracting target field information of a target field of the intermediate data;
step S208, assembling the target field information into target data in a target data format;
step S210, saving the target data into a target data table, where the target data table is used to save the data in the target data format.
Alternatively, the executing subject of the above steps may be a server, etc., but is not limited thereto, and other devices capable of performing data processing may be used to execute the method in the embodiment of the present application.
By the embodiment, the mode of performing data conversion by using the intermediate data format with the hierarchical relationship is adopted, and the intermediate data format (e.g. JSON and xml) which can be conveniently converted with other data formats is adopted, so that the problem of low service development efficiency caused by complex operation and easy error in a data extraction mode in the related technology is solved, the data conversion operation is simplified, the error in the conversion operation is avoided, the service development cost and the maintenance cost are reduced, and the service development efficiency is improved.
The data processing method in the embodiment of the present application is described below with reference to fig. 2.
In step S202, source data in a source data format is read from a source database, where the source database stores service data of multiple services, and the source data is service data corresponding to a target service in the multiple services.
The data processing method in the embodiment of the application can be applied to scenes needing data conversion. The following scenarios can be applied: the source database stores service data of a plurality of services; data (source data and service data corresponding to a target service in a plurality of services) required by a service party is extracted from a source database into a target data table, and the service party directly extracts the data from the target data table when the service party needs to use the data. For example, output fields can be configurably specified from a data source that can be expressed as a Json type to any output data source.
For example, data (meta data such as video album) can be extracted from HBase in configurable manner into Hive table.
As an alternative embodiment, before reading the source data in the source data format from the source database, the configuration information may be read, where the configuration information includes: path information for representing a path of the target field in the intermediate data format; and constructing a parser corresponding to the intermediate data format according to the path information, wherein the parser is used for extracting the target field information from the intermediate data.
The user can store the path information into the configuration center, wherein the path information is used for representing the path of the target field in the intermediate data format (the path of the target field in the intermediate data format), and which information in the intermediate data is extracted can be determined according to the path information.
The configuration center (or other devices capable of reading data from the configuration center) may read the configuration information, obtain path information in the configuration information, and construct a parser corresponding to the intermediate data format according to the path information.
The parser can parse the data in the intermediate data format, so as to extract the data information in the target field in the data. For intermediate data, the parser may be used to extract target field information from the intermediate data.
For example, the user may store the JsonPath path of the target field (which represents the path of the target field in Json format) into the configuration center. The configuration center may generate the parser from the JsonPath path.
The path information has configurable characteristics. For different target data formats, different path information may be configured to define field information for extracting different fields in the intermediate data format, and different target data formats may differ in the number of extracted fields, order of fields, and the like.
By the embodiment, the parser is constructed by the information configured by the configuration information, so that the accuracy and efficiency of the intermediate data parsing can be improved; the configurable configuration information can adapt to conversion among different data formats, and the expandability and replaceability of data format conversion are improved.
As an optional embodiment, the configuration information further includes data table information, and after the configuration information is read, a target data table of a target data table mode may be constructed according to the data table information, where the target data table mode corresponds to the target data format.
In addition to the path information, the data sheet information may also be stored in the configuration center. The data table information may be used to represent a target data table. For example, the address information of the target data table, the name of the target data table, the Schema of the target data table, etc., and the Schema of the data table refers to the structure information of the data table, for example, the attribute represented by each column in the data table, etc.
The configuration center (or other devices capable of reading data from the configuration center) can read the data table information in the configuration information, and construct a target data table in a target data table (Schema) mode according to the data table information. The target data table mode corresponds to the target data format, and data in the target data format can be stored in the target data table according to the corresponding relation between the target data format and the target data table mode.
For example, the user may also output information of the data table to the configuration center. The configuration center can construct the Schema of the output table according to the information of the output data table.
According to the embodiment, the target data table of the target data table mode is constructed in advance according to the data table information in the configuration information, and the accuracy and the efficiency of target data storage can be improved.
In addition to path information (and data table information), the user may also store data source information in the configuration center. The data source information is used to indicate data source information of the source database, for example, address information of the source database, a name of a data table in the source database, a Schema (Schema) of the data table, and the like.
For example, the source data is stored in HBase in PB format, and the intermediate data is data in Json format. The user can store the data source information, the information of the output data table (data table information) and the JsonPath path corresponding to the field in the configuration center. The server can construct a Parser, read the configuration, put into the memory, and construct a Schema of the output table. The Json format is a data format with a hierarchical relationship, different fields of the Json format correspond to different paths, and the JsonPath is a path of a target field corresponding to a target service in the JSON format, namely, the Json format is used for specifying the position of the target field in the Json format. There may be one or more target fields, and correspondingly, JsonPath may be a combination of one or more field paths.
The source data may be stored in a source database in a source data format. The source database stores service data of a plurality of services. The service data of different services can be distinguished by different service identifications, and the service data of different service parties can be distinguished by different service party identifications.
After receiving the data request of the service party or before receiving the data request of the service party according to the data requirement of the service party, the data part used by the service party is pre-extracted into a target data table (e.g. Hive), which is convenient for the service party to use.
When data extraction is performed, source data in a source data format may be read from a source database. The source data may be all service data of a certain service party (target service party), may be service data of a certain service party (target service party), and may also be service data of a certain service of a plurality of service parties, and accordingly, the manner of reading the source data in the source data format may be: and reading the source data from the source database according to at least one of the service identifier and the service party identifier. The read source data may be data having the same service identification and/or the same service party identification.
As an alternative embodiment, reading the source data in the source data format from the source database includes: reading source data in a PB format from an Hbase database, wherein the source database is the Hbase database, and the source data is in the PB format.
The source data may be saved to Hbase in PB format, which may be conveniently converted to an intermediate data format (e.g., Json format).
With the present embodiment, saving the source data into Hbase by the PB format can improve the efficiency of data saving, and the efficiency of data conversion.
In step S204, the source data is converted into intermediate data in an intermediate data format, wherein the intermediate data format is a data format having a hierarchical relationship.
After the source data is acquired, the acquired source data may be converted into intermediate data in an intermediate data format. Different source data formats and target data formats, the conversion mode can be different.
As an alternative embodiment, converting the source data into intermediate data in the intermediate data format includes: and converting the source data in the PB format into intermediate data in the JSON format, wherein the source data is in the PB format, and the intermediate data is in the JSON format.
The PB-format source data can be converted into JSON-format intermediate data, the JSON format can be mutually converted with various data formats, the method has the characteristics of simple conversion operation and high conversion speed, and the PB-format data and the JSON-format data can be quickly converted.
For example, complex raw data (raw PB data) is read into a memory and converted into json format. The PB format conversion to json format uses an existing method, for example, a method implemented by google corporation (com.
With the present embodiment, by converting PB-format data into JSON-format data, the efficiency of data conversion can be improved.
In step S206, the intermediate data is analyzed, and the target field information of the target field of the intermediate data is extracted.
For the intermediate data in the intermediate data format, a corresponding parser can be used for data parsing, and field information of a specific field is extracted from the intermediate data, so as to assemble the target data format.
As an alternative embodiment, parsing the intermediate data, and extracting the target field information of the target field of the intermediate data includes: and analyzing the intermediate data by using an analyzer, and extracting target field information of a target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
The data can be analyzed by using a Parser of Parser, and Json format data in a data source can be extracted and converted into a data format matched with a target table (target data table) by using a JsonPath path or a custom analysis method.
In addition, the Parse parser can support not only JsonPath conversion but also a custom parsing method, namely, a Java method is written, and then the Java method is called through a Java reflection method, so that a personalized parsing scheme with complex logic is realized.
For example, the JsonPath path or the custom function corresponding to each field in the output Hive table and Hive table may be configured, and the configured JsonPath path or the custom function may be used to analyze the intermediate data in the Json format.
Through the embodiment, the Json format intermediate data is subjected to data analysis through JsonPath, and the data analysis efficiency can be improved.
In step S208, the target field information is assembled into target data in a target data format.
After the field part of the service requirement in the source data (HBase data) is dynamically parsed, the parsed field information may be subjected to data assembly according to a target data format, so as to obtain target data in the target data format.
In step S210, the target data is saved into a target data table, wherein the target data table is used for saving data in a target data format.
The assembled target data may be saved to a target data table, where the target data table may be a Hive table.
As an alternative embodiment, saving the target data into the target data table includes: and outputting the target data to a distributed file system directory corresponding to the Hive table, and constructing a corresponding date partition, wherein the target data table is the Hive table.
Target data in a target data format can be output to an HDFS directory corresponding to the Hive table, (most of hives are stored in the HDFS, the HDFS is a storage form/medium of hives, hives can also be stored in Amazon S3 and the like), and a corresponding date partition is constructed. The Hive table is stored as a date partition, one snapshot is taken every day, and historical data can be conveniently traced.
After the target data is stored in the target data table, the requested data can be extracted from the target data table according to the request of the service party, and the requested data is sent to the service party, so that the data requirement of the service party is ensured, and the use experience of the service party is improved.
Through the embodiment, the object data is stored through the Hive table, so that the data processing efficiency can be improved, and the use experience of a service party is ensured.
The following describes a data processing method in the embodiment of the present application with reference to an alternative example. The data processing method in this example may be applied in a content data warehouse system. By the data processing method in this example. And the huge data stored in the HBase is converted into Hive table data which is low in cost and easy to use according to the specific requirements of a service party, so that the development and maintenance cost is greatly reduced.
In this example, a plurality of technical solutions (PB/JSON path/Hive) are combined together to be used, and a format of PB format- > JSON format- (JSON path extraction conversion) > adapted Hive is implemented to replace an original relatively inefficient manner (writing code implements PB- > Hive), and meanwhile, the technical solutions (PB/JSON path/Hive) are all extensible and replaceable.
With reference to fig. 3, 4 and 5, the data processing method in the present example may include the following steps:
step S502, the user stores the data source information, the output data table and the JsonPath path corresponding to the field into the configuration center.
Step S504, a Parser Parser is built, configuration is read, the configuration is put into a memory, and Schema of an output table is built;
step S506, reading the complex original data into the memory, and converting the complex original data into json format.
And step S508, submitting the task to an executor, and calculating under a big data calculation framework.
And according to the submitted task, a Parser is used for parsing data, and a Json format data in a data source is extracted and converted into a data format matched with a target table by using a JsonPath path or a custom Parser method. And outputting the data to the HDFS directory corresponding to the Hive table, and constructing a corresponding date partition.
By the embodiment, Hbase data stored in a PB format and data output to a Hive table are connected in a configurable mode of Json format, and therefore coding development cost and maintenance cost are greatly reduced.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
According to another aspect of the embodiments of the present application, there is provided a data processing apparatus for implementing the data processing method in the above embodiments. Optionally, the apparatus is used to implement the above embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 6 is a block diagram of an alternative data processing apparatus according to an embodiment of the present application, as shown in fig. 6, the apparatus including:
(1) a first reading unit 62, configured to read source data in a source data format from a source database, where the source database stores service data of multiple services, and the source data is service data corresponding to a target service in the multiple services;
(2) a conversion unit 64, connected to the first reading unit 62, for converting the source data into intermediate data in an intermediate data format, where the intermediate data format is a data format with a hierarchical relationship;
(3) an extracting unit 66 connected to the converting unit 64, configured to analyze the intermediate data and extract target field information of a target field of the intermediate data;
(4) the assembling unit 68 is connected with the extracting unit 66 and is used for assembling the target field information into target data in a target data format;
(5) and the saving unit 610 is connected with the assembling unit 68 and used for saving the target data into a target data table, wherein the target data table is used for saving the data in the target data format.
Alternatively, the first reading unit 62 may be used in step S202 in the above-described embodiment, the converting unit 64 may be used in step S204 in the above-described embodiment, the extracting unit 66 may be used in step S206 in the above-described embodiment, the assembling unit 68 may be used in step S208 in the above-described embodiment, and the saving unit 610 may be used in step S210 in the above-described embodiment.
By the embodiment, the mode of performing data conversion by using the intermediate data format with the hierarchical relationship is adopted, and the intermediate data format (for example, the JSON format) which can be conveniently converted with other data formats is adopted, so that the problem that the service development efficiency is very low due to complex operation and easy error in the data extraction mode in the related technology is solved, the data conversion operation is simplified, the error in the conversion operation is avoided, the service development cost and the maintenance cost are reduced, and the service development efficiency is improved.
As an alternative embodiment, the apparatus further comprises:
(1) a second reading unit, configured to read configuration information before reading the source data in the source data format from the source database, where the configuration information includes: path information for representing a path of the target field in the intermediate data format;
(2) a first constructing unit, configured to construct, according to the path information, a parser corresponding to the intermediate data format, where the parser is configured to extract the target field information from the intermediate data.
As an optional embodiment, the configuration information further includes data table information used for representing the target data table, and the apparatus further includes:
(1) a second constructing unit, configured to construct the target data table of a target data table mode according to the data table information after the configuration information is read, where the target data table mode corresponds to the target data format.
As an alternative embodiment, the extraction unit 66 includes:
(1) and the extraction module is used for analyzing the intermediate data by using the analyzer and extracting the target field information of the target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
According to yet another aspect of embodiments herein, there is provided a computer-readable storage medium. Optionally, the storage medium has a computer program stored therein, where the computer program is configured to execute the steps in any one of the methods provided in the embodiments of the present application when the computer program is executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, reading source data in a source data format from a source database, wherein the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service of the plurality of services;
s2, converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship;
s3, analyzing the intermediate data, and extracting the target field information of the target field of the intermediate data;
s4, assembling the target field information into target data in a target data format;
and S5, storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a variety of media that can store computer programs, such as a usb disk, a ROM (Read-only Memory), a RAM (Random Access Memory), a removable hard disk, a magnetic disk, or an optical disk.
According to still another aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor (which may be the processor 102 in fig. 1) and a memory (which may be the memory 104 in fig. 1) having a computer program stored therein, the processor being configured to execute the computer program to perform the steps of any of the above methods provided in embodiments of the present application.
Optionally, the electronic apparatus may further include a transmission device (the transmission device may be the transmission device 106 in fig. 1) and an input/output device (the input/output device may be the input/output device 108 in fig. 1), wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, reading source data in a source data format from a source database, wherein the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service of the plurality of services;
s2, converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship;
s3, analyzing the intermediate data, and extracting the target field information of the target field of the intermediate data;
s4, assembling the target field information into target data in a target data format;
and S5, storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
Optionally, for an optional example in this embodiment, reference may be made to the examples described in the above embodiment and optional implementation, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A data processing method, comprising:
reading source data in a source data format from a source database, wherein service data of a plurality of services are stored in the source database, and the source data are service data corresponding to a target service in the plurality of services;
converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship;
analyzing the intermediate data, and extracting target field information of a target field of the intermediate data;
assembling the target field information into target data in a target data format;
and storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
2. The method of claim 1, wherein prior to reading the source data in the source data format from the source database, the method further comprises:
reading configuration information, wherein the configuration information comprises: path information for representing a path of the target field in the intermediate data format;
and constructing a parser corresponding to the intermediate data format according to the path information, wherein the parser is used for extracting the target field information from the intermediate data.
3. The method of claim 2, wherein the configuration information further includes data table information for representing the target data table, and after the reading the configuration information, the method further comprises:
and constructing the target data table of a target data table mode according to the data table information, wherein the target data table mode corresponds to the target data format.
4. The method according to claim 2 or 3, wherein the parsing the intermediate data, and the extracting the target field information of the target field of the intermediate data comprises:
and analyzing the intermediate data by using the analyzer, and extracting target field information of a target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
5. A data apparatus, comprising:
the system comprises a first reading unit, a second reading unit and a third reading unit, wherein the first reading unit is used for reading source data in a source data format from a source database, the source database stores service data of a plurality of services, and the source data is service data corresponding to a target service in the plurality of services;
the conversion unit is used for converting the source data into intermediate data in an intermediate data format, wherein the intermediate data format is a data format with a hierarchical relationship;
the extraction unit is used for analyzing the intermediate data and extracting target field information of a target field of the intermediate data;
the assembling unit is used for assembling the target field information into target data in a target data format;
and the storage unit is used for storing the target data into a target data table, wherein the target data table is used for storing the data in the target data format.
6. The apparatus of claim 5, further comprising:
a second reading unit, configured to read configuration information before reading the source data in the source data format from the source database, where the configuration information includes: path information for representing a path of the target field in the intermediate data format;
a first constructing unit, configured to construct, according to the path information, a parser corresponding to the intermediate data format, where the parser is configured to extract the target field information from the intermediate data.
7. The apparatus of claim 6, wherein the configuration information further comprises data table information representing the target data table, the apparatus further comprising:
a second constructing unit, configured to construct the target data table of a target data table mode according to the data table information after the configuration information is read, where the target data table mode corresponds to the target data format.
8. The apparatus according to claim 6 or 7, wherein the extraction unit comprises:
and analyzing the intermediate data by using the analyzer, and extracting target field information of a target field of the intermediate data according to a JSON path, wherein the intermediate data format is a JSON format, and the JSON path is a path of the target field corresponding to the target service in the JSON format.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 4 when executed.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 4 by means of the computer program.
CN202010062392.XA 2020-01-19 2020-01-19 Data processing method and apparatus, storage medium, and electronic apparatus Pending CN111241182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010062392.XA CN111241182A (en) 2020-01-19 2020-01-19 Data processing method and apparatus, storage medium, and electronic apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010062392.XA CN111241182A (en) 2020-01-19 2020-01-19 Data processing method and apparatus, storage medium, and electronic apparatus

Publications (1)

Publication Number Publication Date
CN111241182A true CN111241182A (en) 2020-06-05

Family

ID=70878138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010062392.XA Pending CN111241182A (en) 2020-01-19 2020-01-19 Data processing method and apparatus, storage medium, and electronic apparatus

Country Status (1)

Country Link
CN (1) CN111241182A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131291A (en) * 2020-09-11 2020-12-25 重庆誉存大数据科技有限公司 JSON data-based structured analysis method, device, equipment and storage medium
CN112506948A (en) * 2020-12-03 2021-03-16 中国人寿保险股份有限公司 Index query method of service information and related equipment
CN112733199A (en) * 2020-12-28 2021-04-30 北京极豪科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112860777A (en) * 2021-03-22 2021-05-28 深圳市腾讯信息技术有限公司 Data processing method, device and equipment
CN112965962A (en) * 2021-02-03 2021-06-15 北京中煤时代科技发展有限公司 Industry website data conversion method and device and industry website
CN113065029A (en) * 2021-03-24 2021-07-02 北京达佳互联信息技术有限公司 Data stream processing method and device
CN113656445A (en) * 2021-08-26 2021-11-16 五八同城信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN117668090A (en) * 2024-02-01 2024-03-08 安徽容知日新科技股份有限公司 Data exchange method, data exchange device, electronic equipment and computer readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205117A (en) * 2015-09-09 2015-12-30 郑州悉知信息科技股份有限公司 Data table migrating method and device
CN106649788A (en) * 2016-12-28 2017-05-10 深圳启润德管理咨询有限公司 Database data transmission method and device
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN108763546A (en) * 2018-05-31 2018-11-06 北京五八信息技术有限公司 A kind of conversion method of data format, device, storage medium and terminal
CN110275913A (en) * 2019-04-25 2019-09-24 深圳壹账通智能科技有限公司 Data furnishing method, device and storage medium and electronic device
US20190318020A1 (en) * 2018-04-16 2019-10-17 Bank Of America Corporation Platform-independent intelligent data transformer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205117A (en) * 2015-09-09 2015-12-30 郑州悉知信息科技股份有限公司 Data table migrating method and device
CN106649788A (en) * 2016-12-28 2017-05-10 深圳启润德管理咨询有限公司 Database data transmission method and device
US20190318020A1 (en) * 2018-04-16 2019-10-17 Bank Of America Corporation Platform-independent intelligent data transformer
CN108681590A (en) * 2018-05-15 2018-10-19 普信恒业科技发展(北京)有限公司 Incremental data processing method and processing device, computer equipment, computer storage media
CN108763546A (en) * 2018-05-31 2018-11-06 北京五八信息技术有限公司 A kind of conversion method of data format, device, storage medium and terminal
CN110275913A (en) * 2019-04-25 2019-09-24 深圳壹账通智能科技有限公司 Data furnishing method, device and storage medium and electronic device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131291A (en) * 2020-09-11 2020-12-25 重庆誉存大数据科技有限公司 JSON data-based structured analysis method, device, equipment and storage medium
CN112131291B (en) * 2020-09-11 2023-12-15 重庆誉存大数据科技有限公司 Structured analysis method, device and equipment based on JSON data and storage medium
CN112506948A (en) * 2020-12-03 2021-03-16 中国人寿保险股份有限公司 Index query method of service information and related equipment
CN112733199A (en) * 2020-12-28 2021-04-30 北京极豪科技有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112965962A (en) * 2021-02-03 2021-06-15 北京中煤时代科技发展有限公司 Industry website data conversion method and device and industry website
CN112860777A (en) * 2021-03-22 2021-05-28 深圳市腾讯信息技术有限公司 Data processing method, device and equipment
CN112860777B (en) * 2021-03-22 2024-03-15 深圳市腾讯信息技术有限公司 Data processing method, device and equipment
CN113065029A (en) * 2021-03-24 2021-07-02 北京达佳互联信息技术有限公司 Data stream processing method and device
CN113065029B (en) * 2021-03-24 2024-02-06 北京达佳互联信息技术有限公司 Data stream processing method and device
CN113656445A (en) * 2021-08-26 2021-11-16 五八同城信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN117668090A (en) * 2024-02-01 2024-03-08 安徽容知日新科技股份有限公司 Data exchange method, data exchange device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN111241182A (en) Data processing method and apparatus, storage medium, and electronic apparatus
CN108920659B (en) Data processing system, data processing method thereof, and computer-readable storage medium
CN111552838B (en) Data processing method and device, computer equipment and storage medium
CN110704521A (en) Interface data access method and system
AU2017254506B2 (en) Method, apparatus, computing device and storage medium for data analyzing and processing
CN114417408B (en) Data processing method, device, equipment and storage medium
CN112162915A (en) Test data generation method, device, equipment and storage medium
CN109062906B (en) Translation method and device for program language resources
CN108885544B (en) Front-end page internationalized processing method, application server and computer-readable storage medium
CN114820080A (en) User grouping method, system, device and medium based on crowd circulation
CN114117190A (en) Data processing method, data processing device, storage medium and electronic equipment
CN112926008A (en) Method and device for generating form page, electronic equipment and storage medium
CN111159226A (en) Index query method and system
CN114968917A (en) Method and device for rapidly importing file data
CN113590100A (en) Front-end interface processing method, system, electronic equipment and storage medium
CN113360558A (en) Data processing method, data processing device, electronic device, and storage medium
CN112835901A (en) File storage method and device, computer equipment and computer readable storage medium
CN112131239A (en) Data processing method, computer equipment and readable storage medium
CN110750563A (en) Multi-model data processing method, system, device, electronic equipment and storage medium
CN113485693B (en) Interface configuration method, device, equipment and storage medium
CN111324434B (en) Configuration method, device and execution system of computing task
CN108563677B (en) Data display method and device, electronic equipment and storage medium
CN109992293A (en) The assemble method and device of android system complement version information
CN113778886B (en) Processing method and device for test cases
CN111079391B (en) Report generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination