CN113704320A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113704320A
CN113704320A CN202110909460.6A CN202110909460A CN113704320A CN 113704320 A CN113704320 A CN 113704320A CN 202110909460 A CN202110909460 A CN 202110909460A CN 113704320 A CN113704320 A CN 113704320A
Authority
CN
China
Prior art keywords
data
data source
mode
serialization
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110909460.6A
Other languages
Chinese (zh)
Other versions
CN113704320B (en
Inventor
樊庆
曹舰航
贾思超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202110909460.6A priority Critical patent/CN113704320B/en
Publication of CN113704320A publication Critical patent/CN113704320A/en
Application granted granted Critical
Publication of CN113704320B publication Critical patent/CN113704320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure discloses a data processing method, a data processing device, electronic equipment and a storage medium, and belongs to the technical field of computers. The data processing method includes receiving a mode registration request for a first data source; the mode registration request comprises a data source identification corresponding to the first data source; acquiring a data mode corresponding to a first data source according to the mode registration request; describing a data mode according to a preset description form to generate first mode information; storing first mode information to a mode information table according to the data source identification so as to register the first data source; wherein the first mode information is used to process data associated with the first data source. By adopting the data processing method, the data processing device, the electronic equipment and the storage medium, the problem that the existing data management and processing cost of the heterogeneous data source is high is at least solved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
The heterogeneous data source refers to a plurality of data sources with different data structures, access modes and forms in a broad sense. When data synchronization is performed between two heterogeneous data sources with different data patterns, for example, data in the data source a is synchronized to the data source B, a manager of the data source B, that is, a data consumer of the data source a needs to acquire the data pattern of the data source a, so that the data output by the data source a can be analyzed according to the data pattern, and thus the data consumer can always understand the data synchronized by the data source a.
In the prior art, because the data modes corresponding to different data sources are different, a data consumer needs to unify data output by different data sources respectively by adopting different data standardization steps in order to uniformly process the data output by different data sources in the later period, so that the data management and processing cost is high.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a data processing method, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem that the existing data management and processing for heterogeneous data sources is high in cost.
The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a data processing method, which may include:
receiving a mode registration request for a first data source; the mode registration request comprises a data source identification corresponding to the first data source;
acquiring a data mode corresponding to a first data source according to the mode registration request;
describing a data mode according to a preset description form to generate first mode information;
storing first mode information to a mode information table according to the data source identification so as to register the first data source; wherein the first mode information is used to process data associated with the first data source.
In one embodiment, the mode registration request includes a type of data, describes a data mode according to a preset description form, and generates first mode information, including:
under the condition that the type of the data is serialized data, describing a serialization mode of a first data source according to a preset description form, and generating first mode information;
and under the condition that the type of the data is non-serialized data, describing a table structure of a first data source according to a preset description form, and generating first mode information.
Based on this, in one embodiment, in the case that the type of the data is serialized data, the schema registration request further includes a serialization manner and an associated data source identifier corresponding to the first data source;
acquiring a data mode corresponding to a first data source according to the mode registration request, wherein the data mode comprises the following steps:
determining a second data source related to the first data source according to the related data source identification, wherein the data mode of the second data source is different from that of the first data source;
acquiring second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to a preset description form;
and generating a serialization mode corresponding to the first data source according to the second mode information in a serialization mode.
In one embodiment, in a case that the type of the data is serialized data, after acquiring a data pattern corresponding to the first data source according to the pattern registration request, the method further includes:
and storing the serialization mode corresponding to the first data source into a serialization mode table according to the data source identification.
Based on this, in one embodiment, after storing the serialization pattern corresponding to the first data source into the serialization pattern table according to the data source identifier, the method further includes:
receiving a mode acquisition request of a requester for a first data source; the mode acquisition request comprises a data source identifier corresponding to a first data source;
in response to the mode acquisition request, acquiring a serialization mode corresponding to the data source identification from the serialization mode table;
a serialization schema is transmitted to the requestor that can be used to process data associated with the first data source.
In addition, in one embodiment, after storing the first mode information to the mode information table according to the data source identifier, the method further includes:
receiving a data query request of a requester for a first data source; the data query request comprises a data source identifier corresponding to a first data source;
responding to a data query request, and acquiring first mode information corresponding to the data source identification from a mode information table;
first schema information is sent to the requestor, the first schema information capable of being used to query a data structure of the first data source.
In one embodiment, after storing the first mode information in the mode information table according to the data source identifier, the method further includes:
receiving a schema update request for a first data source;
responding to the mode updating request, describing the updated data mode of the first data source according to a preset description form, and generating third mode information;
according to the first mode information and the third mode information, compatibility check is carried out on the mode updating;
under the condition that the mode updating is determined to be compatible according to the checking result, updating the stored mode information corresponding to the first data source according to the third mode information;
and refusing to update the stored mode information corresponding to the first data source under the condition that the mode update is determined to be incompatible according to the checking result.
Based thereon, in one embodiment, the schema registration request further includes a type of compatibility check corresponding to the first data source;
according to the first mode information and the third mode information, compatibility check is carried out on the current mode updating, and the method comprises the following steps:
acquiring the type of compatibility check corresponding to the first data source;
comparing the first mode information with the third mode information to obtain updated information;
and according to the type of the compatibility check, performing the compatibility check on the mode update according to the update information.
In one embodiment, the types of compatibility checks include forward compatible, backward compatible, or fully compatible;
according to the type of compatibility check, performing compatibility check on the current mode update according to the update information, wherein the compatibility check comprises the following steps:
under the condition that the type of the compatibility check is forward compatibility, if the updating information is to add fields or delete optional fields, the mode is determined to be updated and compatible; if the updating information is not to add fields or delete optional fields, determining that the mode updating is incompatible;
under the condition that the type of the compatibility check is backward compatibility, if the updating information is that optional fields are added or fields are deleted, the mode is determined to be updated and compatible; if the updating information is not to add optional fields or delete fields, determining that the mode updating is not compatible;
under the condition that the types of the compatibility check are fully compatible, if the updating information is that optional fields are added or deleted, the mode updating compatibility is determined; and if the updating information is not to add or delete the optional field, determining that the mode updating is not compatible.
In addition, in one embodiment, after updating the stored mode information corresponding to the first data source according to the third mode information, the method further includes:
determining a third data source associated with the first data source according to the associated data source identifier corresponding to the first data source; the third data source is a registered data source used for storing target data, and the target data is data taken out from the first data source;
acquiring registration information of a third data source;
generating a table structure update statement corresponding to the third data source according to the first mode information and the third mode information under the condition that the non-serialized data stored in the third data source is determined according to the registration information;
and sending a table structure updating statement to the third data source, wherein the table structure updating statement is used for indicating to update the table structure of the third data source.
According to a second aspect of embodiments of the present disclosure, there is provided a data processing apparatus, which may include:
a registration request receiving module configured to perform receiving a mode registration request for a first data source; the mode registration request comprises a data source identification corresponding to the first data source;
the data mode acquisition module is configured to execute acquisition of a data mode corresponding to the first data source according to the mode registration request;
the data pattern description module is configured to execute description of a data pattern according to a preset description form and generate first pattern information;
the mode information registration module is configured to store first mode information into a mode information table according to the data source identification so as to realize registration of the first data source; wherein the first mode information is used to process data associated with the first data source.
In one embodiment, the schema registration request includes a type of data, and the data schema description module includes:
the first description submodule is configured to execute a serialization mode for describing a first data source according to a preset description form under the condition that the type of the data is serialization data, and generate first mode information;
and the second description submodule is configured to perform description on a table structure of the first data source according to a preset description form and generate first mode information under the condition that the type of the data is non-serialized data.
Based on this, in one embodiment, in the case that the type of the data is serialized data, the schema registration request further includes a serialization manner and an associated data source identifier corresponding to the first data source;
a data pattern acquisition module comprising:
the data source determining submodule is configured to determine a second data source associated with the first data source according to the associated data source identification, and the data mode of the second data source is different from that of the first data source;
the mode acquisition submodule is configured to acquire second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to a preset description form;
and the pattern generation submodule is configured to execute the generation of the serialization pattern corresponding to the first data source according to the second pattern information in the serialization manner.
In one embodiment, in the case that the type of data is serialized data, the apparatus further includes:
and the serialization pattern storage module is configured to store the serialization pattern corresponding to the first data source into the serialization pattern table according to the data source identification after acquiring the data pattern corresponding to the first data source according to the pattern registration request.
Based on this, in one of the embodiments, the apparatus further comprises:
the acquisition request receiving module is configured to execute mode acquisition requests of a requester for the first data source after storing the serialization mode corresponding to the first data source into the serialization mode table according to the data source identification; the mode acquisition request comprises a data source identifier corresponding to a first data source;
a serialization pattern acquisition module configured to execute acquiring a serialization pattern corresponding to the data source identification from the serialization pattern table in response to the pattern acquisition request;
a serialization pattern transmission module configured to perform transmission of a serialization pattern to a requestor, the serialization pattern usable for processing data associated with a first data source.
Additionally, in one embodiment, the apparatus further comprises:
the query request receiving module is configured to receive a data query request of a requester for a first data source after storing first mode information into a mode information table according to the data source identification; the data query request comprises a data source identifier corresponding to a first data source;
the mode information acquisition module is configured to execute acquisition of first mode information corresponding to the data source identification from a mode information table in response to a data query request;
a mode information sending module configured to perform sending first mode information to the requester, the first mode information being usable to query a data structure of the first data source.
On the basis of the above embodiments, in one embodiment, the apparatus further includes:
an update request receiving module configured to perform receiving a mode update request for a first data source after storing first mode information to a mode information table according to a data source identification;
the mode updating description module is configured to execute description on the updated data mode of the first data source according to a preset description form in response to a mode updating request, and generate third mode information;
the updating compatibility checking module is configured to execute compatibility checking on the current mode updating according to the first mode information and the third mode information;
the mode updating and registering module is configured to update the stored mode information corresponding to the first data source according to the third mode information under the condition that the mode updating is determined to be compatible according to the checking result;
and the mode updating rejection module is configured to reject to update the stored mode information corresponding to the first data source under the condition that the mode updating is determined to be incompatible according to the checking result.
Based thereon, in one embodiment, the schema registration request further includes a type of compatibility check corresponding to the first data source;
an update compatibility check module comprising:
a type acquisition submodule configured to perform acquisition of a type of compatibility check corresponding to the first data source;
the information acquisition submodule is configured to compare the first mode information with the third mode information and acquire updated information;
and the compatibility check submodule is configured to execute compatibility check on the current mode update according to the type of the compatibility check and the update information.
Based on this, in one embodiment, the types of compatibility checks include forward compatible, backward compatible, or fully compatible;
a compatibility check submodule, comprising:
the first checking unit is configured to execute, under the condition that the type of the compatibility check is forward compatibility, if the update information is to add a field or delete an optional field, determining that the current mode update is compatible; if the updating information is not to add fields or delete optional fields, determining that the mode updating is incompatible;
the second checking unit is configured to execute, under the condition that the type of the compatibility check is backward compatible, if the update information is to add an optional field or delete a field, determining that the current mode update is compatible; if the updating information is not to add optional fields or delete fields, determining that the mode updating is not compatible;
the third checking unit is configured to execute, under the condition that the type of the compatibility check is fully compatible, if the update information is to add or delete the optional field, determining that the current mode update is compatible; and if the updating information is not to add or delete the optional field, determining that the mode updating is not compatible.
Additionally, in one embodiment, the apparatus further comprises:
the data source determination module is configured to determine a third data source associated with the first data source according to the associated data source identifier corresponding to the first data source after updating the stored mode information corresponding to the first data source according to the third mode information; the third data source is a registered data source used for storing target data, and the target data is data taken out from the first data source;
a registration information acquisition module configured to perform acquisition of registration information of a third data source;
an update statement generation module configured to execute, in a case where it is determined that non-serialized data is stored in the third data source according to the registration information, generating a table structure update statement corresponding to the third data source according to the first pattern information and the third pattern information;
and the updating statement sending module is configured to send the table structure updating statement to the third data source, wherein the table structure updating statement is used for indicating that the table structure of the third data source is updated.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, which may include:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement a data processing method as shown in any embodiment of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, in which instructions are executed by a processor of a data processing apparatus to cause the data processing apparatus to implement the data processing method as shown in any one of the embodiments of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, such that the device performs the data processing method shown in any one of the embodiments of the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the embodiment of the disclosure provides the mode registration service for the plurality of heterogeneous data sources, that is, in the process of registering the first data source in the plurality of heterogeneous data sources, the data mode corresponding to the first data source is described according to the preset description form to generate the first mode information, and then the first mode information is stored in the mode information table to realize the registration of the first data source, so that when data associated with the first data source needs to be processed, since the first data source generates the common mode information in the unified description form as the other heterogeneous data sources when registering, for a data consumer of the first data source, the data of the first data source and the data of the other heterogeneous data sources can be processed by the common mode information without unifying data output by different data sources by respectively adopting different data standardization steps, the disclosed embodiments thus reduce the cost of data management and processing for heterogeneous data sources.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is an architecture diagram illustrating one type of data processing according to an exemplary embodiment;
FIG. 2 is a diagram illustrating a data synchronization scenario based on heterogeneous data sources in accordance with an illustrative embodiment;
FIG. 3 is a schematic diagram illustrating another data synchronization scenario based on heterogeneous data sources in accordance with an illustrative embodiment;
FIG. 4 is a flow chart illustrating a method of data processing in accordance with an exemplary embodiment;
FIG. 5 is a flow diagram illustrating another data processing method in accordance with an exemplary embodiment;
FIG. 6 is a flow chart illustrating yet another method of data processing in accordance with an exemplary embodiment;
FIG. 7 is a flow chart illustrating yet another method of data processing in accordance with an exemplary embodiment;
FIG. 8 is a schematic diagram illustrating an application scenario of a data processing method according to an exemplary embodiment;
FIG. 9 is a schematic diagram illustrating an application scenario of another data processing method in accordance with an illustrative embodiment;
FIG. 10 is a block diagram illustrating the structure of a data processing apparatus according to an exemplary embodiment;
fig. 11 is a block diagram illustrating a configuration of an electronic device according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The data processing method provided by the present disclosure may be applied to the architecture shown in fig. 1, and is specifically described in detail with reference to fig. 1.
FIG. 1 is an architectural diagram illustrating one type of data processing according to an exemplary embodiment.
As shown in fig. 1, the architecture diagram may include at least a first electronic device 10 including a data source and a second electronic device 11 including a pattern registration service. The second electronic device 11 may establish a connection with at least one first electronic device 10 through a network and perform information interaction, or a plurality of first electronic devices 10 may establish a connection through a network and perform data transmission. The first electronic device 10 may be a device with a communication function, such as a mobile phone, a tablet computer, an all-in-one machine, or a device simulated by a virtual machine or a simulator, or a device with a storage and computing function, such as a cloud server or a server cluster, and the second electronic device 11 may be a device with a storage and computing function, such as a cloud server or a server cluster.
Here, a plurality of heterogeneous data sources having different data patterns may exist in different first electronic devices 10, respectively, and each data source may perform pattern registration in a second electronic device 11 including a pattern registration service through a network. The upstream data source and the downstream data source can be distinguished according to the transmission direction of data between the data sources. For example, if the data source a sends data to the data source B, the data source a is an upstream data source with respect to the data source B, and the data source B is a downstream data source with respect to the data source a; if the data source B sends data to the data source C, the data source B is an upstream data source relative to the data source C, and the data source C is a downstream data source relative to the data source B.
When data synchronization is performed among a plurality of heterogeneous data sources, a downstream data source needs to acquire a data mode of an upstream data source, so that data transmitted by the upstream data source can be analyzed, and the downstream data source can always understand the data transmitted by the upstream data source. In the prior art, due to different corresponding data modes between heterogeneous data sources, a downstream data source needs to unify data by respectively adopting different data standardization steps for data synchronized by different data sources, so as to perform unified processing when using the data synchronized by different data sources, which results in higher data management and processing costs.
In view of the above problems, the embodiment of the present disclosure provides a pattern registration service for heterogeneous data sources, and describes the data patterns of each registered data source in a unified manner to generate common pattern information, so that when data associated with a registered data source needs to be processed, for example, when data synchronization is performed between data sources, for a data source downstream of the registered data source, complex data standardization steps may not be required to be performed, and data synchronized from multiple upstream heterogeneous data sources may be processed using the common pattern information, so that the costs for data management and processing of the heterogeneous data sources may be reduced.
In addition, the data query service can be provided by using the general mode information. In this way, the querier can query the data structure contained in the data source, such as the field name and the data type included in the data source, through the general schema information corresponding to the data source without parsing the data in the data source.
Meanwhile, after the data mode is updated by the upstream data source, the updated data mode can be described according to a uniform description form, new mode information is generated, compatibility check is carried out on the new mode information and the old mode information, and the mode updating registration of the upstream data source is allowed under the condition that compatibility is determined, so that the downstream data source can analyze all data from the upstream data source, the change of damaging the compatibility is prevented, and a safe mode evolution strategy is provided.
Based on the above architecture, the embodiments of the present disclosure may be applied to a data synchronization scenario based on heterogeneous data sources.
One situation is to synchronize data of a data source of serialized data to a data source of non-serialized data. Illustratively, as shown in fig. 2, a data source 21 of serialized data, a data source 22 of non-serialized data, and a schema registration service 23 are included, wherein data stored in the data source 21 of serialized data can be synchronized to the data source 22 of non-serialized data. In this case, at least the data source 21 of the serialized data may be pattern-registered in the pattern registration service 23, so that the serialized pattern registered by the data source 21 of the serialized data is acquired from the pattern registration service 23 before the data is stored in the data source 22 of the non-serialized data, and the serialized data acquired from the data source 21 of the serialized data is deserialized according to the serialized pattern, and the deserialized data is acquired and stored in the data source 22 of the non-serialized data.
In addition, when the data source 21 of the serialized data is registered, the serialized mode can be described according to a preset description form to obtain corresponding mode information, and if data query needs to be subsequently performed on the data source 21 of the serialized data, the corresponding mode information can be obtained from the mode registration service 23, so that the data structure of the data source 21 of the serialized data is queried without analyzing the data according to the mode information.
Alternatively, the data in the data source of non-serialized data is synchronized to the data source of serialized data. Illustratively, as shown in fig. 3, a data source 31 of non-serialized data, a data source 32 of serialized data, and a schema registration service 33 are included, wherein data stored in the data source 31 of non-serialized data can be synchronized to the data source 32 of serialized data. In this case, the data source 31 of the non-serialized data and the data source 32 of the serialized data may be pattern-registered in the pattern registration service 33, the table structure definition of the data source 31 of the non-serialized data is described in a preset description format, corresponding pattern information is generated, and then, in the process of registering the data source 32 of the serialized data, a serialization pattern is automatically generated from the pattern information corresponding to the data source 31 of the non-serialized data in accordance with the serialization manner corresponding to the data source 32 of the serialized data, so that, when synchronizing the data in the data source 31 of the non-serialized data to the data source 32 of the serialized data, the data taken out of the data source 31 of the non-serialized data is serialized in accordance with the serialization pattern, and the serialized data is obtained and then synchronized to the data source 32 of the serialized data.
Furthermore, if a data query needs to be subsequently performed on the data source 31 of non-serialized data or the data source 32 of serialized data, the corresponding schema information can be obtained from the schema registration service 33, so as to query the data structure of the data source 31 of non-serialized data or the data source 32 of serialized data according to the schema information.
Therefore, on one hand, the embodiment of the disclosure can deal with the challenges of heterogeneous data sources in data management, that is, the mode registration service provides a common service, and common mode information can be used in different data sources to describe respective data modes, thereby greatly simplifying the cost of data management and processing. For example, all applications may use uniform "web click event" mode information. The upstream application need only generate events in a default format, and downstream data processing no longer requires complex standardization steps. On the other hand, the embodiment of the disclosure can also help data discovery and improve data query efficiency, that is, a data researcher or developer can check the data structure in the data source through the data query function without depending on a data producer.
According to the above architecture and application scenarios, the data processing method provided by the embodiment of the present disclosure is described in detail below with reference to fig. 4 to 9, and the data processing method can be executed by the second electronic device 11 including the mode registration service shown in fig. 1.
FIG. 4 is a flow chart illustrating a method of data processing according to an exemplary embodiment.
As shown in fig. 4, the data processing method may specifically include the following steps 410-440.
Step 410, receiving a mode registration request for a first data source; wherein the pattern registration request includes a data source identification corresponding to the first data source.
Step 420, obtaining a data pattern corresponding to the first data source according to the pattern registration request.
Step 430, describing the data mode according to a preset description form, and generating first mode information.
In one embodiment, the preset description form may be used to describe a schema format of the data schema, for example, to describe field information included in the data schema.
Step 440, storing the first mode information to a mode information table according to the data source identifier, so as to register the first data source; wherein the first mode information is used to process data associated with the first data source.
Therefore, the mode registration service is provided for the plurality of heterogeneous data sources, that is, in the process of registering a first data source in the plurality of heterogeneous data sources, the data mode corresponding to the first data source is described according to the preset description form to generate the first mode information, and the first mode information is stored in the mode information table to realize the registration of the first data source, so that when the data associated with the first data source needs to be processed, because the first data source generates the common mode information in the uniform description form as the other heterogeneous data sources when registering, the data of the first data source and the data of the other heterogeneous data sources can be processed by the common mode information for the data consumer of the first data source without unifying the data output by the different data sources by different data standardization steps, the disclosed embodiments thus reduce the cost of data management and processing for heterogeneous data sources.
The above steps are described in detail below, specifically as follows.
Referring to step 410, the embodiment of the present disclosure is mainly directed to a scenario where data transmission is performed between multiple heterogeneous data sources with different data modes, where for a data source of serialized data, the data mode may include a serialization mode, the serialization mode may be used to perform serialization or deserialization processing on the data, and different serialization modes correspond to different serialization modes. Here, the sequencing means includes, but is not limited to, protobuf, avro, json, etc. For data sources of non-serialized data, the data schema may include table structures, such as field names, types, and the like.
Specifically, the first data source may be any one of a plurality of heterogeneous data sources, and the first data source may be a data source storing serialized data or a data source storing non-serialized data. For example, a producer of data stored in a first data source may send a pattern registration request for the first data source to a second electronic device in which the pattern registration service is located, and thus, the second electronic device may receive the pattern registration request for the first data source.
For example, in a case that serialized data is stored in the first data source, the received pattern registration request may carry information such as a data source type, a data source identifier, and a serialization manner of the first data source; when the first data source stores non-serialized data, the received pattern registration request may carry information such as the data source type, data source identifier, and table structure of the first data source. Wherein, the data source of the serialized data can be redis, kafka and the like, and the data source of the non-serialized data can be mysql, hive and the like.
In one embodiment, the schema registration request may further include an associated data source identification corresponding to the first data source, the associated data source identification being usable to retrieve an associated data source corresponding to the first data source. Specifically, when the second electronic device 11 in fig. 1 receives a mode registration request for the first data source, it may be determined, according to the associated data source identifier included in the mode registration request, that the mode information of the associated data source corresponding to the first data source is obtained when the associated data source corresponding to the first data source is registered, and the first mode information of the first data source is generated according to the mode information of the associated data source corresponding to the first data source.
Referring to step 420 and step 430, for data sources storing different types of data, the data patterns may be obtained in different manners, and the manner of generating the first pattern information may also be different. Here, the data pattern may be acquired from information carried in the pattern registration request, or the data pattern may be acquired from another address based on address information carried in the pattern registration request, or the data pattern may be generated based on pattern information of another data source that is registered in association with the data pattern. In addition, the preset description form may be described by using avro uniformly, for example, and the correspondingly generated first mode information may include information such as column name, type, alias, and the like, for describing the data mode of the first data source.
In an optional implementation manner, the mode registration request may include a type of data, that is, a type of data stored in the first data source, and includes serialized data or non-serialized data, and the step 430 may specifically include:
under the condition that the type of the data is serialized data, describing a serialization mode of a first data source according to a preset description form, and generating first mode information;
and under the condition that the type of the data is non-serialized data, describing a table structure of a first data source according to a preset description form, and generating first mode information.
Here, the serialization schema and the table structure may be used to describe data, for example, to explain that the data included in the first data source is zhang and its corresponding telephone number, lie and its corresponding telephone number, etc., and the first schema information may be used to describe the serialization schema or the table structure, for example, to explain that the first data source includes fields such as name, telephone, etc., and information such as field type, alias, etc., corresponding to these fields.
In this way, the advantage of describing the serialization mode or the table structure of the first data source by using the preset description form is that the serialization mode or the table structure is described by using a uniform format, so that even if a downstream data source aims at a plurality of upstream heterogeneous data sources with different data modes, the downstream data source can understand the data of different data sources by using common mode information, that is, the data of the plurality of upstream heterogeneous data sources including the first data source can be uniformly processed without complex standardization steps, and the cost of data management and processing is reduced.
In addition, in the case that the type of the data in the first data source is serialized data, the serialization mode of the first data source can be obtained through the following three ways: one mode is that the registration applicant inputs the serialization mode through text during registration and sends the serialization mode to the second electronic equipment corresponding to the mode registration service together with the mode registration request; one way is to pull the serialized mode from the other storage system to which the address information points, such as the git system, through the address information carried in the mode registration request; yet another approach is to generate a serialization schema based on schema information of other registered heterogeneous data sources.
For the last manner, in a case that the type of the data in the first data source is serialized data, in an optional implementation, the mode registration request may further include a serialization manner and an associated data source identifier corresponding to the first data source. Correspondingly, the step 420 may specifically include:
determining a second data source related to the first data source according to the related data source identification, wherein the data mode of the second data source is different from that of the first data source;
acquiring second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to a preset description form;
and generating a serialization mode corresponding to the first data source according to the second mode information in a serialization mode.
Here, the associated data source identifier may be a data source identifier corresponding to a data source associated with the first data source, where the data source associated with the first data source, that is, the second data source, may be an upstream data source or a downstream data source of the first data source. Additionally, the second data source may be a different type of data than the first data source, such as a data source of non-serialized data.
For example, before registering the first data source, the second data source may be registered first, second mode information corresponding to the second data source is generated, and then a corresponding serialization mode is generated according to the second mode information and the serialization manner corresponding to the first data source. In addition, the first data source may be registered first, after a registration request for requesting registration of the second data source is received, the first data source associated with the second data source may be determined according to an associated data source identifier included in the registration request, and then, after the data pattern corresponding to the second data source is described according to a preset description format and the second pattern information is generated, in a case where the first data source is determined to be a registered data source, the serialization mode corresponding to the first data source is obtained, and the serialization mode corresponding to the first data source is generated according to the second pattern information according to the serialization mode.
Different data sources in the data sources of the serialized data may have different serialization modes, and specifically, the serialization modes may include protobuf, avro, json, and the like.
In a specific example, first, an identifier of a data source related to a first data source, for example, an identifier corresponding to an upstream data source or a downstream data source of the first data source, input by a user is received, and then, it is determined whether a second data source corresponding to the identifier is a registered data source according to the identifier, and if the second data source is the registered data source, if the serialization manner corresponding to the first data source is json, a serialization pattern of the json type may be generated according to second pattern information corresponding to the second data source. Here, the second schema information is information obtained by describing the data schema corresponding to the second data source in a general description form, and therefore the second schema information can be used as a bridge for mutual conversion between different data schemas corresponding to the first data source and the second data source, so as to generate the data schema required by the first data source. Specifically, when the serialization manner corresponding to the first data source is json, when the second schema information includes fields such as names and telephones and types corresponding to the fields, a serialization schema of the json type can be generated according to the second schema information, and the serialization schema includes information such as specific names and telephones corresponding to the names.
The method has the advantages that the serialization mode corresponding to the first data source is generated through the mode information of other registered heterogeneous data sources, automatic generation of the serialization mode can be achieved, developers do not need to manually write specific serialization modes in the data synchronization process between the heterogeneous data sources, the second data source is registered to generate general second mode information, and the serialization mode corresponding to the first data source can be automatically generated according to the second mode information when the first data source related to the second data source is registered.
When the type of the data in the first data source is non-serialized data, the pattern registration request may further include a table structure and an associated data source identifier corresponding to the first data source, where the table structure may serve as a data pattern corresponding to the first data source and be used to generate first pattern information, and the associated data source identifier may be used to update or register the data pattern of the data source corresponding to the associated data source identifier after the data pattern of the first data source is updated or registered.
In addition, in another optional implementation manner, in a case that the type of the data in the first data source is serialized data, after step 420, the data processing method provided in the embodiment of the present disclosure may further include:
and storing the serialization mode corresponding to the first data source into a serialization mode table according to the data source identification.
For example, the data source identification and the serialization pattern may be stored in a serialization pattern table, and thus, the serialization pattern corresponding to the data source identification may be queried from the serialization pattern table according to the data source identification. Specifically, for a serialization schema of a protobuf type, a proto file needs to be compiled by using a protoc tool to generate binary descriptor contents before storage. In one embodiment, the data source identifier and the serialization mode are stored correspondingly, which means that the corresponding relation between the data source identifier and the serialization mode is stored; or storing the data source identification and the serialization mode into a table, wherein the data source identification and the serialization mode have a corresponding relation; the data source identification and the serialization mode can also be respectively stored in the two tables, and the data source identification and the serialization mode have a corresponding relationship.
Of course, only one pattern table may be set in the pattern registration service, and correspondingly, the first pattern information, the serialization pattern and the data source identifier may also be stored in the same pattern table, so that the serialization pattern and the first pattern information corresponding to the data source identifier may be queried from the pattern table according to the data source identifier, which is not limited herein.
In addition, the data mode corresponding to the data source of the non-serialized data is defined by a table structure, so that when the database is registered, the generated mode information and the data source identification can be correspondingly stored in a preset mode information table, and the mode information corresponding to the data source identification can be inquired from the mode information table according to the data source identification.
Therefore, the serialization mode corresponding to the first data source stored with the serialization data is stored in the serialization mode table, a data producer or a data consumer of the first data source can conveniently inquire and obtain the serialization mode corresponding to the first data source, the data producer can conveniently carry out serialization processing on the data to be stored in the first data source, and the data consumer can conveniently carry out deserialization processing on the data taken out from the first data source.
Referring to step 440, the data associated with the first data source may be, for example, data obtained from the first data source or data that needs to be synchronized to the first data source. In addition, the pattern registration service includes a pattern information table, and the generated pattern information may be stored in the pattern information table in a process of pattern registration of the data source, thereby completing a registration process of the data source.
Based on this, in a case that the serialization data is stored in the first data source, in a possible embodiment, as shown in fig. 5, after storing the serialization pattern corresponding to the first data source into the serialization pattern table according to the data source identifier, the data processing method provided in the embodiment of the present disclosure may further include: step 4501 to step 4503.
Step 4501, receiving a mode acquisition request of a requester for a first data source; the mode acquisition request comprises a data source identification corresponding to the first data source.
Here, the requester may be a downstream data source of the first data source, that is, a data consumer, and for example, when the requester acquires data from the first data source and needs to perform deserialization processing on serialized data output from the first data source, the requester may send a pattern acquisition request for acquiring a serialization pattern corresponding to the first data source to the second electronic device where the pattern registration service is located; in addition, the requester may also be an upstream data source of the first data source, that is, the data producer, for example, when the requester synchronizes data to the first data source and needs to perform serialization processing on the data synchronized from the database to the first data source, the requester may send a pattern acquisition request for acquiring a serialization pattern corresponding to the first data source to the second electronic device where the pattern registration service is located; moreover, the requester may also be a user who needs to view the data content of the first data source, for example, in a data query scenario, when detailed content of data in the first data source needs to be queried, a pattern acquisition request for acquiring a corresponding serialization pattern of the first data source may be sent to a second electronic device where the pattern registration service is located.
Step 4502, in response to the pattern retrieval request, retrieves a serialization pattern corresponding to the data source identification from the serialization pattern table.
Here, since the data source identification corresponding to the first data source is stored in correspondence with the serialization pattern in the serialization pattern table, the serialization pattern stored in correspondence with the data source identification may be acquired by looking up the serialization pattern table.
Step 4503, a serialization schema is sent to the requestor that can be used to process data associated with the first data source.
Here, the data associated with the first data source may be, for example, data acquired from the first data source, or data that needs to be synchronized to the first data source.
For example, the serialization schema can be sent to the requester so that the requester deserializes the serialized data obtained from the first data source or serializes the data synchronized from the database to the first data source according to the serialization schema.
Of course, the data synchronization service may be set on the second electronic device where the mode registration service is located, the serialized data is acquired from the first data source through the data synchronization service, the deserialization processing is performed on the data according to the serialization mode, and then the acquired deserialization data is sent to the requester, or the data to be synchronized is acquired from the database through the data synchronization service, and the serialized data is sent to the requester after the serialization processing is performed on the data according to the serialization mode.
In this way, by performing the pattern registration on the first data source, when the requester needs to process the data associated with the first data source, the serialization pattern corresponding to the data source identifier can be directly obtained from the serialization pattern table, and then the data is subjected to serialization or deserialization according to the serialization pattern, so that data standardization processing is not needed, the data processing process is simplified, and the data management and processing cost is reduced.
In another possible embodiment, as shown in fig. 6, in addition to the step 410 and the step 440, after the step 440, the data processing method provided in the embodiment of the present disclosure may further include: step 4601 to step 4603.
Step 4601, receiving a data query request of a requester for a first data source; the data query request comprises a data source identification corresponding to the first data source.
Step 4602, in response to the data query request, obtain the first mode information corresponding to the data source identifier from the mode information table.
Step 4603, send first mode information to the requestor, the first mode information being capable of being used to query the data structure of the first data source.
Here, while the second electronic device provides the mode registration service, a data query service may also be provided to provide corresponding mode information when the user needs to query data in the registered data source.
For example, when a user queries a data structure in a first data source, a data query request may be sent to a device where a data query service is located, and after receiving the request, the device searches for corresponding first mode information in a corresponding mode information table according to a carried data source identifier, and sends the corresponding first mode information to a requester, so that the requester queries the data structure of the first data source according to the first mode information. The data structure may include, among other things, column names, types, aliases, etc.
Of course, after the first mode information is acquired, the data structure of the first data source may be determined according to the first mode information through the data query service, and the data structure may be sent to the requester.
Therefore, the mode information can reflect the data structure of the data source, developers can check the data structure in the data source through the data query function without depending on a data producer through the data query service, data discovery is facilitated, and the efficiency of data collaborative development is improved.
In addition, on the basis of the foregoing embodiments, in a possible embodiment, as shown in fig. 7, in addition to the foregoing steps 410 and 440, after the step 440, the data processing method provided in the embodiment of the present disclosure may further include: step 4701 to step 4704.
Step 4701, receiving a schema update request for a first data source;
step 4702, responding to the mode updating request, describing the updated data mode of the first data source according to a preset description form, and generating third mode information;
step 4703, according to the first mode information and the third mode information, performing compatibility check on the current mode update;
step 4704, determining whether the mode updating is compatible according to the checking result, if yes, executing step 4705, and if not, executing step 4706;
step 4705, updating the stored mode information corresponding to the first data source according to the third mode information;
step 4706, refusing to update the stored schema information corresponding to the first data source.
For example, when the data pattern of the first data source changes, the mode update registration may be performed again on the first data source, for example, a new data pattern is input, and an update key is clicked, and a mode update request for the first data source may be sent to the second electronic device where the mode registration service is located. After receiving the mode updating request of the first data source, the second electronic device of the mode registration service may describe the updated data mode of the first data source according to a preset description form, and generate third mode information. And then performing compatibility check, such as forward compatibility, backward compatibility, and full compatibility check, on the first mode information and the third mode information. If incompatible modification is detected in the foregoing process, the mode evolution is not allowed, that is, updating of the mode information correspondingly stored by the first data source is rejected, otherwise, the mode information correspondingly stored by the first data source is updated, for example, the corresponding mode table is updated: and updating the updated serialization mode to the corresponding serialization mode table, and updating the third mode information to the corresponding mode information table.
In this way, downstream data consumers can be assured that upstream producers will not send them data that cannot be processed. In this case, the shared schema table may ensure that the user can deserialize all data from upstream writes, preventing changes that disrupt compatibility, thereby making the data pipeline more robust.
Based on this, in an optional implementation, the mode registration request may further include a type of compatibility check corresponding to the first data source, and accordingly, step 4705 may specifically include:
acquiring the type of compatibility check corresponding to the first data source;
comparing the first mode information with the third mode information to obtain updated information;
and according to the type of the compatibility check, performing the compatibility check on the mode update according to the update information.
Here, different types of compatibility check may be set according to actual requirements when different data sources are registered, and thus, when a subsequent data source needs to change its data mode, compatibility check may be performed according to the corresponding type of compatibility check.
It should be noted that, the execution sequence of the first step and the second step in the above three steps is not limited. Specifically, the update information of the third mode information compared to the first mode information may be determined by comparing the first mode information with a data structure corresponding to the third mode information, where the update information may include, for example, an add field, a delete field, and the like. For example, the third mode information includes three fields of "name", "telephone" and "age", and the first mode information includes two fields of "name" and "telephone", and the comparison shows that the update information is an added field.
Because the compatibility check results obtained for different previous and subsequent update information are different in different types of compatibility check, different check strategies can be adopted to determine whether the mode update is compatible.
Therefore, compatibility check is carried out on the mode updating according to the types of the compatibility check of different data sources and the updating information, so that the data sources can carry out compatible mode updating according to actual requirements, and the mode updating requirements of the data sources under different scenes are met.
The type of the compatibility check may include forward compatibility, backward compatibility, or full compatibility, and accordingly, the step of performing the compatibility check on the current mode update according to the type of the compatibility check and the update information may specifically include:
under the condition that the type of the compatibility check is forward compatibility, if the updating information is to add fields or delete optional fields, the mode is determined to be updated and compatible; if the updating information is not to add fields or delete optional fields, determining that the mode updating is incompatible;
under the condition that the type of the compatibility check is backward compatibility, if the updating information is that optional fields are added or fields are deleted, the mode is determined to be updated and compatible; if the updating information is not to add optional fields or delete fields, determining that the mode updating is not compatible;
under the condition that the types of the compatibility check are fully compatible, if the updating information is that optional fields are added or deleted, the mode updating compatibility is determined; and if the updating information is not to add or delete the optional field, determining that the mode updating is not compatible.
Here, the optional field may be a field using a defined default value when there is no value.
Exemplarily, under the condition that the type of the compatibility check is forward compatibility, judging whether the third mode information is compared with the first mode information and is added with a new field or deleted with an optional field, if so, determining that the current mode is updated to be compatible, and if not, determining that the current mode is not compatible; under the condition that the type of the compatibility check is backward compatibility, judging whether the third mode information is added with optional fields or deleted fields compared with the first mode information, if so, determining that the current mode is compatible for updating, and if not, determining that the current mode is not compatible; and under the condition that the type of the compatibility check is fully compatible, judging whether the third mode information is added with the optional field or deleted from the first mode information, if so, determining that the mode update is compatible, and if not, determining that the mode update is not compatible.
Therefore, compatibility check aiming at different compatibility check types can be realized, and the diversity of the compatibility check is ensured, so that each data source can be provided with different compatibility check mechanisms according to actual requirements, and the individualized requirement of the data source updating process is met.
In addition, in an optional implementation manner, after the stored mode information corresponding to the first data source is updated according to the third mode information, the data processing method related to the foregoing may further include:
determining a third data source associated with the first data source according to the associated data source identifier corresponding to the first data source; the third data source is a registered data source used for storing target data, and the target data is data taken out from the first data source;
acquiring registration information of a third data source;
generating a table structure update statement corresponding to the third data source according to the first mode information and the third mode information under the condition that the non-serialized data stored in the third data source is determined according to the registration information;
and sending a table structure updating statement to the third data source, wherein the table structure updating statement is used for indicating to update the table structure of the third data source.
Here, the third data source may be a downstream data source associated with the first data source. The registration information may be information stored when the third data source is registered, including but not limited to a data source type, a data source identification, table structure description information, a serialization manner, a serialization mode, and the like.
Specifically, whether serialized data or non-serialized data is stored in the data source can be determined according to the type of the data source, the table structure description, or the serialization manner. For example, if the data source type in the registration information is redis, kafka, or the like, or the registration information includes a serialization manner, it may be determined that the data source stores serialized data; if the table structure description is included in the registration information, it may be determined that non-serialized data is stored in the data source.
In addition, in a specific example, the table structure update statement may be, for example, an sql statement, and accordingly, the third data source may be, for example, a mysql database. In this way, in the heterogeneous data source, when the data mode of the upstream data source is changed, the mode update registration is performed on the upstream data source after the compatibility is determined, and when the downstream data source is a data source of non-serialized data, the table structure update statement corresponding to the downstream data source is generated according to the newly generated third mode information.
In order to better understand the data processing method provided by the embodiments of the present disclosure, the following description is specifically provided in conjunction with several practical application scenarios.
As shown in fig. 8, after the data producer serializes the produced data according to the desired serialization scheme, the data producer sends the serialized data to kafka, and may register the data pattern of the data source kafka. Specifically, the data source kafka sends a mode registration request to a second electronic device where the mode registration service is located, after receiving the mode registration request, the second electronic device may first determine whether the data source kafka is a registered data source, and if the data source kafka is not a registered data source, store a serialization mode used when serializing data to a corresponding serialization mode table through the mode registration service, and simultaneously generate a general mode (i.e., mode information) according to the serialization mode and store the general mode to the corresponding general mode table (i.e., mode information table); if the data source is registered, the mode registration request is rejected.
When a data consumer needs to synchronize the serialized data in the kafka to a downstream database hive, the data consumer can send a pattern acquisition request to the second electronic device, a pattern registration service in the second electronic device can acquire a corresponding serialized pattern from a serialized pattern table according to a data source identifier of the kafka carried in the request and send the corresponding serialized pattern to the data consumer, and the data consumer can obtain non-serialized data by performing anti-serialization processing on the serialized data acquired from the kafka by using the serialized pattern and store the non-serialized data in the database hive.
In addition, as shown in fig. 8, in the data query scenario, the data querier may send a data query request to the second electronic device, and the data query service in the second electronic device may obtain the corresponding common schema from the common schema table according to the data source identifier of the kafka carried in the request, and send the corresponding common schema to the data querier, so that the data querier queries which fields and field types are included in the kafka according to the common schema without parsing the data. Of course, the data query service in the second electronic device may obtain the corresponding serialization schema from the serialization schema table according to the data source identifier of the kafka carried in the request, and send the serialization schema to the data querier, so that the data querier can use the serialization schema to parse the data in the kafka.
As shown in fig. 9, in the case of synchronizing data stored in the database hive to the message queue redis, the database hive may be registered after the data producer produces the data and stores it to the database hive. Specifically, the data source hive sends a mode registration request to the second electronic device where the mode registration service is located, after receiving the mode registration request, the second electronic device may first determine whether the mode registration request is a registered data source, and if the mode registration request is not a registered data source, generate a common mode according to the table structure definition through the mode registration service, and store the common mode in a corresponding common mode table; if the data source is registered, the mode registration request is rejected.
When a data consumer needs to synchronize data in a database hive to a downstream message queue redis, the data consumer may first send a pattern registration request to a second electronic device where a pattern registration service is located, where the request may carry a data source identifier of the redis and a data source identifier of the hive associated with the redis, obtain, by the pattern registration service, a general pattern corresponding to the database hive according to the data source identifier, further generate, according to the general pattern, a serialization pattern corresponding to a serialization manner of the redis, and store the serialization pattern in a serialization pattern table.
Then, the data consumer sends a pattern acquisition request to the second electronic device, the pattern registration service in the second electronic device can acquire a corresponding serialization pattern from the serialization pattern table according to the data source identification of the redis carried in the request, and sends the corresponding serialization pattern to the data consumer, and the data consumer can obtain the serialization data by using the serialization pattern to perform serialization processing on the data read from the hive, and synchronize the serialization data into the redis.
It should be noted that the application scenarios described in the embodiment of the present disclosure are for more clearly illustrating the technical solutions of the embodiment of the present disclosure, and do not constitute a limitation to the technical solutions provided in the embodiment of the present disclosure, and as a person having ordinary skill in the art knows new application scenarios, for example, data transmission is performed between data sources of the same type but different data modes, or between any data sources of the same data mode, the technical solutions provided in the embodiment of the present disclosure are also applicable.
Based on the same inventive concept, the present disclosure also provides a data processing apparatus. This is explained in detail with reference to fig. 10.
Fig. 10 is a schematic diagram illustrating a structure of a data processing apparatus according to an exemplary embodiment.
As shown in fig. 10, the data processing apparatus 100 may specifically include:
a registration request receiving module 1001 configured to perform receiving a mode registration request for a first data source; the mode registration request comprises a data source identification corresponding to the first data source;
a data pattern obtaining module 1002 configured to perform obtaining a data pattern corresponding to the first data source according to the pattern registration request;
a data pattern description module 1003 configured to perform description of a data pattern according to a preset description form, and generate first pattern information;
a mode information registration module 1004 configured to perform storing the first mode information to a mode information table according to the data source identification to realize registration of the first data source; wherein the first mode information is used to process data associated with the first data source.
The data processing apparatus 100 is described in detail below, specifically as follows:
in one embodiment, the mode registration request includes a type of data, and the data mode description module 1003 may specifically include:
the first description submodule is configured to execute a serialization mode for describing a first data source according to a preset description form under the condition that the type of the data is serialization data, and generate first mode information;
and the second description submodule is configured to perform description on a table structure of the first data source according to a preset description form and generate first mode information under the condition that the type of the data is non-serialized data.
Based on this, in one embodiment, in the case that the type of the data is serialized data, the schema registration request further includes a serialization manner and an associated data source identifier corresponding to the first data source;
the data pattern obtaining module 1002 may specifically include:
the data source determining submodule is configured to determine a second data source associated with the first data source according to the associated data source identification, and the data mode of the second data source is different from that of the first data source;
the mode acquisition submodule is configured to acquire second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to a preset description form;
and the pattern generation submodule is configured to execute the generation of the serialization pattern corresponding to the first data source according to the second pattern information in the serialization manner.
In one embodiment, in the case that the type of the data is serialized data, the data processing apparatus 100 referred to above may further include:
and the serialization pattern storage module is configured to store the serialization pattern corresponding to the first data source into the serialization pattern table according to the data source identification after acquiring the data pattern corresponding to the first data source according to the pattern registration request.
Based on this, in one embodiment, the data processing apparatus 100 mentioned above may further include:
the acquisition request receiving module is configured to execute mode acquisition requests of a requester for the first data source after storing the serialization mode corresponding to the first data source into the serialization mode table according to the data source identification; the mode acquisition request comprises a data source identifier corresponding to a first data source;
a serialization pattern acquisition module configured to execute acquiring a serialization pattern corresponding to the data source identification from the serialization pattern table in response to the pattern acquisition request;
a serialization pattern transmission module configured to perform transmission of a serialization pattern to a requestor, the serialization pattern usable for processing data associated with a first data source.
In addition, in one embodiment, the data processing apparatus 100 mentioned above may further include:
the query request receiving module is configured to receive a data query request of a requester for a first data source after storing first mode information into a mode information table according to the data source identification; the data query request comprises a data source identifier corresponding to a first data source;
the mode information acquisition module is configured to execute acquisition of first mode information corresponding to the data source identification from a mode information table in response to a data query request;
a mode information sending module configured to perform sending first mode information to the requester, the first mode information being usable to query a data structure of the first data source.
On the basis of the foregoing embodiments, in one embodiment, the data processing apparatus 100 further includes:
an update request receiving module configured to perform receiving a mode update request for a first data source after storing first mode information to a mode information table according to a data source identification;
the mode updating description module is configured to execute description on the updated data mode of the first data source according to a preset description form in response to a mode updating request, and generate third mode information;
the updating compatibility checking module is configured to execute compatibility checking on the current mode updating according to the first mode information and the third mode information;
the mode updating and registering module is configured to update the stored mode information corresponding to the first data source according to the third mode information under the condition that the mode updating is determined to be compatible according to the checking result;
and the mode updating rejection module is configured to reject to update the stored mode information corresponding to the first data source under the condition that the mode updating is determined to be incompatible according to the checking result.
Based on this, in one embodiment, the schema registration request may also include a type of compatibility check corresponding to the first data source;
the update compatibility check module may specifically include:
a type acquisition submodule configured to perform acquisition of a type of compatibility check corresponding to the first data source;
the information acquisition submodule is configured to compare the first mode information with the third mode information and acquire updated information;
and the compatibility check submodule is configured to execute compatibility check on the current mode update according to the type of the compatibility check and the update information.
Based on this, in one embodiment, the types of compatibility checks include forward compatible, backward compatible, or fully compatible;
the compatibility check sub-module may specifically include:
the first checking unit is configured to execute, under the condition that the type of the compatibility check is forward compatibility, if the update information is to add a field or delete an optional field, determining that the current mode update is compatible; if the updating information is not to add fields or delete optional fields, determining that the mode updating is incompatible;
the second checking unit is configured to execute, under the condition that the type of the compatibility check is backward compatible, if the update information is to add an optional field or delete a field, determining that the current mode update is compatible; if the updating information is not to add optional fields or delete fields, determining that the mode updating is not compatible;
the third checking unit is configured to execute, under the condition that the type of the compatibility check is fully compatible, if the update information is to add or delete the optional field, determining that the current mode update is compatible; and if the updating information is not to add or delete the optional field, determining that the mode updating is not compatible.
In addition, in one embodiment, the data processing apparatus 100 mentioned above may further include:
the data source determination module is configured to determine a third data source associated with the first data source according to the associated data source identifier corresponding to the first data source after updating the stored mode information corresponding to the first data source according to the third mode information; the third data source is a registered data source used for storing target data, and the target data is data taken out from the first data source;
a registration information acquisition module configured to perform acquisition of registration information of a third data source;
an update statement generation module configured to execute, in a case where it is determined that non-serialized data is stored in the third data source according to the registration information, generating a table structure update statement corresponding to the third data source according to the first pattern information and the third pattern information;
and the updating statement sending module is configured to send the table structure updating statement to the third data source, wherein the table structure updating statement is used for indicating that the table structure of the third data source is updated.
Therefore, the mode registration service is provided for the plurality of heterogeneous data sources, that is, in the process of registering a first data source in the plurality of heterogeneous data sources, the data mode corresponding to the first data source is described according to the preset description form to generate the first mode information, and the first mode information is further stored in the mode information table to realize the registration of the first data source, so that when the data associated with the first data source needs to be processed, because the first data source generates the common mode information in the uniform description form as the other heterogeneous data sources during the registration, the data of the first data source and the data of the other heterogeneous data sources can be processed by the common mode information for the data consumer of the first data source without unifying the data output by different data sources by different data standardization steps, the disclosed embodiments thus reduce the cost of data management and processing for heterogeneous data sources.
Based on the same inventive concept, the embodiment of the present disclosure further provides an electronic device, which is specifically described in detail with reference to fig. 11. Fig. 11 is a block diagram illustrating a configuration of an electronic device according to an example embodiment.
As shown in fig. 11, the electronic device 110 is a structural diagram of an exemplary hardware architecture of an electronic device capable of implementing the data processing method and the data processing apparatus according to the embodiment of the present disclosure.
The electronic device 110 may include a processor 1101 and a memory 1102 in which computer program instructions are stored.
Specifically, the processor 1101 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 1102 may include a mass storage for information or instructions. By way of example, and not limitation, memory 1102 may include a Hard Disk Drive (HDD), a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 1102 may include removable or non-removable (or fixed) media, where appropriate. Memory 1102 may be internal or external to the integrated gateway device, where appropriate. In a particular embodiment, the memory 1102 is a non-volatile solid-state memory. In a particular embodiment, the memory 1102 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.
The processor 1101 performs the following steps by reading and executing computer program instructions stored in the memory 1102:
a processor 1101 that performs receiving a mode registration request for a first data source; the mode registration request comprises a data source identification corresponding to the first data source; acquiring a data mode corresponding to a first data source according to the mode registration request; describing a data mode according to a preset description form to generate first mode information; storing first mode information to a mode information table according to the data source identification so as to register the first data source; wherein the first mode information is used to process data associated with the first data source.
In one embodiment, the processor 1101 specifically executes, when the type of the data is serialized data, a serialization pattern describing a first data source according to a preset description form to generate first pattern information; and under the condition that the type of the data is non-serialized data, describing a table structure of a first data source according to a preset description form, and generating first mode information.
In one embodiment, the processor 1101 further specifically performs determining a second data source associated with the first data source according to the associated data source identifier, where the data pattern of the second data source is different from that of the first data source; acquiring second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to a preset description form; and generating a serialization mode corresponding to the first data source according to the second mode information in a serialization mode.
In one embodiment, the processor 1101 further specifically executes storing the serialization pattern corresponding to the first data source into the serialization pattern table according to the data source identifier.
Additionally, in one embodiment, the above-described method involving processor 1101 further performs receiving a schema acquisition request by a requestor for a first data source; the mode acquisition request comprises a data source identifier corresponding to a first data source; in response to the mode acquisition request, acquiring a serialization mode corresponding to the data source identification from the serialization mode table; a serialization schema is transmitted to the requestor that can be used to process data associated with the first data source.
In one embodiment, the processor 1101 specifically further performs receiving a data query request of a requester for a first data source; the data query request comprises a data source identifier corresponding to a first data source; responding to a data query request, and acquiring first mode information corresponding to the data source identification from a mode information table; first schema information is sent to the requestor, the first schema information capable of being used to query a data structure of the first data source.
Based on this, in one embodiment, the above-described reference to processor 1101 further performs receiving a schema update request for a first data source; responding to the mode updating request, describing the updated data mode of the first data source according to a preset description form, and generating third mode information; according to the first mode information and the third mode information, compatibility check is carried out on the mode updating; under the condition that the mode updating is determined to be compatible according to the checking result, updating the stored mode information corresponding to the first data source according to the third mode information; and refusing to update the stored mode information corresponding to the first data source under the condition that the mode update is determined to be incompatible according to the checking result.
In one embodiment, the above-described embodiments involving processor 1101 further performing obtaining a type of compatibility check corresponding to the first data source; comparing the first mode information with the third mode information to obtain updated information; and according to the type of the compatibility check, performing the compatibility check on the mode update according to the update information.
In one embodiment, the above-mentioned related processor 1101 further performs, in a case that the type of the compatibility check is forward compatibility, if the update information is to add a field or delete an optional field, determining that the current mode update is compatible; if the updating information is not to add fields or delete optional fields, determining that the mode updating is incompatible; under the condition that the type of the compatibility check is backward compatibility, if the updating information is that optional fields are added or fields are deleted, the mode is determined to be updated and compatible; if the updating information is not to add optional fields or delete fields, determining that the mode updating is not compatible; under the condition that the types of the compatibility check are fully compatible, if the updating information is that optional fields are added or deleted, the mode updating compatibility is determined; and if the updating information is not to add or delete the optional field, determining that the mode updating is not compatible.
In one embodiment, the aforementioned related processor 1101 further performs determining a third data source associated with the first data source according to the associated data source identification corresponding to the first data source; the third data source is a registered data source used for storing target data, and the target data is data taken out from the first data source; acquiring registration information of a third data source; generating a table structure update statement corresponding to the third data source according to the first mode information and the third mode information under the condition that the non-serialized data stored in the third data source is determined according to the registration information; and sending a table structure updating statement to the third data source, wherein the table structure updating statement is used for indicating to update the table structure of the third data source.
In one example, the electronic device 110 can also include a transceiver 1103 and a bus 1104. As shown in fig. 11, the processor 1101, the memory 1102 and the transceiver 1103 are connected via a bus 1104 to complete the communication therebetween.
Bus 1104 includes hardware, software, or both. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Control Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 1104 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The embodiment of the disclosure also provides a computer storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are used for implementing the data processing method described in the embodiment of the disclosure.
In some possible embodiments, various aspects of the methods provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the methods according to various exemplary embodiments of the present disclosure described above in this specification when the program product is run on the computer device, for example, the computer device may perform the data processing methods described in the embodiments of the present disclosure.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to the present disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims (10)

1. A data processing method, comprising:
receiving a mode registration request for a first data source; wherein the pattern registration request includes a data source identification corresponding to the first data source;
acquiring a data mode corresponding to the first data source according to the mode registration request;
describing the data mode according to a preset description form to generate first mode information;
storing the first mode information to a mode information table according to the data source identification so as to realize the registration of the first data source; wherein the first mode information is used to process data associated with the first data source.
2. The method according to claim 1, wherein the mode registration request includes a type of data, and the describing the data mode according to a preset description form generates first mode information including:
under the condition that the type of the data is serialized data, describing a serialization mode of the first data source according to the preset description form, and generating first mode information;
and under the condition that the type of the data is non-serialized data, describing a table structure of the first data source according to the preset description form, and generating first mode information.
3. The method according to claim 2, wherein, in the case that the type of the data is serialized data, the pattern registration request further includes a serialization manner and an associated data source identifier corresponding to the first data source;
the obtaining of the data pattern corresponding to the first data source according to the pattern registration request includes:
determining a second data source associated with the first data source according to the associated data source identification, wherein the data mode of the second data source is different from that of the first data source;
acquiring second mode information corresponding to the second data source under the condition that the second data source is a registered data source; the second mode information is mode information generated by describing a data mode corresponding to the second data source according to the preset description form;
and generating a serialization mode corresponding to the first data source according to the second mode information in the serialization mode.
4. The method according to claim 2, wherein in a case that the type of the data is serialized data, after acquiring the data schema corresponding to the first data source according to the schema registration request, the method further comprises:
and storing the serialization mode corresponding to the first data source into a serialization mode table according to the data source identification.
5. The method of claim 4, wherein after storing the serialization pattern corresponding to the first data source into a serialization pattern table according to the data source identifier, the method further comprises:
receiving a mode acquisition request of a requester for the first data source; the mode acquisition request comprises a data source identifier corresponding to the first data source;
in response to the mode acquisition request, acquiring a serialization mode corresponding to the data source identification from the serialization mode table;
sending the serialization schema to the requestor, the serialization schema usable for processing data associated with the first data source.
6. The method of claim 1, wherein after storing the first schema information to a schema information table based on the data source identification, the method further comprises:
receiving a data query request of a requester for the first data source; the data query request comprises a data source identifier corresponding to the first data source;
responding to the data query request, and acquiring first mode information corresponding to the data source identification from the mode information table;
sending the first mode information to the requester, the first mode information being usable to query a data structure of the first data source.
7. A data processing apparatus, comprising:
a registration request receiving module configured to perform receiving a mode registration request for a first data source; wherein the pattern registration request includes a data source identification corresponding to the first data source;
a data mode obtaining module configured to execute obtaining a data mode corresponding to the first data source according to the mode registration request;
the data pattern description module is configured to describe the data pattern according to a preset description form and generate first pattern information;
the mode information registration module is configured to store the first mode information into a mode information table according to the data source identification so as to realize registration of the first data source; wherein the first mode information is used to process data associated with the first data source.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the data processing method of any one of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a data processing apparatus, cause the data processing apparatus to implement the data processing method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the data processing method of any one of claims 1 to 6 when executed by a processor.
CN202110909460.6A 2021-08-09 2021-08-09 Data processing method, device, electronic equipment and storage medium Active CN113704320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110909460.6A CN113704320B (en) 2021-08-09 2021-08-09 Data processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110909460.6A CN113704320B (en) 2021-08-09 2021-08-09 Data processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113704320A true CN113704320A (en) 2021-11-26
CN113704320B CN113704320B (en) 2024-01-02

Family

ID=78651959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110909460.6A Active CN113704320B (en) 2021-08-09 2021-08-09 Data processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113704320B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112182036A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Data sending and writing method and device, electronic equipment and readable storage medium
CN112671734A (en) * 2020-12-16 2021-04-16 中国平安人寿保险股份有限公司 Message processing method facing multiple data sources and related equipment thereof
CN112883088A (en) * 2019-11-29 2021-06-01 贵州白山云科技股份有限公司 Data processing method, device, equipment and storage medium
CN113127522A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Data processing method, device, system and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883088A (en) * 2019-11-29 2021-06-01 贵州白山云科技股份有限公司 Data processing method, device, equipment and storage medium
CN113127522A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 Data processing method, device, system and storage medium
CN112182036A (en) * 2020-09-15 2021-01-05 中信银行股份有限公司 Data sending and writing method and device, electronic equipment and readable storage medium
CN112671734A (en) * 2020-12-16 2021-04-16 中国平安人寿保险股份有限公司 Message processing method facing multiple data sources and related equipment thereof

Also Published As

Publication number Publication date
CN113704320B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
CN108984388B (en) Method and terminal equipment for generating automatic test case
CN108696381B (en) Protocol configuration method and device
CN111581291A (en) Data processing method and device, electronic equipment and readable medium
CN112104709A (en) Intelligent contract processing method, device, medium and electronic equipment
EP3869434A1 (en) Blockchain-based data processing method and apparatus, device, and medium
CN114356921A (en) Data processing method, device, server and storage medium
CN108255967B (en) Method and device for calling storage process, storage medium and terminal
CN113886485A (en) Data processing method, device, electronic equipment, system and storage medium
CN110688305B (en) Test environment synchronization method, device, medium and electronic equipment
CN114064712A (en) Data access method and device, electronic equipment and computer readable storage medium
CN114242210A (en) Medical image data management method, device, equipment and storage medium
CN113094415B (en) Data extraction method, data extraction device, computer readable medium and electronic equipment
CN112699183A (en) Data processing method, system, readable storage medium and computer equipment
CN113704320B (en) Data processing method, device, electronic equipment and storage medium
CN110020166B (en) Data analysis method and related equipment
CN116737535A (en) Interface test method, device, computer equipment and storage medium
CN116204428A (en) Test case generation method and device
CN113886221B (en) Test script generation method and device, storage medium and electronic equipment
CN114398152A (en) Interface simulation service calling method and device
CN114675871A (en) Resource updating method and device, electronic equipment, server and storage medium
CN105607942B (en) A kind of method and apparatus that ballot determines
US11861408B2 (en) Hardware accelerator service discovery
CN113076273B (en) Component access method, device, electronic equipment, storage medium and program product
CN110347960B (en) Data transfer processing method, device, equipment and storage medium
CN117992425A (en) Database operation execution method, apparatus, electronic device, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant