CN110515893A - Date storage method, device, equipment and computer readable storage medium - Google Patents

Date storage method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110515893A
CN110515893A CN201910684136.1A CN201910684136A CN110515893A CN 110515893 A CN110515893 A CN 110515893A CN 201910684136 A CN201910684136 A CN 201910684136A CN 110515893 A CN110515893 A CN 110515893A
Authority
CN
China
Prior art keywords
data
hive
file
protobuf
hive table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910684136.1A
Other languages
Chinese (zh)
Other versions
CN110515893B (en
Inventor
潘利杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN201910684136.1A priority Critical patent/CN110515893B/en
Publication of CN110515893A publication Critical patent/CN110515893A/en
Application granted granted Critical
Publication of CN110515893B publication Critical patent/CN110515893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Abstract

The embodiment of the invention discloses a kind of date storage method, device, equipment and computer readable storage mediums.Wherein, method includes the hive table for creating the Protobuf serializing file for reading bottom storage;By the corresponding programming language file of the description file generated of Protobuf serialized data, and it is sent in file packet to be loaded;According to the field for being used to parse data in advance in the corresponding analysis mode configuration hive table of the data parser constructed;Based on data parser, is automated using hive table and parse and read Protobuf serialized data;Wherein, data parser is used to hive table and configuration file carrying out the matching of table schema and configuration structure, and generates the Object object set and hive result object collection of hive table structure.The application realizes the parsing problem that the storage of hive data warehouse bottom Protobuf file is solved with least exploitation amount, is conducive to the data storage safety for promoting hive data warehouse.

Description

Date storage method, device, equipment and computer readable storage medium
Technical field
The present embodiments relate to technical field of memory, more particularly to a kind of date storage method, device, equipment and meter Calculation machine readable storage medium storing program for executing.
Background technique
With big data, the fast development of cloud computing, current informationization society mass data is continued to bring out, and is especially interconnected The data volume of the companies such as net industry, telecommunications industry is constantly increased with shockingly speed, and so big data volume is for storing just Have higher requirement.Current industry nearly all can use the mode of data compression to deposit to reduce file the storage of big data It may require that higher calculating frequency while storing up occupied space, but reduce file storage, between EQUILIBRIUM CALCULATION FOR PROCESS and storage Relationship makes that the required balance of industry can be reached between resource, is that those skilled in the art need the problem of paying close attention to.
Since hive data warehouse has lot of advantages, such as computing capability is expansible, data fault tolerant performance is higher, number According to safety, integrated all advantages of HDFS, low in cost and easy to use etc., hive has been widely used as key data warehouse applications In the application scenarios for having off-line data warehouse demand.
But since the HSQL ability to express of hive data warehouse is limited, mapreduce operation not smart enoughization of generation It is thicker etc. with tuning granularity.Current hive data warehouse support storing data format be textfile, sequencefile, Rcfile, orcfile, parquet, but these types of data memory format is all the unsafe storage class of data, is illegally invaded As long as the partial data that the person of entering or unauthorized user take the type can check all data of this partial data taken, hold It easily divulges a secret, is unfavorable for user security storing data.
Summary of the invention
The embodiment of the present disclosure provides a kind of date storage method, device, equipment and computer readable storage medium, realizes The parsing problem that the storage of hive data warehouse bottom Protobuf file is solved with least exploitation amount is conducive to promote hive The data storage safety of data warehouse.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
On the one hand the embodiment of the present invention provides a kind of date storage method, comprising:
Hive table is created, the hive table is used to read the number of the Protobuf structured data storage format of bottom storage According to;
By the corresponding programming language file of the description file generated of Protobuf serialized data, and it is sent to published article to be added In part packet;
It is configured in the hive table according to the corresponding analysis mode of the data parser constructed in advance for parsing data Field;
Resolver based on the data is automated using the hive table and parses and read the Protobuf serializing number According to;
Wherein, the data parser is used to carry out in the hive table and configuration file of table schema and configuration structure Match, and generates the Object object set and hive result object collection of the hive table structure.
Optionally, it is applied in Java language environment, the building process of the data parser includes:
It is arranged for reading configuration file and being associated with the pattern match file of the hive table structure, by the hive table The matching of table schema and configuration structure is carried out with configuration file;
Object transformed document for being acted on behalf of the conversion logic for realizing Java object to Object object based on subclass is set;
It is arranged for traversing the nested traversal that the Protobuf serialized data generates Java object using nested object File.
Optionally, the resolver based on the data is automated using the hive table described in parsing and reading Protobuf serialized data includes:
The pattern match file and the object are assembled by rewriteeing the initialization function of hive data warehouse to realize Transformed document generates the Object object set of the hive table structure;
By rewriteeing the data analytical function of hive data warehouse to realize the assembling nested traversal file generated hive Result object collection;
Executable file packet is generated, and the java of Protobuf structure definition file generation is set in the configuration file The executable file packet is loaded into hive environmental variance and reads institute using the specified parsing of the hive table by document location State the entrance of Protobuf serialized data;
It executes hiveSQL query statement and reads the Protobuf serialized data.
Optionally, the resolver based on the data is automated using the hive table described in parsing and reading Protobuf serialized data includes:
Judge whether that receiving data parsing within a preset time reads successful information;
If it is not, then carrying out the alarm of data read errors, and feedback automation parses and read the Protobuf sequence Change the log-file information of data procedures.
On the other hand the embodiment of the present invention provides a kind of data storage device, comprising:
Hive table creation module, for creating hive table, the hive table is used to read the Protobuf knot of bottom storage The data of structure data memory format;
Data conversion module, for the corresponding programming language of the description file generated of Protobuf serialized data is literary Part, and be sent in file packet to be loaded;
Analysis mode configuration module, for according to the corresponding analysis mode configuration of the data parser of default building For parsing the field of data in hive table;The data parser is used to the hive table and configuration file carrying out table schema With the matching of configuration structure, and the Object object set and hive result object collection of the hive table structure are generated;
Datamation parses read module, for resolver based on the data, is dissolved automatically using the hive table It analyses and reads the Protobuf serialized data.
Optionally, the data parser includes pattern matcher, object converter and nested traversal device;
The pattern matcher is used to carry out in the hive table and configuration file the matching of table schema and configuration structure;
The object converter is used to act on behalf of the conversion logic for realizing Java object to Object object based on subclass;
The nested traversal device, which is used to traverse the Protobuf serialized data using nested object, generates Java pairs As.
Optionally, the datamation parsing read module includes:
Automation object generates submodule, for the initialization function by rewriteeing hive data warehouse to realize assembling institute It states pattern match file and the object transformed document generates the Object object set of the hive table structure;
Hive result object collection generates submodule, for the data analytical function by rewriteeing hive data warehouse to realize Assemble the nested traversal file generated hive result object collection;
Analysis mode specifies submodule, is arranged for generating executable file packet, and in the configuration file The java document location that Protobuf structure definition file generates, is loaded into hive environmental variance for the executable file packet The middle entrance that the Protobuf serialized data is read using the specified parsing of the hive table;
Reading submodule reads the Protobuf serialized data for executing hiveSQL query statement.
It optionally, further include alarm module, if being read successfully for being not received by data parsing within a preset time Information then carries out the alarm of data read errors, and feedback automation parses and read the Protobuf serialized data mistake The log-file information of journey.
The embodiment of the invention also provides a kind of data storage device, including processor, the processor is deposited for executing It is realized when the computer program stored in reservoir as described in preceding any one the step of date storage method.
The embodiment of the present invention finally additionally provides a kind of computer readable storage medium, the computer readable storage medium On be stored with data recording program, when the data recording program is executed by processor realize as described in preceding any one data storage The step of method.
The advantages of technical solution provided by the present application, is, based on building data parser in advance, need to only configure hive table Data parsing field can realize hive data warehouse automation parse and read bottom storage Protobuf serializing File, since the data stored with Protobuf structured data storage format not only have data compression function, and it is highly-safe, Improve the data storage safety of hive data warehouse;In addition, need to only modify certain parsing fields, it is not necessary to modify original programs Code realizes the parsing problem that the storage of hive data warehouse bottom Protobuf file is solved with least exploitation amount, for Downstream program need to only change the variation that configuration file is suitable for data structure when the data of upper layer transport have structure change, very greatly Reducing enterprise in degree is that the application layer applications program that data structure changes and generates changes bring risk.
In addition, the embodiment of the present invention provides corresponding realization device, equipment and computer also directed to date storage method Readable storage medium storing program for executing, further such that the method has more practicability, described device, equipment and computer readable storage medium Have the advantages that corresponding.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited It is open.
Detailed description of the invention
It, below will be to embodiment or correlation for the clearer technical solution for illustrating the embodiment of the present invention or the relevant technologies Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of flow diagram of date storage method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another date storage method provided in an embodiment of the present invention;
Fig. 3 is a kind of specific embodiment structure chart of data storage device provided in an embodiment of the present invention;
Fig. 4 is another specific embodiment structure chart of data storage device provided in an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
The description and claims of this application and term " first ", " second ", " third " " in above-mentioned attached drawing Four " etc. be for distinguishing different objects, rather than for describing specific sequence.Furthermore term " includes " and " having " and Their any deformations, it is intended that cover and non-exclusive include.Such as contain a series of steps or units process, method, System, product or equipment are not limited to listed step or unit, but may include the step of not listing or unit.
After describing the technical solution of the embodiment of the present invention, the various non-limiting realities of detailed description below the application Apply mode.
Referring first to Fig. 1, Fig. 1 is a kind of flow diagram of date storage method provided in an embodiment of the present invention, this hair Bright embodiment may include the following contents:
S101: creation hive table, hive table are used to read the number of the Protobuf structured data storage format of bottom storage According to.
In this application, the creation that any prior art carries out hive table can be used, specific creation process please join The description for readding relevant art, just repeats no more herein.The hive table and hive table packet in the prior art created in S101 The functional module contained is all the same.
It is understood that Protobuf structured data storage format has both safety and data compression, for a kind of and language It says the serializing structured data storage method that unrelated, platform is unrelated, expansible, and is that the structural data of portable and effective a kind of is deposited Format is stored up, can be used for structural data serialization, serialize in other words.Hive data warehouse uses Protobuf structured data Storage format storing data can promote the safety of storing data.It should be noted that creating the mesh of hive table in this step Be in order to specify parsing Protobuf structured data storage format data analysis mode and parsing entrance.
S102: by the corresponding programming language file of the description file generated of Protobuf serialized data, and be sent to In load document packet.
In the present embodiment, the format for the data that bottom stores in hive data warehouse is Protobuf, Protobuf structure The data of data memory format presence in the form of serializing in storage or transmission, that is to say, that the embodiment of the present invention will be read What is taken is the Protobuf serializing file of hive data warehouse bottom storage, is executing data parsing in reading process, is needing By by the corresponding programming language file of the description file generated of Protobuf serialized data, if being held in JAVA language environment Row, then can be by the corresponding JAVA class of the description file generated of Protobuf serialized data.
S103: it is used to parse data according in the corresponding analysis mode configuration hive table of the data parser constructed in advance Field.
In this application, data parser can be used for carrying out in hive table and configuration file of table schema and configuration structure Match, and generates the Object object set and hive result object collection of hive table structure.Wherein, configuration file is the configuration text of system Part, for the Object object set of hive table structure for realizing subsequent automated operation, hive result object collection is by Protobuf Serialized data, which is converted into, executes the operable data structure of subject.It need to only be configured by the data parser certain in hive table Field is parsed class and achievees the purpose that the data directly stored using hiveSQL queried access bottom to parse Protobuf data, The addition field that can be convenient, without modifying program code, downstream program when having structure change for the data of upper layer transport The variation that configuration file is suitable for data structure only need to be changed, this will largely reduce produces since data structure changes Raw application layer applications program changes bring risk.
S104: being based on data parser, is automated using hive table and parses and read Protobuf serialized data.
The initial method of the built-in serializing analytical function SerDe of hive can be used to parse data parser for the application, lead to It crosses and executes hiveSQL query statement reading parsing Protobuf serializing file using the hive table of S101 creation.
In technical solution provided in an embodiment of the present invention, using preparatory building data parser, hive table need to be only configured Data parsing field can realize hive data warehouse automation parse and read bottom storage Protobuf serializing File since the data stored with Protobuf structured data storage format are not only highly-safe, and has data compression function, Promote the data storage safety of hive data warehouse;In addition, realizing it is not necessary to modify Original program code with least exploitation Amount solves the parsing problem of hive data warehouse bottom Protobuf file storage, has structure change for the data of upper layer transport When downstream program need to only change the variation that configuration file is suitable for data structure, largely reduce enterprise be data structure The application layer applications program change bring risk of variation and generation.
As a preferred embodiment, data parser may include configuration text when being applied in Java language environment Part and the matched pattern matcher of hive table schema, using Java CGlib realize comprising Java object to Object object The object converter of conversion logic uses the nested traversal device of the nested object traversal Protobuf Java object generated.Phase It answers, the building process of data parser can include:
It is understood that can define a class first, for inheriting abstract class AbstractSerDe, to realize Initialize, deserialize and getObjectInspector method.AbstractSerDe is one built in hive Abstract class is mainly responsible for and defines all unconsummated abstract methods of the abstract class, and hive included interpreter is also required to inherit and be somebody's turn to do Abstract class.Initialize is abstract method defined in AbstractSerDe abstract class, is mainly responsible for the initialization of interpreter The preparation such as work such as loading environment variable and table information, hive included interpreter are also required to inherit the abstract class. Deserialize is abstract method defined in AbstractSerDe abstract class, is mainly responsible for the solution of unserializing data file The realization of analysis method, hive included interpreter are also required to inherit the abstract class.GetObjectInspector is Abstract method defined in AbstractSerDe abstract class is mainly responsible for the data return of result object after serializing, and hive is certainly The interpreter of band is also required to inherit the abstract class.
It is arranged for reading configuration file and being associated with the pattern match file of hive table structure, by hive table and configuration text The matching of part progress table schema and configuration structure.It, can be by defining the class and pass that one is read configuration file in implementation process Join hive table structure, it is such as configuration file and to be associated with the pattern match file of hive table structure.
Object transformed document for being acted on behalf of the conversion logic for realizing Java object to Object object based on subclass is set. It, can be by defining a class in implementation process, the conversion of the Java object realized using Java CGlib to Object object Logic, such is as object transformed document.Java CGlib is a kind of implementation of Java dynamic proxy, alternatively referred to as subclass Agency realizes the extension to target object function by constructing a subclass object in memory.
Nested traversal file for being generated Java object using nested object traversal Protobuf serialized data is set. In implementation process, it can be used nested time of the nested object traversal Protobuf Java object generated by defining a class Tool is gone through, such is as nested traversal file.
Based on the data parser of above-mentioned building, S104 is parsed using the automation of hive table and is read Protobuf serializing Data may particularly include:
It, can be by rewriteeing the initialization function of hive data warehouse to realize assembly model matching files based on analysis mode The Object object set of hive table structure is generated with object transformed document;What initialization function herein can carry for hive Initialize method in AbstractSerDe.
Based on analysis mode, the nested traversal text of assembling can be realized by rewriteeing the data analytical function of hive data warehouse Part generates hive result object collection.In the AbstractSerDe that data analytical function herein can carry for hive Deserialize method.
Executable file packet is generated, and the java file of Protobuf structure definition file generation is set in configuration file Executable file packet is loaded into hive environmental variance and reads Protobuf serializing using the specified parsing of hive table by position The entrance of data.
It executes hiveSQL query statement and reads Protobuf serialized data.
In view of that possibly can not parse and read hive data bins due to network cause or resolver failure and other reasons The data of bottom storehouse layer storage, for positioning failure reason as early as possible, and repair failure in time.Optionally, in one embodiment, Referring to Fig. 2, based on the above embodiment, may also include that
S105: judge whether that receiving data parsing within a preset time reads successful information, if it is not, then executing S106。
S106: the alarm of data read errors, and feedback automation parsing and reading Protobuf serialized data are carried out The log-file information of process.
In the present embodiment, by the way that self feed back step is arranged, if system is when starting to parse and read storing data to a certain In this section of preset time period at moment, such as 10s, it is not received by data parsing and reads successful feedback information, then prove to hold There is failure in the parsing of row data and reading process, the log-file information that system is run in timely crawl this period is simultaneously anti- Feeding system, can to avoid by faulty relevant information before the covering of subsequent journal file, be conducive to staff it is accurate and When bug is grabbed from journal file, efficiently repair failure, promote the overall performance of whole system.
The embodiment of the present invention provides corresponding realization device also directed to date storage method, further such that the method With more practicability.Data storage device provided in an embodiment of the present invention is introduced below, data storage described below Device can correspond to each other reference with above-described date storage method.
Referring to Fig. 3, Fig. 3 is a kind of structure of the data storage device provided in an embodiment of the present invention under specific embodiment Figure, the device can include:
Hive table creation module 301, for creating hive table, hive table is used to read the Protobuf structure of bottom storage The data of data memory format.
Data conversion module 302, for by the corresponding programming language of description file generated of Protobuf serialized data File, and be sent in file packet to be loaded.
Analysis mode configuration module 303, for the corresponding analysis mode configuration of data parser according to default building For parsing the field of data in hive table;Data parser is used to hive table and configuration file carrying out table schema and configuration is tied The matching of structure, and generate the Object object set and hive result object collection of hive table structure.
Datamation parses read module 304, for being based on data parser, is automated using hive table and parses and read Take Protobuf serialized data.
As a kind of preferred embodiment of the present embodiment, the data parser may include pattern matcher, object Converter and nested traversal device;
Pattern matcher is used to carry out in hive table and configuration file the matching of table schema and configuration structure;
Object converter is used to act on behalf of the conversion logic for realizing Java object to Object object based on subclass;
Nesting traversal device is used to generate Java object using nested object traversal Protobuf serialized data.
Optionally, in some embodiments of the present embodiment, referring to Fig. 4, described device for example can also include accusing Alert module 305 carries out reading data mistake if reading successful information for being not received by data parsing within a preset time Alarm accidentally, and feedback automation parsing and the log-file information for reading the Protobuf serialized data process.
In other embodiment, the datamation parsing read module is specific can include:
Automation object generates submodule, for the initialization function by rewriteeing hive data warehouse to realize that group is die-filling Formula matching files and object transformed document generate the Object object set of hive table structure;
Hive result object collection generates submodule, for the data analytical function by rewriteeing hive data warehouse to realize The nested traversal file generated hive result object collection of assembling;
Analysis mode specifies submodule, for generating executable file packet, and Protobuf is arranged in configuration file and ties Structure defines the java document location of file generated, and executable file packet is loaded into hive environmental variance and is referred to using hive table The entrance of Protobuf serialized data is read in fixed parsing;
Reading submodule reads Protobuf serialized data for executing hiveSQL query statement.
The function of each functional module of data storage device described in the embodiment of the present invention can be according in above method embodiment Method specific implementation, specific implementation process is referred to the associated description of above method embodiment, and details are not described herein again.
From the foregoing, it will be observed that the embodiment of the present invention, which is realized, solves hive data warehouse bottom Protobuf with least exploitation amount The parsing problem of file storage is conducive to the data storage safety for promoting hive data warehouse.
The embodiment of the invention also provides a kind of data storage devices, specifically can include:
Memory, for storing computer program;
Processor realizes the step of date storage method described in any one embodiment as above for executing computer program Suddenly.
The function of each functional module of data storage device described in the embodiment of the present invention can be according in above method embodiment Method specific implementation, specific implementation process is referred to the associated description of above method embodiment, and details are not described herein again.
From the foregoing, it will be observed that the embodiment of the present invention, which is realized, solves hive data warehouse bottom Protobuf with least exploitation amount The parsing problem of file storage is conducive to the data storage safety for promoting hive data warehouse.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored with data recording program, the number When being executed by processor according to storage program as above date storage method described in any one embodiment the step of.
The function of each functional module of computer readable storage medium described in the embodiment of the present invention can be according to above method reality The method specific implementation in example is applied, specific implementation process is referred to the associated description of above method embodiment, herein no longer It repeats.
From the foregoing, it will be observed that the embodiment of the present invention, which is realized, solves hive data warehouse bottom Protobuf with least exploitation amount The parsing problem of file storage is conducive to the data storage safety for promoting hive data warehouse.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other The difference of embodiment, same or similar part may refer to each other between each embodiment.For being filled disclosed in embodiment For setting, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part Explanation.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Above to a kind of date storage method provided by the present invention, device, equipment and computer readable storage medium into It has gone and has been discussed in detail.Used herein a specific example illustrates the principle and implementation of the invention, the above implementation The explanation of example is merely used to help understand method and its core concept of the invention.It should be pointed out that for the general of the art , without departing from the principle of the present invention, can be with several improvements and modifications are made to the present invention for logical technical staff, this A little improvement and modification are also fallen within the protection scope of the claims of the present invention.

Claims (10)

1. a kind of date storage method characterized by comprising
Hive table is created, the hive table is used to read the data of the Protobuf structured data storage format of bottom storage;
By the corresponding programming language file of the description file generated of Protobuf serialized data, and it is sent to file packet to be loaded In;
It is configured in the hive table according to the corresponding analysis mode of the data parser constructed in advance for parsing the field of data;
Resolver based on the data is automated using the hive table and parses and read the Protobuf serialized data;
Wherein, the data parser is used to carry out in the hive table and configuration file the matching of table schema and configuration structure, And generate the Object object set and hive result object collection of the hive table structure.
2. date storage method according to claim 1, which is characterized in that be applied in Java language environment, the number Include: according to the building process of resolver
Be arranged for reading configuration file and being associated with the pattern match file of the hive table structure, by the hive table with match Set the matching that file carries out table schema and configuration structure;
Object transformed document for being acted on behalf of the conversion logic for realizing Java object to Object object based on subclass is set;
It is arranged for traversing the nested traversal file that the Protobuf serialized data generates Java object using nested object.
3. date storage method according to claim 2, which is characterized in that the resolver based on the data utilizes The hive table automation, which parses and reads the Protobuf serialized data, includes:
The pattern match file and object conversion are assembled by rewriteeing the initialization function of hive data warehouse to realize The Object object set of hive table structure described in file generated;
By rewriteeing the data analytical function of hive data warehouse to realize the assembling nested traversal file generated hive result Object set;
Executable file packet is generated, and the java file of Protobuf structure definition file generation is set in the configuration file The executable file packet is loaded into hive environmental variance using described in the specified parsing reading of the hive table by position The entrance of Protobuf serialized data;
It executes hiveSQL query statement and reads the Protobuf serialized data.
4. date storage method according to claim 1 to 3, which is characterized in that described to solve based on the data Parser is automated using the hive table and parses and read the Protobuf serialized data and include:
Judge whether that receiving data parsing within a preset time reads successful information;
If it is not, then carrying out the alarm of data read errors, and feedback automation parses and read the Protobuf serializing number According to the log-file information of process.
5. a kind of data storage device characterized by comprising
Hive table creation module, for creating hive table, the hive table is used to read the Protobuf structure number of bottom storage According to the data of storage format;
Data conversion module, for by the corresponding programming language file of the description file generated of Protobuf serialized data, and It is sent in file packet to be loaded;
Analysis mode configuration module, for configuring the hive table according to the corresponding analysis mode of data parser of default building In for parsing the fields of data;The data parser is used to the hive table and configuration file carrying out table schema and configuration The matching of structure, and generate the Object object set and hive result object collection of the hive table structure;
Datamation parses read module, for resolver based on the data, simultaneously using hive table automation parsing Read the Protobuf serialized data.
6. data storage device according to claim 5, which is characterized in that the data parser includes pattern match Device, object converter and nested traversal device;
The pattern matcher is used to carry out in the hive table and configuration file the matching of table schema and configuration structure;
The object converter is used to act on behalf of the conversion logic for realizing Java object to Object object based on subclass;
The nested traversal device, which is used to traverse the Protobuf serialized data using nested object, generates Java object.
7. data storage device according to claim 7, which is characterized in that the datamation parses read module packet It includes:
Automation object generates submodule, for the initialization function by rewriteeing hive data warehouse to realize the assembling mould Formula matching files and the object transformed document generate the Object object set of the hive table structure;
Hive result object collection generates submodule, for the data analytical function by rewriteeing hive data warehouse to realize assembling The nested traversal file generated hive result object collection;
Analysis mode specifies submodule, for generating executable file packet, and Protobuf is arranged in the configuration file and ties Structure defines the java document location of file generated, and the executable file packet is loaded into hive environmental variance using described The entrance of the Protobuf serialized data is read in the specified parsing of hive table;
Reading submodule reads the Protobuf serialized data for executing hiveSQL query statement.
8. according to data storage device described in claim 5 to 7 any one, which is characterized in that further include alarm module, use If reading successful information in being not received by data parsing within a preset time, the alarm of data read errors is carried out, and Feedback automation parses and reads the log-file information of the Protobuf serialized data process.
9. a kind of data storage device, which is characterized in that including processor, the processor is used to execute to store in memory It is realized when computer program as described in any one of Claims 1-4 the step of date storage method.
10. a kind of computer readable storage medium, which is characterized in that be stored with data on the computer readable storage medium and deposit Program is stored up, the date storage method as described in any one of Claims 1-4 is realized when the data recording program is executed by processor The step of.
CN201910684136.1A 2019-07-26 2019-07-26 Data storage method, device, equipment and computer readable storage medium Active CN110515893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910684136.1A CN110515893B (en) 2019-07-26 2019-07-26 Data storage method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684136.1A CN110515893B (en) 2019-07-26 2019-07-26 Data storage method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110515893A true CN110515893A (en) 2019-11-29
CN110515893B CN110515893B (en) 2022-12-09

Family

ID=68624195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684136.1A Active CN110515893B (en) 2019-07-26 2019-07-26 Data storage method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110515893B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111935065A (en) * 2020-05-30 2020-11-13 中国兵器科学研究院 Data communication method based on multi-window system and related device
CN112637288A (en) * 2020-12-11 2021-04-09 上海哔哩哔哩科技有限公司 Streaming data distribution method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199879A (en) * 2014-08-21 2014-12-10 广州华多网络科技有限公司 Data processing method and device
CN105760534A (en) * 2016-03-10 2016-07-13 上海晶赞科技发展有限公司 User-defined serializable data structure, hadoop cluster, server and application method thereof
CN106570018A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Serialization method and apparatus, deserialization method and apparatus, serialization and deserialization system, and electronic device
CN107992624A (en) * 2017-12-22 2018-05-04 百度在线网络技术(北京)有限公司 Parse method, apparatus, storage medium and the terminal device of serialized data
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104199879A (en) * 2014-08-21 2014-12-10 广州华多网络科技有限公司 Data processing method and device
CN106570018A (en) * 2015-10-10 2017-04-19 阿里巴巴集团控股有限公司 Serialization method and apparatus, deserialization method and apparatus, serialization and deserialization system, and electronic device
CN105760534A (en) * 2016-03-10 2016-07-13 上海晶赞科技发展有限公司 User-defined serializable data structure, hadoop cluster, server and application method thereof
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection
CN107992624A (en) * 2017-12-22 2018-05-04 百度在线网络技术(北京)有限公司 Parse method, apparatus, storage medium and the terminal device of serialized data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
震秦: "hive-protobuf-serde", 《HTTPS://GITEE.COM/ZHZHENQIN/HIVE-PROTOBUF-SERDE》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111935065A (en) * 2020-05-30 2020-11-13 中国兵器科学研究院 Data communication method based on multi-window system and related device
CN112637288A (en) * 2020-12-11 2021-04-09 上海哔哩哔哩科技有限公司 Streaming data distribution method and system

Also Published As

Publication number Publication date
CN110515893B (en) 2022-12-09

Similar Documents

Publication Publication Date Title
US9491117B2 (en) Extensible framework to support different deployment architectures
US11868226B2 (en) Load test framework
Seinturier et al. A component‐based middleware platform for reconfigurable service‐oriented architectures
US8799299B2 (en) Schema contracts for data integration
JP5197688B2 (en) Integrated environment generator
US8832658B2 (en) Verification framework for business objects
JP2009532758A (en) A framework for modeling continuations in a workflow
US8285676B2 (en) Containment agnostic, N-ary roots leveraged model synchronization
Beaton et al. Usability challenges for enterprise service-oriented architecture APIs
CN108363566A (en) File configuration method, intelligent terminal and storage medium in a kind of project development process
CN101937336A (en) Software asset bundling and consumption method and system
CN113946321B (en) Processing method of computing logic, electronic device and readable storage medium
CN110515893A (en) Date storage method, device, equipment and computer readable storage medium
KR102472345B1 (en) Method for managing hierarchical documents and apparatus using the same
CN106293687A (en) The control method of a kind of flow process of packing, and device
Bhattacharjee et al. Cloudcamp: A model-driven generative approach for automating cloud application deployment and management
US9141383B2 (en) Subprocess definition and visualization in BPEL
Margaria et al. Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications: 6th International Symposium, ISoLA 2014, Imperial, Corfu, Greece, October 8-11, 2014, Proceedings, Part II
US7861233B2 (en) Transparent context switching for software code
Mencl et al. Managing Evolution of Component Specifications using a Federation of Repositories
Gargantini et al. Metamodelling a formal method: applying mde to abstract state machines
CN114595246B (en) Statement generation method, device, equipment and storage medium
Andrews Design and Development of a Run-time Object Design and Instantiation Framework for BPM Systems
CN117369773A (en) Method, apparatus, device and medium for managing code resources
CN117421061A (en) Code sharing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant