CN107861723A - Mass data processing method and its system - Google Patents

Mass data processing method and its system Download PDF

Info

Publication number
CN107861723A
CN107861723A CN201711009275.1A CN201711009275A CN107861723A CN 107861723 A CN107861723 A CN 107861723A CN 201711009275 A CN201711009275 A CN 201711009275A CN 107861723 A CN107861723 A CN 107861723A
Authority
CN
China
Prior art keywords
message
protocol buffer
mass data
file
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711009275.1A
Other languages
Chinese (zh)
Inventor
官辉
顾正
范长春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huayi Technology Co Ltd
Original Assignee
Shenzhen Huayi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huayi Technology Co Ltd filed Critical Shenzhen Huayi Technology Co Ltd
Priority to CN201711009275.1A priority Critical patent/CN107861723A/en
Publication of CN107861723A publication Critical patent/CN107861723A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/61Installation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present invention relates to mass data processing method and its system, this method includes configuration Protocol Buffer running environment;Mass data is obtained, builds the model of Protocol Buffer message objects;Model is applied to platform specific, obtains mass data processing result.The present invention is by setting Protocol Buffer, the data for storing needs are formed to handle, be converted to code file corresponding with application platform, it is implanted into the project of platform, carry out serializing processing, form mass data processing result, compilation process realizes the conversion for being rapidly completed data format using cli objects and back-end code maker, serialization process saves the size of message in itself by the way of displacement, shift operation process is simple, realize efficient process mass data, file format is simple, file is small, without using a large amount of code analysis, server end and the code of client are easy to maintain and compatible good.

Description

Mass data processing method and its system
Technical field
The present invention relates to data processing method, more specifically refers to mass data processing method and its system.
Background technology
In inter-network data transmits and applies, it usually needs convert the information into binary file, realize data transfer And application, current popular several data interchange formats include following three kinds:XML, JSON and YAML;Wherein, XML is Data interchange format the most popular, possesses advantage cross-platform, across language, JSON (JavaScript in current programming Object Notation) it is a kind of data interchange format of lightweight, it is a subset based on JavaScript, is easy to People reads and write.Machine parsing and generation are also easy to simultaneously, and compared with XML or HTML fragment, JSON provides more preferable letter Unisexuality and flexibility;In Javascript domains, JSON is home court operation after all, and its advantage will much be superior to xml certainly, It is very suitable for interacting for server and JavaScript, JSON data represent that as XML JSON is also based on plain text Data format.Because JSON is prepared for JavaScript, therefore, JSON data format is very simple, uses JSON transmits simple String, Number, a Boolean, can also transmit an array, or one complicated Object objects, it is as follows the step of JSON data processing principle:Client sends data to server end, server end response User asks returned data, now the data of server end can be encapsulated into JSON and be sent to Web page;Developed in Java The data obtained from the background are generally encapsulated into JSON data by server end with following method;YAML is that one kind intuitively can Enough Data Serialization forms identified by computer, rephrase the statement, and YAML is that a kind of data for being simply similar to very much XML describe language Speech, grammer is simpler than XML a lot, and inside YAML, structure is represented by being retracted, and continuous project is represented by minus sign; The key/value of map struc-tures with colon to being separated;YAML also has the contracting for describing the mutually isostructural data of several rows Grammer is write, array is included with " [] ", and hash is included with " { } ".
But three kinds of above-mentioned data interchange formats are respectively present the problem of following, XML file is huge, and file format is multiple Miscellaneous, transmission accounts for bandwidth, and server end and client are required for spending a large amount of codes to parse XML, cause server end and client End code becomes complex and not easy care, and the mode that XML is parsed between client different browsers is inconsistent, it is necessary to repeat Many codes are write, server end and client parsing XML spend more resource and time;JSON allows not in NameSpace Ibid identical message segment hereinafter is mixed with each other, therefore NameSpace has been can not find in JSON;YAML compatibility It is bad.
Therefore, it is necessary to design a kind of mass data processing method based on Protocol Buffer, efficient process is realized Mass data, and file format is simple, file is small, and without using a large amount of code analysis, the code of server end and client is easy In safeguarding and compatibility is good.
The content of the invention
The defects of it is an object of the invention to overcome prior art, there is provided mass data processing method and its system.
To achieve the above object, the present invention uses following technical scheme:Mass data processing method, methods described include:
Configure Protocol Buffer running environment;
Mass data is obtained, builds the model of Protocol Buffer message objects;
The model is applied to platform specific, obtains mass data processing result.
Its further technical scheme is:The step of configuring Protocol Buffer running environment, including walk in detail below Suddenly:
Download Protocol Buffer;
HOMEBREW is installed;
Protocol Buffer are installed.
Its further technical scheme is:Mass data is obtained, builds the step of the model of Protocol Buffer message objects Suddenly, including in detail below step:
Obtain the data for needing to store in mass data;
The data for needing to store according to Protocol Buffer syntactic description, form the file of special format;
Protocol Buffer compiler is obtained, compiles the file of special format, forms code file, by code text Part forms the model of Protocol Buffer message objects.
Its further technical scheme is:Protocol Buffer compiler is obtained, compiles the file of special format, shape Into code file, the step of model of Protocol Buffer message objects is formed by code file, including walk in detail below Suddenly:
Obtain main functions;
Command Line Interface objects are generated according to main functions;
Give the object registration in the back-end code maker of newspeak to Command Line Interface objects, formed Compiler;
Call compiler to analyze the file of special format, obtain syntax tree;
Syntax tree is traveled through, code corresponding to generation, code file is formed, Protocol Buffer is formed by code file The model of message object.
Its further technical scheme is:The model is applied to the step that mass data processing result is obtained to platform specific Suddenly, including in detail below step:
The code file of generation is directed into project;
In Gradle addition Protocol Buffer dependence version;
Message constructing device is built by the inside Builder classes in Protocol Buffer message class;
The value of message field is set by message constructing device;
Message class object is created by message constructing device;
According to message class object and the value of message field, serialized message or unserializing message are obtained, forms magnanimity Data processed result.
Present invention also offers mass data processing system, including environment configurations unit, model construction unit and application Unit;
The environment configurations unit, for configuring Protocol Buffer running environment;
The model construction unit, for obtaining mass data, build the model of Protocol Buffer message objects;
The applying unit, for the model to be applied to platform specific, obtain mass data processing result.
Its further technical scheme is:The environment configurations unit includes download module, the first installation module and second Module is installed;
The download module, for downloading Protocol Buffer;
The first installation module, for installing HOMEBREW;
The second installation module, for installing Protocol Buffer.
Its further technical scheme is:The model construction unit includes data acquisition module, describing module and compiling Module;
The data acquisition module, for obtaining the data for needing to store in mass data;
The describing module, for the data for needing to store according to Protocol Buffer syntactic description, it is special to be formed The file of form;
The collector, for obtaining Protocol Buffer compiler, the file of special format is compiled, formed Code file, the model of Protocol Buffer message objects is formed by code file.
Its further technical scheme is:The collector includes function acquisition submodule, object generation submodule, compiling Device forms submodule, syntax tree acquisition submodule and code file and forms submodule;
The function acquisition submodule, for obtaining main functions;
The object generates submodule, for generating Command Line Interface objects according to main functions;
The compiler forms submodule, is given for the object registration in the back-end code maker by newspeak Command Line Interface objects, form compiler;
The syntax tree acquisition submodule, for calling compiler to analyze the file of special format, obtain grammer Tree;
The code file forms submodule, for traveling through syntax tree, code corresponding to generation, forms code file, by Code file forms the model of Protocol Buffer message objects.
Its further technical scheme is:The applying unit include import modul, add module, constructor structure module, Setup module, Object Creation module and result form module;
The import modul, for the code file of generation to be directed into project;
The add module, for the dependence version in Gradle addition Protocol Buffer;
The constructor builds module, for the inside Builder class structures in the message class by Protocol Buffer Build message constructing device;
The setup module, for setting the value of message field by message constructing device;
The Object Creation module, for creating message class object by message constructing device;
The result forms module, for the value according to message class object and message field, obtain serialized message or Unserializing message, form mass data processing result.
Compared with the prior art, the invention has the advantages that:The mass data processing method of the present invention, passes through setting Protocol Buffer, form the data stored to needs and handle, be converted to code file corresponding with application platform, Be implanted into the project of platform, carry out serializing processing, form mass data processing result, compilation process using cli objects and Back-end code maker realizes the conversion for being rapidly completed data format, and serialization process saves message in itself by the way of displacement Size, and shift operation process is simple, realizes efficient process mass data, and file format is simple, and file is small, without using The code of a large amount of code analysis, server end and client is easy to maintain and compatible good.
The invention will be further described with specific embodiment below in conjunction with the accompanying drawings.
Brief description of the drawings
Fig. 1 is the flow chart for the mass data processing method that the specific embodiment of the invention provides;
Fig. 2 is the flow chart for the configuration Protocol Buffer running environment that the specific embodiment of the invention provides;
Fig. 3 is the flow chart of the model for the structure Protocol Buffer message objects that the specific embodiment of the invention provides;
Fig. 4 is the flow chart of the model for the formation Protocol Buffer message objects that the specific embodiment of the invention provides;
Fig. 5 is the flow chart for the acquisition mass data processing result that the specific embodiment of the invention provides;
Fig. 6 is the structured flowchart for the mass data processing system that the specific embodiment of the invention provides;
Fig. 7 is the relation schematic diagram for multiple Compiler classes that the specific embodiment of the invention provides;
Fig. 8 is the structured flowchart for the syntax tree that the specific embodiment of the invention provides;
Fig. 9 is the schematic diagram for the serialization process that the specific embodiment of the invention provides;
Figure 10 is the schematic diagram for the message buffer that the specific embodiment of the invention provides.
Embodiment
In order to more fully understand the technology contents of the present invention, technical scheme is entered with reference to specific embodiment One step introduction and explanation, but it is not limited to this.
Specific embodiment as shown in Fig. 1~10, the mass data processing method that the present embodiment provides, can be used in sea During form conversion, storage and the analysis of measuring data, efficient process mass data is realized, and file format is simple, text Part is small, and without using a large amount of code analysis, the code of server end and client is easy to maintain and compatible good.
As shown in figure 1, present embodiments providing mass data processing method, this method includes:
S1, configuration Protocol Buffer running environment;
S2, mass data is obtained, build the model of Protocol Buffer message objects;
S3, the model applied to platform specific, obtain mass data processing result.
The mass data processing method is the method based on Protocol Buffer, Protocol Buffer (abbreviation PB) It is a kind of form of data exchange, it is independently of language, independently of platform.Provide the realization of multilingual:java、c#、c+ +, go and python, each realizes the compiler and library file for all containing corresponding language.Because it is a kind of binary system Form, than using xml carry out the fast many of data exchange.The data communication or different that it can be used between Distributed Application Data exchange under structure environment.The binary data transmission form all very outstanding as a kind of efficiency and compatibility, can be used for The numerous areas such as network transmission, configuration file, data storage.
For above-mentioned S1 steps, the step of configuring Protocol Buffer running environment, including step in detail below:
S11, download Protocol Buffer;
S12, installation HOMEBREW;
S13, installation Protocol Buffer.
For above-mentioned S2 steps, mass data is obtained, builds the step of the model of Protocol Buffer message objects Suddenly, including in detail below step:
S21, obtain the data for needing to store in mass data;
S22, the data for needing to store according to Protocol Buffer syntactic description, form the file of special format;
S23, the compiler for obtaining Protocol Buffer, the file of special format is compiled, code file is formed, by generation Code file forms the model of Protocol Buffer message objects.
For above-mentioned S21 steps to S23 steps, Protobuf provides google, protobuf, compiler Wrap to complete the function of on-the-flier compiler, main class is called importer, is defined in importer.h, importer class objects In include three main objects, respectively handle mistake MultiFileErrorCollector classes, define .proto text The SourceTree classes of part, source directory;Wherein, the file of the special format referred in above-mentioned S22 steps is .proto files.
Further, for above-mentioned S23 steps, Protocol Buffer compiler is obtained, compiles special format File, form code file, the step of model of Protocol Buffer message objects is formed by code file, including with Lower specific steps:
S231, obtain main functions;
S232, according to main functions generate Command Line Interface objects;
S233, give the object registration in the back-end code maker of newspeak to Command Line Interface couple As forming compiler;
S234, call compiler to analyze the file of special format, obtain syntax tree;
S235, traversal syntax tree, code corresponding to generation, form code file, Protocol are formed by code file The model of Buffer message objects.
, it is necessary to construct an importer object during being compiled to the file of special format, constructed fuction needs Want two entrances parameter, one is source Tree objects, and the object specifies the source directory of storage .proto files, second Individual parameter is an error collector object, and the error collector objects have an AddError method, are used for The syntax error run into during processing parsing .proto files.During one .proto file of on-the-flier compiler, only importer need to be called The import methods of object are compiled.
Fig. 8 illustrates the relation of multiple Compiler classes, is respectively intended to represent the information defined in a .proto file, And the field in information, specifically, class FileDescriptor represents the .proto files after a compiling;Class Descriptor corresponds to an information in this document;Class FieldDescriptor describes a specific word in an information Section.
Protocol Buffer compiler protoc supports 3 kinds of programming languages:C++, java and Python.Class Command Line Interface encapsulate the front end of protoc compilers, include the parsing of command line parameter, proto files The function such as compiling.Need to obtain during use is the derived class for realizing class Code Generator, realizes such as code The work of the rear ends such as generation;Acquisition for compiler, S231 steps described above to S233 steps, in main functions, generation Command Line Interface object cli, its Register Generator method is called by the rear end generation of newspeak Code generator yourG object registrations give cli objects.Then call cli Run () method to generate compiler to be compiled; Compiler receives same command line parameter, and the .proto inputted to user is carried out the analysis work such as morphology grammer by cli objects, A syntax tree is ultimately generated, the root node of syntax tree is a File Descriptor object, and is passed as input parameter Enter yourG Generator () method.In this method, syntax tree is traveled through, code corresponding to generation, is realized faster complete Into the conversion of data format.
Further, for above-mentioned S3 steps, the model is applied to platform specific, obtains mass data processing As a result the step of, including step in detail below:
S31, the code file of generation is directed into project;
S32, the dependence version in Gradle addition Protocol Buffer;
S33, pass through the inside Builder classes structure message constructing device in Protocol Buffer message class;
S34, the value by message constructing device setting message field;
S35, pass through message constructing device establishment message class object;
S36, the value according to message class object and message field, serialized message or unserializing message are obtained, formed Mass data processing result.
Above-mentioned S31 steps specifically imported into specific platform project to S36 steps using the code file of generation It is interior, serialized.Because the interface of different platform is different, therefore, it is necessary to Gradle addition Protocol Buffer according to Rely version, so that Protocol Buffer can be combined preferably with the interface of platform, service is provided for platform.For above-mentioned S33 steps to S36 steps elaborate how to be serialized, and the binary system generated after Protocol Buffer serializings disappears Closely, this has benefited from the very cleverly coding method Varint of Protocol Buffer uses to breath, and Varint is a kind of Compact numeral method, it represents a numeral with one or more bytes, it is fewer to be worth smaller use of numerals Byte number, this can be reduced for numeral byte number so that file is smaller, such as the numeral for int32 types, typically Need 4 byte to represent, using Varint, for the numeral of the int32 types of very little, then can be represented with 1 byte, Therefore in most cases, after Varint, digital information can be represented with less byte number.It is each in Varint The highest order bit of byte has special implication, if the position is 1, it is also the digital part to represent follow-up byte, if The position is 0, then terminates.7 bit of others are used for representing numeral.Therefore the numeral less than 128 can use a byte table Show, the numeral more than 128, such as 300, it can be represented with two bytes:1010 1,100 0,000 0010, as shown in figure 9, drilling Show how Google Protocol Buffer parse two bytes, be exchanged with each other the position of two bytes before final calculating Cross once, because Google Protocol Buffer byte is by the way of little-endian.Message is passed through It can turn into a binary data stream after serializing, the data in the stream are a series of Key-Value pairs, as shown in Figure 10, Different fields is split without using separator using this Key-Pair structures.For optional field, if in message In the absence of the field, then just without the field in final message buffer, these characteristics both contribute to save message sheet The size of body, so as to reduce file.
For above-mentioned S36 steps, serialized message is then obtained for sender, for recipient, then obtains antitone sequence Change message.
In addition, Protocol Buffer read knot corresponding to C++ by a binary sequence, according to specified form Package can be carried out in structure type, formation sequence message is sent, and speed is very fast.
Protocol Buffer unpacking process is as follows:By taking the Reader in code inventory 3 as an example, the program is adjusted first The binary data stream read in msg1 Parse From Istream methods, the parsing of this method from file, and will parsing Data out assign the corresponding data member of helloworld classes, and whole resolving needs Protocol Buffer in itself Skeleton code and by Protocol Buffer compilers generate code complete jointly, Protocol Buffer provide base Category information and information list provide as general framework, Coded Input Stream classes, Wire Format Lite classes etc. To the decoding function of binary data, Protocol Buffer decoding can be completed by several simple mathematical operations, Without complicated morphology syntactic analysis, therefore, efficiency high, and it is easy to the code of server end and client easy to maintain.
Above-mentioned mass data processing method, by setting Protocol Buffer, form the data stored to needs and enter Row processing, is converted to code file corresponding with application platform, is implanted into the project of platform, carries out serializing processing, forms sea Data processed result is measured, compilation process is realized using cli objects and back-end code maker is rapidly completed turning for data format Change, serialization process saves the size of message in itself by the way of displacement, and shift operation process is simple, realizes efficient process Mass data, and file format is simple, file is small, and without using a large amount of code analysis, the code of server end and client is easy In safeguarding and compatibility is good.
As shown in fig. 6, the present embodiment additionally provides mass data processing system, it includes environment configurations unit 1, model structure Build unit 2 and applying unit 3.
Environment configurations unit 1, for configuring Protocol Buffer running environment.
Model construction unit 2, for obtaining mass data, build the model of Protocol Buffer message objects.
Applying unit 3, for the model to be applied to platform specific, obtain mass data processing result.
Protocol Buffer (abbreviation PB) are a kind of forms of data exchange, and it is independently of language, independently of platform. Provide the realization of multilingual:Java, c#, c++, go and python, each realizes the compiling for all containing corresponding language Device and library file.Because it is a kind of binary form, than carrying out the fast many of data exchange using xml.It can be used The data exchange under data communication or isomerous environment between Distributed Application.It is all very excellent as a kind of efficiency and compatibility Elegant binary data transmission form, can be used for the numerous areas such as network transmission, configuration file, data storage.
Include download module, the first installation module and the second installation module for above-mentioned environment configurations unit 1.
Download module, for downloading Protocol Buffer.
First installation module, for installing HOMEBREW.
Second installation module, for installing Protocol Buffer.
Further, data acquisition module, describing module and compiling mould are included for above-mentioned model construction unit 2 Block.
Data acquisition module, for obtaining the data for needing to store in mass data.
Describing module, for the data for needing to store according to Protocol Buffer syntactic description, form special format File.
Collector, for obtaining Protocol Buffer compiler, the file of special format is compiled, forms code File, the model of Protocol Buffer message objects is formed by code file.
Protobuf provides google, protobuf, compiler bag to complete the function of on-the-flier compiler, main class It is called importer, is defined in importer.h, three main objects is included in importer class objects, are respectively handled The MultiFileErrorCollector classes of mistake, define .proto files, the SourceTree classes of source directory;Wherein, it is above-mentioned Describing module in the file of special format that refers to be .proto files.
In addition, above-mentioned collector, which includes function acquisition submodule, object generation submodule, compiler, forms submodule Block, syntax tree acquisition submodule and code file form submodule.
Function acquisition submodule, for obtaining main functions.
Object generates submodule, for generating Command Line Interface objects according to main functions.
Compiler forms submodule, for the object registration in the back-end code maker by newspeak to Command Line Interface objects, form compiler.
Syntax tree acquisition submodule, for calling compiler to analyze the file of special format, obtain syntax tree.
Code file forms submodule, for traveling through syntax tree, code corresponding to generation, code file is formed, by code File forms the model of Protocol Buffer message objects.
, it is necessary to construct an importer object during being compiled to the file of special format, constructed fuction needs Want two entrances parameter, one is source Tree objects, and the object specifies the source directory of storage .proto files, second Individual parameter is an error collector object, and the error collector objects have an AddError method, are used for The syntax error run into during processing parsing .proto files.During one .proto file of on-the-flier compiler, only importer need to be called The import methods of object are compiled.
Fig. 8 illustrates the relation of multiple Compiler classes, is respectively intended to represent the information defined in a .proto file, And the field in information, specifically, class FileDescriptor represents the .proto files after a compiling;Class Descriptor corresponds to an information in this document;Class FieldDescriptor describes a specific word in an information Section.
Protocol Buffer compiler protoc supports 3 kinds of programming languages:C++, java and Python.Class Command Line Interface encapsulate the front end of protoc compilers, include the parsing of command line parameter, proto files The function such as compiling.Need to obtain during use is the derived class for realizing class Code Generator, realizes such as code The work of the rear ends such as generation;Acquisition for compiler, function acquisition submodule, object generation submodule and compiling described above Device is formed for submodule, in main functions, generation Command Line Interface object cli, calls it Register Generator methods give the back-end code maker yourG object registrations of newspeak to cli objects.Then call Cli Run () method can generate compiler and be compiled;Compiler receives same command line parameter, and cli objects will be right The .proto of user's input carries out the analysis work such as morphology grammer, ultimately generates a syntax tree, the root node of syntax tree is one Individual File Descriptor objects, and it is passed into as input parameter yourG Generator () method.In this method It is interior, syntax tree is traveled through, code corresponding to generation, realizes the faster conversion for completing data format.
Include import modul, add module, constructor structure module, setup module, object for above-mentioned applying unit 3 Creation module and result form module.
Import modul, for the code file of generation to be directed into project.
Add module, for the dependence version in Gradle addition Protocol Buffer.
Constructor builds module, disappears for the inside Builder classes structure in the message class by Protocol Buffer Cease constructor.
Setup module, for setting the value of message field by message constructing device.
Object Creation module, for creating message class object by message constructing device.
As a result module is formed, for the value according to message class object and message field, obtains serialized message or inverted sequence Rowization message, form mass data processing result.
It is imported into specific platform project, is serialized using the code file of generation.Due to connecing for different platform Mouth is different, therefore, it is necessary to Protocol Buffer dependence version is added in Gradle, so that Protocol Buffer can be more Good is combined with the interface of platform, and service is provided for platform.Elaborate how to carry out for above-mentioned S33 steps to S36 steps Serializing, closely, this has benefited from Protocol to the binary message generated after Protocol Buffer serializings The very cleverly coding method Varint that Buffer is used, Varint is a kind of compact numeral method, and it is with one Individual or multiple bytes represent a numeral, are worth the fewer byte number of smaller use of numerals, and this can be reduced for representing numeral Byte number so that file is smaller, such as the numeral for int32 types, generally requires 4 byte to represent, use It Varint, for the numeral of the int32 types of very little, then can be represented with 1 byte, therefore in most cases, used After Varint, digital information can be represented with less byte number.It is special that the highest order bit of each byte in Varint has Implication, if the position be 1, it is also the digital part to represent follow-up byte, if the position be 0, terminate.It is other 7 bit are used for representing numeral.Therefore the numeral less than 128 can be represented with a byte, the numeral more than 128, such as 300, it can be represented with two bytes:1010 1,100 0,000 0010, as shown in figure 9, demonstrating Google Protocol How Buffer parses two bytes, is exchanged with each other the position of two bytes once before final calculating, because Google Protocol Buffer byte is by the way of little-endian.Message can turn into after serializing One binary data stream, the data in the stream are a series of Key-Value pairs, as shown in Figure 10, using this Key- Pair structures split different fields without using separator.For optional field, if the field is not present in message, Just without the field so in final message buffer, these characteristics both contribute to save the size of message in itself, so as to Reduce file.
Above-mentioned result is formed for module, and serialized message is then obtained for sender, for recipient, is then obtained anti- Serialized message.
In addition, Protocol Buffer read knot corresponding to C++ by a binary sequence, according to specified form Package can be carried out in structure type, formation sequence message is sent, and speed is very fast.
Protocol Buffer unpacking process is as follows:By taking the Reader in code inventory 3 as an example, the program is adjusted first The binary data stream read in msg1 Parse From Istream methods, the parsing of this method from file, and will parsing Data out assign the corresponding data member of helloworld classes, and whole resolving needs Protocol Buffer in itself Skeleton code and by Protocol Buffer compilers generate code complete jointly, Protocol Buffer provide base Category information and information list provide as general framework, Coded Input Stream classes, Wire Format Lite classes etc. To the decoding function of binary data, Protocol Buffer decoding can be completed by several simple mathematical operations, Without complicated morphology syntactic analysis, therefore, efficiency high, and it is easy to the code of server end and client easy to maintain.
Above-mentioned mass data processing system, by setting Protocol Buffer, form the data stored to needs and enter Row processing, is converted to code file corresponding with application platform, is implanted into the project of platform, carries out serializing processing, forms sea Data processed result is measured, compilation process is realized using cli objects and back-end code maker is rapidly completed turning for data format Change, serialization process saves the size of message in itself by the way of displacement, and shift operation process is simple, realizes efficient process Mass data, and file format is simple, file is small, and without using a large amount of code analysis, the code of server end and client is easy In safeguarding and compatibility is good.
The above-mentioned technology contents that the present invention is only further illustrated with embodiment, in order to which reader is easier to understand, but not Represent embodiments of the present invention and be only limitted to this, any technology done according to the present invention extends or recreation, by the present invention's Protection.Protection scope of the present invention is defined by claims.

Claims (10)

1. mass data processing method, it is characterised in that methods described includes:
Configure Protocol Buffer running environment;
Mass data is obtained, builds the model of Protocol Buffer message objects;
The model is applied to platform specific, obtains mass data processing result.
2. mass data processing method according to claim 1, it is characterised in that configuration Protocol Buffer operations The step of environment, including step in detail below:
Download Protocol Buffer;
HOMEBREW is installed;
Protocol Buffer are installed.
3. mass data processing method according to claim 1, it is characterised in that obtain mass data, structure The step of model of Protocol Buffer message objects, including step in detail below:
Obtain the data for needing to store in mass data;
The data for needing to store according to Protocol Buffer syntactic description, form the file of special format;
Protocol Buffer compiler is obtained, compiles the file of special format, code file is formed, by code file shape Into the model of Protocol Buffer message objects.
4. according to the mass data processing method described in any one of claims 1 to 3, it is characterised in that obtain Protocol Buffer compiler, the file of special format is compiled, form code file, Protocol Buffer are formed by code file The step of model of message object, including step in detail below:
Obtain main functions;
Command Line Interface objects are generated according to main functions;
Give the object registration in the back-end code maker of newspeak to Command Line Interface objects, form compiling Device;
Call compiler to analyze the file of special format, obtain syntax tree;
Syntax tree is traveled through, code corresponding to generation, code file is formed, Protocol Buffer message is formed by code file The model of object.
5. mass data processing method according to claim 4, it is characterised in that apply the model to specific flat Platform, the step of obtaining mass data processing result, including step in detail below:
The code file of generation is directed into project;
In Gradle addition Protocol Buffer dependence version;
Message constructing device is built by the inside Builder classes in Protocol Buffer message class;
The value of message field is set by message constructing device;
Message class object is created by message constructing device;
According to message class object and the value of message field, serialized message or unserializing message are obtained, forms mass data Result.
6. mass data processing system, it is characterised in that including environment configurations unit, model construction unit and applying unit;
The environment configurations unit, for configuring Protocol Buffer running environment;
The model construction unit, for obtaining mass data, build the model of Protocol Buffer message objects;
The applying unit, for the model to be applied to platform specific, obtain mass data processing result.
7. mass data processing system according to claim 6, it is characterised in that the environment configurations unit includes downloading Module, the first installation module and the second installation module;
The download module, for downloading Protocol Buffer;
The first installation module, for installing HOMEBREW;
The second installation module, for installing Protocol Buffer.
8. mass data processing system according to claim 6, it is characterised in that the model construction unit includes data Acquisition module, describing module and collector;
The data acquisition module, for obtaining the data for needing to store in mass data;
The describing module, for the data for needing to store according to Protocol Buffer syntactic description, form special format File;
The collector, for obtaining Protocol Buffer compiler, the file of special format is compiled, forms code File, the model of Protocol Buffer message objects is formed by code file.
9. the mass data processing system according to any one of claim 6 to 8, it is characterised in that the collector bag Include function acquisition submodule, object generation submodule, compiler and form submodule, syntax tree acquisition submodule and code file Form submodule;
The function acquisition submodule, for obtaining main functions;
The object generates submodule, for generating Command Line Interface objects according to main functions;
The compiler forms submodule, for the object registration in the back-end code maker by newspeak to Command Line Interface objects, form compiler;
The syntax tree acquisition submodule, for calling compiler to analyze the file of special format, obtain syntax tree;
The code file forms submodule, for traveling through syntax tree, code corresponding to generation, code file is formed, by code File forms the model of Protocol Buffer message objects.
10. mass data processing system according to claim 9, it is characterised in that the applying unit includes importing mould Block, add module, constructor structure module, setup module, Object Creation module and result form module;
The import modul, for the code file of generation to be directed into project;
The add module, for the dependence version in Gradle addition Protocol Buffer;
The constructor builds module, disappears for the inside Builder classes structure in the message class by Protocol Buffer Cease constructor;
The setup module, for setting the value of message field by message constructing device;
The Object Creation module, for creating message class object by message constructing device;
The result forms module, for the value according to message class object and message field, obtains serialized message or inverted sequence Rowization message, form mass data processing result.
CN201711009275.1A 2017-10-25 2017-10-25 Mass data processing method and its system Pending CN107861723A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711009275.1A CN107861723A (en) 2017-10-25 2017-10-25 Mass data processing method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711009275.1A CN107861723A (en) 2017-10-25 2017-10-25 Mass data processing method and its system

Publications (1)

Publication Number Publication Date
CN107861723A true CN107861723A (en) 2018-03-30

Family

ID=61696667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711009275.1A Pending CN107861723A (en) 2017-10-25 2017-10-25 Mass data processing method and its system

Country Status (1)

Country Link
CN (1) CN107861723A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377289A (en) * 2019-07-01 2019-10-25 北京字节跳动网络技术有限公司 A kind of data analysis method, device, medium and electronic equipment
CN110673856A (en) * 2019-09-30 2020-01-10 新华三大数据技术有限公司 Data processing method and device and machine-readable storage medium
CN111949254A (en) * 2020-08-07 2020-11-17 北京字节跳动网络技术有限公司 Method, apparatus, computer device and storage medium for generating unified AST
CN112631598A (en) * 2020-09-09 2021-04-09 南京烽火星空通信发展有限公司 Method for rapidly analyzing Protobuf format data
CN113434147A (en) * 2021-06-25 2021-09-24 北京达佳互联信息技术有限公司 ProtoBuf protocol-based message analysis method and device
CN113726864A (en) * 2021-08-24 2021-11-30 中国信息通信研究院 Data transmission system and method for intelligent sound box and big data platform system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699656A (en) * 2013-12-27 2014-04-02 同济大学 GPU-based mass-multimedia-data-oriented MapReduce platform
US8924435B1 (en) * 2011-03-23 2014-12-30 Google Inc. Transferring values among fields in data structures
CN106528667A (en) * 2016-10-24 2017-03-22 南京中新赛克科技有限责任公司 Low-power-consumption mass data full-text retrieval system frame capable of carrying out read-write separation
CN106557564A (en) * 2016-11-17 2017-04-05 北京锐安科技有限公司 A kind of object data analysis method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924435B1 (en) * 2011-03-23 2014-12-30 Google Inc. Transferring values among fields in data structures
CN103699656A (en) * 2013-12-27 2014-04-02 同济大学 GPU-based mass-multimedia-data-oriented MapReduce platform
CN106528667A (en) * 2016-10-24 2017-03-22 南京中新赛克科技有限责任公司 Low-power-consumption mass data full-text retrieval system frame capable of carrying out read-write separation
CN106557564A (en) * 2016-11-17 2017-04-05 北京锐安科技有限公司 A kind of object data analysis method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CARSON_HO: "序列化:这是一份很有诚意的 Protocol Buffer 语法详解", 《HTTPS://BLOG.CSDN.NET/CARSON_HO/ARTICLE/DETAILS/70267574》 *
刘明: "Google Protocol Buffer 的使用和原理", 《HTTPS://WWW.IBM.COM/DEVELOPERWORKS/CN/LINUX/L-CN-GPB/INDEX.HTML》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377289A (en) * 2019-07-01 2019-10-25 北京字节跳动网络技术有限公司 A kind of data analysis method, device, medium and electronic equipment
CN110673856A (en) * 2019-09-30 2020-01-10 新华三大数据技术有限公司 Data processing method and device and machine-readable storage medium
CN110673856B (en) * 2019-09-30 2023-03-28 新华三大数据技术有限公司 Data processing method and device and machine-readable storage medium
CN111949254A (en) * 2020-08-07 2020-11-17 北京字节跳动网络技术有限公司 Method, apparatus, computer device and storage medium for generating unified AST
CN112631598A (en) * 2020-09-09 2021-04-09 南京烽火星空通信发展有限公司 Method for rapidly analyzing Protobuf format data
CN113434147A (en) * 2021-06-25 2021-09-24 北京达佳互联信息技术有限公司 ProtoBuf protocol-based message analysis method and device
CN113434147B (en) * 2021-06-25 2024-05-14 北京达佳互联信息技术有限公司 Method and device for analyzing message based on ProtoBuf protocol
CN113726864A (en) * 2021-08-24 2021-11-30 中国信息通信研究院 Data transmission system and method for intelligent sound box and big data platform system

Similar Documents

Publication Publication Date Title
CN107861723A (en) Mass data processing method and its system
CN103970737B (en) A kind of data configuration method and apparatus
CN101841515B (en) Target variable protocol data unit codec code automatic generation implementation method
CN101840334B (en) Software component service packaging method
CN104714830B (en) The system and method for cross-platform exploitation is realized based on primary development language
US8850416B1 (en) System and method for creating target byte code
US7158990B1 (en) Methods and apparatus for data conversion
CN101546260B (en) Method and device thereof for reconstructing service-oriented applications
EP2124419B1 (en) An object oriented management device for asn.1 message
CN109308224A (en) The method, apparatus and system of cross-platform data communication, cross-platform data processing
CN103677952B (en) Codec generating means and method
EP0961968A1 (en) Method and system for generating software code
CN108920133A (en) Across Programming with Pascal Language method, apparatus, electronic equipment and storage medium
JP5325920B2 (en) Encoder compiler, program and communication equipment
CN103281311A (en) Internet of Things protocol analysis method adopting Protobuf description
CN104407863A (en) Abstract control model programming device and method
CN113778449A (en) Avionics interface data adaptation conversion system
CN102111160B (en) Coding and decoding system and codec for reactive system test
CN103793458B (en) Method for nondestructively converting AADL (architecture analysis and design language) into XML (extensible markup language)
CN104331288B (en) A kind of configurationization shows the method and system of dynamic page
CN115202663A (en) Data processing method, device and storage medium
CN102707934A (en) Method and system for representing value-added service flow in script form
CN105786529B (en) One type Managed Code calls the Parameters design of the labyrinth of C/C++ style function
CN105793842B (en) Conversion method and device between serialized message
CN104462157A (en) Method and device for secondary structuralizing of text data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330