CN115276889A - Decoding processing method, decoding processing device, computer equipment and storage medium - Google Patents

Decoding processing method, decoding processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN115276889A
CN115276889A CN202110477916.6A CN202110477916A CN115276889A CN 115276889 A CN115276889 A CN 115276889A CN 202110477916 A CN202110477916 A CN 202110477916A CN 115276889 A CN115276889 A CN 115276889A
Authority
CN
China
Prior art keywords
field
data
decoding
data stream
fields
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110477916.6A
Other languages
Chinese (zh)
Inventor
段庆龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110477916.6A priority Critical patent/CN115276889A/en
Publication of CN115276889A publication Critical patent/CN115276889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0006Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission format
    • H04L1/0007Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission format by modifying the frame length
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/24Testing correct operation
    • H04L1/245Testing correct operation by using the properties of transmission codes
    • H04L1/246Testing correct operation by using the properties of transmission codes two-level transmission codes, e.g. binary
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/123Applying verification of the received information received data contents, e.g. message integrity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The application provides a decoding processing method, a decoding processing device, computer equipment and a storage medium. The method can comprise the following steps: acquiring an encoded data stream, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields to determine the data type of each field; determining a decoding method corresponding to each field according to the data type of each field; and decoding each field according to the decoding method corresponding to each field to restore the original data. By the scheme, the efficiency of decoding processing and the decoding success rate can be improved.

Description

Decoding processing method, decoding processing device, computer equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to the field of encoding and decoding technologies, and in particular, to a decoding processing method, a decoding processing apparatus, a computer device, and a computer-readable storage medium.
Background
The data encoding refers to a process of encoding the original data by using an encoding method to obtain an encoded data stream corresponding to the original data. At present, in the decoding process of an encoded data stream, a corresponding protocol file, for example, a protocol file used in an encoding method, is usually searched manually, and then a decoding method corresponding to the encoded data stream is determined according to the protocol file, so as to restore corresponding original data. The decoding processing mode which depends on manual searching of protocol files is too low in processing efficiency.
Disclosure of Invention
The embodiment of the application provides a decoding method, a decoding device, computer equipment and a storage medium, which can improve the efficiency of decoding processing and the success rate of decoding.
In one aspect, an embodiment of the present application provides a decoding processing method, where the method includes:
acquiring an encoded data stream, wherein the encoded data stream is a binary data stream obtained by encoding original data;
splitting the coded data stream to obtain N fields, wherein N is a positive integer;
respectively analyzing the data characteristics contained in each field of the N fields to determine the data type of each field;
determining a decoding method corresponding to each field according to the data type of each field;
and decoding each field according to the decoding method corresponding to each field to restore the original data.
In one aspect, an embodiment of the present application provides a decoding processing apparatus, where the apparatus includes:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an encoded data stream, and the encoded data stream is a binary data stream obtained by encoding original data;
the processing unit is used for splitting the coded data stream to obtain N fields, wherein N is a positive integer;
the processing unit is further configured to analyze data characteristics included in each of the N fields, and determine a data type of each field;
the determining unit is used for determining a decoding method corresponding to each field according to the data type of each field;
the processing unit is further configured to perform decoding processing on each field according to a decoding method corresponding to each field, and restore the original data.
In one aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the decoding processing method described above.
In one aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is read by a processor of a computer device and executed, the computer program causes the computer device to execute the decoding processing method.
In one aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the decoding processing method.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then, decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method and the device can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of the automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of the decoding processing can be effectively improved, in addition, the process of the automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a blockchain network according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a block structure according to an embodiment of the present disclosure;
fig. 3 is a schematic architecture diagram of a decoding processing system according to an embodiment of the present application;
fig. 4 is a flowchart illustrating a decoding processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an encoded data stream provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of a splitting processing method according to an embodiment of the present application;
fig. 7 is a flowchart illustrating a field decoding method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a decoding processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.
The following describes technical terms related to the embodiments of the present application:
1. the key technical terms are as follows:
protobuf: protocol buffers is a data encoding and decoding standard for short;
PB: protobuf is abbreviated;
T-V: tag-Value, bond-Value;
T-L-V is Tag-Length-Value, bond-Length-Value;
protoc is a data protocol file defined by PB coding standard specification;
forward decoding: the method is a process of searching a protocol file of the PB, and calling a decoding algorithm recorded in the protocol file to perform decoding processing to obtain original data;
and (3) reverse decoding: the method is a process of determining a decoding method by analyzing an encoded data stream and restoring original data by using the decoding method obtained by analysis;
splitting: splitting the encoded data stream into a plurality of fields;
and (3) field decoding: calling a decoding method for the field data to obtain original data;
serialization: the method refers to a process of converting structured data into a specified format according to a certain coding specification;
deserialization: the process of resolving the data converted into the specified format into the original structured data is referred to;
message (message): a composite data format defined in the PB;
wire _ type: PB defines a wrapper type for indicating the data type of the field, and the specific values may include 0,1,2,5;
field _ number: the field number is used for indicating the sequence number of the field in the PB protocol and has uniqueness;
tag: the calculation mode is field _ number < <3 no magnetism wire _type;
varint codec: an encoding and decoding algorithm can improve the compression efficiency of data.
2. Block chains:
the Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
Because a large amount of data computing and data storage services are involved in the block chain, and a large amount of computer operation cost is required for the data computing and data storage services, the original data and the encoded data stream related by the application can be realized by a cloud storage technology in the cloud technology. That is, the block chain is stored on the "cloud" through the cloud storage technology, when the original data and the encoded data stream need to be stored in the block chain, the data can be uploaded to the block chain on the "cloud" through the cloud storage technology, and when the data needs to be read, the data can also be read from the block chain on the "cloud" at any time, so that the storage requirement on the terminal device can be reduced, and the application range of the block chain is expanded.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a block chain network according to an embodiment of the present disclosure. The block chain network 100 (or referred To as a distributed system 100) is formed by a plurality of nodes (any form of computing devices in an access network, such as servers and user terminals) and clients, and a Peer-To-Peer (P2P) network is formed between the nodes, where the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP) Protocol. In a distributed system, any machine, such as a server or a terminal, can join to become a node, which includes a hardware layer, an intermediate layer, an operating system layer, and an application layer.
Referring to the functions of each node in the blockchain network shown in fig. 1, the functions involved may include:
(1) Routing, a basic function that a node has, is used to support communication between nodes.
(2) The application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization function to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain network, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
(1) the wallet is used for providing functions of carrying out transactions of electronic money, and comprises the steps of initiating the transactions (namely sending transaction records of the current transactions to other nodes in the block chain network), and storing the record data of the transactions into a temporary block of the block chain as a response for confirming that the transactions are valid after the other nodes successfully verify; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;
(2) the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain network, the record data are stored in the temporary block as a response for acknowledging that the account data are valid after the other nodes verify the validity, and confirmation can be sent to the node initiating the operations.
(3) Intelligent contracts, computerized agreements that can execute the terms of a contract, implemented by code deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement code; that is, an intelligent contract is a piece of executable program running on a blockchain that triggers its automatic execution when a transaction is made on the blockchain. For example, inquiring the logistics state of the goods purchased by the buyer, and transferring the electronic money of the buyer to the address of the merchant after the buyer signs the goods; of course, intelligent contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
(4) The Block chain (Blockchain) comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, once a new Block is added into the Block chain, the new Block cannot be removed any more, and the blocks record the record data submitted by the nodes in the Block chain network.
The blockchain is essentially a decentralized database, and as an underlying technology of bitcoin, is a string of data blocks generated by using cryptography correlation, and each data block is linked by a random hash algorithm (also called hash algorithm), and the next block contains the hash value of the previous block. The block chain technology is widely applied to scenes such as digital assets, intelligent contracts, logistics tracking and product protection.
Referring to fig. 2, fig. 2 is a schematic diagram of a block structure according to an embodiment of the present disclosure. As shown in fig. 2, each Block Structure includes a hash value of the transaction record stored in the Block (hash value of the Block) and a hash value of the previous Block, and the blocks are connected by the hash values to form a Block chain. The block may also include information such as a time stamp at the time of block generation. The blockchain is essentially a decentralized database, which is a string of data blocks associated by cryptographic methods, each data block containing relevant information for verifying the validity of the information (anti-counterfeiting) and generating the next block.
The decoding processing method can be combined with a block chain technology, for example, original data, an encoded data stream and the like can be uploaded to a block chain for storage, and data on the block chain can be guaranteed not to be easily tampered. Or, the reverse decoding processing flow of the encoded data stream can be executed on the block chain, so that the fairness of the decoding processing flow can be guaranteed, meanwhile, the decoding processing flow can have traceability, and the safety of the decoding processing flow is improved.
In the embodiment of the application, after any original data is coded by adopting a coding algorithm to obtain a binary coded data stream, for example, a PB algorithm is used to code the original data to obtain the coded data stream; the decoding processing scheme provided by the application can be adopted, so that the characteristics of the coded data stream are automatically subjected to reverse analysis on the basis of not needing manual searching and relying on protocol files, the original data are restored by decoding processing, and the decoding efficiency and the decoding success rate are effectively improved.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating an architecture of a decoding processing system according to an embodiment of the present disclosure. The architecture diagram of the decoding processing system comprises: server 340 and a computer device cluster, wherein the computer device cluster may include: computer device 310, computer device 320, computer device 330, and the like. The cluster of computer devices and the server 340 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The server 340 shown in fig. 1 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.
The computer device 310, the computer device 320, the computer device 330, and the like shown in fig. 1 may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a vehicle, an in-vehicle device, a roadside device, an aircraft, a wearable device, such as a smart watch, a smart bracelet, a pedometer, and the like, and may be an intelligent device having a decoding processing function.
In a possible implementation manner, taking the computer device 310 as an example, the computer device 310 obtains an encoded data stream, where the encoded data stream may be a binary data stream obtained by encoding the original data by the computer device 310. The computer device 310 sends the encoded data stream to the server 340, and after the server 340 receives the encoded data stream, the encoded data stream can be split to obtain N fields, where N is a positive integer; then, the server 340 analyzes the data characteristics contained in each of the N fields, and determines the data type of each field; moreover, the server 340 determines a decoding method corresponding to each field according to the data type of each field; finally, the server 340 decodes each field according to the decoding method corresponding to each field, and restores the original data. The server 340 may then send the restored raw data to the computer device 310.
Certainly, splitting the encoded data stream to obtain N fields, and analyzing data characteristics contained in each of the N fields to determine the data type of each field; then, determining a decoding method corresponding to each field according to the data type of each field; and decoding each field according to the decoding method corresponding to each field. Not necessarily by server 340, but by computer device 310 or any other computer device in a cluster of computer devices. The encoding of the raw data to obtain the encoded data stream is not necessarily performed by the computer device 310, and may be performed by the server 340.
In a possible implementation manner, the decoding processing system provided in the embodiment of the present application may be deployed at a node of a blockchain, for example, the server 340 and each computer device included in a cluster of computer devices may be regarded as a node device of the blockchain to jointly form a blockchain network. Therefore, in the present application, the decoding process for the encoded data stream may be performed on the blockchain, and in addition, the encoding process for the original data may also be performed on the blockchain. Therefore, fairness and fairness of the data decoding processing flow can be guaranteed, meanwhile, the decoding processing flow can have traceability, and safety of the decoding processing flow is improved.
It is to be understood that the system architecture diagram described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.
The following describes a decoding processing method according to an embodiment of the present application in detail with reference to the accompanying drawings. Referring to fig. 4, fig. 4 is a decoding processing method provided in an embodiment of the present application, where the decoding processing method may be applied to a computer device, the computer device may be, for example, an in-vehicle device, a smart phone, a tablet computer, a smart wearable device, and so on, the computer device may be cooperatively completed by the computer device and a server, and the computer device may also be the server. As shown in fig. 4, the decoding processing method may include steps S410 to S450. Wherein:
s410: and acquiring an encoded data stream, wherein the encoded data stream is a binary data stream obtained by encoding the original data.
The process of obtaining an encoded data stream may comprise: reading the coded data stream and detecting whether the reading of the coded data stream is finished; the detection mode herein may include: running a data end instruction to detect whether the reading of the coded data stream is ended; the end of data instruction may include, but is not limited to, an isEnd command; and if the reading end of the coded data stream is detected, continuing to read the coded data stream until the reading end.
In a possible implementation manner, after the encoded data stream is obtained, it is further required to determine whether the encoded data stream is valid, where the determination manner may be to determine whether data included in the encoded data stream is empty, and if not, determine that the encoded data stream is valid; if the encoded data stream is empty, the acquired encoded data stream is invalid data, and the encoded data stream needs to be acquired again. The encoded data stream is a binary data stream obtained by encoding the original data by using an encoding algorithm.
In the present application, a coding algorithm may be selected and used according to actual needs, and the following embodiments of the present application are described by taking the coding algorithm as a Protocol Buffers (PB for short) Protocol algorithm as an example; the Protobuf protocol is a light and efficient structured data storage format and can be used for structured data serialization. The PB protocol code may encode the original data into a data encoding format of a T-V (Tag-Value) structure or a data encoding format of a T-L-V (Tag-Length-Value) structure. The PB protocol algorithm is adopted to encode the original data, a string of binary data streams consisting of 0 and 1 can be obtained, and the binary data streams are encoded data streams.
S420: and splitting the coded data stream to obtain N fields, wherein N is a positive integer.
In the present application, each of the obtained N fields is a field of a minimum unit, which means that each of the obtained fields is a field that cannot be further split. That is, splitting the encoded data stream according to the present application means that the splitting process of the encoded data stream is completed until each field cannot be further split.
In a possible implementation manner, N identifiers T are included in the coded data streamiI is a positive integer and i is less than or equal to N. Wherein, the mark TiCan be used to uniquely identify the ith field, e.g. in the PB protocol encoding, identify TiI.e. Tag of the i-th field. The process of splitting the encoded data stream in this application may include the following steps: first, the identification T is read from the encoded data streamiAnd for the read identification TiPerforming variable length decoding; variable length decoding is the inverse operation method of variable length coding; by variable length coding is meant a method of serializing an integer using one or more bytes. By variable length decoding is meant a method of deserializing an integer into one or more bytes. In the embodiment of the present application, variable length coding is performed as Varint coding, and variable length decoding is performed as Varint decoding. Then, the mark T after variable length decoding is adopted by a mark separation formulaiSeparate out the identifier TiThe contained data type sub-identification and the field number sub-identification; the tag separation formula may refer to each tag TiFor example, in PB protocol coding, the calculation formula may be: field _ number<<3 a wire (u) type. Then, according to the mark TiDetermining a matched splitting mode by the contained data type sub-identifier value, and splitting the coded data stream according to the matched splitting mode to obtain one or more fields; finally, let i = i +1, and iteratively perform the above steps until N fields are obtained.
The data type sub-identifier may be a unique identifier of the data type of each field, and is used to indicate the data type of the field, for example, the data type sub-identifier may be indicated by wire _ type. The field number sub-identity may be a unique identity of each field in the encoded data stream for indicating the number of the field in the encoded data stream, e.g. the data type sub-identity may be indicated by field number.
S430: and analyzing the data characteristics contained in each field of the N fields respectively to determine the data type of each field.
In the present application, any of the N fields is denoted as the i-th field, and the i-th field contains an identifier TiAnd field data; wherein, the identification T of the i fieldiAs a unique identification of the ith field; the field data of the i-th field is the data of the i-th field itself, and the field data may include, but is not limited to, one or more kinds of data in an integer or a character string. The data type of the ith field can be divided into: a first encoding format type and a second encoding format type, for example: the coded data stream based on the PB protocol, the first coding format type may be a key-Value (Tag-Value) coding format type, and if the data type of the i-th field is the first coding format type, the identifier T of the i-th fieldiIs stored as a key and field data of the i-th field is stored as Value. The second encoding format type may be a key-Length-Value (Tag-Length-Value), and if the data type of the i-th field is the second encoding format type, the identifier T of the i-th fieldiIt is stored as a key, field data of the i-th field is stored as Value, and the Length of the data field in the Value is stored as Length. Further, the second encoding format type may be subdivided further, for example into a string type, a byte type, etc.
S440: and determining a decoding method corresponding to each field according to the data type of each field.
The decoding methods corresponding to different data types are different, and the decoding method corresponding to each field can be determined according to the data type of each field after the data characteristics of each field in the N fields are analyzed and the data type of each field is determined. In the N fields, the decoding methods may be the same or different between different fields.
In one possible implementation manner, as can be seen from the above description, the data types of the fields may include: the encoding apparatus may include a first encoding format type and a second encoding format type, the first encoding format type may be a key-Value (Tag-Value) encoding format type, and the second encoding format type may be a key-Length-Value (Tag-Length-Value). The decoding method of each field is determined according to the data type of each field. Therefore, the flow of the decoding method for determining an arbitrary field (taking the ith field as an example) may include:
and if the data type of the ith field is the first encoding format type, determining that the decoding method corresponding to the ith field is a variable length decoding method.
If the data type of the ith field is the second encoding format type and the data type of the ith field is the character string type; determining the decoding method corresponding to the ith field as a character string decoding method.
If the data type of the ith field is the second encoding format type and the data type of the ith field is the byte type; the decoding method corresponding to the i-th field is determined as not needing decoding.
S450: and decoding each field according to the decoding method corresponding to each field to restore the original data.
In this application, each field is decoded according to the decoding method corresponding to each field, and the decoding process may be to decode each field sequentially according to the decoding method of each field, or to decode N fields in parallel according to the decoding method of each field in N fields, which is not limited in this application.
In a possible implementation manner, each field is decoded according to a decoding method corresponding to each field, so as to obtain a decoding result of each field; for example, in the N fields, if the decoding method of the i-th field is the variable length decoding method, the i-th field is subjected to variable length decoding to obtain a variable length decoding result; if the decoding method of the i-th field is a Character string decoding method, performing Character string decoding on the i-th field, for example, decoding by using a UTF-8 (Universal Character Filter Set/Unicode Transformation Format, 8-bit) decoding algorithm to obtain a Character string decoding result; if the decoding method of the ith field is decoding-free, the field data in the ith field is directly output as the decoding result. Then, splicing the decoding results of each field to obtain original data corresponding to the coded data stream; here, since N fields have a certain order in the encoded data stream, the fields are spliced according to the order in the encoded data stream during splicing, for example, the decoding result of the 2 nd field is spliced after the decoding result of the 1 st field, the decoding result of the 3 rd field is spliced after the decoding result of the 2 nd field, and so on.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then, decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of the automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of the decoding processing can be effectively improved, in addition, the process of the automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
The following description will be given by taking an example in which an encoding algorithm is a Protocol Buffers (PB) Protocol algorithm, that is, an encoded data stream is a binary encoded data stream obtained by encoding original data by using a PB Protocol algorithm.
In a possible implementation manner, please refer to fig. 5, and fig. 5 is a schematic structural diagram of an encoded data stream provided in an embodiment of the present application. As shown in fig. 5, an encoded data stream obtained by encoding a message (message) through the PB protocol algorithm includes a plurality of fields (fields), and the encoding format of the fields may be a Tag-Length-Value format or a Tag-Value format.
Each field contains a Tag, which is a unique identification of the field and which may also be referred to as a key. Tag can be composed of two parts of field _ number and wire _ type, wherein, the field _ number is used as the sub-identifier of field number and is used for representing the sequence number of the field in the coded data stream; the wire _ type is used as a data type sub-identifier for indicating the data type of field data in the field. Tag may be calculated as field _ number < <3 < toracute brightness _ type, and the coding scheme of Tag may be Varints coding (variable length coding). The relationship between the meaning of the field wire _ type and the coding format corresponding to the field can be as shown in the following table 1:
table 1: mapping relation between data type of field and coding format of field
Figure BDA0003046112260000111
Figure BDA0003046112260000121
As can be seen from table 1 above, when the wire _ type =0, 1 or 5, the encoding format of a field is the Tag-Value format; when wire _ type =2, the encoding format is a Tag-Length-Value format.
Next, the rules of Varints coding (variable length coding) and the corresponding examples are introduced:
1) The most significant bit (msb) of each Byte is a flag bit. If the value of the flag bit is 1, indicating that other Byte is behind the Byte; if the flag bit has a value of 0, it indicates that the Byte is the last Byte. Where 1Bytes =8 bits.
2) The lower 7 bits of each Byte are the bits used to store the value.
3) The Varints method uses little endian (the following bytes are preceded when the value is solved backwards).
For example, taking the number 1 as an example: the binary is 00000001, the highest bit is 0 to indicate that no more bytes follow, the remaining 7 bits are the numerical bits, and "000 0001" is obviously 1. As another example, taking the numeral 300 as an example: its representation under the Varints coding rule is 1010 1100 0000 0010. First, the first byte is 1010 1100, the most significant bit is 1, indicating that there are more bytes to follow; the first byte content is the last 7 bits, i.e. 010 1100. Then, the second byte is 00000010, the most significant bit is 0, indicating that there are no more bytes behind; the second byte content is the last 7 bits, 000 0010. Finally, since the variable length coding is "low-endian", the actual byte is 0000010 010 1100=1 0010 1100=300.
Based on the above description, after the encoded data stream obtained by encoding based on the PB protocol is obtained, the encoded data stream may be split, where the encoded data stream may include N identifiers TiI is a positive integer and i is less than or equal to N; each identifier TiIs a Tag. The process of splitting the encoded data stream in this application may include the following steps:
(1) reading identification T from coded data streamiAnd for the read identification TiPerforming variable length decoding; because each Tag in the coded data stream is obtained by variable length coding (Varints coding); thus, the read identifier T can be first identifiediAnd performing variable length decoding.
(2) Identification T decoded from variable length by identification separation formulaiSeparate out the labelHu TiThe contained data type sub-identification and field number sub-identification; the tag separation formula may refer to a tag TiFor example, in PB protocol coding, the calculation formula may be: field _ number<<3 a wire (u) type. The data type sub-identifier may indicate a data type of the field, and in PB coding, the data type sub-identifier is indicated by wire _ type. The field number sub-id may be used to indicate the sequence number of the field in the encoded data stream, and in PB encoding, the field number sub-id is indicated by field _ number.
(3) According to the identification TiAnd determining a matched splitting mode by the contained data type sub-identifier value, and splitting the coded data stream according to the matched splitting mode to obtain one or more fields.
(4) Let i = i +1 and iteratively perform the above steps until N fields are obtained.
For example, if a series of serialized binary data (encoded data stream) is received, a tag T can be read firsti(i.e. a Tag) and then on the read identification TiPerforming variable length decoding using the flag TiThe calculation formula of (2) separates wire _ type (specifically, the identifier T)iThe last 3 bits) of the fields, referring to the above table 1 according to the value of the wire _ type, the encoding format of the fields can be determined; while field _ number is shared, the sequence number of this field in the encoded data stream can be determined based on the value of this field _ number. The following Value is read correctly according to the wire _ type. And then continuing to read the next field, and sequentially and iteratively executing the steps until determining that the N fields which can not be further subjected to the splitting processing can not be carried out.
For example, for a message (encoded data stream obtained by protobuf encoding algorithm), for example: 00001000 00001010 00010101 00000000 00000000 00000000 01000000. If the num1 field in the message is assigned 10, the num2 field is assigned 1073741824. Based on the above description of the coding rule of protobuf, the obtained coded data stream may be analyzed as follows:
it is known from the above that the Tag is obtained by variable length coding, therefore, the Tag with the first byte (00001000) as the first field is read at first according to the first msb bit, the last 3 bits (000) indicate that the splitting mode corresponding to the wire _ type =0 and the wire \/type =0 of the field can be the variable length splitting mode, and then the coded data stream is split according to the variable length splitting mode to obtain the first field.
The parsing of the encoded data stream in the variable length splitting manner may include, for example: reading the next byte (00001010) from the coded data stream at the identifier (00001000), the msb bit of the byte being 0, thus determining that the read byte has an end flag bit, and thus determining the read byte as the field data of the first field is: 00001010. and the identification of the first field and the field data are used as a complete field, so that the work of disassembling the field is completed.
Next, reading the next byte (00010101) in the encoded data stream as the Tag of the second field, where the last 3 bits (101) indicate that the wire _ type =5 of the field, and the splitting mode corresponding to the wire _type =5 may be the second length splitting mode, and then splitting the encoded data stream according to the second length splitting mode to obtain the second field.
The parsing the encoded data stream according to the second length splitting manner may include, for example: four bytes (00000000 00000000 000000000000 000001000000) are sequentially read from the encoded data stream after the flag (00010101). The read 4 bytes of data are then determined as field data of the second field. Therefore, the identification of the second field and the field data are used as a complete field, so that the disassembling work of one field is completed. And then, repeatedly executing the steps on the third field, and disassembling the coded data stream to obtain the third field, and so on. Finally, the encoded data stream is disassembled to obtain N fields.
After the field splitting is completed, the data characteristics contained in each field of the N fields are analyzed respectively, and the data type of each field is determined. Wherein, any field in the N fields is represented as the ith field which can contain an identification TiAnd field data. The data type of the ith field can be divided into: a first coding format type and a second coding format type, e.g.: the coded data stream based on the PB protocol, the first coding format type may be a key-Value (Tag-Value) coding format type, and if the data type of the i-th field is the first coding format type, the identifier T of the i-th fieldiIs stored as a key and field data of the i-th field is stored as Value. The second encoding format type may be a key-Length-Value (Tag-Length-Value), and if the data type of the i-th field is the second encoding format type, the identifier T of the i-th fieldiIt is stored as a key, field data of the i-th field is stored as Value, and the Length of the data field in the Value is stored as Length. Further, the second encoding format type may be subdivided further, for example into character string types, byte types, etc.
In one possible implementation, if T is identifiediAnd if the included data type sub-identifier value is any unit value in the first value range, determining that the data type of the i-th field is the first coding format type. Wherein, if the data type of the ith field is the first encoding format type, the identifier T in the ith fieldiAnd field data is stored in a key-value manner. In the embodiment of the present application, as can be seen from the foregoing, the value of the data type sub-identifier (wire _ type) may be any one of 0,1,2, and 5. The first range of values may include 0,1, 5, where 0,1, 5 are each a unit value within the first unit of value. For example, 0 may be a first unit value in a first range of values, 1 may be a second unit value in the first range of values, and 5 may be a third unit value in the first range of values.
In one possible implementation, if T is identifiediIf the value of the contained data type sub-identifier is a second numerical value, determining that the data type of the ith field is a second coding format type; wherein, if the data type of the ith field is the second encoding format type, the identifier T in the ith fieldiAnd field data are stored in a key-length-value manner; wherein the length is used to indicate the length of field data in the i-th field. In the embodiment of the present application, as can be seen from the foregoing, the value of the data type sub-identifier (wire _ type) may be any one of 0,1,2 and 5The second value may be 2.
Further, at the mark TiThe value of the included data type sub-identifier is a second value. Continuing to analyze the data characteristics of the field data in the ith field: and if the data characteristics of the field data in the ith field have character string characteristics, determining that the data type of the ith field is the character string type. Wherein the string features may include UTF-8 encoding features. The UTF-8 encoding is a variable length character encoding for Unicode (Unicode). It can be used to represent any character in the Unicode Standard, and the first byte in the Code is still compatible with the ASCII (American Standard Code for Information Interchange) Code, so that the original software for processing ASCII characters can be used without modification or with only a small amount of modification.
If the data feature of the field data in the i-th field does not have the character string feature, the data feature is identified by the identifier TiAnalyzing whether the ith field and other fields have adjacent relation or not by using the value of the included field number sub-identifier; and if the adjacent relation is not available, determining that the data type of the ith field is a byte type.
In summary, each field is decoded according to the decoding method corresponding to each field, and the original data is restored. For example, if the decoding method corresponding to the field 1 is variable length decoding, the field 1 is decoded by the variable length decoding method to obtain original unit data 1; if the decoding method corresponding to the field 2 is character string decoding, the field 2 is decoded by the character string decoding method to obtain original unit data 2; the field 3 is decoded in such a way that decoding is not required, and the field data of the field 3 is directly output as the original unit data 3. And by parity of reasoning, obtaining N original unit data corresponding to the N fields respectively, and splicing according to the sequence of each field in the encoded data stream to obtain the original data corresponding to the encoded data stream.
Next, the decoding processing method of the present application will be further explained with a specific example.
For example, for a message, if the first field in the message is assigned a value of 10 and the second field in the message is assigned a value of 64. The binary data stream (i.e., encoded data stream) serialized by employing the PB protocol algorithm can be expressed as: 00001000 00001010 00010101 01000000 00000000 00000000 00000000. Based on the method, the obtained encoded data stream can be analyzed, so as to restore the original data, and the process is as follows:
first, the Tag of the first field is read: reading a first byte (00001000) from the coded data stream according to the first msb bits as the Tag of a first field, and separating a data type sub-identifier wire _ type and a field number sub-identifier field _ number by using an identifier separation method, wherein the last 3 bits (000) in the first byte represent the wire _ type of the first field, and the remaining 5 bits (00001) in the first byte represent the field _ number. As can be seen, the wire _ type =0 and the field _number =1 (the number indicating the first field is 1) in the first field indicate that the splitting method of the first field is the variable length splitting method, and thus the field data read from the encoded data stream in the first field by the variable length splitting method is 00001010.
As it can be seen from wire _ type =0 that the data type of the field data of the first field is a key-Value (Tag-Value) encoding format type, the decoding method corresponding to the first field is a variable length decoding method. Therefore, for 00001010 to perform variable length decoding (msb bits need to be removed when decoding), since there is only one byte, the msb bits are removed and the reverse order is also 0001010, binary 0001010 represents the number: 10.
by analyzing the coded data stream by the method, the original data 10 of the first field in the message can be restored, and it can be seen that the data obtained after decoding process is exactly matched with the original data.
Then, reading the Tag of the second field, and referring to the decoding process flow of the first field, so as to restore the original data of the second field in the message. And finally, splicing the original data of the first field and the original data of the second field to obtain the original data of the message.
In a possible implementation manner, after each field is decoded according to the decoding method corresponding to each field, a decoding log may be generated and output according to the decoding result of each field. Wherein decoding the log may include: the method comprises the steps of encoding a data stream, a decoding method corresponding to each field and a decoding processing flow of each field. By the scheme, the data content of the coded data stream can be automatically analyzed and the corresponding decoding log is output through the packet returning of the monitoring protocol, so that the method can be used for analyzing and positioning problems in the decoding process, and the condition that the problems can not be positioned due to key log loss caused by human factors can be avoided.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then, decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method and the device can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of decoding processing can be effectively improved, in addition, the process of automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
Based on the above analysis, the following describes the splitting process flow of the encoded data stream in detail. Referring to fig. 6, fig. 6 is a flowchart illustrating a splitting processing method according to an embodiment of the present application, where the splitting processing method may be applied to a computer device, and the embodiment of fig. 6 may be a specific embodiment of step S420 in the embodiment of fig. 4. As shown in fig. 6, the splitting processing method may include steps S610 to S670. Wherein:
s610: it is determined whether the encoded data stream is valid.
In a possible implementation manner, after obtaining the encoded data stream, the computer device may first determine validity of the encoded data stream, and determine whether the encoded data stream is valid may include: and judging whether the coded data stream is empty or not, whether the coded data stream has a length or not, and the like. By the method, the effectiveness of the acquired coded data stream can be determined, so that guarantee is provided for the following data analysis.
If the encoded data stream is determined to be valid, the process continues to step S620. If the encoded data stream is determined to be invalid, the encoded data stream may be reacquired until the acquired encoded data stream is valid.
S620: an identification is read.
In a possible implementation manner, after the encoded data stream is obtained, based on the foregoing analysis of the encoded data stream, each field in the encoded data stream is stored in a data format of Tag-Value or Tag-Length-Value, tag of each field is a unique identifier of the field, and N identifiers T may be included in the encoded data streamiI is a positive integer and i is less than or equal to N. And, any one of the N fields is represented as an ith field, and the ith field contains an identification TiAnd field data. For example, one identifier read by the computer device from the encoded data stream may be the identifier T of the i-th fieldiI.e. the key of the i-th field.
S630: and extracting the data type sub-identifier.
In one possible implementation, due to the Tag (identity T) of each fieldi) Are obtained by variable length coding, so that for the extracted identifier TiFirstly, the identification T can be decoded in a variable length modeiAnd performing decoding processing. For example, the identity T of a certain field obtainediComprises the following steps: 0001 0010, then according to the mark TiField _ number of the calculation formula<<3 ventilation wire \ type. Identification T which can be decoded from variable lengthiSeparate out the mark TiData type sub-identification (e.g. wire _ type) and field coding containedNumber identification (e.g., field _ number). Namely, the last three bits in 0001 0010 are data type sub-identifiers, and the first five bits in 0001 0010 are field number sub-identifiers.
S640: the next byte is read from the encoded data stream.
It can be known from the foregoing analysis that each data type sub identifier corresponds to a splitting manner, that is, a corresponding splitting manner can be determined according to a value of the data type sub identifier. Therefore, according to the value of the data type sub-identifier extracted in S630, the corresponding splitting manner of the field can be determined, and according to the splitting manner, how to read the field data of the field in the encoded data stream can be determined.
In one possible implementation, if T is identifiediAnd if the value of the contained data type sub-identifier is a first unit numerical value in a first numerical value range, determining that the splitting mode matched with the first unit numerical value is a variable length splitting mode. Wherein the first range of values may include 0,1, 5, wherein 0,1, 5 are each a unit value within the first unit of values. For example, 0 may be a first unit value within a first range of values. That is, if the extracted data type sub identifier of the field has a value of 0, it can be determined that the splitting manner matching 0 is a variable length splitting manner. Furthermore, the coded data stream is disassembled according to a variable length splitting mode to obtain a field.
Further, the process of parsing the encoded data stream according to the variable length splitting manner to obtain a field may be: from the coded data stream at the indication TiThe next position starts the sequential read at the next byte. Further, after the next byte is read, it can then be determined whether the next byte has an end flag bit. If the read next byte does not have the ending flag bit, continuing to read the next byte in sequence until the byte with the ending flag bit is read; and if the next byte read has the end flag bit, taking the read byte as the field data of the field.
Each byte (Bytes) is formed by 8 bits (bit), the criterion for judging whether the byte has an end flag bit is to judge according to the most significant bit (msb bit) of the byte, and the encoded data stream is a binary data stream, so the most significant bit of the byte includes both 0 and 1. If the most significant bit of the byte is 0, the byte is indicated to have an end flag bit; if the most significant bit of the byte is 1, it means that the byte does not have an end flag bit.
Therefore, if the next byte read does not have the end flag bit, the next byte is continuously and sequentially read until the byte with the end flag bit is read; then, all the read bytes are combined into field data of the i-th field according to the reading order. For example, in encoding a data stream: 00010000 1000 0000 0001.. The first byte 00010000 is the identifier of a field whose data type sub-identifier value (last three bits 000) is 0 according to the above analysis of the field identifier, so that the next byte located in the identifier is read from the encoded data stream: 1000 1000, since the most significant bit of the byte is 1, the next byte of the byte continues to be read sequentially from the encoded data stream: 00000001, since the most significant bit of the byte is 0, the byte has an end flag bit, the reading is ended, and the two read bytes are combined into the field data of the field: 1000 1000 0000 0001. The field data of this field is also referred to as a Value (Value) of this field. Thus, in the above manner, a complete field can be read from the encoded data stream.
S650: 64 bits of data are read from the encoded data stream.
In one possible implementation, if T is identifiediAnd if the value of the contained data type sub-identifier is a second unit numerical value in the first numerical value range, determining that the splitting mode matched with the second unit numerical value is a first length splitting mode. Wherein the first range of values may include 0,1, 5, wherein 0,1, 5 are each a unit value within the first unit of values. For example, 1 may be a second unit value within the first range of values. That is, if the extracted data type sub-identifier of the field has a value of 1, it may be determined that the splitting mode matching 1 is the first length splitting mode, and the splitting mode is according to the first length splitting modeAnd the length splitting mode is used for splitting the coded data stream to obtain a field.
Further, the process of parsing the encoded data stream according to the first length splitting manner to obtain a field may be: from the coded data stream at the indicator TiSequentially reading bytes with a first length from the next position; and determining the read bytes of the first length as field data of the ith field. Wherein, in case that the value of the data type sub-identifier is 1, the first length may be 64 bits/bit (8 bytes), so that the identifier T located in this field in the encoded data stream may beiThe subsequent position starts to sequentially read data of 64 bits, and the read 64-bit data is determined as field data of the field. The field data of this field is also referred to as a Value (Value) of this field. Thus, in the above manner, a complete field can be read from the encoded data stream.
For example, in encoding a data stream: 0001 0001 1000 000000010000 00010000 0001 1000.. The first byte 0001 is the identifier of a field, and according to the above analysis on the identifier of the field, the value of the data type sub-identifier of the field (the last three bits 001) is 1, so that 8 bytes (64 bits/bit) located in the identifier are read from the encoded data stream: 1000 1000 000000010000 0001 1000 1000 000000010000 00010000 0001 1000 1000. And takes the read 8 bytes as field data of the field.
S660: 32 bits of data are read from the encoded data stream.
In one possible implementation, if T is identifiediAnd if the value of the contained data type sub-identifier is a third unit numerical value in the first numerical value range, determining that the splitting mode matched with the third unit numerical value is a second length splitting mode. Wherein the first range of values may include 0,1, 5, wherein 0,1, 5 are each a unit value within the first unit of values. For example, 5 may be a third unit value within the first range of values. That is, if the extracted data type sub-identifier of the field has a value of 5, it can be determined that the splitting mode matching 5 is the second lengthAnd splitting the encoded data stream according to a second length splitting mode to obtain a field.
Further, the process of parsing the encoded data stream according to the second length splitting manner to obtain a field may be: from the coded data stream at the indication TiSequentially reading bytes of a second length from a later position; and determining the read bytes with the second length as field data of the ith field. Wherein, in case that the value of the data type sub-identifier is 5, the second length may be 32 bits/bit (4 bytes), so that the identifier T located in the field in the encoded data stream may beiThe next position starts to sequentially read the data of 4 bytes, and the read data of 4 bytes is determined as the field data of the field. The field data of this field is also referred to as a Value (Value) of this field. In this way, a complete field can be read from the encoded data stream.
For example, in encoding a data stream: 00010101 1000 000000010000 00010000 0001 1000.. The first byte: 00010101 is the identification of a field whose data type sub-identification value (last three bits 101) is, according to the above analysis of the identification of the field: 5, 4 bytes (32 bits/bit) located at this identification are thus read from the encoded data stream: 1000 1000 000000010000 0001 1000 1000. And takes the read 4 bytes as field data of the field.
S670: the Length of the Tag-Length-Value is calculated.
In one possible implementation, if T is identifiediAnd if the value of the contained data type sub-identifier is a second numerical value, determining that the splitting mode matched with the second numerical value is a specified length splitting mode. Wherein the second value may be 2. That is to say, if the extracted data type sub-identifier of the field has a value of 2, the splitting mode matched with 2 may be determined as a splitting mode according to the specified length, and the encoded data stream is split according to the splitting mode according to the specified length to obtain a field.
Further, based on the above analysis (as shown in Table 1), if a certain one is presentThe Value of the data type sub-id of the field is 2 (wire _ type = 2), it may be determined that the data encoding format/storage format of the field is Tag-Length-Value. Parsing the encoded data stream according to a specified length splitting manner to obtain one or more fields may include: first, from the identifier TiObtaining the specified length; then, the mark T is located in the coded data streamiSequentially reading bytes with specified length from the next position; and determining the read byte with the specified length as field data of the ith field.
For example, if in the encoded data stream: 0001 00100000 0011 000000010000 0001 1000 000000010000 00010000 0001 1000.. The first byte 0001 0010 is the identifier of a field, and according to the above analysis on the identifier of the field, the value of the data type sub-identifier of the field (the last three bits 010) is 2, so that the next byte located at the identifier is the length of the data of the field, and the length of the data of the field is: 0000 0011, i.e. 3 (specified length). Thus, the location identity T can be read from the encoded data streamiStarting with the second byte after, 3 bytes are read sequentially: 000000010000 0001 1000; and takes the 3 bytes read as field data of the field.
In a possible implementation manner, in the process of parsing the encoded data stream according to the specified length splitting manner, if the field data of the i-th field includes multiple identifiers, and values of field number sub-identifiers included in the multiple identifiers have an adjacent relationship, it indicates that the i-th field is a message, and needs to be further split, so that the i-th field is recursively split to obtain P fields, where P is a positive integer.
Wherein, analyzing whether the value of the field number sub-identifier has the adjacency relation may include: it is determined whether there is an incremental relationship in the encoded data stream between the field number sub-identifiers of each identifier. For example, if the identifier 1 includes a field number sub-identifier of 1, the identifier 2 includes a field number sub-identifier of 2, and the identifier 3 includes a field number sub-identifier of 3, it can be determined that the 3 field number sub-identifiers have an adjacent relationship; for another example, if the identifier 1 includes the field number sub-identifier 3, the identifier 2 includes the field number sub-identifier 2, and the identifier 3 includes the field number sub-identifier 1, it may be determined that the 3 field number sub-identifiers do not have an adjacent relationship. It should be noted that, the values of the field number sub-identifiers may not satisfy the gradient increment relationship, and only the increment relationship is satisfied. The gradient increasing relationship means: the difference between any two values is the same. For example: 1. 2, 3, 4, 5, which means that the gradient is increasing, as follows: 1. 2, 3, 5, 8, is incremental, but not gradient incremental. In the application, the adjacent relation between the fields can be determined only by meeting the increasing relation between the field number sub-identifications, the gradient increasing relation is not required to be met by further limiting, and certainly, the adjacent relation is also achieved if the gradient increasing relation is met between the field number sub-identifications.
For example, the read field data includes a plurality of identifiers, and it is assumed that the read field data includes identifier 1:0010 0010 and identification 2:0010 1010, the value of the field number sub flag (first five bits 00100) included in the flag 1 is 4, the value of the field number sub flag (first five bits 00101) included in the flag 2 is 5, and the values (2 and 3) of the field number sub flags of the two flags included in the field data have an adjacent relationship, so that the field data can be continuously split until the field data can not be further split. For the splitting process flow of the field data, reference may be made to the splitting process procedure of the encoded data stream in this application, which is not described herein again.
In the embodiment of the application, the data type sub-identifier and the field number sub-identifier included in each identifier can be determined by reading and analyzing each identifier, then the splitting mode of the field is determined according to the value of the data type sub-identifier included in each identifier, and splitting of the coded data stream to obtain one or more fields can be completed according to the splitting mode of the field. And finally, completing the splitting of the coded data stream until each field can not be split any more, so as to obtain N fields. The method and the device can improve the splitting efficiency because the data characteristics in the coded data stream are automatically analyzed to complete the splitting of the field, and the corresponding protocol file does not need to be manually searched to split the field.
Based on the N fields obtained after the encoded data stream is split in the embodiment of fig. 6, next, the data characteristics that are contained in each of the N fields are analyzed, so as to determine the decoding method corresponding to each field, and each field is decoded according to the decoding method corresponding to each field. Referring to fig. 7, fig. 7 is a flowchart illustrating a field decoding method according to an embodiment of the present disclosure. Wherein, the field decoding method may be applied to a computer device, as shown in fig. 7, the field decoding method may include steps S710 to S770. Wherein:
s710: it is determined whether the field is valid.
In a possible implementation manner, after the computer device obtains the field, the computer device may first perform validity determination on the field, and determining whether the field is valid may include: and judging whether the field is empty or not, whether the field has the length or not, and the like. In this way, the validity of the acquired field can be determined, thereby providing a guarantee for the subsequent data analysis.
If the field is determined to be valid, the process continues to step S720. If the field is determined to be invalid, the field may be reacquired until the acquired field is valid.
S720: and extracting the data type sub-identifier.
In one possible implementation, tag (identity T) due to fieldi) Are obtained by variable length coding, so that for the extracted identifier TiFirst, the identification T can be decoded in a variable length modeiAnd performing decoding processing. For example, the identity T of a certain field obtainediComprises the following steps: 0001 0010, then according to the mark TiIs calculated by the formula field _ number<<3 a wire (u) type. Identification T which can be decoded from variable lengthiSeparate out the mark TiA data type sub-identification (e.g., wire _ type) and a field number sub-identification (e.g., field _ number) are included. Namely, the last three bits in 0001 0010 are data type sub-identifiers, and the first five bits in 0001 0010 are field number sub-identifiers.
S730: and (5) variable length decoding.
In one possible implementation, if T is identifiediIf the value of the included data type sub-identifier is any unit value in the first value range, the corresponding decoding method of the i-th field can be determined to be a variable length decoding method. Wherein the first range of values may include 0,1, 5, wherein 0,1, 5 are each a unit value within the first unit of values. For example, 0 may be a first unit value within a first range of values. That is, if the extracted data type sub-flag value of the field is any one of 0,1, and 5, the decoding method corresponding to the i-th field can be determined to be the variable length decoding method.
For example, based on the foregoing description of the variable length coding rule, if wire _ type =0, the variable length decoding rule mainly includes: and removing the most significant bit of each byte, combining the bytes with the most significant bits removed in a reverse order, and restoring the combined binary data into the original data corresponding to the field. For example, wire _ type =0 for a field, and the field data for the field is 1000 00000001, 0001000000 0001 is obtained after removing the most significant bit of each byte, and combined into new binary data in reverse order: 0000001000, so the original data reverted to this field is 136. For another example, the field data obtained for a field is 00001000, and the highest bit of each byte is removed to obtain 0001000, and since there is only one byte, the binary data combined in reverse order as new still remains: 0001000, so the original data for this field is restored to 8.
The reverse order is an order in which data is arranged from the first direction to the second direction, and is converted into an order in which the data is arranged from the second direction to the first direction. For example, a certain data is arranged as [ 12 34 ] from left to right, then the data reverse order may be arranged from right to left, and thus the data may be arranged from right to left: [5 43 2 1].
In one possible implementation, if the wire _ type =1, the field data of the field includes binary data of 8 bytes, and since (wire _ type = 1) is a fixed-length read in this case, the highest bit of each byte does not need to be removed when performing variable-length decoding on the field data of the field, but the 8 bytes need to be combined in reverse order and converted into corresponding original data. For example, wire _ type =1 for a field, and field data for the field is: 00000000 00000000 0000000001000000 00000000 00000000 00000000 00000000, these 8 bytes are combined in reverse order: 00000000 00000000 00000000 0000000001000000 00000000 00000000 00000000, so the original data of this field is restored to 1073741824.
Similarly, if wire _ type =5, the field data of the field includes binary data of 4 bytes, and since the field data of the field (wire _ type = 5) is read in a fixed length in this case, it is not necessary to remove the most significant bit of each byte when performing variable length decoding on the field data of the field, but these 4 bytes need to be combined in reverse order and converted into corresponding original data. For example, wire _ type =5 for a field, and field data for the field is: 01010100 00000000 00000000 00000000, these 8 bytes are combined in reverse order: 00000000 00000000 00000000 01010100, the original data of this field is restored to 84.
S740: and judging whether the symbol features exist.
After the field data of the field is subjected to variable length decoding, further, whether the field data subjected to variable length decoding contains a symbol feature or not can be judged, and if the field data subjected to variable length decoding contains the symbol feature, the field data subjected to variable length decoding is continuously subjected to decoding processing by adopting an integer compression decoding method to obtain the original data of the field. And if the character features are not contained in the field data after variable length decoding, taking the field data after variable length decoding as the original data of the field. In this application, the symbolic feature may include a minus sign.
First, a case of the negative varrints coding will be described. When encoding negative numbers, there is a large difference between the signed int types (sint 32 and sint 64) and the "standard" int types (int 32 and int 64). If int32 or int64 is used as the type of negative number, the result varint is always ten bytes; that is, negative numbers such as-1, -2 also take up a relatively large number of bytes. In fact he is treated as a very large unsigned integer. If one of the signed types (sint 32 and sint 64) is used, the generated varint will use an improved zigbee coding (integer compression coding) with higher efficiency. The zigbee encoding maps signed numbers to unsigned numbers so that numbers with smaller absolute values (e.g., -1) also have smaller values for the varint encoding. This is done by "meandering" back and forth through positive and negative integers, encoding-1 as 1, 1 as 2, and-2 as 3 \ 8230 \8230; \8230, and so on. As shown in table 2 below:
table 2: integer compression coding
Negative number Is coded into
0 0
-1 1
1 2
-2 3
2 4
-3 5
... ...
2147483647 4294967294
-2147483648 4294967295
In one possible implementation manner, if a field has a sign characteristic, this may mean that the field is a negative number, and based on the above correlation analysis of variable length coding of the negative number, when decoding the negative number, the zigbee decoding is adopted, that is: after solving the positive number, the positive number is mapped back to the original negative number according to the mapping relation. For example, setting int32 val = -2. And 3, performing Varings coding on the number 3, and storing or sending the result. After receiving the data, the receiver performs Varints decoding to obtain a number 3, and then maps the number 3 back to-2.
S750: and analyzing the data characteristics of the field data.
In one possible implementation, if T is identifiediIf the value of the included data type sub-identifier is the second value, then the data characteristics of the field data in the field (i-th field) need to be analyzed continuously. The second value may be 2, which means that if the extracted value of the data type sub identifier of the field is 2, the data characteristics of the field data in the field need to be analyzed continuously to determine the decoding method corresponding to the field.
S760: and judging whether the character string features are provided or not.
After analyzing the data characteristics of the field data, it can be determined whether the data characteristics of the field data in the field have character string characteristics. If the data characteristics of the field data in the field have character string characteristics, the decoding method corresponding to the field is a character string decoding method. If the data feature of the field data in this field does not have the character string feature, step S770 is executed.
In the present application, the string features may include UTF-8 features, and the UTF-8 features may include ASCII characters (symbols such as a, b, c). For example, the data characteristics of field data of a certain field are: 1a 03 08 96 01, the data characteristic of this field is a character string characteristic. In one possible implementation manner, if the data feature of the field data in the i-th field has a character string feature, the data type of the i-th field is determined to be a character string type, and therefore, the decoding method corresponding to the field is determined to be a character string decoding method.
S770: whether the adjacent relation exists is judged.
In a possible implementation manner, if the data characteristics of the field data in the i-th field do not have the character string characteristics, the data characteristics need to be identified according to the identifier TiThe value of the included field number sub-identifier analyzes whether the ith field has a neighboring relationship with other fields. Further, if the field does not have an adjacent relationship with other fields, the data type of the i-th field is determined to be a byte type, wherein the data of the byte type does not need to be decoded, and therefore, the decoding method corresponding to the field is determined to be decoding-free. If the field has an adjacent relationship with other fields, the field data of the field needs to be continuously split. For the splitting process, reference may be specifically made to the splitting process of the encoded data stream in this application, and details of this application are not repeated herein.
The analyzing whether a certain field and other fields have a neighbor relation may include: and judging whether the field number sub-identifications corresponding to each field have an increasing relationship in the coded data stream. For example, if the field number sub-identifier of the field 1 is 1, the field number sub-identifier of the field 2 is 2, and the field number sub-identifier of the field 3 is 3, it can be determined that the field 1, the field 2, and the field 3 have an adjacent relationship; for another example, if the field number sub-identifier of the field 1 is 3, the field number sub-identifier of the field 2 is 2, and the field number sub-identifier of the field 3 is 1, it may be determined that the field 1 does not have an adjacent relationship with the field 2 and the field 3. It should be noted that, the field number sub-identifiers may not satisfy the gradient increment relationship, and only the increment relationship is satisfied. The gradient increasing relationship means: the difference between any two numbers is the same. For example: 1. 2, 3, 4, 5, which means that the gradient is increasing, as follows: 1. 2, 3, 5, 8, is incremental, but not gradient incremental. In the application, the adjacent relation between the fields can be determined only by meeting the increasing relation between the field number sub-identifications, the gradient increasing relation needs not to be further limited, and certainly, the adjacent relation is also achieved between the fields if the gradient increasing relation is met between the field number sub-identifications.
To sum up, in the present application, for each field, the data type sub-identifier of each field is extracted, and the decoding method of each field is determined according to the value of the data type sub-identifier. Then, based on the determined decoding method, the corresponding fields are decoded according to the specific rules of the decoding method, so that the original data corresponding to each field can be obtained. Therefore, the data characteristics contained in each field are automatically analyzed, the decoding of each field is completed, and the corresponding protocol file does not need to be manually searched to decode the field, so that the decoding processing efficiency can be improved, the automatic reverse analysis decoding processing process does not depend on the protocol file, and the decoding success rate can be effectively improved.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a decoding processing apparatus according to an embodiment of the present disclosure. The decoding processing apparatus 800 may be applied to a computer device in the method embodiments corresponding to fig. 4 to fig. 7. The decoding processing apparatus 800 may be a computer program (including program code) running in a lightweight node, for example, the decoding processing apparatus 800 is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The decoding processing apparatus 800 may include:
an obtaining unit 810, configured to obtain an encoded data stream, where the encoded data stream is a binary data stream obtained by encoding original data;
a processing unit 820, configured to split an encoded data stream to obtain N fields, where N is a positive integer;
the processing unit 820 is further configured to analyze data features included in each of the N fields, and determine a data type of each field;
a determining unit 830, configured to determine, according to the data type of each field, a decoding method corresponding to each field;
the processing unit 820 is further configured to perform decoding processing on each field according to the decoding method corresponding to each field, and restore the original data.
In a possible implementation, the coded data stream includes N identifiers TiI is a positive integer and i is not more than N;
the processing unit 820 splits the encoded data stream to obtain N fields, including:
reading identification T from coded data streamiAnd for the read identification TiPerforming variable length decoding;
identification T decoded from variable length by identification separation formulaiSeparate out the mark TiThe contained data type sub-identification and field number sub-identification;
according to the identification TiDetermining a matched splitting mode by the value of the contained data type sub-identifier, and splitting the coded data stream according to the matched splitting mode to obtain one or more fields;
let i = i +1 and iteratively perform the above steps until N fields are obtained.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 bases on the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the contained data type sub-identifier is a first unit numerical value in a first numerical value range, determining that the splitting mode matched with the first unit numerical value is a variable length splitting mode;
the processing unit 820 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, including:
from the coded data stream at the indication TiThe next byte is sequentially read from the next position;
if the read next byte does not have the ending flag bit, continuing to read the next byte in sequence until the byte with the ending flag bit is read;
and combining all the read bytes into field data of the ith field according to the reading sequence.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 bases on the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the included data type sub-identifier is a second unit numerical value in the first numerical value range, determining that the splitting mode matched with the second unit numerical value is a first length splitting mode;
the processing unit 820 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, including:
from the coded data stream at the indication TiSequentially reading bytes of a first length from a later position;
and determining the read bytes of the first length as field data of the ith field.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 bases on the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the included data type sub-identifier is a third unit numerical value in the first numerical value range, determining that the splitting mode matched with the third unit numerical value is a second length splitting mode;
the processing unit 820 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, including:
from the coded data stream at the indication TiSequentially reading bytes of a second length from a later position;
and determining the read bytes with the second length as field data of the ith field.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 bases on the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the contained data type sub-identifier is a second numerical value, determining that the splitting mode matched with the second numerical value is a splitting mode with a specified length;
the processing unit 820 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, including:
from the coded data stream at the indication TiReading the specified length from the subsequent position;
sequentially reading bytes of the specified length from the encoded data stream according to the specified length;
and determining the read byte with the specified length as field data of the ith field.
In a possible implementation manner, the processing unit 820 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, and further includes:
if the field data of the ith field contains a plurality of identifications and the values of the field number sub-identifications contained in the identifications have an adjacent relation, carrying out recursive splitting on the ith field to obtain P fields, wherein P is a positive integer.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 analyzes the data characteristics contained in each of the N fields, and determines the data type of each field, including:
if the mark T isiIf the value of the contained data type sub-identifier is any unit numerical value in a first numerical value range, determining that the data type of the ith field is a first coding format type; wherein, if the data type of the ith field is the first encoding format type, the identifier T in the ith fieldiAnd field data are stored in a key-value manner;
the determining unit 830 determines a decoding method corresponding to each field according to the data type of each field, including: and determining the decoding method corresponding to the ith field as a variable length decoding method.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processing unit 820 analyzes the data characteristics contained in each of the N fields, and determines the data type of each field, including:
if the mark T isiIf the value of the contained data type sub-identifier is a second numerical value, determining that the data type of the ith field is a second encoding format type; if the data type of the ith field is the second encoding format type, the identifier T in the ith fieldiAnd field data are stored in a key-length-value manner; the length is used to indicate the length of field data in the i-th field;
analyzing data characteristics of field data in the ith field;
if the data characteristics of the field data in the ith field have character string characteristics, determining that the data type of the ith field is the character string type;
the determining unit 830 determines a decoding method corresponding to each field according to the data type of each field, including: and determining the decoding method corresponding to the ith field as a character string decoding method.
In a possible implementation manner, analyzing data features contained in each of the N fields, and determining a data type of each field respectively further includes:
if the data characteristics of the field data in the i-th fieldIf the character string feature is not provided, the character string feature is determined according to the mark TiAnalyzing whether the ith field and other fields have adjacent relation or not by the value of the included field number sub-identification;
if the adjacent relation is not available, determining that the data type of the ith field is a byte type;
the determining unit 830 determines a decoding method corresponding to each field according to the data type of each field, including: and determining a decoding method corresponding to the ith field as decoding-free.
In a possible implementation manner, the processing unit 820 performs decoding processing on each field according to a decoding method corresponding to each field, and restores original data, including:
decoding each field according to a decoding method corresponding to each field to obtain a decoding result of each field;
and splicing the decoding results of each field to obtain the original data.
In one possible implementation, the processing unit 820 is further configured to perform the following operations:
and outputting a decoding log, wherein the decoding log comprises the encoded data stream, a decoding method corresponding to each field and a decoding processing flow of each field.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method and the device can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of the automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of the decoding processing can be effectively improved, in addition, the process of the automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device 900 is configured to perform the steps in the method embodiments corresponding to fig. 4 to 7, and the computer device 900 includes: one or more processors 910; one or more input devices 920, one or more output devices 930, and memory 940. The processor 910, the input device 920, the output device 930, and the memory 940 are connected by a bus 950. The memory 940 is used for storing a computer program comprising program instructions, and the processor 910 is used for executing the program instructions stored in the memory 940 to perform the following operations:
acquiring an encoded data stream, wherein the encoded data stream is a binary data stream obtained by encoding original data;
splitting the coded data stream to obtain N fields, wherein N is a positive integer;
respectively analyzing the data characteristics contained in each field of the N fields to determine the data type of each field;
determining a decoding method corresponding to each field according to the data type of each field;
and decoding each field according to the decoding method corresponding to each field to restore the original data.
In a possible implementation, the coded data stream includes N identifiers TiI is a positive integer and i is not more than N;
processor 910 splits the encoded data stream to obtain N fields, including:
reading an identification T from an encoded data streamiAnd for the read identification TiPerforming variable length decoding;
identification T decoded from variable length by identification separation formulaiSeparate out the identifier TiIncluded data type sub-identification and field number sub-identification;
According to the identification TiDetermining a matched splitting mode by the contained data type sub-identifier value, and splitting the coded data stream according to the matched splitting mode to obtain one or more fields;
let i = i +1 and iteratively perform the above steps until N fields are obtained.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processor 910 identifies TiThe method for determining the matched splitting mode by the contained data type sub-identification value comprises the following steps:
if the mark T isiIf the value of the contained data type sub-identifier is a first unit numerical value in a first numerical value range, determining that the splitting mode matched with the first unit numerical value is a variable length splitting mode;
processor 910 parses the encoded data stream in accordance with the matching splitting scheme to obtain one or more fields, including:
from the coded data stream at the indication TiThe next byte is sequentially read from the next position;
if the next byte read does not have the end flag bit, continuing to sequentially read the next byte until the byte with the end flag bit is read;
and combining all the read bytes into field data of the ith field according to the reading sequence.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processor 910 identifies TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the included data type sub-identifier is a second unit numerical value in the first numerical value range, determining that the splitting mode matched with the second unit numerical value is a first length splitting mode;
processor 910 parses the encoded data stream in accordance with the matching splitting scheme to obtain one or more fields, including:
from the coded data stream at the indication TiSequentially reading bytes of a first length from a later position;
and determining the read bytes of the first length as field data of the ith field.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processor 910 identifies TiThe method for determining the matched splitting mode by the contained data type sub-identification value comprises the following steps:
if the mark T isiIf the value of the included data type sub-identifier is a third unit numerical value in the first numerical value range, determining that the splitting mode matched with the third unit numerical value is a second length splitting mode;
processor 910 parses the encoded data stream according to the matching splitting method to obtain one or more fields, including:
from the coded data stream at the indicator TiSequentially reading bytes with a second length from the later position;
and determining the read bytes with the second length as field data of the ith field.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
the processor 910 identifies TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the mark T isiIf the value of the contained data type sub-identifier is a second numerical value, determining that the splitting mode matched with the second numerical value is a splitting mode with a specified length;
processor 910 parses the encoded data stream in accordance with the matching splitting scheme to obtain one or more fields, including:
from the coded data stream at the indicator TiThe subsequent position reading specifies the length;
sequentially reading bytes with the appointed length from the coded data stream according to the appointed length;
and determining the read byte with the specified length as field data of the ith field.
In a possible implementation manner, the processor 910 parses the encoded data stream according to the matching splitting manner to obtain one or more fields, further including:
if the field data of the ith field contains a plurality of identifications and the values of the field number sub-identifications contained in the identifications have an adjacent relation, carrying out recursive splitting on the ith field to obtain P fields, wherein P is a positive integer.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
processor 910 analyzes the data characteristics contained in each of the N fields, and determines the data type of each field, including:
if the mark T isiIf the value of the contained data type sub-identifier is any unit numerical value in a first numerical value range, determining that the data type of the i-th field is a first coding format type; wherein, if the data type of the ith field is the first encoding format type, the identifier T in the ith fieldiAnd field data are stored in a key-value manner;
the processor 910 determines a decoding method corresponding to each field according to the data type of each field, including: and determining the decoding method corresponding to the ith field as a variable length decoding method.
In one possible implementation, any one of the N fields is represented as an ith field, and the ith field contains an identifier TiAnd field data;
processor 910 analyzes the data characteristics contained in each of the N fields, and determines the data type of each field, including:
if the mark T isiIf the value of the contained data type sub-identifier is a second value, the i-th field is determinedThe data type is a second encoding format type; if the data type of the ith field is the second encoding format type, the identifier T in the ith fieldiAnd field data is stored in a key-length-value manner; the length is used to indicate the length of field data in the i-th field;
analyzing data characteristics of field data in the ith field;
if the data characteristics of the field data in the ith field have character string characteristics, determining that the data type of the ith field is the character string type;
the processor 910 determines a decoding method corresponding to each field according to the data type of each field, including: and determining the decoding method corresponding to the ith field as a character string decoding method.
In a possible implementation manner, analyzing data features contained in each of the N fields, and determining a data type of each field respectively further includes:
if the data characteristics of the field data in the ith field do not have the character string characteristics, the data characteristics are identified according to the identifier TiAnalyzing whether the ith field and other fields have adjacent relation or not by using the value of the included field number sub-identifier;
if the adjacent relation is not available, determining that the data type of the ith field is a byte type;
the processor 910 determines a decoding method corresponding to each field according to the data type of each field, including: and determining a decoding method corresponding to the ith field as decoding-free.
In a possible implementation manner, the processor 910 performs decoding processing on each field according to a decoding method corresponding to each field, and restores original data, including:
decoding each field according to a decoding method corresponding to each field to obtain a decoding result of each field;
and splicing the decoding results of each field to obtain the original data.
In one possible implementation, the processor 910 is further configured to perform the following operations:
and outputting a decoding log, wherein the decoding log comprises the encoded data stream, a decoding method corresponding to each field and a decoding processing flow of each field.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then, decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of decoding processing can be effectively improved, in addition, the process of automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
Further, here, it is to be noted that: an embodiment of the present application further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the decoding processing apparatus 800, and the computer program includes program instructions, and when the processor executes the program instructions, the method in the embodiment corresponding to fig. 4 to fig. 7 can be executed, and therefore, details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application. As an example, program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network, where the multiple computer devices are distributed across the multiple sites and interconnected by the communication network to form a block chain system.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can execute the method in the embodiment corresponding to fig. 4 to fig. 7, and therefore, the detailed description thereof will not be repeated here.
In the embodiment of the application, an encoded data stream is obtained, wherein the encoded data stream is a binary data stream obtained by encoding original data; splitting the coded data stream to obtain N fields, wherein N is a positive integer; respectively analyzing the data characteristics contained in each field of the N fields, determining the data type of each field, and determining the decoding method corresponding to each field according to the data type of each field; and then, decoding each field according to the decoding method corresponding to each field to restore the original data. Therefore, the method can automatically analyze and process the characteristics of the coded data stream, can determine the decoding method corresponding to each field based on the reverse analysis of the data characteristics of each field in the binary data stream, and further can restore the original data based on the decoding method of each field; through the process of automatic reverse analysis decoding processing, manual searching is not needed, the efficiency of decoding processing can be effectively improved, in addition, the process of automatic reverse analysis decoding processing does not depend on protocol files, and the decoding success rate can be effectively improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.

Claims (15)

1. A method of decoding processing, the method comprising:
acquiring an encoded data stream, wherein the encoded data stream is a binary data stream obtained by encoding original data;
splitting the coded data stream to obtain N fields, wherein N is a positive integer;
analyzing the data characteristics contained in each field of the N fields respectively to determine the data type of each field;
determining a decoding method corresponding to each field according to the data type of each field;
and decoding each field according to the decoding method corresponding to each field, and restoring the original data.
2. The method of claim 1, wherein the encoded data stream includes N flags TiI is a positive integer and i is not more than N;
the splitting the encoded data stream to obtain N fields includes:
reading an identification T from the encoded data streamiAnd for the read identification TiPerforming variable length decoding;
using mark separation formula to separate mark T from variable length decoded markiSeparating out the mark TiThe contained data type sub-identification and field number sub-identification;
according to the identification TiDetermining a matched splitting mode according to the value of the contained data type sub-identifier, and splitting according to the matched splitting modeEncoding the data stream to obtain one or more fields;
let i = i +1 and iteratively perform the above steps until the N fields are obtained.
3. The method of claim 1 or 2, wherein any of the N fields is represented as an ith field containing an identification TiAnd field data;
according to the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the said mark TiIf the value of the contained data type sub-identifier is a first unit numerical value in a first numerical value range, determining that the splitting mode matched with the first unit numerical value is a variable length splitting mode;
the parsing the encoded data stream according to the matching splitting manner to obtain one or more fields includes:
locating the identifier T from the encoded data streamiThe next byte is sequentially read from the next position;
if the read next byte does not have the end flag bit, continuing to sequentially read the next byte until the byte with the end flag bit is read;
and combining all the read bytes into field data of the ith field according to the reading sequence.
4. The method of claim 1 or 2, wherein any of the N fields is represented as an ith field containing an identification TiAnd field data;
according to the identification TiThe method for determining the matched splitting mode by the contained data type sub-identification value comprises the following steps:
if the said mark TiIf the value of the included data type sub-identifier is a second unit value in the first numerical value range, determining that the splitting mode matched with the second unit value is a first length splitting mode;
The parsing the encoded data stream according to the matched splitting manner to obtain one or more fields includes:
locating the identifier T from the coded data streamiSequentially reading bytes of a first length from a later position;
determining the read bytes of the first length as field data of the ith field.
5. Method according to claim 1 or 2, wherein any of the N fields is denoted as the i-th field, which contains an identification TiAnd field data;
according to the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the said mark TiIf the value of the contained data type sub-identifier is a third unit numerical value in the first numerical value range, determining that the splitting mode matched with the third unit numerical value is a second length splitting mode;
the parsing the encoded data stream according to the matched splitting manner to obtain one or more fields includes:
locating the identifier T from the encoded data streamiSequentially reading bytes with a second length from the later position;
determining the read bytes of the second length as field data of the ith field.
6. Method according to claim 1 or 2, wherein any of the N fields is denoted as the i-th field, which contains an identification TiAnd field data;
according to the identification TiThe method for determining the matched splitting mode by the contained data type sub-identifier value comprises the following steps:
if the said mark TiIf the value of the included data type sub-identifier is a second numerical value, determining that the value matches the second numerical valueThe splitting mode of (1) is a splitting mode with a specified length;
the parsing the encoded data stream according to the matched splitting manner to obtain one or more fields includes:
locating the identifier T from the encoded data streamiReading the specified length from the subsequent position;
sequentially reading the bytes with the specified length from the coded data stream according to the specified length;
and determining the read bytes with the specified length as field data of the ith field.
7. The method of claim 6, wherein said parsing said encoded data stream in accordance with said matched splitting to obtain one or more fields, further comprises:
if the field data of the ith field contains a plurality of identifications and values of field number sub-identifications contained in the identifications have an adjacent relation, recursively splitting the ith field to obtain P fields, wherein P is a positive integer.
8. The method of claim 1 or 2, wherein any of the N fields is represented as an ith field containing an identification TiAnd field data;
the analyzing the data features contained in each field of the N fields to determine the data type of each field includes:
if the said mark TiIf the value of the contained data type sub-identifier is any unit numerical value in a first numerical value range, determining that the data type of the i-th field is a first coding format type; wherein, if the data type of the ith field is the first encoding format type, the identifier T in the ith fieldiAnd field data are stored in a key-value manner;
the determining the decoding method corresponding to each field according to the data type of each field includes: and determining that the decoding method corresponding to the ith field is a variable length decoding method.
9. Method according to claim 1 or 2, wherein any of the N fields is denoted as the i-th field, which contains an identification TiAnd field data;
the analyzing the data features contained in each field of the N fields to determine the data type of each field includes:
if the said mark TiIf the value of the contained data type sub-identifier is a second numerical value, determining that the data type of the ith field is a second encoding format type; wherein, if the data type of the ith field is the second encoding format type, the identifier T in the ith fieldiAnd field data is stored in a key-length-value manner; the length is used for representing the length of field data in the ith field;
analyzing data characteristics of field data in the ith field;
if the data characteristics of the field data in the ith field have character string characteristics, determining that the data type of the ith field is a character string type;
the determining, according to the data type of each field, the decoding method corresponding to each field includes: and determining that the decoding method corresponding to the ith field is a character string decoding method.
10. The method of claim 8, wherein the analyzing the data characteristics contained in each of the N fields to determine the data type of each field respectively further comprises:
if the data characteristics of the field data in the ith field do not have the character string characteristics, the identification T is used for identifying the field data in the ith fieldiAnalyzing whether the ith field and other fields have adjacent relation or not by the value of the contained field number sub-identification;
if the adjacent relation does not exist, determining that the data type of the ith field is a byte type;
the determining the decoding method corresponding to each field according to the data type of each field includes: and determining that the decoding method corresponding to the ith field does not need decoding.
11. The method of claim 1, wherein the decoding each field according to the decoding method corresponding to each field to restore the original data comprises:
decoding each field according to the decoding method corresponding to each field to obtain the decoding result of each field;
and splicing the decoding results of each field to obtain the original data.
12. The method of claim 1, wherein the method further comprises:
and outputting a decoding log, wherein the decoding log comprises the encoded data stream, the decoding method corresponding to each field and the decoding processing flow of each field.
13. An apparatus for decoding processing, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an encoded data stream, and the encoded data stream is a binary data stream obtained by encoding original data;
the processing unit is used for splitting the coded data stream to obtain N fields, wherein N is a positive integer;
the processing unit is further configured to analyze data features included in each field of the N fields, and determine a data type of each field;
a determining unit, configured to determine, according to the data type of each field, a decoding method corresponding to each field;
the processing unit is further configured to perform decoding processing on each field according to the decoding method corresponding to each field, and restore the original data.
14. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the method of any one of claims 1 to 12.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 12.
CN202110477916.6A 2021-04-29 2021-04-29 Decoding processing method, decoding processing device, computer equipment and storage medium Pending CN115276889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110477916.6A CN115276889A (en) 2021-04-29 2021-04-29 Decoding processing method, decoding processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110477916.6A CN115276889A (en) 2021-04-29 2021-04-29 Decoding processing method, decoding processing device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115276889A true CN115276889A (en) 2022-11-01

Family

ID=83745979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110477916.6A Pending CN115276889A (en) 2021-04-29 2021-04-29 Decoding processing method, decoding processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115276889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915368A (en) * 2023-09-14 2023-10-20 深圳华云信息系统科技股份有限公司 Encoding and decoding method and device for data stream conforming to futures transaction data exchange protocol

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116915368A (en) * 2023-09-14 2023-10-20 深圳华云信息系统科技股份有限公司 Encoding and decoding method and device for data stream conforming to futures transaction data exchange protocol
CN116915368B (en) * 2023-09-14 2024-03-29 深圳华云信息系统科技股份有限公司 Encoding and decoding method and device for data stream conforming to futures transaction data exchange protocol

Similar Documents

Publication Publication Date Title
CN108961052B (en) Verification method, storage method, device, equipment and medium of block chain data
JP4456554B2 (en) Data compression method and compressed data transmission method
CN110445860B (en) Message sending method, device, terminal equipment and storage medium
CN109167750B (en) Data packet transmission method and device, electronic equipment and storage medium
CN108846753B (en) Method and apparatus for processing data
CN108733317B (en) Data storage method and device
CN110597814B (en) Structured data serialization and deserialization method and device
CN111262876B (en) Data processing method, device and equipment based on block chain and storage medium
CN110851748A (en) Short link generation method, server, storage medium and computer equipment
US11070231B2 (en) Reducing storage of blockchain metadata via dictionary-style compression
CN104980489A (en) Secure collection synchronization using matched network names
US10394763B2 (en) Method and device for generating pileup file from compressed genomic data
CN113761219A (en) Knowledge graph-based retrieval method and device, electronic equipment and storage medium
CN111629063A (en) Block chain based distributed file downloading method and electronic equipment
CN104281970B (en) Message treatment method, message processing apparatus and server platform
CN115276889A (en) Decoding processing method, decoding processing device, computer equipment and storage medium
CN115409507A (en) Block processing method, block processing device, computer equipment and storage medium
CN115033549A (en) File link storage method and device based on block chain
CN114065269A (en) Method for generating and analyzing bindless heterogeneous token and storage medium
CN110096624B (en) Encoding and decoding method and device, computer equipment and storage medium
CN114092577A (en) Image data processing method, image data processing device, computer equipment and storage medium
CN113204683A (en) Information reconstruction method and device, storage medium and electronic equipment
CN112559546A (en) Database synchronization method and device, computer equipment and readable storage medium
CN112732789A (en) Searchable encryption method based on block chain and electronic equipment
CN115002100B (en) File transmission method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination