CN113254723A

CN113254723A - Big data storage method and device under cloud network architecture

Info

Publication number: CN113254723A
Application number: CN202110597410.9A
Authority: CN
Inventors: 齐维潇
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-13

Abstract

The application discloses a big data storage method under a cloud network architecture, which comprises the following steps: the method comprises the following steps that a plurality of edge nodes collect data uploaded by a plurality of sensors and send the data to a format analyzer; analyzing the data format to obtain a plurality of data formats; generating blank normalized data, wherein the normalized data is in a tree structure; acquiring repeated fields of various data formats, establishing a repeated field set based on the repeated fields, and sequentially adding the repeated fields in the repeated field set into a plurality of first child nodes of the normalized data; the method comprises the steps of obtaining differential fields of various data formats through format analysis, establishing a differential field set based on the differential fields, sequentially adding the differential fields in the differential field set into a plurality of second sub-nodes of normalized data, and establishing initial format identifications corresponding to the differential fields; and sending the normalized data added with the repeated fields and the differentiated fields to a cloud server so that the cloud server stores the normalized data.

Description

Big data storage method and device under cloud network architecture

Technical Field

The present application relates to the field of information technologies, and in particular, to a big data storage method and apparatus under a cloud network architecture.

Background

Cloud computing (Cloud computing) is another revolutionary change in information technology following the major transition from mainframe computers to client/server (C/S) model in the 80S of the 20 th century. On 9.8.2006, Google president executive Eric Schmidt first proposed a cloud computing concept at the search engine congress (SES San Jose 2006). Cloud computing is a product of development and fusion of traditional computer and network technologies, such as grid computing, distributed computing, parallel computing, utility technology, network storage, virtualization, load balancing and the like. The method aims to organize and integrate shared software/hardware resources and information in a network-based computing mode, and provide the information to computers and other systems for use as required.

The cloud computing network architecture can be mainly divided into four layers. The first is the display layer. This layer of most data center cloud computing architectures is primarily used to present the content and service experience desired by users in a friendly manner. Second is the middle layer. The layer is started from the top, and provides a plurality of services such as cache service and REST service on the basis of resources provided by the underlying infrastructure layer, and the services can be used for supporting the display layer and can also be directly called by a user. The third is the infrastructure layer, which is used to prepare the above middleware layer or the user for the resources such as computation and storage required by the user. And the fourth layer is a management layer which serves three layers in the horizontal direction and provides a plurality of management and maintenance technologies for the three layers.

In the prior art, the data volume of cloud computing processing and storage is very large, which brings the difficult problems of storage and application of multi-source heterogeneous data, the data formats of different data sources and different data formats are different, and in an actual service scene, a relatively uniform normalized data format is not needed to be suitable for the scene, so that the difference of data storage modes is large, the uniform management of big data is not facilitated, the data storage formats are not uniform, the efficiency of data extraction is low, and the application range is limited.

Disclosure of Invention

The embodiment of the invention provides a big data storage method under a cloud network architecture, which is used for solving the problems that the storage format of big data is not uniform and the big data extraction efficiency is low due to the lack of a normalized data format aiming at a specific practical scene in the prior art.

The embodiment of the invention provides a big data storage method under a cloud network architecture, which comprises the following steps:

the method comprises the following steps that a plurality of edge nodes collect data uploaded by a plurality of sensors and send the data to a format analyzer;

the format analyzer analyzes the format of the data to obtain a plurality of data formats;

the format parser generates blank normalized data, and the normalized data is of a tree structure;

the format analyzer acquires repeated fields of the multiple data formats, establishes a repeated field set based on the repeated fields, and sequentially adds the repeated fields in the repeated field set into a plurality of first child nodes of the normalized data;

the format analysis is used for obtaining the differentiated fields of the multiple data formats, establishing a differentiated field set based on the differentiated fields, sequentially adding the differentiated fields in the differentiated field set into a plurality of second sub-nodes of the normalized data, and establishing initial format identifications corresponding to the differentiated fields;

and the format parser sends the normalized data added with the repeated fields and the differentiated fields to a cloud server so that the cloud server stores the normalized data.

Optionally, the normalized data is provided with a first parent node, a second parent node and a third parent node,

a subordinate node of the first parent node is a plurality of the first child nodes, which are sequentially filled with the repeated fields in the repeated field set, respectively,

a plurality of the second child nodes of the subordinate node of the second parent node, the plurality of the second child nodes being sequentially populated by differentiated fields of the set of differentiated fields, respectively,

a plurality of third child nodes of a subordinate node of the second parent node, the third child nodes being reserved child nodes.

Optionally, the subordinate node of the first child node is a first commander node, and the first commander node includes a first optional commander node and a first mandatory commander node, and then the method further includes:

the format parser parses the repeated field to obtain a first optional command symbol and a first mandatory command symbol, and sequentially adds the first optional command symbol into the first optional command symbol node and adds the first mandatory command symbol into the first mandatory command symbol node,

and/or the presence of a gas in the gas,

the subordinate node of the second child node is a second commander node, and the second commander node includes a second optional commander node and a second mandatory commander node, and the method further includes:

the format analyzer analyzes the differential field, acquires a second optional command symbol and a second mandatory command symbol, sequentially adds the second optional command symbol into the second optional command symbol node, and adds the second mandatory command symbol into the second mandatory command symbol node.

Optionally, the subordinate node of the first command symbol node and/or the second command symbol node is an expansion module hierarchical node, and the expansion module hierarchical node is configured to perform node expansion when the node data is saturated.

Optionally, after the cloud server stores the normalized data, the method further includes:

if the cloud server sends a command to one of the edge nodes, decoupling operation is carried out on the normalized data, a data format consistent with the edge node control signaling format is generated, and a control instruction is sent to the edge node based on the data format consistent with the edge node control signaling format.

Optionally, the performing a decoupling operation on the normalized data includes:

and disassembling a repeated field and a differentiated field in the normalized data, acquiring a specific field corresponding to the ID in the differentiated field based on the ID of the edge node to be sent, and forming the repeated field, the specific field and the control command symbol into the control command in the XML format.

Optionally, the data types include structured data, semi-structured data, and unstructured data, and the acquiring, by the edge nodes, data uploaded by the sensors includes:

and the edge nodes are subjected to priority ordering according to distribution places and storage spaces to generate a high-priority cluster, a medium-priority cluster and a low-priority cluster, the edge nodes of the high-priority sequence collect the unstructured data, the edge nodes of the medium-priority sequence collect the semi-structured data, and the edge nodes of the low-priority sequence collect the structured data.

Optionally, the repeated field is a common field in the multiple data formats, and the differentiated field is a specific field in a certain data format.

Optionally, the cloud server stores the normalized data, including:

the cloud server stores the normalized data through a plurality of container dockers.

The embodiment of the present invention further includes an apparatus, which is characterized by comprising a memory and a processor, wherein the memory stores computer executable instructions, and the processor implements the method when executing the computer executable instructions on the memory.

According to the method and the device provided by the embodiment of the invention, through setting the normalized data format of the tree structure, a specific number of data formats are analyzed, the repeated fields and the differentiated fields are extracted, the fields are added into the leaf nodes to form the extensible normalized data format, so that unified storage is convenient, and when data extraction is required, the normalized data format is decoupled, so that a certain specific type of data format is generated, the applicability of a data specific scene is improved, and the management standardization requirement of data storage and the efficiency of data storage and extraction are improved.

Drawings

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings used in the description of the embodiment will be briefly introduced below.

FIG. 1 is a flow diagram of a big data storage method under a cloud network architecture in one embodiment;

FIG. 2 is a schematic diagram illustrating normalized storage of data under a cloud network architecture according to an embodiment;

FIG. 3 is a diagram illustrating a normalized data format in one embodiment;

FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Fig. 1 is a flowchart of a big data storage method under a cloud network architecture according to an embodiment of the present invention, and as shown in fig. 1, the method provided in the embodiment of the present invention specifically includes:

s101, a plurality of edge nodes collect data uploaded by a plurality of sensors and send the data to a format analyzer;

the definition of the edge node is to sink the storage and calculation function, so that the storage and calculation function meets the requirement of the concurrent acquisition and storage function of big data. The edge node is used as a data transfer station, and a large amount of data needs to be denoised, filed and aggregated, so that the original data is aggregated into effective data for large-scale commercial use and processing by a cloud server.

In the embodiment of the invention, the sensors can be various sensors with different data formats, and typical sensors comprise a camera, a temperature and humidity sensor, a GPS, various IOT sensors and the like. The data can be divided into structured data, semi-structured data and unstructured data according to the format. The most common is structured, which is data with patterns, the structure being the pattern. Most applications are based on structured data. Unstructured generally refers to data that cannot be structured, such as pictures, files, videos, and the like. Semi-structured data is structured, but is not conveniently schematized because the description is not standard or because the description is flexible. Data represented by XML and JSON has the characteristic of a semi-mode.

The edge nodes can also perform priority ordering according to distribution locations and storage spaces to generate a high-priority cluster, a medium-priority cluster and a low-priority cluster. Wherein, low, medium, high can be defined according to a set threshold: the edge nodes of the high-priority cluster are closer to the corresponding sensors, the storage space is larger, the resource occupancy rate is lower, the medium-priority cluster is the highest, and the low-priority cluster is the highest in resource occupancy rate. Generally, the amount of unstructured data is large, and low latency and stability are required, so that the unstructured data is acquired by using edge nodes of a high priority sequence, the semi-structured data is acquired by preferentially using edge nodes of a medium priority sequence, and the structured data is acquired by preferentially using edge nodes of a low priority sequence because the resource saturation of the edge nodes is high or the storage space of the edge nodes is preferential.

Due to the diversity of data formats, it is not easy to complete a relatively uniform and clear data format standard under a specific scenario (a specific scenario composed of a specific kind of sensors and a specific number of edge computations), and therefore, in the embodiment of the present invention, it is necessary to analyze different data formats in advance and provide a generalized data format to satisfy data storage and service for a specific scenario.

In the embodiment of the invention, a format parser is set, and the format parser can be used as a part of a cloud server to perform format parsing, or can be an independent hardware carrier and is specially responsible for performing formatting operation of normalized data on a non-specific format.

Fig. 2 is a schematic diagram of normalized storage of data under a cloud network architecture provided by the embodiment of the present invention. As shown in fig. 2, a plurality of edge nodes a-E collect data of a plurality of sensors and send the data to a format parser, and after parsing and processing the data, the format parser generates a normalized data format and sends the normalized data format to a cloud server, so that the cloud server stores the normalized data format.

S102, the format analyzer analyzes the format of the data to obtain a plurality of data formats;

after the format parser obtains different types of data, the data format is parsed, and a plurality of different data formats are obtained, for example, data formats such as MP3, MP4, RMVB, XML, JASON, TXT, and the like, where different data formats generally have specific data messages, and a typical data message includes a header, a payload, and a trailer, where the message is a set of an ID and various types of command symbols, the payload is valid data, and the trailer may include various types of command symbols (e.g., an end symbol) and a message ID, and the like. Therefore, for a certain type of data, the message and the trailer of the data need to be specified, and the data format of the type can be generated.

S103, generating blank normalized data by the format parser, wherein the normalized data is in a tree structure;

after analyzing a plurality of data formats, in order to meet the requirement of the normalized data format, the format analyzer firstly establishes a tree-structured blank normalized data, and conveniently fills various command symbols into the normalized data to generate the real standard normalized data.

S104, the format analyzer acquires repeated fields of the multiple data formats, establishes a repeated field set based on the repeated fields, and sequentially adds the repeated fields in the repeated field set into a plurality of first child nodes of the normalized data;

the format parser summarizes the repeated fields in the data formats and sets a repeated field set, for example, for data formats A, B and C, the fields are respectively set in the 1 st command symbol of data format a, the 3 rd command symbol of data format B and the 7 th command symbol of data format C, and have different meanings in different data formats, so after the common repeated field "01C" is extracted, the repeated field needs to be explained, and for the repeated field set, each repeated field needs to be added with an explanation notice, which clearly defines which command symbol of which data format the repeated field appears in and the meaning of the explanation.

After the duplicate field description is added, the duplicate field description is sequentially added into a plurality of first child nodes, and the first child nodes are positioned in the nodes at the lower level or the lower level of the root node and can be expanded for a plurality of times to accommodate the duplicate field set. As shown in table 1 below.

TABLE 1

First child node A	First child node B	First child node C	First child node D	...
					Repetition field 1	Repetition field 2	Repetition field 3	Repetition field 4	...

The number of child nodes can be consistent with the number of repeated fields and is easy to expand.

Note that the duplicate field may be a field in which all data formats are duplicated, or may be a field in which any two data formats are duplicated.

S105, analyzing the format to obtain differentiated fields of the multiple data formats, establishing a differentiated field set based on the differentiated fields, sequentially adding the differentiated fields in the differentiated field set into a plurality of second sub-nodes of the normalized data, and establishing initial format identifications corresponding to the differentiated fields;

unlike S104, the format parser may also aggregate the differentiated fields to form a set of differentiated fields. For example, for data formats A, B and C, "114A" is a specific field of data format a (data formats B and C do not have this format), "12B" is a specific field of data format B, and "45D" is a specific field of data format C. For convenience of subsequent format description and use, the differentiation field also needs to add a Notification or an initial identifier (i.e. which command symbol of which data format the field originally located). After the description is added, the field needs to be added into a plurality of second child nodes in sequence.

Thus, the repeated fields correspond to the intersection of the fields, and the differentiated fields correspond to the non-intersection of the fields. It should be noted that in a specific scenario, the more data formats, the fewer specific fields are, i.e. the fewer fields are completely different from other data formats. Then, either data format is equal to the choice of a repeated field + the choice of a non-repeated field (i.e., a particular field).

In the embodiment of the present invention, the normalized data format is as shown in fig. 3, the normalized data has a first parent node leaf1, a second parent node leaf2 and a third parent node leaf3,

the lower nodes of the first parent node are a plurality of the first child nodes child1 ═ child1₁,child1₂,...,child1_n-the plurality of first child nodes child1 are sequentially filled by duplicate fields of the set of duplicate fields, respectively,

a lower node of the second parent node, a plurality of the second child nodes child2 ═ child2₁,child2₂,...,child2_n-the plurality of second child nodes are sequentially populated by the differentiated fields of the set of differentiated fields, respectively,

a plurality of third child nodes child3 { child3 } which are lower nodes of the second parent node₁,child3₂,...,child3_nAnd the third child node is a reserved child node, and an ANYXML field is filled by default, so that the third child node can be used as an alternative for the first child node or the second child node when the first child node or the second child node is close to saturation.

In the embodiment of the present invention, the lower node of the first child node is a first command symbol node ctrl1, where the first command symbol node comprises a first optional command symbol node opt1 and a first mandatory command symbol node mand1, that is, ctrl1 ═ opt1, and command 1. An optional designator is a deletable designator in the data format, typically a reserved or padding designator in the data format, with no actual meaning, and may be selected to be reserved or deleted, while a mandatory designator is one that must be reserved, and absent, the instruction is incomplete or has no actual physical meaning.

Therefore, the format parser parses or disassembles the repeated field, obtains the first optional command symbol and the first mandatory command symbol therein, and sequentially adds the first optional command symbol into the first optional command symbol node, and adds the first mandatory command symbol into the first mandatory command symbol node, for example, "01C" is a repeated field, where "01" is a mandatory command symbol, and "C" is an optional command symbol, then the "01C" may be disassembled into the mandatory command symbol "01" and the optional command symbol "C", and "01" is respectively put into the first mandatory command node, and "C" is put into the first optional command node.

And/or the presence of a gas in the gas,

the lower level node of the second child node is a second command symbol node ctrl2, which includes a second optional command symbol node opt2 and a second mandatory command symbol node, and command 2, i.e., ctrl2 ═ opt2, and command 2. The format parser parses the differentiated field to obtain a second optional command symbol and a second mandatory command symbol therein, and sequentially adds the second optional command symbol to the second optional command symbol node and adds the second mandatory command symbol to the second mandatory command symbol node.

The lower nodes of the first command symbol node and/or the second command symbol node are expansion module level nodes, and the expansion module level nodes are used for node expansion when node data is saturated.

S106, the format analyzer sends the normalization data added with the repeated fields and the differentiated fields to a cloud server so that the cloud server stores the normalization data.

After the cloud server receives the normalized data, due to the consistency of the data formats of the normalized data, the cloud server does not need to adopt different databases to store different data formats, but can adopt a most direct and efficient queue storage method to store the data, so that the storage efficiency can be improved, and the utilization rate of storage resources can be improved.

In addition, the cloud server may further store the normalized data through a plurality of container dockers.

After the cloud server stores the normalized data, if the cloud server sends a command to one of the edge nodes, performing decoupling operation on the normalized data to generate a data format consistent with the edge node control signaling format, and sending a control instruction to the edge node based on the data format consistent with the edge node control signaling format.

Wherein, the decoupling operation may specifically be: and disassembling a repeated field and a differentiated field in the normalized data, acquiring a specific field corresponding to the ID in the differentiated field based on the ID of the edge node to be sent, and forming the repeated field, the specific field and the control command symbol into the control command in the XML format.

According to the method provided by the embodiment of the invention, through setting the normalized data format of the tree structure, a specific number of data formats are analyzed, the repeated fields and the differentiated fields are extracted, the fields are added into the leaf nodes to form the extensible normalized data format, so that unified storage is convenient, and when data extraction is required, the normalized data format is decoupled, so that a certain specific type of data format is generated, the applicability of a specific scene of data is improved, and the management standardization requirements of data storage and the efficiency of data storage and extraction are improved.

Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-executable instructions for performing the method in the foregoing embodiments.

The embodiment of the invention also provides a device which comprises a memory and a processor, wherein the memory is stored with computer executable instructions, and the processor realizes the method when running the computer executable instructions on the memory.

FIG. 4 is a diagram illustrating the hardware components of the apparatus according to one embodiment. It will be appreciated that fig. 4 only shows a simplified design of the device. In practical applications, the apparatuses may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all apparatuses that can implement the big data management method of the embodiments of the present application are within the protection scope of the present application.

The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.

The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.

The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.

The memory is used to store program codes and data of the network device.

The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

The above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A big data storage method under a cloud network architecture is characterized by comprising the following steps:

2. The method of claim 1, wherein the normalized data has a first parent node, a second parent node, and a third parent node,

3. The method of claim 2, wherein the subordinate nodes of the first child node are first commander nodes including a first optional commander node and a first mandatory commander node, the method further comprising:

and/or the presence of a gas in the gas,

4. The method according to claim 3, wherein the subordinate nodes of the first and/or second commander nodes are expansion module hierarchy nodes for node expansion when node data is saturated.

5. The method of any of claims 1-4, wherein after the cloud server stores the normalized data, the method further comprises:

6. The method of claim 5, wherein said performing a decoupling operation on said normalized data comprises:

7. The method of claim 1, wherein the data types include structured data, semi-structured data, and unstructured data, and the plurality of edge nodes collect data uploaded by a plurality of sensors, including:

8. The method of claim 1, wherein the repeated field is a common field in the plurality of data formats, and wherein the differentiated field is a specific field in a certain data format.

9. The method of claim 1, wherein the cloud server stores the normalized data, comprising:

10. An apparatus comprising a memory having computer-executable instructions stored thereon and a processor that, when executing the computer-executable instructions on the memory, implements the method of any of claims 1 to 9.