CN114943287A

CN114943287A - Computer big data acquisition and processing system, method, equipment and medium

Info

Publication number: CN114943287A
Application number: CN202210550505.XA
Authority: CN
Inventors: 刘满君
Original assignee: Eastern Liaoning University
Current assignee: Eastern Liaoning University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-08-26

Abstract

The invention relates to a computer big data acquisition and processing system, a method, equipment and a medium, wherein the system comprises: the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device; the processing device has: the classification module is used for classifying the big data based on the format attribute of the big data to form classified data; when data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. And then randomly distributing different data blocks to the heterogeneous storage servers.

Description

Computer big data acquisition and processing system, method, equipment and medium

Technical Field

The invention relates to the technical field of big data acquisition and processing, in particular to a computer big data acquisition and processing system, method, equipment and medium.

Background

In the field of electronic commerce, multiple big data are often required to be acquired from a plurality of computers for back-end analysis, and after the existing data are collected, processed and stored, when data access is required, the data are easily attacked and leaked.

Disclosure of Invention

In view of the above, the present invention provides a system, a method, a device and a medium for computer big data collection and processing, so as to solve the problems mentioned in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

a computer big data acquisition processing system, comprising:

the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device;

the processing device has:

the classification module is used for classifying the big data based on the format attribute of the big data to form classified data;

a partitioning module that partitions the classified data into a plurality of data blocks;

the tag module writes the associated codes in the attribute values of the data blocks in sequence according to the segmentation logic of the segmentation module to form associated data blocks, and the associated data blocks are reversely fused into classified data according to the associated codes;

a random configuration module, configured to randomly configure a first storage path for each associated data block, and configured to store the associated data block into a storage node corresponding to the first storage path, record the first storage path of each associated data block, write the associated code of the associated data block corresponding to the first storage path into a mapping table, and store the associated code in a first configuration unit;

the storage module is provided with a plurality of different storage servers, and each different storage server corresponds to a storage node; and recording a second storage path of each storage node and storing the second storage path in a second configuration unit.

Furthermore, the storage node is provided with a detection unit, the detection unit is used for acquiring the storage state of the storage node, and when the capacity occupied by the associated data block in the storage node exceeds a set upper limit, a transfer command is triggered to enable the associated data block in the storage node to be stored in the corresponding heterogeneous storage server according to the second path.

Further, the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit monitors that the occupied capacity of the first node unit exceeds a set upper limit, a push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to enable the associated data blocks in the storage node to be stored in corresponding heterogeneous storage servers according to a second path.

The invention also provides a computer big data acquisition and processing method, which comprises the following steps:

when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then divides the classified data into a plurality of data blocks according to division logics, sequentially writes association codes in attribute values of the data blocks according to the division logics to form associated data blocks, randomly configures a first storage path for each associated data block by adopting a random configuration module after the associated data blocks are formed, stores the associated data blocks into storage nodes corresponding to the first storage paths, acquires storage states of the storage nodes through detection units arranged in the storage nodes, and triggers a transfer command to enable the associated data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path after the capacity of the associated data blocks in the storage nodes exceeds a set upper limit In the server.

Further, when the associated data block is stored in the storage node corresponding to the first storage path, the first storage path of each associated data block is recorded, and the associated code of the associated data block corresponding to the first storage path is written into the mapping table and stored in the first configuration unit.

Further, the associated data block is reversely fused according to the associated code to form corresponding classified data.

The invention also provides equipment which is applied to the computer big data acquisition and processing system and is provided with a retrieval program, wherein the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file.

Further, the configuration file is a first storage path of each of the associated data blocks and a second storage path of each storage node.

The invention also provides a medium which comprises the equipment and is the recorded retrieval program and the loading program.

When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, different data blocks are not directly connected, even if partial data are obtained from a certain heterogeneous storage server, the data blocks with correlation or continuity cannot be obtained, and the data cannot be used.

Meanwhile, the invention embeds a retrieval tool or a retrieval program into the equipment, and the retrieval program loads the program to call the configuration files stored in the first configuration unit and the second configuration unit; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. Thus, if a configuration file cannot be obtained, a contiguous, associated data block cannot be obtained.

Drawings

FIG. 1 is a schematic diagram of the framework of the present invention;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

The present invention is described in detail below with reference to the accompanying drawings, which refer to fig. 1 to 2.

Embodiment 1, the present invention provides a computer big data acquisition and processing system, including:

the processing device has:

a segmentation module to segment the classified data into a plurality of data blocks;

The specific working principle is as follows: when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then divides the classified data into a plurality of data blocks according to division logics, sequentially writes association codes in attribute values of the data blocks according to the division logics to form associated data blocks, randomly configures a first storage path for each associated data block by adopting a random configuration module after the associated data blocks are formed, stores the associated data blocks into storage nodes corresponding to the first storage paths, acquires storage states of the storage nodes through detection units arranged in the storage nodes, and triggers a transfer command to enable the associated data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path after the capacity of the associated data blocks in the storage nodes exceeds a set upper limit In the server.

When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks through a division logic, for example, the classification data is divided into a plurality of data blocks occupying the same memory size according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, direct connection does not exist among different data blocks, even if partial data are obtained from a certain heterogeneous storage server, related or continuous data blocks cannot be obtained, and the data cannot be used.

In the foregoing, the storage node has a detection unit, where the detection unit is configured to obtain a storage state of the storage node, and when a capacity occupied by the associated data block in the storage node exceeds a set upper limit, trigger a transfer command to enable the associated data block in the storage node to be stored in the corresponding heterogeneous storage server according to the second path.

In the foregoing, the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit detects that the occupied capacity of the first node unit exceeds a set upper limit, a push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to store the associated data blocks in the storage node in the corresponding heterogeneous storage server according to the second path. Of course, the storage node further includes a plurality of node units, and the node units are connected in a single classification tree.

Embodiment 2, the present invention further provides a computer big data acquisition and processing method, including the following steps:

When the associated data blocks are stored in the storage nodes corresponding to the first storage paths, the first storage path of each associated data block is recorded, and the associated codes of the associated data blocks corresponding to the first storage paths are written into a mapping table and stored in a first configuration unit.

And the associated data block carries out reverse fusion according to the associated code to form corresponding classified data.

When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, direct connection does not exist among different data blocks, even if partial data are obtained from a certain heterogeneous storage server, related or continuous data blocks cannot be obtained, and the data cannot be used.

Embodiment 3, the present invention further provides a device, which is applied to the computer big data acquisition and processing system, where the device has a search program, and the search program calls configuration files stored in the first configuration unit and the second configuration unit by loading a program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. The configuration file comprises a first storage path of each associated data block and a second storage path of each storage node.

The invention is characterized in that a retrieval tool or a retrieval program is built in the device, and the retrieval program is used for calling configuration files stored in a first configuration unit and a second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. Thus, if a configuration file cannot be obtained, a contiguous, associated data block cannot be obtained.

Embodiment 4, the present invention further provides a medium including the above device, where the medium is a recorded search program and a recorded loader program. The invention is characterized in that a retrieval tool or a retrieval program is built in the device, and the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program, so that if the configuration files cannot be acquired, continuous and associated data blocks cannot be acquired.

The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims

1. A computer big data acquisition and processing system is characterized by comprising:

the processing device has:

a random configuration module, configured to randomly configure a first storage path for each associated data block, configured to store the associated data block into a storage node corresponding to the first storage path, record the first storage path of each associated data block, write the association code of the associated data block corresponding to the first storage path into a mapping table, and store the association code in a first configuration unit;

2. The computer big data acquisition and processing system according to claim 1, wherein the storage node has a detection unit, the detection unit is configured to obtain a storage state of the storage node, and when a capacity occupied by the associated data block in the storage node exceeds a set upper limit, a transfer command is triggered to store the associated data block in the storage node in the corresponding heterogeneous storage server according to the second path.

3. The computer big data acquisition and processing system according to claim 1 or 2, wherein the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit detects that the occupied capacity of the first node unit exceeds a set upper limit, the push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to enable the associated data blocks in the storage node to be stored in corresponding heterogeneous storage servers according to the second path.

4. A computer big data acquisition and processing method is characterized by comprising the following steps:

when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then the classified data is divided into a plurality of data blocks according to division logics, association codes are sequentially written in attribute values of the data blocks according to the division logics to form association data blocks, a random configuration module is adopted to randomly configure a first storage path for each association data block after the association data blocks are formed, the association data blocks are stored in storage nodes corresponding to the first storage paths, the storage nodes acquire storage states of the storage nodes through detection units arranged in the storage nodes, and when the capacity of the association data blocks in the storage nodes exceeds a set upper limit, a transfer command is triggered to enable the association data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path In the server.

5. The computer big data acquisition processing method according to claim 4, wherein when the associated data blocks are stored in the storage nodes corresponding to the first storage paths, the first storage path of each associated data block is recorded, and the associated codes of the associated data blocks corresponding to the first storage paths are written into a mapping table and stored in a first configuration unit.

6. The big data collecting and processing method of claim 4, wherein the associated data blocks are reversely fused according to the associated codes to form corresponding classified data.

7. A device applied in the computer big data collecting and processing system according to any one of claims 1 to 3, wherein the device has a retrieval program, the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file.

8. The apparatus of claim 7, wherein the configuration file is a first storage path for each of the associated data blocks and a second storage path for each storage node.

9. A medium comprising the apparatus of any of claims 7 to 8, wherein the medium is a recorded search program and a loading program.