CN114943287A - Computer big data acquisition and processing system, method, equipment and medium - Google Patents

Computer big data acquisition and processing system, method, equipment and medium Download PDF

Info

Publication number
CN114943287A
CN114943287A CN202210550505.XA CN202210550505A CN114943287A CN 114943287 A CN114943287 A CN 114943287A CN 202210550505 A CN202210550505 A CN 202210550505A CN 114943287 A CN114943287 A CN 114943287A
Authority
CN
China
Prior art keywords
storage
data
node
data blocks
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210550505.XA
Other languages
Chinese (zh)
Inventor
刘满君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eastern Liaoning University
Original Assignee
Eastern Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eastern Liaoning University filed Critical Eastern Liaoning University
Priority to CN202210550505.XA priority Critical patent/CN114943287A/en
Publication of CN114943287A publication Critical patent/CN114943287A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a computer big data acquisition and processing system, a method, equipment and a medium, wherein the system comprises: the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device; the processing device has: the classification module is used for classifying the big data based on the format attribute of the big data to form classified data; when data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. And then randomly distributing different data blocks to the heterogeneous storage servers.

Description

Computer big data acquisition and processing system, method, equipment and medium
Technical Field
The invention relates to the technical field of big data acquisition and processing, in particular to a computer big data acquisition and processing system, method, equipment and medium.
Background
In the field of electronic commerce, multiple big data are often required to be acquired from a plurality of computers for back-end analysis, and after the existing data are collected, processed and stored, when data access is required, the data are easily attacked and leaked.
Disclosure of Invention
In view of the above, the present invention provides a system, a method, a device and a medium for computer big data collection and processing, so as to solve the problems mentioned in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a computer big data acquisition processing system, comprising:
the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device;
the processing device has:
the classification module is used for classifying the big data based on the format attribute of the big data to form classified data;
a partitioning module that partitions the classified data into a plurality of data blocks;
the tag module writes the associated codes in the attribute values of the data blocks in sequence according to the segmentation logic of the segmentation module to form associated data blocks, and the associated data blocks are reversely fused into classified data according to the associated codes;
a random configuration module, configured to randomly configure a first storage path for each associated data block, and configured to store the associated data block into a storage node corresponding to the first storage path, record the first storage path of each associated data block, write the associated code of the associated data block corresponding to the first storage path into a mapping table, and store the associated code in a first configuration unit;
the storage module is provided with a plurality of different storage servers, and each different storage server corresponds to a storage node; and recording a second storage path of each storage node and storing the second storage path in a second configuration unit.
Furthermore, the storage node is provided with a detection unit, the detection unit is used for acquiring the storage state of the storage node, and when the capacity occupied by the associated data block in the storage node exceeds a set upper limit, a transfer command is triggered to enable the associated data block in the storage node to be stored in the corresponding heterogeneous storage server according to the second path.
Further, the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit monitors that the occupied capacity of the first node unit exceeds a set upper limit, a push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to enable the associated data blocks in the storage node to be stored in corresponding heterogeneous storage servers according to a second path.
The invention also provides a computer big data acquisition and processing method, which comprises the following steps:
when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then divides the classified data into a plurality of data blocks according to division logics, sequentially writes association codes in attribute values of the data blocks according to the division logics to form associated data blocks, randomly configures a first storage path for each associated data block by adopting a random configuration module after the associated data blocks are formed, stores the associated data blocks into storage nodes corresponding to the first storage paths, acquires storage states of the storage nodes through detection units arranged in the storage nodes, and triggers a transfer command to enable the associated data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path after the capacity of the associated data blocks in the storage nodes exceeds a set upper limit In the server.
Further, when the associated data block is stored in the storage node corresponding to the first storage path, the first storage path of each associated data block is recorded, and the associated code of the associated data block corresponding to the first storage path is written into the mapping table and stored in the first configuration unit.
Further, the associated data block is reversely fused according to the associated code to form corresponding classified data.
The invention also provides equipment which is applied to the computer big data acquisition and processing system and is provided with a retrieval program, wherein the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file.
Further, the configuration file is a first storage path of each of the associated data blocks and a second storage path of each storage node.
The invention also provides a medium which comprises the equipment and is the recorded retrieval program and the loading program.
When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, different data blocks are not directly connected, even if partial data are obtained from a certain heterogeneous storage server, the data blocks with correlation or continuity cannot be obtained, and the data cannot be used.
Meanwhile, the invention embeds a retrieval tool or a retrieval program into the equipment, and the retrieval program loads the program to call the configuration files stored in the first configuration unit and the second configuration unit; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. Thus, if a configuration file cannot be obtained, a contiguous, associated data block cannot be obtained.
Drawings
FIG. 1 is a schematic diagram of the framework of the present invention;
FIG. 2 is a flow chart of the method of the present invention.
Detailed Description
The present invention is described in detail below with reference to the accompanying drawings, which refer to fig. 1 to 2.
Embodiment 1, the present invention provides a computer big data acquisition and processing system, including:
the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device;
the processing device has:
the classification module is used for classifying the big data based on the format attribute of the big data to form classified data;
a segmentation module to segment the classified data into a plurality of data blocks;
the tag module writes the associated codes in the attribute values of the data blocks in sequence according to the segmentation logic of the segmentation module to form associated data blocks, and the associated data blocks are reversely fused into classified data according to the associated codes;
a random configuration module, configured to randomly configure a first storage path for each associated data block, and configured to store the associated data block into a storage node corresponding to the first storage path, record the first storage path of each associated data block, write the associated code of the associated data block corresponding to the first storage path into a mapping table, and store the associated code in a first configuration unit;
the storage module is provided with a plurality of different storage servers, and each different storage server corresponds to a storage node; and recording a second storage path of each storage node and storing the second storage path in a second configuration unit.
The specific working principle is as follows: when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then divides the classified data into a plurality of data blocks according to division logics, sequentially writes association codes in attribute values of the data blocks according to the division logics to form associated data blocks, randomly configures a first storage path for each associated data block by adopting a random configuration module after the associated data blocks are formed, stores the associated data blocks into storage nodes corresponding to the first storage paths, acquires storage states of the storage nodes through detection units arranged in the storage nodes, and triggers a transfer command to enable the associated data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path after the capacity of the associated data blocks in the storage nodes exceeds a set upper limit In the server.
When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks through a division logic, for example, the classification data is divided into a plurality of data blocks occupying the same memory size according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, direct connection does not exist among different data blocks, even if partial data are obtained from a certain heterogeneous storage server, related or continuous data blocks cannot be obtained, and the data cannot be used.
In the foregoing, the storage node has a detection unit, where the detection unit is configured to obtain a storage state of the storage node, and when a capacity occupied by the associated data block in the storage node exceeds a set upper limit, trigger a transfer command to enable the associated data block in the storage node to be stored in the corresponding heterogeneous storage server according to the second path.
In the foregoing, the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit detects that the occupied capacity of the first node unit exceeds a set upper limit, a push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to store the associated data blocks in the storage node in the corresponding heterogeneous storage server according to the second path. Of course, the storage node further includes a plurality of node units, and the node units are connected in a single classification tree.
Embodiment 2, the present invention further provides a computer big data acquisition and processing method, including the following steps:
when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then divides the classified data into a plurality of data blocks according to division logics, sequentially writes association codes in attribute values of the data blocks according to the division logics to form associated data blocks, randomly configures a first storage path for each associated data block by adopting a random configuration module after the associated data blocks are formed, stores the associated data blocks into storage nodes corresponding to the first storage paths, acquires storage states of the storage nodes through detection units arranged in the storage nodes, and triggers a transfer command to enable the associated data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path after the capacity of the associated data blocks in the storage nodes exceeds a set upper limit In the server.
When the associated data blocks are stored in the storage nodes corresponding to the first storage paths, the first storage path of each associated data block is recorded, and the associated codes of the associated data blocks corresponding to the first storage paths are written into a mapping table and stored in a first configuration unit.
And the associated data block carries out reverse fusion according to the associated code to form corresponding classified data.
When data acquisition and processing are carried out, big data generated by different computers are firstly classified in a processing device according to format attributes of the big data to form classified data; for example, classification is performed according to text file data, image file data, and video file data; after the classification, the classification data is divided into a plurality of data blocks by the division logic, for example, the classification data is divided into a plurality of data blocks occupying the same size of the memory according to the receiving time sequence of the data and the same size. The different data blocks are then randomly distributed to the heterogeneous (which may be understood as different) storage servers. Due to the fact that separated storage is adopted, direct connection does not exist among different data blocks, even if partial data are obtained from a certain heterogeneous storage server, related or continuous data blocks cannot be obtained, and the data cannot be used.
Embodiment 3, the present invention further provides a device, which is applied to the computer big data acquisition and processing system, where the device has a search program, and the search program calls configuration files stored in the first configuration unit and the second configuration unit by loading a program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. The configuration file comprises a first storage path of each associated data block and a second storage path of each storage node.
The invention is characterized in that a retrieval tool or a retrieval program is built in the device, and the retrieval program is used for calling configuration files stored in a first configuration unit and a second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file. Thus, if a configuration file cannot be obtained, a contiguous, associated data block cannot be obtained.
Embodiment 4, the present invention further provides a medium including the above device, where the medium is a recorded search program and a recorded loader program. The invention is characterized in that a retrieval tool or a retrieval program is built in the device, and the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program, so that if the configuration files cannot be acquired, continuous and associated data blocks cannot be acquired.
The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims (9)

1. A computer big data acquisition and processing system is characterized by comprising:
the data acquisition device is accessed to the plurality of computers and used for acquiring big data generated by the plurality of computers and transmitting the big data to the processing device;
the processing device has:
the classification module is used for classifying the big data based on the format attribute of the big data to form classified data;
a partitioning module that partitions the classified data into a plurality of data blocks;
the tag module writes the associated codes in the attribute values of the data blocks in sequence according to the segmentation logic of the segmentation module to form associated data blocks, and the associated data blocks are reversely fused into classified data according to the associated codes;
a random configuration module, configured to randomly configure a first storage path for each associated data block, configured to store the associated data block into a storage node corresponding to the first storage path, record the first storage path of each associated data block, write the association code of the associated data block corresponding to the first storage path into a mapping table, and store the association code in a first configuration unit;
the storage module is provided with a plurality of different storage servers, and each different storage server corresponds to a storage node; and recording a second storage path of each storage node and storing the second storage path in a second configuration unit.
2. The computer big data acquisition and processing system according to claim 1, wherein the storage node has a detection unit, the detection unit is configured to obtain a storage state of the storage node, and when a capacity occupied by the associated data block in the storage node exceeds a set upper limit, a transfer command is triggered to store the associated data block in the storage node in the corresponding heterogeneous storage server according to the second path.
3. The computer big data acquisition and processing system according to claim 1 or 2, wherein the storage node at least has a first node unit and a second node unit, the second node unit is configured as a sub-library of the first node unit, when the detection unit detects that the occupied capacity of the first node unit exceeds a set upper limit, the push-down command is triggered to transfer all associated data blocks in the first node unit to the second node unit, when the occupied capacities of the first node unit and the second node unit both exceed the set upper limit, the push-down fails, the first node unit and the second node unit need to be merged and subjected to multi-path merging and sorting, and then the transfer command is triggered to enable the associated data blocks in the storage node to be stored in corresponding heterogeneous storage servers according to the second path.
4. A computer big data acquisition and processing method is characterized by comprising the following steps:
when a computer has big data generation, a data acquisition unit acquires and transmits big data generated by different computers to a processing device, the processing device classifies the big data by utilizing a classification module based on format attributes of the big data to form classified data, then the classified data is divided into a plurality of data blocks according to division logics, association codes are sequentially written in attribute values of the data blocks according to the division logics to form association data blocks, a random configuration module is adopted to randomly configure a first storage path for each association data block after the association data blocks are formed, the association data blocks are stored in storage nodes corresponding to the first storage paths, the storage nodes acquire storage states of the storage nodes through detection units arranged in the storage nodes, and when the capacity of the association data blocks in the storage nodes exceeds a set upper limit, a transfer command is triggered to enable the association data blocks in the storage nodes to be stored in corresponding heterogeneous storage nodes according to a second path In the server.
5. The computer big data acquisition processing method according to claim 4, wherein when the associated data blocks are stored in the storage nodes corresponding to the first storage paths, the first storage path of each associated data block is recorded, and the associated codes of the associated data blocks corresponding to the first storage paths are written into a mapping table and stored in a first configuration unit.
6. The big data collecting and processing method of claim 4, wherein the associated data blocks are reversely fused according to the associated codes to form corresponding classified data.
7. A device applied in the computer big data collecting and processing system according to any one of claims 1 to 3, wherein the device has a retrieval program, the retrieval program is used for calling the configuration files stored in the first configuration unit and the second configuration unit by loading the program; acquiring at least one data block or at least one classification data based on the established retrieval index; the classification data is formed by reversely fusing the associated data blocks according to the associated codes based on the configuration file.
8. The apparatus of claim 7, wherein the configuration file is a first storage path for each of the associated data blocks and a second storage path for each storage node.
9. A medium comprising the apparatus of any of claims 7 to 8, wherein the medium is a recorded search program and a loading program.
CN202210550505.XA 2022-05-20 2022-05-20 Computer big data acquisition and processing system, method, equipment and medium Pending CN114943287A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210550505.XA CN114943287A (en) 2022-05-20 2022-05-20 Computer big data acquisition and processing system, method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210550505.XA CN114943287A (en) 2022-05-20 2022-05-20 Computer big data acquisition and processing system, method, equipment and medium

Publications (1)

Publication Number Publication Date
CN114943287A true CN114943287A (en) 2022-08-26

Family

ID=82909207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210550505.XA Pending CN114943287A (en) 2022-05-20 2022-05-20 Computer big data acquisition and processing system, method, equipment and medium

Country Status (1)

Country Link
CN (1) CN114943287A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964754A (en) * 2023-03-16 2023-04-14 长城数字能源(西安)科技有限公司 Big data secure storage method and device
CN116070251A (en) * 2023-04-03 2023-05-05 国网冀北电力有限公司 Data processing system and method of data security monitoring platform
CN116225571A (en) * 2023-03-16 2023-06-06 长城数字能源(西安)科技有限公司 Data acquisition system, storage system and exchange method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964754A (en) * 2023-03-16 2023-04-14 长城数字能源(西安)科技有限公司 Big data secure storage method and device
CN116225571A (en) * 2023-03-16 2023-06-06 长城数字能源(西安)科技有限公司 Data acquisition system, storage system and exchange method
CN116225571B (en) * 2023-03-16 2023-09-29 长城数字能源(西安)科技有限公司 Data acquisition system, storage system and exchange method
CN116070251A (en) * 2023-04-03 2023-05-05 国网冀北电力有限公司 Data processing system and method of data security monitoring platform

Similar Documents

Publication Publication Date Title
CN114943287A (en) Computer big data acquisition and processing system, method, equipment and medium
US7933938B2 (en) File storage system, file storing method and file searching method therein
KR20200020347A (en) Method and device of searching index for sensor tag data
CN109271545B (en) Feature retrieval method and device, storage medium and computer equipment
CN110928851A (en) Method, device and equipment for processing log information and storage medium
CN105528454A (en) Log treatment method and distributed cluster computing device
CN113094374A (en) Distributed storage and retrieval method and device and computer equipment
CN105574148A (en) Digital slide storage system and digital slide browsing method
CN101833511A (en) Data management method, device and system
CN113448946B (en) Data migration method and device and electronic equipment
CN114816728A (en) Elastic expansion method and system for cloud environment MongoDB database cluster instance node
US20040107204A1 (en) File management apparatus
CN109739854A (en) A kind of date storage method and device
Alam et al. Intellibvr-intelligent large-scale video retrieval for objects and events utilizing distributed deep-learning and semantic approaches
AL-Msie'deen et al. Detecting commonality and variability in use-case diagram variants
CN102906740B (en) The method and system of packed data record and process packed data record
CN114564458B (en) Method, device, equipment and storage medium for synchronizing data among clusters
CN108228101B (en) Method and system for managing data
CN106844480B (en) A kind of cleaning comparison storage method
CN110389939A (en) A kind of Internet of Things storage system based on NoSQL and distributed file system
CN110221778A (en) Processing method, system, storage medium and the electronic equipment of hotel's data
CN114936269A (en) Document searching platform, searching method, device, electronic equipment and storage medium
JP6273969B2 (en) Data processing apparatus, information processing apparatus, method, and program
CN102622284A (en) Data asynchronous replication method directing to mass storage system
CN113115069A (en) Video storage method and system of automobile data recorder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination