CN114546268B - Data information storage system and method under big data scene - Google Patents

Data information storage system and method under big data scene Download PDF

Info

Publication number
CN114546268B
CN114546268B CN202210137567.8A CN202210137567A CN114546268B CN 114546268 B CN114546268 B CN 114546268B CN 202210137567 A CN202210137567 A CN 202210137567A CN 114546268 B CN114546268 B CN 114546268B
Authority
CN
China
Prior art keywords
data
storage
unit
stored
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210137567.8A
Other languages
Chinese (zh)
Other versions
CN114546268A (en
Inventor
贾浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaibei Pengshun Information Technology Co ltd
Original Assignee
Huaibei Pengshun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaibei Pengshun Information Technology Co ltd filed Critical Huaibei Pengshun Information Technology Co ltd
Priority to CN202210137567.8A priority Critical patent/CN114546268B/en
Publication of CN114546268A publication Critical patent/CN114546268A/en
Application granted granted Critical
Publication of CN114546268B publication Critical patent/CN114546268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data information storage system and a method under a big data scene, wherein the data information storage system comprises the following steps: the system comprises a data information acquisition module, a database, a data storage planning module, a data dynamic monitoring module and an exception handling module, wherein the data information acquisition module is used for acquiring storage space, storage addresses and storage data type information, the database is used for storing and managing received data, the data storage planning module is used for analyzing the logic relation among the storage data, planning the data storage mode and storing the data, the data dynamic monitoring module is used for monitoring the dynamic change condition of the stored data in real time, the exception handling module is used for tracing the data change condition, and whether the storage addresses need to be reserved for the data is analyzed or not: if the storage address needs to be reserved, the position of the reserved storage address is planned, and the data does not need to be fetched from the head, so that the fetching speed of the stored data is increased, and the fetching speed of the connected data before repairing the broken link is increased.

Description

Data information storage system and method under big data scene
Technical Field
The invention relates to the technical field of data storage, in particular to a data information storage system and method under a big data scene.
Background
The data storage means that temporary files generated in the processing process of the data stream or information to be searched in the processing process are stored, and along with rapid development of technology, more and more data are generated, and the data storage mode is gradually expanded from a single sequential storage mode to other storage modes, so that the storage space is fully utilized to the greatest extent to store large data;
The existing storage mode has certain disadvantages: firstly, in a sequential storage structure, a group of data occupies a continuous storage address, a space enough for storing the data must be found in a memory in advance, the data cannot be distinguished in the storage space, and the data modification is more inconvenient; secondly, although the chain type storage structure can improve part of defects of the sequential storage structure to a certain extent, the existing chain type storage structure has the problems that the existing chain type storage structure is too scattered and cannot be debugged uniformly; finally, the data in the chained memory structure needs to be fetched from the head, which easily results in the memory data fetching time being prolonged, and some emergency situations can not be handled.
Therefore, a data information storage system and method in a big data scenario are needed to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide a data information storage system and a data information storage method in a big data scene so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: a data information storage system in a big data scenario, characterized by: the system comprises: the system comprises a data information acquisition module, a database, a data storage planning module, a data dynamic monitoring module and an abnormality processing module;
the data information acquisition module is used for acquiring storage space, storage address and storage data type information and transmitting all acquired information to the database;
the database is used for storing and managing the received data and is used for being called by the data storage planning module;
the data storage planning module is used for analyzing the logic relation between the stored data, planning the data storage mode and storing the data;
The data dynamic monitoring module is used for monitoring the dynamic change condition of stored data in real time, and sending an early warning signal to the abnormality processing module when the data changes dynamically;
The exception handling module is used for tracing the data change condition and analyzing whether a storage address needs to be reserved for the data: if the reserved storage address is needed, planning the position of the reserved storage address.
Further, the data information acquisition module comprises a storage space acquisition unit, a storage address acquisition unit and a storage data acquisition unit, wherein the storage space acquisition unit is used for acquiring space capacity data of data storage; the storage address acquisition unit is used for acquiring storage address data which are allocated by default in the storage space; the storage data acquisition unit is used for acquiring the type of data to be stored.
Further, the data storage planning module comprises a logic relation analysis unit, an overall link planning unit, a data block judgment unit and a block storage control unit, wherein the logic relation analysis unit is used for calling the collected data type to be stored, analyzing the logic relation among the data to be stored and transmitting the analysis result to the overall link planning unit; the whole link planning unit is used for setting pointers according to the logic relation between data to be stored, constructing a chain type storage structure and storing the data in a chain type; the data block judging unit is used for judging whether the data block storage is needed according to the called condition of the data, and if the judging result is that the data block storage is needed, the corresponding data block storage is controlled by the block storage control unit.
Further, the data dynamic monitoring module comprises a change dynamic monitoring unit, a data calling monitoring unit and a change early warning unit, wherein the change dynamic monitoring unit is used for monitoring the change condition of the data subjected to chain storage in real time: when the stored data is deleted, the change early warning unit is utilized to send an early warning signal to the abnormality processing module; the data calling monitoring unit is used for monitoring the data called condition of chain storage in real time: if the stored data is deleted and before the storage link interrupted by the deleted data is connected, the data pointed by the pointer of the deleted data is called, and an early warning signal is sent to the exception processing module through the change early warning unit.
Further, the exception handling module comprises a data change tracing unit and an address reservation analysis unit, wherein the data change tracing unit is used for inquiring a data change reason for chained storage after receiving an early warning signal of data change; the address reservation analysis unit is used for analyzing whether a reserved storage address is required to be set for the pointed data after receiving an early warning signal that the pointed data of the deleted data is called up or not: if the storage address needs to be reserved, planning the reserved position of the storage address.
A data information storage method under a big data scene is characterized in that: the method comprises the following steps:
s1: collecting storage space, storage address and storage data information;
S2: analyzing the logic relation between the storage data and constructing a chain type storage structure;
S3: judging whether the data needs to be stored in blocks or not: if not, storing data according to the original chain type storage structure; if necessary, splitting the chain type storage structure, and controlling corresponding data to be stored in blocks;
S4: monitoring the change condition and the called condition of the data in real time, and carrying out early warning when the data change is abnormal;
s5: tracing the data change and reserving a storage address.
Further, in steps S1-S2: the method comprises the steps that A storage space collection unit is used for collecting the space capacity to be stored as W, A storage address collection unit is used for collecting the default distribution storage unit set in the storage space as a= { A1, A2, …, an }, wherein n represents the number of the default distribution storage units, A storage data collection unit is used for collecting the data set to be stored as A= { A1, A2, …, am }, m types of data are all used, the capacity set of the m types of data is W '={W1',W2',…,Wm' }, and A logic relation analysis unit is used for analyzing the logic relation among the data to be stored: the total number of times of data acquisition is K, two random data Ai and Aj are selected, the number of times of simultaneous acquisition of the two random data is counted as K, and a logic relation coefficient Q i between the two random data is calculated according to the following formula:
Wherein sim (Ai, aj) represents a correlation coefficient between data Ai and data Aj, the scope of sim (Ai, aj) is (0, 1), a set of logical relationship coefficients of data in the data set to be stored is obtained as q= { Q 1,Q2,…,Qm×(m-1)/2 }, a logical relationship coefficient threshold value is set as Q', wherein, Comparison of Q i and Q': if Q i is less than or equal to Q', the logic relation coefficient between the two corresponding data is not more than a threshold value; if Q i > Q', the logical relation coefficient between two corresponding data exceeds the threshold value, a pointer is arranged between the data with the logical relation coefficient exceeding the threshold value, the corresponding data is connected through the pointer, a chain type storage structure is constructed, the logical relation between the data can be mapped and divided to a certain extent from the specific condition that the data is called, on the basis of calculating the correlation coefficient, the logical relation coefficient is optimized in consideration of the frequency of data calling, the rationality of judging the logical relation between the data is improved, and the purpose of judging whether the pointer is needed to be arranged between the data or not is achieved, and the chain type storage structure is built in a targeted mode, so that the dispersibility of the chain type storage structure is improved.
Further, in step S3: judging whether the data block storage is needed by using a data block judging unit: m data of which the logical relation coefficient exceeds a threshold value are counted, and the number of the obtained pointers is as follows: m, collecting the storage capacity of each pointer as W, selecting random two kinds of data A i and A i+1 connected through the pointers, obtaining K i ' and K i+1 ' which are respectively called corresponding data, and calculating the residual storage space capacity W The remainder is after constructing a chain storage structure according to the following formula, wherein the number of times of simultaneous calling is K '.
Wherein Wi' represents the capacity of random one of m data, W represents the space capacity to be stored, and W The remainder is and W are compared: if it isIndicating that the residual storage space exceeds 1/3 of the space to be stored, and not performing block storage; if/>The method includes the steps that the residual storage space is not more than 1/3 of the storage space to be stored, the need of block storage is judged, and if the need of block storage is judged: the necessary coefficient F i for storing the data a i and a i+1 in blocks is calculated according to the following formula:
Wherein K represents the total number of times the stored data is called, a necessary coefficient set for carrying out block storage on the data connected through the pointer is obtained as F= { F 1,F2,…,FM }, and a necessary coefficient threshold value is set as Compare F i with/>Wherein/> If/>Explaining that the necessary coefficients for storing data a i and a i+1 in blocks exceed the threshold, data a i and a i+1 are stored in blocks: the pointers between the data A i and the data A i+1 are deleted, and the chained storage structure is complete, so that the data is required to be fetched from the head, the fetching speed of the stored data is reduced, on the basis of considering the residual storage space capacity, whether the data is required to be stored in blocks is analyzed according to the specific condition that the data connected through the pointers is fetched, and the fetching speed of the stored data is increased after the data are stored in blocks without being fetched from the head.
Further, in steps S4-S5: and monitoring the change condition of the stored data in real time by using a change dynamic monitoring unit: when the stored data is detected to be deleted, the deleted data is confirmed to be stored in the storage unit ai, an early warning signal is sent to the data change tracing unit by the change early warning unit, and the data change tracing unit inquires the reason of the data to be deleted after receiving the early warning signal of the data change.
Further, if the deleted data has a pointer, confirming that the storage unit pointed by the pointer is aj, if the data stored in aj is accessed after the stored data is deleted and before a storage link interrupted by the deleted data is connected, counting the number of times the data stored in aj is accessed as N, if N > K/2, sending an early warning signal to an address reservation analysis unit by using a change early warning unit, and setting a reserved storage address by using the address reservation analysis unit: the method comprises the steps of obtaining p pointers pointing to deleted data, wherein the p pointers are respectively arranged on { f1, f2, … and fp } nodes of a storage link, the number of times of data retrieval corresponding to the pointers is set as N '= { N1', N2', … and Np' }, and calculating a feasible coefficient Ei of a reserved storage address on a data storage unit corresponding to one random pointer pointing to the deleted data according to the following formula:
Ni' represents the number of times of data corresponding to a pointer pointing to deleted data is called, fi represents the number of times that the data corresponding to the pointer is called on the fi-th node of a storage link where the pointer is located, a feasible coefficient set is E= { E1, E2, …, ep }, the highest feasible coefficient is screened out, a storage address of aj is reserved on a data storage unit corresponding to the highest feasible coefficient, and as the chained storage structure is convenient to modify relative to the sequential storage structure, the situation that the data is added, modified or deleted is easier to occur, if the data connected with the deleted data needs to be called, a storage address of the connected data cannot be found in time, and a certain time is required to modify the pointer address, so that the reserved address is favorable for searching and calling the connected data in time.
Compared with the prior art, the invention has the following beneficial effects:
According to the invention, by collecting a large amount of information stored in data, analyzing the logic relationship among the stored data, constructing a proper chain type storage structure, and when the logic relationship is analyzed, the logic relationship coefficient is optimized by combining the correlation degree of the data content and the frequently called data, so that the rationality of judging the logic relationship among the data is improved, and the constructed chain type storage structure is more beneficial to improving the problems of dispersion and inconvenience in debugging of the existing chain type storage structure; the chain type storage structure is split, and data is stored in a proper block mode, so that the data does not need to be fetched from the head, and the fetching speed of the stored data is increased; when the data is changed, the address of the data storage unit connected with the changed data is reserved, so that the speed of calling the connected data before repairing the broken link is increased.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a block diagram of a data information storage system in a big data scenario of the present invention;
fig. 2 is a flowchart of a data information storage method in a big data scenario according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Referring to fig. 1-2, the present invention provides the following technical solutions: a data information storage system in a big data scenario, characterized by: the system comprises: the system comprises a data information acquisition module, a database, a data storage planning module, a data dynamic monitoring module and an abnormality processing module;
The data information acquisition module is used for acquiring the storage space, the storage address and the storage data type information and transmitting all acquired information to the database;
The database is used for storing and managing the received data and is called by the data storage planning module;
the data storage planning module is used for analyzing the logic relation between the stored data, planning the data storage mode and storing the data;
the data dynamic monitoring module is used for monitoring the dynamic change condition of stored data in real time and sending an early warning signal to the abnormality processing module when the data changes dynamically;
The exception handling module is used for tracing the data change condition and analyzing whether a storage address needs to be reserved for the data: if the reserved storage address is needed, planning the position of the reserved storage address.
The data information acquisition module comprises a storage space acquisition unit, a storage address acquisition unit and a storage data acquisition unit, wherein the storage space acquisition unit is used for acquiring space capacity data of data storage; the storage address acquisition unit is used for acquiring storage address data which are allocated by default in the storage space; the storage data acquisition unit is used for acquiring the type of data to be stored.
The data storage planning module comprises a logic relation analysis unit, an integral link planning unit, a data block judgment unit and a block storage control unit, wherein the logic relation analysis unit is used for calling the collected data types to be stored, analyzing the logic relation among the data to be stored and transmitting the analysis result to the integral link planning unit; the whole link planning unit is used for setting pointers according to the logic relation between the data to be stored, constructing a chain type storage structure and storing the data in a chain type; the data block judging unit is used for judging whether the data block storage is needed according to the called condition of the data, and if the judging result is that the data block storage is needed, the block storage control unit controls the corresponding data to be stored in a block mode.
The data dynamic monitoring module comprises a change dynamic monitoring unit, a data calling monitoring unit and a change early warning unit, wherein the change dynamic monitoring unit is used for monitoring the change condition of data stored in a chained mode in real time: when the stored data is deleted, a change early warning unit is utilized to send an early warning signal to an abnormality processing module; the data calling monitoring unit is used for monitoring the data called condition of chain storage in real time: if the stored data is deleted and before the storage link interrupted by the deleted data is connected, the data pointed by the pointer of the deleted data is fetched, and an early warning signal is sent to the exception processing module through the change early warning unit.
The exception handling module comprises a data change tracing unit and an address reservation analysis unit, wherein the data change tracing unit is used for inquiring the data change reason for chain storage after receiving an early warning signal of data change; the address reservation analysis unit is used for analyzing whether a reserved storage address is required to be set for pointed data after receiving an early warning signal that the pointed data of the deleted data is fetched: if the storage address needs to be reserved, planning the reserved position of the storage address.
A data information storage method under a big data scene is characterized in that: the method comprises the following steps:
s1: collecting storage space, storage address and storage data information;
S2: analyzing the logic relation between the storage data and constructing a chain type storage structure;
S3: judging whether the data needs to be stored in blocks or not: if not, storing data according to the original chain type storage structure; if necessary, splitting the chain type storage structure, and controlling corresponding data to be stored in blocks;
S4: monitoring the change condition and the called condition of the data in real time, and carrying out early warning when the data change is abnormal;
s5: tracing the data change and reserving a storage address.
In steps S1-S2: the method comprises the steps that A storage space collection unit is used for collecting the space capacity to be stored as W, A storage address collection unit is used for collecting the default distribution storage unit set in the storage space as a= { A1, A2, …, an }, wherein n represents the number of the default distribution storage units, A storage data collection unit is used for collecting the data set to be stored as A= { A1, A2, …, am }, m types of data are used for total, the capacity set of m types of data is W '= { W1', W2', …, wm' }, and A logic relation analysis unit is used for analyzing the logic relation among the data to be stored: the total number of times of data acquisition is K, two random data Ai and Aj are selected, the number of times of simultaneous acquisition of the two random data is counted as K, and a logic relation coefficient Q i between the two random data is calculated according to the following formula:
wherein sim (Ai, aj) represents a correlation coefficient between data Ai and data Aj, a set of logical relation coefficients of data in a dataset to be stored is obtained as q= { Q 1,Q2,…,Qm×(m-1)/2 }, a logical relation coefficient threshold value is set as Q', wherein, Comparison of Q i and Q': if Q i is less than or equal to Q', the logic relation coefficient between the two corresponding data is not more than a threshold value; if Q i is greater than Q', the logical relation coefficient between the two corresponding data exceeds the threshold value, a pointer is arranged between the data with the logical relation coefficient exceeding the threshold value, the corresponding data is connected through the pointer, a chain type storage structure is constructed, the logical relation coefficient is optimized, the rationality of judging the logical relation between the data is improved, and the dispersity of the chain type storage structure is improved effectively.
In step S3: judging whether the data block storage is needed by using a data block judging unit: m data of which the logical relation coefficient exceeds a threshold value are counted, and the number of the obtained pointers is as follows: m, collecting the storage capacity of each pointer as W, selecting random two kinds of data A i and A i+1 connected through the pointers, obtaining K i ' and K i+1 ' which are respectively called corresponding data, and calculating the residual storage space capacity W The remainder is after constructing a chain storage structure according to the following formula, wherein the number of times of simultaneous calling is K '.
Wherein Wi' represents the capacity of random one of m data, W represents the space capacity to be stored, and W The remainder is and W are compared: if it isIndicating that the residual storage space exceeds 1/3 of the space to be stored, and not performing block storage; if/>The method includes the steps that the residual storage space is not more than 1/3 of the storage space to be stored, the need of block storage is judged, and if the need of block storage is judged: the necessary coefficient F i for storing the data a i and a i+1 in blocks is calculated according to the following formula:
Wherein K represents the total number of times the stored data is called, a necessary coefficient set for carrying out block storage on the data connected through the pointer is obtained as F= { F 1,F2,…,FM }, and a necessary coefficient threshold value is set as Compare F i with/>Wherein/> If/>Explaining that the necessary coefficients for storing data a i and a i+1 in blocks exceed the threshold, data a i and a i+1 are stored in blocks: the pointers between the data A i and the data A i+1 are deleted, and the data after the storage is divided into blocks does not need to be called from the beginning, so that the data storage calling speed is increased.
In steps S4-S5: and monitoring the change condition of the stored data in real time by using a change dynamic monitoring unit: when the stored data is detected to be deleted, the deleted data is confirmed to be stored in the storage unit ai, the change early warning unit is utilized to send an early warning signal to the data change tracing unit, and the data change tracing unit inquires the reason for the data to be deleted after receiving the early warning signal of the data change.
If the deleted data has a pointer, confirming that a storage unit pointed by the pointer is aj, if the stored data is deleted and before a storage link interrupted by the deleted data is connected, monitoring that the data stored in aj is called, counting the number of times that the data stored in aj is called as N, if N is more than K/2, sending an early warning signal to an address reservation analysis unit by using a change early warning unit, and setting a reserved storage address by using the address reservation analysis unit: the method comprises the steps of obtaining p pointers pointing to deleted data, wherein the p pointers are respectively arranged on { f1, f2, … and fp } nodes of a storage link, the number of times of data retrieval corresponding to the pointers is set as N '= { N1', N2', … and Np' }, and calculating a feasible coefficient Ei of a reserved storage address on a data storage unit corresponding to one random pointer pointing to the deleted data according to the following formula:
Ni' represents the number of times that data corresponding to a pointer pointing to deleted data is called, fi represents the number of times that the data corresponding to the pointer is called on the fi node of the storage link where the pointer is located, a feasible coefficient set of E= { E1, E2, …, ep } is obtained, the highest feasible coefficient is screened out to be Emax, a storage address of aj is reserved on a data storage unit corresponding to the highest feasible coefficient, and the reserved address is convenient for searching and calling connected data in time.
Embodiment one: the space capacity to be stored is acquired to be W=1GB, the data set to be stored is acquired to be A= { A1, A2, A3}, the capacity set of 3 data is acquired to be W '= { W1', W2', W3' } = {200, 300, 260}, and the unit is: MB, the total number of times of data retrieval is k=10, the number of times of simultaneous data retrieval of two random data is K1=6, K2=3, K3=1, according to the formulaObtaining a set of logical relation coefficients of data in a data set to be stored as Q= { Q 1,Q2,Q3 } = {0.3,0.24,0.02}, sim (A1, A2) = 0.5, sim (A1, A3) = 0.8, sim (A2, A3) = 0.2, and setting a logical relation coefficient threshold value as Q',/Comparison of Q i and Q': q 1>Q',Q2>Q',Q3 < Q', pointers are arranged between data A1 and A2 and between data A1 and A3, a chain storage structure is constructed, M=2 groups of data with logical relation coefficients exceeding a threshold value are counted, and the number of the pointers is obtained: 2, collecting that the storage capacity of each pointer is w=8 bytes, obtaining that the total number of times data A1, A2 and A3 are respectively called is 7, 1 and 2, and the number of times of being simultaneously called is 6 and 3, according to the formula/> Calculating the remaining storage space capacity W The remainder is = 256,/>, after constructing the chain storage structureThe residual storage space is not more than 1/3 of the space to be stored, the block storage is judged to be needed, and if the block storage is judged to be needed: according to the formula/>Calculating a necessary coefficient f1=0.2 and f2=0.6 for storing the data A1 and A2 in blocks, and setting a necessary coefficient threshold value as/>The data A1 and A3 are stored in blocks: the pointer between data A1 and A3 is deleted.
Finally, it should be noted that: the foregoing is merely a preferred example of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A data information storage method under a big data scene is characterized in that: the method comprises the following steps:
s1: collecting storage space, storage address and storage data information;
S2: analyzing the logic relation between the storage data and constructing a chain type storage structure;
S3: judging whether the data needs to be stored in blocks or not: if not, storing data according to the original chain type storage structure; if necessary, splitting the chain type storage structure, and controlling corresponding data to be stored in blocks;
S4: monitoring the change condition and the called condition of the data in real time, and carrying out early warning when the data change is abnormal;
S5: tracing the data change, and reserving a storage address;
In steps S1-S2: the method comprises the steps that A storage space collection unit is used for collecting the space capacity to be stored as W, A storage address collection unit is used for collecting the default distribution storage unit set in the storage space as a= { A1, A2, …, an }, wherein n represents the number of the default distribution storage units, A storage data collection unit is used for collecting the data set to be stored as A= { A1, A2, …, am }, m types of data are all used, the capacity set of the m types of data is W '={W1',W2',…,Wm' }, and A logic relation analysis unit is used for analyzing the logic relation among the data to be stored: the total number of times of data acquisition is K, two random data Ai and Aj are selected, the number of times of simultaneous acquisition of the two random data is counted as K, and a logic relation coefficient Q i between the two random data is calculated according to the following formula:
Wherein sim (Ai, aj) represents a correlation coefficient between data Ai and data Aj, the scope of sim (Ai, aj) is (0, 1), a set of logical relationship coefficients of data in the data set to be stored is obtained as q= { Q 1,Q2,…,Qm×(m-1)/2 }, a logical relationship coefficient threshold value is set as Q ', wherein, Comparison of Q i and Q ': if Q i≤Q' indicates that the logical relation coefficient between the two corresponding data does not exceed the threshold value; if Q i>Q' shows that the logical relation coefficient between the two corresponding data exceeds the threshold value, setting a pointer between the data with the logical relation coefficient exceeding the threshold value, connecting the corresponding data through the pointer, and constructing a chain type storage structure;
In step S3: judging whether the data block storage is needed by using a data block judging unit: counting M groups of data with logical relation coefficients exceeding a threshold value, and obtaining the number of pointers as follows: m, collecting the storage capacity of each pointer as W, selecting random two kinds of data A i and A i+1 connected through the pointers, obtaining K i ' and K i+1 ' which are respectively called corresponding data, and K ' which are called simultaneously, and calculating the residual storage space capacity W The remainder is after constructing a chain storage structure according to the following formula:
Wherein Wi ' represents the capacity of random one of m data, W represents the space capacity to be stored, and W The remainder is is compared with W: if it is Indicating that the residual storage space exceeds 1/3 of the space to be stored, and not performing block storage; if/>The method includes the steps that the residual storage space is not more than 1/3 of the storage space to be stored, the need of block storage is judged, and if the need of block storage is judged: the necessary coefficient F i for storing the data a i and a i+1 in blocks is calculated according to the following formula:
Wherein K represents the total number of times the stored data is called, a necessary coefficient set for carrying out block storage on the data connected through the pointer is obtained as F= { F 1,F2,…,FM }, and a necessary coefficient threshold value is set as Compare F i with/>Wherein,If/>Explaining that the necessary coefficients for storing data a i and a i+1 in blocks exceed the threshold, data a i and a i+1 are stored in blocks: the pointer between data a i and a i+1 is deleted.
2. The method for storing data information in a big data scene according to claim 1, wherein: in steps S4-S5: and monitoring the change condition of the stored data in real time by using a change dynamic monitoring unit: when the stored data is detected to be deleted, the deleted data is confirmed to be stored in the storage unit ai, an early warning signal is sent to the data change tracing unit by the change early warning unit, and the data change tracing unit inquires the reason of the data to be deleted after receiving the early warning signal of the data change.
3. The method for storing data information in a big data scene according to claim 2, wherein: if the deleted data has a pointer, confirming that a storage unit pointed by the pointer is aj, if the stored data is deleted and before a storage link interrupted by the deleted data is connected, monitoring that the data stored in aj is called, counting the number of times that the data stored in aj is called as N, if N is more than K/2, sending an early warning signal to an address reservation analysis unit by using a change early warning unit, and setting a reserved storage address by using the address reservation analysis unit: the method comprises the steps of obtaining p pointers pointing to deleted data, wherein the p pointers are respectively arranged on { f1, f2, … and fp } nodes of a storage link, the number of times of data retrieval corresponding to the pointers is N '={N1',N2',…,Np' }, and calculating a feasible coefficient Ei of a reserved storage address on a data storage unit corresponding to one random pointer pointing to the deleted data according to the following formula:
Ni ' represents the number of times that data corresponding to a pointer pointing to deleted data is called, fi represents the number of times that the data corresponding to the pointer is called on the fi node of the storage link where the data corresponding to the pointer is located, a feasible coefficient set of E= { E1, E2, …, ep }, the highest feasible coefficient is screened out as Emax, and a storage address of aj is reserved on a data storage unit corresponding to the highest feasible coefficient.
4. A data information storage system in a big data scene, applied to a data information storage method in a big data scene as claimed in claim 1, characterized in that: the system comprises: the system comprises a data information acquisition module, a database, a data storage planning module, a data dynamic monitoring module and an abnormality processing module;
the data information acquisition module is used for acquiring storage space, storage address and storage data type information and transmitting all acquired information to the database;
the database is used for storing and managing the received data and is used for being called by the data storage planning module;
the data storage planning module is used for analyzing the logic relation between the stored data, planning the data storage mode and storing the data;
The data dynamic monitoring module is used for monitoring the dynamic change condition of stored data in real time, and sending an early warning signal to the abnormality processing module when the data changes dynamically;
The exception handling module is used for tracing the data change condition and analyzing whether a storage address needs to be reserved for the data: if the reserved storage address is needed, planning the position of the reserved storage address.
5. The data information storage system in a big data scenario of claim 4, wherein: the data information acquisition module comprises a storage space acquisition unit, a storage address acquisition unit and a storage data acquisition unit, wherein the storage space acquisition unit is used for acquiring space capacity data of data storage; the storage address acquisition unit is used for acquiring storage address data which are allocated by default in the storage space; the storage data acquisition unit is used for acquiring the type of data to be stored.
6. The data information storage system in a big data scenario of claim 4, wherein: the data storage planning module comprises a logic relation analysis unit, an integral link planning unit, a data block judging unit and a block storage control unit, wherein the logic relation analysis unit is used for calling the collected data type to be stored, analyzing the logic relation among the data to be stored and transmitting the analysis result to the integral link planning unit; the whole link planning unit is used for setting pointers according to the logic relation between data to be stored, constructing a chain type storage structure and storing the data in a chain type; the data block judging unit is used for judging whether the data block storage is needed according to the called condition of the data, and if the judging result is that the data block storage is needed, the corresponding data block storage is controlled by the block storage control unit.
7. The data information storage system in a big data scenario of claim 4, wherein: the data dynamic monitoring module comprises a change dynamic monitoring unit, a data calling monitoring unit and a change early warning unit, wherein the change dynamic monitoring unit is used for monitoring the change condition of data subjected to chain storage in real time: when the stored data is deleted, the change early warning unit is utilized to send an early warning signal to the abnormality processing module; the data calling monitoring unit is used for monitoring the data called condition of chain storage in real time: if the stored data is deleted and before the storage link interrupted by the deleted data is connected, the data pointed by the pointer of the deleted data is called, and an early warning signal is sent to the exception processing module through the change early warning unit.
8. The data information storage system in a big data scenario of claim 4, wherein: the exception processing module comprises a data change tracing unit and an address reservation analysis unit, wherein the data change tracing unit is used for inquiring the data change reason for chain storage after receiving an early warning signal of data change; the address reservation analysis unit is used for analyzing whether a reserved storage address is required to be set for the pointed data after receiving an early warning signal that the pointed data of the deleted data is called up or not: if the storage address needs to be reserved, planning the reserved position of the storage address.
CN202210137567.8A 2022-02-15 2022-02-15 Data information storage system and method under big data scene Active CN114546268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210137567.8A CN114546268B (en) 2022-02-15 2022-02-15 Data information storage system and method under big data scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210137567.8A CN114546268B (en) 2022-02-15 2022-02-15 Data information storage system and method under big data scene

Publications (2)

Publication Number Publication Date
CN114546268A CN114546268A (en) 2022-05-27
CN114546268B true CN114546268B (en) 2024-04-19

Family

ID=81675302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210137567.8A Active CN114546268B (en) 2022-02-15 2022-02-15 Data information storage system and method under big data scene

Country Status (1)

Country Link
CN (1) CN114546268B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574201A (en) * 2016-01-05 2016-05-11 卡斯柯信号有限公司 Data formatting and file storage method based on real-time acquired data characteristic
KR102100346B1 (en) * 2019-08-29 2020-04-14 (주)프람트테크놀로지 Apparatus and method for managing dataset
CN113424144A (en) * 2019-03-12 2021-09-21 英特尔公司 Computing data storage system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113454954A (en) * 2019-01-29 2021-09-28 推特股份有限公司 Real-time event detection on social data streams
US11429730B2 (en) * 2019-11-25 2022-08-30 Duality Technologies, Inc. Linking encrypted datasets using common identifiers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574201A (en) * 2016-01-05 2016-05-11 卡斯柯信号有限公司 Data formatting and file storage method based on real-time acquired data characteristic
CN113424144A (en) * 2019-03-12 2021-09-21 英特尔公司 Computing data storage system
KR102100346B1 (en) * 2019-08-29 2020-04-14 (주)프람트테크놀로지 Apparatus and method for managing dataset

Also Published As

Publication number Publication date
CN114546268A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN111061752B (en) Data processing method and device and electronic equipment
US7716177B2 (en) Proactive space allocation in a database system
CN107656807B (en) Automatic elastic expansion method and device for virtual resources
EP4321980A1 (en) Method and apparatus for eliminating cache memory block, and electronic device
CN107817947B (en) Data storage method, device and system
CN115292266B (en) High-reliability log storage method based on memory
CN113946294A (en) Distributed storage system and data processing method thereof
CN109413166B (en) Industrial gateway and data management method thereof
CN110502510B (en) Real-time analysis and duplicate removal method and system for WIFI terminal equipment trajectory data
CN114546268B (en) Data information storage system and method under big data scene
EP0844564A2 (en) Memory manager system
CN106502786A (en) A kind of interrupt distribution method and device
CN112463542B (en) Log abnormality cause diagnosis method, device, computer equipment and storage medium
EP0220683A2 (en) Storage area structure in information processor
CN115292373B (en) Method and device for segmenting data block
CN110932935A (en) Resource control method, device, equipment and computer storage medium
CN116016673A (en) Feature code analysis system and method based on data transmission
CN115904211A (en) Storage system, data processing method and related equipment
CN111159438B (en) Data storage and retrieval method, electronic device and storage medium
CN113868217A (en) Unified storage and query method and system for multi-sensor data
CN108984422B (en) Method for saving memory based on NTFS and FAT32 file system cluster management
CN116450887B (en) Hospital informatization equipment management system
JPH07262054A (en) Failure information management system
CN111611285B (en) Transaction caching method, device and storage medium
CN113946296B (en) Distributed storage system with power-down protection function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant