CN106886376A - A kind of marine monitoring data trnascription management method optimized based on many attributes - Google Patents
A kind of marine monitoring data trnascription management method optimized based on many attributes Download PDFInfo
- Publication number
- CN106886376A CN106886376A CN201710201232.7A CN201710201232A CN106886376A CN 106886376 A CN106886376 A CN 106886376A CN 201710201232 A CN201710201232 A CN 201710201232A CN 106886376 A CN106886376 A CN 106886376A
- Authority
- CN
- China
- Prior art keywords
- data
- copy
- node
- attribute
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 64
- 238000007726 management method Methods 0.000 title claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 36
- 238000011156 evaluation Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims description 45
- 238000004364 calculation method Methods 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 7
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 claims description 2
- 229910052802 copper Inorganic materials 0.000 claims description 2
- 239000010949 copper Substances 0.000 claims description 2
- 239000002699 waste material Substances 0.000 abstract description 9
- 238000013459 approach Methods 0.000 abstract description 2
- 230000000717 retained effect Effects 0.000 abstract 1
- 238000013500 data storage Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013523 data management Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of marine monitoring data trnascription management method optimized based on many attributes, its step is:S1:Input marine monitoring big data;S2:Create marine monitoring big data copy;S3:The frame of node is placed in selection;S4:The memory space of decision node;S5:Nodal community Evaluations matrix is set up, and matrix is processed;S6:The weighting matrix of node determinant attribute is set up, optimal solution and most inferior solution is determined;S7:Each node to the relative exchange premium degree of optimal solution is calculated, the minimum node of approach degree is chosen;S8:Export the copy placement scheme of marine monitoring big data.The advantage is that, by data efficient respectively in cloud environment, copy as few as possible is retained on the premise of data reliability is ensured, reduce the unnecessary waste of memory space;The access temperature that each data is continually changing can be obtained;May determine that copy when is created or deleted under copy dynamic mode;Network delay is reduced, the overall stability of system is improved.
Description
Technical Field
The invention relates to the technical field of data monitoring, in particular to a marine monitoring data copy management method based on multi-attribute optimization.
Background
With the rapid development of marine observation and forecasting technology and the explosive growth of related data volume, the development of marine big data is becoming one of the important applications of scientific big data. The ocean big data mainly comprises observation data such as radar and satellite, numerical prediction mode results, prediction products and the like. According to previous studies on the amount of ocean data, the total amount of various ocean data in the world in 2014 is about 25PB, and the total amount of ocean data in the world 2030 is expected to reach 275 PB. In order to better adapt to the processing and analysis of the real-time big data of ocean monitoring, a distributed storage mode in a cloud environment is needed. However, in a complex network environment of a cloud storage system, a single data storage is often affected by a network and a rack storage node, and data transmission delay occurs or even access data fails, so that adding a copy of data can increase the stability of the system and improve the performance and efficiency of accessing data. Therefore, it is necessary to design a copy management method facing ocean monitoring big data.
The duplication technology is to copy a data item into multiple copies and store the copies in multiple nodes of a distributed system respectively, so as to improve the reliability, load balance and access rate of the system, and is the most common and important data management mechanism. However, while improving the performance of the system in all aspects, the replica technique also brings a series of management and overhead problems. Currently, researchers have proposed a variety of copy management strategies.
A static distributed data replication algorithm for GFS that determines the replication of a block of data from three aspects: 1) and placing the new copy on a data server with the disk space utilization rate lower than the average value. 2) Limiting the number of times each data server has recently created a copy. 3) Different copies of the data block are distributed to different racks. The disadvantage of this algorithm is that it assigns a fixed number of copies to all data in the system, which is not an optimal choice for the data.
A novel cost-effective dynamic data replica strategy CIR in a cloud data center adopts an incremental replica mode to achieve the cost-effective aim, so that the number of replicas is minimized while the requirement of system reliability is met. Their proposed method can reduce data storage costs, especially when the data only needs to be stored for a brief period of time or has a relatively low availability requirement. However, their approach is based only on reliability parameters and the price model of amazon S3, making it unsuitable for google clusters with higher failure rates than amazon S3 storage cells.
Chinese patent CN201610912503.5, published as 2017.03.15, discloses a method for accessing distributed block data, which includes obtaining a data block to be accessed according to a received data access request; acquiring at least two node servers where at least two copies of the data block to be accessed are respectively located; according to the load values of the at least two node servers, at least two numerical value intervals are traveled; generating a random number within the at least two value intervals; selecting a node server from the at least two node servers according to the random number and the at least two numerical value intervals; and accessing a copy of the database to be accessed from the selected node server. The method can realize load balance of the node servers, but the number of the copies in the method is certain and cannot be changed along with the change of data access, and although the load balance can be ensured, the data cannot be efficiently distributed in a cloud environment, so that waste of storage space is caused.
Chinese invention patent CN201580024098.7, published No. 2017.02.22, discloses a distributed remote data storage access method, comprising storing user data in a first part of a data store of a first network attached storage device (NAS); enabling file sharing functionality of the first NAS; designating a second portion of the data storage of the first NAS for shared data storage in conjunction with the enabling; providing a copy of at least a portion of the user data to a second NAS for storage therein; receiving a copy of third party data from the second NAS; storing a copy of the third party data in a second portion of the data storage of the first NAS; downloading at least a portion of the user data from the first NAS; downloading at least a portion of the copy of the user data from the second NAS. The method distributes data among one or more data storage devices, can provide increased data security or data access through data redundancy, but can only be used for storing and accessing less data, can not be used for storing and accessing a large amount of data, and can not provide efficiency and security when accessing the large amount of data.
Chinese invention patents CN201610862411.0 and CN201610822923.4, published as 2017.02.22, respectively disclose a data storage method and a data reading method. A data storage method comprises the steps of carrying out data writing operation; acquiring a pre-stored data version number and updating the data version number; checking whether the data version numbers of the copies of the data are consistent; and if the data are inconsistent, selecting the copy with the most complete data to replace other copies. A data reading method comprises the steps that a server where a master copy is located receives a data reading request of a user; selecting a server with the smallest load according to the load information of the servers with all the copies; if the server where the primary copy is located is not the server with the minimum load, the data reading request is sent to the server where the standby copy with the minimum load is located, so that the server can perform data reading operation; and receiving the data read by the server where the backup copy with the minimum load is located, and displaying the data to a user. By using the data storage method, the data writing delay is reduced, and the block storage performance is improved; by using the data reading method, the load balance of the servers where the plurality of copies are located can be realized, the data reading efficiency is improved, and the problems of resource waste of the server where the standby copy is located and overload of the server where the main copy is located in the prior art are solved. However, the method cannot dynamically change the number of the copies, so that the server is efficiently distributed, and the waste of storage space is reduced.
The duplicate data not only can improve the reliability and the safety of the data, but also can balance the load to a certain extent and improve the response speed of the data. Therefore, the replica technology becomes a key technology for marine big data storage and management. The purpose of creating the copy is to improve the efficiency and reliability of the system, and it is also important to select an appropriate location to store the copy after the copy is created. For the cloud environment for storing ocean big data, the cloud environment has the characteristics of large number of storage nodes, complex and changeable users and large node performance difference, and for the characteristics, the research on how to efficiently distribute data objects into the cloud environment and keep as few backups as possible on the premise of ensuring the reliability of the data is a challenging problem in order to reduce unnecessary waste of storage space.
Therefore, there is a need for a copy management method capable of efficiently distributing data objects to a cloud environment, keeping as few backups as possible while ensuring data reliability, reducing unnecessary waste of storage space, and dynamically changing the number of copies according to the access heat of the copies, which is not reported yet.
Disclosure of Invention
The invention aims to provide a marine monitoring data copy management method based on multi-attribute optimization, aiming at the defects in the prior art.
In order to achieve the purpose, the invention adopts the technical scheme that:
a marine monitoring data copy management method based on multi-attribute optimization comprises the following steps:
s1: inputting ocean monitoring big data;
s2: establishing a marine monitoring big data copy;
s3: selecting a rack for placing nodes;
s4: judging the storage space of the node;
s5: establishing a node attribute evaluation matrix, and processing the matrix;
s6: establishing a weighting matrix of the key attributes of the nodes, and determining an optimal solution and a worst solution;
s7: calculating the relative sticking progress of each node to the optimal solution, and selecting the node with the minimum closeness;
s8: outputting a copy layout scheme of the ocean monitoring big data;
wherein, the step S5 is to establish a node attribute evaluation matrix by a TOPSIS method, and the key attributes in the step S6 include node visit volume, node response ratio and node bandwidth.
As a preferred technical solution, the work flow of step S2 is:
s21: calculating the data heat of the ocean monitoring big data;
s22: calculating the copy heat of the ocean monitoring big data;
s23: calculating the number of copies of the ocean monitoring big data;
s24: and (4) establishing a marine monitoring big data copy.
As a preferred technical solution, the Data heat in step S21 is a weighted average of the frequency of Data accessed in each period, and is represented by Data _ h, and its calculation formula is shown in (1):
α+β=1
Data_h0=0
Data_hi=αData_hi-1+βfi(1)
wherein, Data _ h0Representing the heat value when the data is initially created, and assigning the initial value of the heat of the data to be 0 for the convenience of calculation; data _ h1Indicating the heat value of certain data at the end of the ith period; f. ofiIndicating the access frequency of the data from the monitoring process during the ith cycle, α is the historical access factor during the ith-1 cycleβ is the access factor of the ith cycle.
As a preferred technical solution, in the method for calculating the heat of duplicates in step S22, assuming that there are g duplicates of the data in the cloud storage system, the storage node equally distributes the request of the client to each duplicate as much as possible, and a formula for calculating the heat of data of each duplicate is as shown in (2):
DReplica_hi=Data_hi/g
=(αData_hi-1+βf)/g (2)
as a preferred technical solution, the calculating the number of copies in step S23 includes calculating a minimum copy value and a maximum copy value, where the minimum copy value is calculated according to the reliability requirement of the user, and a calculation formula thereof is shown in (3):
R≤(1-C)(1-La) (3)
wherein R is the reliability requirement of a user on certain data, a is the minimum value of a copy, L is the failure rate of a storage node, L can also be the failure rate of a single copy, and C is the environment failure rate in the cloud environment caused by various uncertain factors, (1-C) (1-L)a) Is the probability of a successful access of a data;
the maximum value of the copy is calculated according to the consistency overhead of the system operation, and the calculation formula is shown as (4):
b≤W/(h×v) (4)
wherein b is the maximum value of the copy, h is the size of the updated copy content, v is the frequency of updating the copy by the user, and W is the flow caused by maintaining the consistency of the copy in unit time.
As a preferred technical solution, the specific process of creating the copy in step S24 is as follows:
s241: calculating the copy heat of the ocean monitoring big data;
s242: determining duplicatesWhether the heat value is at the reduced threshold value2And increasing the threshold1If it is larger than the increase threshold value1Step S243 is entered, if it is less than the increase threshold value1The process proceeds to step S245, and if the threshold value is decreased2And increasing the threshold1Step S247 is performed therebetween;
s243: judging whether the copy value is larger than the maximum copy value b, if so, entering a step S247, and if not, entering a step S244;
s244: adding one copy, adding 1 to the number of the copies, and repeating the steps S241-S242;
s245: judging whether the copy value is smaller than the copy minimum value a, if so, entering a step S247, and if so, entering a step S246;
s246: deleting one copy, subtracting 1 from the number of the copies, and repeating the steps S241-S242;
s247: and outputting the marine monitoring big data copy.
As a preferred technical solution, the specific process in step S5 is to establish an attribute evaluation matrix by using a TOPSIS method, where r alternatives are provided, each alternative has S attributes, hij represents the jth attribute of the ith alternative, establish an attribute evaluation matrix H, and perform normalization processing on the matrix H to obtain a matrix G, as shown in (5):
wherein,
as a preferred technical solution, the specific process of step S6 is as follows:
s61: selecting key attributes including node access amount, node response ratio and node bandwidth;
s62: establishing a weighting matrix WG, as shown in (6):
wherein, wgijI in (1) is the weight of the 1 st column key attribute in all the attributes, s is the attribute of each alternative, i is 1,2, … …, r, j is 1,2, … …, s;
s63: determining an optimal solution G for an attribute evaluation matrix+And the worst solution G-Optimal solution G+For the set of maxima of each column, the worst solution G-The calculation formula of the set composed of the minimum values of each column is respectively shown as (7) and (8):
H+=(h1 +,h2 +…hs +) (7)
H-=(h1 -,h2 -…hs -) (8)
as a preferred technical solution, the specific process of step S7 is as follows:
s71: calculating the distance from each node to the optimal solution and the worst solution, wherein the calculation formulas are shown as (9) and (10):
s72: calculating the relative sticking rate of each node and the optimal solution, wherein the calculation formula is shown as (11), then selecting the node with the minimum degree of closeness through sorting,
wherein, UiAnd the relative pasting degree of the ith node and the optimal solution is obtained.
The invention has the advantages that:
1. the data are efficiently and respectively stored in the cloud environment, and copies as few as possible are reserved on the premise of ensuring the reliability of the data, so that unnecessary waste of storage space is reduced;
2. the access heat of each datum which changes constantly can be obtained;
3. it may be determined when a replica is created or deleted in the replica dynamic mode;
4. network delay is reduced, and the overall stability of the system is improved.
Drawings
FIG. 1 is a flow chart of a marine surveillance data copy management method based on multi-attribute optimization according to the invention.
FIG. 2 is a copy creation flow chart of a marine surveillance data copy management method based on multi-attribute optimization according to the invention.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings.
Example 1
Referring to fig. 1, the method for managing marine monitoring data copies based on multi-attribute optimization according to the present invention includes the following steps:
s1: inputting ocean monitoring big data;
s2: establishing a marine monitoring big data copy;
s3: selecting a rack for placing nodes;
s4: judging the storage space of the node;
s5: establishing a node attribute evaluation matrix, and processing the matrix;
s6: establishing a weighting matrix of the key attributes of the nodes, and determining an optimal solution and a worst solution;
s7: calculating the relative sticking progress of each node to the optimal solution, and selecting the node with the minimum closeness;
s8: outputting a copy layout scheme of the ocean monitoring big data;
wherein, the step S5 is to establish a node attribute evaluation matrix by a TOPSIS method, and the key attributes in the step S6 include node visit volume, node response ratio and node bandwidth.
Example 2
The invention relates to a marine monitoring data copy management method based on multi-attribute optimization, which comprises the following specific implementation steps:
s1: inputting ocean monitoring big data
Inputting ocean monitoring big data into a cloud storage system;
s2: creating ocean monitoring big data copy
S21: calculating data heat of ocean monitoring big data
The ocean monitoring big data has the attribute characteristics of mass, real-time property, diversity and the like, the popularity of different data is different, and even if the same data is used, the popularity of the same data in different time periods is possibly not good. Therefore, the popularity of the data is measured by using the data heat attribute, and the heat attribute of each data is obtained through reasonable calculation. The higher the heat of the data, the more popular the data is. If more copies are established for the data with high popularity, the multiple copies can disperse the request operation of the data to prevent the generation of system hot spots, and on the other hand, the multiple copies are reasonably arranged to enable the user request to be accessed nearby as far as possible, so that the time delay of network transmission is reduced as far as possible, and the response speed of the system is improved. Meanwhile, a few copies are established for the data with low popularity, the redundancy of the copies is reduced as much as possible on the premise of ensuring the reliability of the data, and the storage space of the system is saved to a certain extent.
In a cloud environment, a monitoring process is set for the storage nodes and is responsible for counting the access frequency of the user to each data in different time periods. Although the popularity of the data in the next time period cannot be accurately predicted, by analyzing the access rule of the data, it can be known that the data which is frequently accessed in the near future is still accessed in the later time period. Thus, it can be assumed that the next cycle of data will have an upcoming access request pattern that is substantially the same for the first few cycles.
The Data heat is a weighted average of the access frequency of the Data in each period, which is represented by Data _ h, the life cycle of each Data is calculated from the moment when the Data is created, the monitoring process starts to monitor the access times of the Data, the corresponding heat is calculated according to the access frequency of the Data in each period, and two factors are used for influencing the heat value of the Data at the end of the period, one is the heat value of the Data at the end of the previous period, and the other is the access frequency of the Data in the period, and the calculation formula is shown as (1):
α+β=1
Data_h0=0
Data_hi=αData_hi-1+βfi(1)
wherein, Data _ h0Number of representationsAccording to the initial heat value when being created, the initial value of the data heat is assigned to be 0 for the convenience of calculation; data _ hiIndicating the heat value of certain data at the end of the ith period; f. ofiIndicating the access frequency of the data to which the monitoring process has copper prices in the ith cycle, α being the historical access factor in the ith-1 cycle, and β being the access factor in the ith cycle.
S22: calculating copy heat of ocean monitoring big data
The method for calculating the heat of the copies is that if g copies of the data in the cloud storage system are assumed, the storage nodes equally distribute the requests of the clients to each copy as much as possible, and a data heat calculation formula of each copy is shown as (2).
DReplica_hi=Data_hi/g
=(αData_hi-1+βf)/g (2)
S23: calculating the number of copies of ocean monitoring big data
The number of copies of the ocean monitoring big data is continuously changed in the life cycle of the ocean monitoring big data, and if the number of the copies is too small, the usability of the data cannot be better ensured; if the number of copies is too large, the storage space of the system is wasted. Therefore, the minimum value of the copies is calculated according to the reliability requirement of the user, and the maximum value of the number of the copies is calculated according to the consistency overhead allowed by the system.
S231: calculating the copy minimum value of the ocean monitoring big data
In a cloud environment, the reliability and the effectiveness of data are key factors influencing the minimum value of the number of copies, and a user can put forward different reliability requirements according to the importance of the data, wherein the requirements represent the success rate of access required by the user. In addition, in a cloud environment, there are generally many storage nodes, and a node failure is considered to be a normal condition.
The minimum value of the copy is calculated according to the reliability requirement of the user, and the calculation formula is shown as (3):
R≤(1-C)(1-La) (3)
wherein R is the reliability requirement of a user on certain data, a is the minimum value of a copy, L is the failure rate of a storage node, L can also be the failure rate of a single copy, and C is the environment failure rate in the cloud environment caused by various uncertain factors, (1-C) (1-L)a) Is the probability that a data is successfully accessed.
S232: calculating the maximum value of the copy of the ocean monitoring big data
The maximum value of the number of the copies is mainly determined by the overhead of copy consistency maintenance, and the greater the maximum value of the copies is, the greater the difficulty and the overhead of the copy consistency maintenance are. Each update of the copy by the system causes data traffic within the system, so the overhead of consistency maintenance is measured by the data traffic caused by the update of the copy per unit time.
The maximum value of the copy is calculated according to the consistency overhead of the system operation, and the calculation formula is shown as (4):
b≤W/(h×v) (4)
wherein b is the maximum value of the copy, h is the size of the updated copy content, v is the frequency of updating the copy by the user, and W is the flow caused by maintaining the consistency of the copy in unit time.
S24: ocean monitoring big data copy creation
The data copy heat reflects the access strength of a user to the data at the current time, when the access strength is too high, the copy can be added to the data, and the copy shares a part of requests for accessing the object, so that the access delay of the user is greatly reduced, the network bandwidth resources are saved, and the system load balance is effectively improved; when the access intensity is too low, the copy quantity can be reduced, the waste of storage resources is avoided, and the utilization rate is improved. Therefore, the warmth Data _ h is copied for each DataiSetting an increasing threshold1And subtracting the threshold2For determining when to create in a replica dynamic modeOr delete the copy.
Referring to fig. 2, the process of creating the ocean monitoring big data copy is as follows:
s241: calculating the copy heat of the ocean monitoring big data;
s242: judging whether the copy heat value is in a reduced threshold value2And increasing the threshold1If it is larger than the increase threshold value1Step S243 is entered, if it is less than the increase threshold value1The process proceeds to step S245, and if the threshold value is decreased2And increasing the threshold1Step S247 is performed therebetween;
s243: judging whether the copy value is larger than the maximum copy value b, if so, entering a step S247, and if not, entering a step S244;
s244: adding one copy, adding 1 to the number of the copies, and repeating the steps S241-S242;
s245: judging whether the copy value is smaller than the copy minimum value a, if so, entering a step S247, and if so, entering a step S246;
s246: deleting one copy, subtracting 1 from the number of the copies, and repeating the steps S241-S242;
s247: and outputting the marine monitoring big data copy.
S3: selecting a rack for placing nodes;
s4: judging storage space of node
Judging the storage space of the node, and if the space is enough, acquiring the attribute of the node.
S5: establishing a node attribute evaluation matrix and processing the matrix
Establishing an attribute evaluation matrix by using a TOPSIS method, wherein r alternative schemes are provided, each alternative scheme has s attributes, hij represents the jth attribute of the ith alternative scheme, establishing an attribute evaluation matrix H, and performing normalization processing on the matrix H to obtain a matrix G, as shown in (5):
wherein,
s6: establishing a weighting matrix of the key attributes of the nodes, and determining the optimal solution and the worst solution
S61: selecting key attributes including node access amount, node response ratio and node bandwidth
The node access amount represents the access amount of the node in unit time, and the lower the node access amount is, the better the node access amount is, so that the excessive concentration of the read-write operation of a user can be effectively avoided; the node response ratio represents the availability of the node, and is represented by the proportion of the correct response number of the node to the total request, and the higher the node response ratio is, the better the node response ratio is, the more stable the service performance of the node is; the node bandwidth represents the highest data rate which can be transmitted from a certain node to another node within a unit time in network transmission, and the higher the node bandwidth is, the better the node bandwidth is, and convenience is brought to reading and writing of users.
S62: establishing a weighting matrix WG, as shown in (6):
wherein i in wgij is the weight of the 1 st column key attribute in all the attributes, s is the attribute of each alternative, i is 1,2, … …, r, j is 1,2, … …, s;
s63: determining an optimal solution G for an attribute evaluation matrix+And the worst solution G-Optimal solution G+For the set of maxima of each column, the worst solution G-The calculation formula of the set formed by the minimum values of each column is respectively (7) and (8)) Shown in the figure:
H+=(h1 +,h2 +…hs +) (7)
H-=(h1 -,h2 -…hs -) (8)
s7: calculating the relative sticking progress of each node to the optimal solution, and selecting the node with the minimum closeness
S71: calculating the distance from each node to the optimal solution and the worst solution, wherein the calculation formulas are shown as (9) and (10):
s72: calculating the relative sticking rate of each node and the optimal solution, wherein the calculation formula is shown as (11), then selecting the node with the minimum degree of closeness through sorting,
wherein, UiAnd the relative pasting degree of the ith node and the optimal solution is obtained.
S8: copy layout scheme for outputting ocean monitoring big data
And obtaining a copy layout scheme according to the nodes obtained by calculation in the step S7, and outputting the copy layout scheme of the ocean monitoring big data.
The marine monitoring data copy management method based on multi-attribute optimization has the advantages that data are efficiently and respectively stored in a cloud environment, copies are reserved as few as possible on the premise of ensuring data reliability, and unnecessary waste of storage space is reduced; the access heat of each datum which changes constantly can be obtained; it may be determined when a replica is created or deleted in the replica dynamic mode; network delay is reduced, and the overall stability of the system is improved.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and additions can be made without departing from the method of the present invention, and these modifications and additions should also be regarded as the protection scope of the present invention.
Claims (9)
1. A marine monitoring data copy management method based on multi-attribute optimization is characterized by comprising the following steps:
s1: inputting ocean monitoring big data;
s2: establishing a marine monitoring big data copy;
s3: selecting a rack for placing nodes;
s4: judging the storage space of the node;
s5: establishing a node attribute evaluation matrix, and processing the matrix;
s6: establishing a weighting matrix of the key attributes of the nodes, and determining an optimal solution and a worst solution;
s7: calculating the relative sticking progress of each node to the optimal solution, and selecting the node with the minimum closeness;
s8: outputting a copy layout scheme of the ocean monitoring big data;
wherein, the step S5 is to establish a node attribute evaluation matrix by a TOPSIS method, and the key attributes in the step S6 include node visit volume, node response ratio and node bandwidth.
2. The method for managing marine survey data copies based on multi-attribute optimization of claim 1, wherein the working process of step S2 is as follows:
s21: calculating the data heat of the ocean monitoring big data;
s22: calculating the copy heat of the ocean monitoring big data;
s23: calculating the number of copies of the ocean monitoring big data;
s24: and (4) establishing a marine monitoring big data copy.
3. The method for managing marine survey Data copy based on multi-attribute optimization of claim 2, wherein the Data heat in step S21 is a weighted average of the frequency of Data being accessed in each period, which is represented by Data _ h, and the calculation formula is shown in (1):
α+β=1
Data_h0=0
Data_hi=αData_hi-1+βfi(1)
wherein, Data _ h0Representing the heat value when the data is initially created, and assigning the initial value of the heat of the data to be 0 for the convenience of calculation; data _ h1Indicating the heat value of certain data at the end of the ith period; f. ofiIndicating the access frequency of the data to which the monitoring process has copper prices in the ith cycle, α being the historical access factor in the ith-1 cycle, and β being the access factor in the ith cycle.
4. The method for managing ocean monitoring data copies based on multi-attribute optimization according to claim 2, wherein the copy heat calculation method in step S22 is that assuming that there are g copies of the data in the cloud storage system, the storage node equally divides the request of the client to each copy as much as possible, and the data heat calculation formula of each copy is as shown in (2):
DReplica_hi=Data_hi/g
=(αData_hi-1+βf)/g (2)
5. the method for managing marine surveillance data copy based on multi-attribute optimization of claim 2, wherein the calculating the copy number in step S23 includes calculating a copy minimum value and a copy maximum value, wherein the copy minimum value is calculated according to the reliability requirement of the user, and the calculation formula is shown in (3):
R≤(1-C)(1-La) (3)
wherein R is the reliability requirement of a user on certain data, a is the minimum value of a copy, L is the failure rate of a storage node, L can also be the failure rate of a single copy, and C is the environment failure rate in the cloud environment caused by various uncertain factors, (1-C) (1-L)a) Is the probability of a successful access of a data;
the maximum value of the copy is calculated according to the consistency overhead of the system operation, and the calculation formula is shown as (4):
b≤W/(h×v) (4)
wherein b is the maximum value of the copy, h is the size of the updated copy content, v is the frequency of updating the copy by the user, and W is the flow caused by maintaining the consistency of the copy in unit time.
6. The method for managing marine monitoring data copies based on multi-attribute optimization according to claim 2, wherein the specific process of copy creation in step S24 is as follows:
s241: calculating the copy heat of the ocean monitoring big data;
s242: judging whether the copy heat value is in a reduced threshold value2And increasing the threshold1If it is larger than the increase threshold value1Step S243 is entered, if it is less than the increase threshold value1The process proceeds to step S245, and if the threshold value is decreased2And increasing the threshold1Step S247 is performed therebetween;
s243: judging whether the copy value is larger than the maximum copy value b, if so, entering a step S247, and if not, entering a step S244;
s244: adding one copy, adding 1 to the number of the copies, and repeating the steps S241-S242;
s245: judging whether the copy value is smaller than the copy minimum value a, if so, entering a step S247, and if so, entering a step S246;
s246: deleting one copy, subtracting 1 from the number of the copies, and repeating the steps S241-S242;
s247: and outputting the marine monitoring big data copy.
7. The method for managing ocean monitoring data copies based on multi-attribute optimization according to claim 1, wherein the specific process of step S5 is to establish an attribute evaluation matrix by using a TOPSIS method, where r alternatives are provided, each alternative has S attributes, hij represents the jth attribute of the ith alternative, establish an attribute evaluation matrix H, and normalize the matrix H to obtain a matrix G, as shown in (5):
wherein,
8. the method for managing marine monitoring data copies based on multi-attribute optimization according to claim 1, wherein the specific process of step S6 is as follows:
s61: selecting key attributes including node access amount, node response ratio and node bandwidth;
s62: establishing a weighting matrix WG, as shown in (6):
wherein i in wgij is the weight of the 1 st column key attribute in all the attributes, s is the attribute of each alternative, i is 1,2, … …, r, j is 1,2, … …, s;
s63: determining an optimal solution G for an attribute evaluation matrix+And the worst solution G-Optimal solution G+For the set of maxima of each column, the worst solution G-The calculation formula of the set composed of the minimum values of each column is respectively shown as (7) and (8):
H+=(h1 +,h2 +…hs +) (7)
H-=(h1 -,h2 -…hs -) (8)
9. the method for managing marine monitoring data copies based on multi-attribute optimization according to claim 1, wherein the specific process of step S7 is as follows:
s71: calculating the distance from each node to the optimal solution and the worst solution, wherein the calculation formulas are shown as (9) and (10):
s72: calculating the relative sticking rate of each node and the optimal solution, wherein the calculation formula is shown as (11), then selecting the node with the minimum degree of closeness through sorting,
wherein, UiAnd the relative pasting degree of the ith node and the optimal solution is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201232.7A CN106886376B (en) | 2017-03-30 | 2017-03-30 | A kind of marine monitoring data copy management method optimized based on more attributes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710201232.7A CN106886376B (en) | 2017-03-30 | 2017-03-30 | A kind of marine monitoring data copy management method optimized based on more attributes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106886376A true CN106886376A (en) | 2017-06-23 |
CN106886376B CN106886376B (en) | 2019-08-30 |
Family
ID=59182458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710201232.7A Active CN106886376B (en) | 2017-03-30 | 2017-03-30 | A kind of marine monitoring data copy management method optimized based on more attributes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106886376B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522151A (en) * | 2017-09-15 | 2019-03-26 | 北京京东尚科信息技术有限公司 | Method and device for data redundancy storage |
CN109697018A (en) * | 2017-10-20 | 2019-04-30 | 北京京东尚科信息技术有限公司 | The method and apparatus for adjusting memory node copy amount |
CN110187072A (en) * | 2019-04-04 | 2019-08-30 | 沈阳大学 | A kind of methods of water environment quality assessment based on exchange premium degree model |
CN115033187A (en) * | 2022-08-10 | 2022-09-09 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295117A (en) * | 2013-05-20 | 2013-09-11 | 东南大学 | Method and system for matching tractor and trailer |
CN103425756A (en) * | 2013-07-31 | 2013-12-04 | 西安交通大学 | Copy management strategy for data blocks in HDFS |
CN103997512A (en) * | 2014-04-14 | 2014-08-20 | 南京邮电大学 | Data duplicate quantity determination method for cloud storage system |
CN105787269A (en) * | 2016-02-25 | 2016-07-20 | 三明学院 | Heterogeneous multi-attribute variable-weight decision-making method based on regret theory |
CN106095336A (en) * | 2016-06-10 | 2016-11-09 | 北京银信长远科技股份有限公司 | Independent weight factor and the method for velocity factor is set for data trnascription |
-
2017
- 2017-03-30 CN CN201710201232.7A patent/CN106886376B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295117A (en) * | 2013-05-20 | 2013-09-11 | 东南大学 | Method and system for matching tractor and trailer |
CN103425756A (en) * | 2013-07-31 | 2013-12-04 | 西安交通大学 | Copy management strategy for data blocks in HDFS |
CN103997512A (en) * | 2014-04-14 | 2014-08-20 | 南京邮电大学 | Data duplicate quantity determination method for cloud storage system |
CN105787269A (en) * | 2016-02-25 | 2016-07-20 | 三明学院 | Heterogeneous multi-attribute variable-weight decision-making method based on regret theory |
CN106095336A (en) * | 2016-06-10 | 2016-11-09 | 北京银信长远科技股份有限公司 | Independent weight factor and the method for velocity factor is set for data trnascription |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522151A (en) * | 2017-09-15 | 2019-03-26 | 北京京东尚科信息技术有限公司 | Method and device for data redundancy storage |
CN109697018A (en) * | 2017-10-20 | 2019-04-30 | 北京京东尚科信息技术有限公司 | The method and apparatus for adjusting memory node copy amount |
CN110187072A (en) * | 2019-04-04 | 2019-08-30 | 沈阳大学 | A kind of methods of water environment quality assessment based on exchange premium degree model |
CN115033187A (en) * | 2022-08-10 | 2022-09-09 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
CN115033187B (en) * | 2022-08-10 | 2022-11-08 | 蓝深远望科技股份有限公司 | Big data based analysis management method |
Also Published As
Publication number | Publication date |
---|---|
CN106886376B (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106886376B (en) | A kind of marine monitoring data copy management method optimized based on more attributes | |
Ranganathan et al. | Identifying dynamic replication strategies for a high-performance data grid | |
Bestavros | WWW traffic reduction and load balancing through server-based caching | |
Zhu et al. | Efficient, proximity-aware load balancing for DHT-based P2P systems | |
Gadde et al. | Reduce, reuse, recycle: An approach to building large internet caches | |
CN103425756B (en) | The replication strategy of data block in a kind of HDFS | |
CN101753625B (en) | Method for deployment of copy service and copy establishment in peer-to-peer network environment | |
Lee et al. | PFRF: An adaptive data replication algorithm based on star-topology data grids | |
CN101692229A (en) | Self-adaptive multilevel cache system for three-dimensional spatial data based on data content | |
CN107343021A (en) | A kind of Log Administration System based on big data applied in state's net cloud | |
CN106960011A (en) | Metadata of distributed type file system management system and method | |
Hua et al. | ANTELOPE: A semantic-aware data cube scheme for cloud data center networks | |
Souravlas et al. | Trends in data replication strategies: a survey | |
CN113688115B (en) | Archive big data distributed storage system based on Hadoop | |
Mohammadi et al. | A fuzzy logic-based method for replica placement in the peer to peer cloud using an optimization algorithm | |
Liu et al. | Optimal caching for low latency in distributed coded storage systems | |
Rahmani et al. | Data placement using Dewey Encoding in a hierarchical data grid | |
Dogra et al. | A survey of dynamic replication strategies in distributed systems | |
Li et al. | Distributed cache replacement method for geospatial data using spatiotemporal locality-based sequence | |
Miao | Analysis of influencing factors of rural home stay tourism development based on remote sensing assisted operation data collection | |
CN108446356B (en) | Data caching method, server and data caching system | |
Bai et al. | An efficient skyline query algorithm in the distributed environment | |
Sharfuddin et al. | Frequent block access pattern-based replication algorithm for improving the performance of cloud storage systems | |
Nakazato et al. | Data allocation method considering server performance and data access frequency with consistent hashing | |
CN103888373A (en) | Adjusting method and device of save file storage capacity of network nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |