CN116683916A

CN116683916A - Disaster recovery system of data center

Info

Publication number: CN116683916A
Application number: CN202310966546.1A
Authority: CN
Inventors: 秦丽娟; 尤沛; 姚新美; 孙艺梦; 刘晓森
Original assignee: Shandong Wukesong Electric Technology Co ltd
Current assignee: Shandong Wukesong Electric Technology Co ltd
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-09-01
Anticipated expiration: 2043-08-03
Also published as: CN116683916B

Abstract

The invention relates to the technical field of data processing, in particular to a disaster recovery system of a data center, which comprises the following components: the system comprises a data preprocessing module, a similarity acquisition module, an encoding module and a storage module, wherein the data preprocessing module is used for acquiring hospital information data and generating a data sequence to be encoded; reading in a data sequence to be encoded and obtaining a matched character string; updating a search buffer area according to the matched character string; acquiring a character frequency distribution sequence according to the character types and the frequency contained in the search buffer areas before and after updating; acquiring the similarity of the search buffer areas before and after updating according to the character frequency distribution sequences of the search buffer areas before and after updating; expanding the length of the search buffer area according to the similarity result; coding the data sequence line to be coded according to the finally obtained search buffer area; and storing the encoded hospital information data to realize the construction of a disaster recovery system for the hospital information data. The invention improves the data storage rate and the storage efficiency of the disaster recovery system.

Description

Disaster recovery system of data center

Technical Field

The invention relates to the technical field of data compression, in particular to a disaster recovery system of a data center.

Background

For IT, the disaster recovery system is an environment provided for the computer information system and capable of coping with various disasters, and when the computer system suffers from irresistible natural disasters such as floods and war and artificial disasters, the disaster recovery system can ensure the safety of user data. At present, hospitals are developed into modern comprehensive hospitals, and in order to realize the scientization and modernization of hospital management, comprehensive sharing of data is realized, so that a comprehensive hospital information management system is formed together. Because the hospital information system relates to a plurality of important systems such as clinic, laboratory information system, medical image management, patient information and the like, a huge information system can generate a large amount of data compared with people, and the hospital is taken as a key unit, experiments and the like usually exist in the system, if the hospital information system is invaded by natural disasters or hackers, the data damage and even the system paralysis cannot be caused, and therefore, the establishment of the hospital information disaster recovery system is very important.

Because the data of the hospital information system is huge and complex, a great deal of manpower and material resources can be consumed for backing up the data to construct the disaster recovery system, and because the data is compressed and backed up, the backup efficiency is improved, the operation pressure of the computer system can be reduced, and the data integrity can be ensured.The coding is used as a lossless compression method based on data repeatability, has a stronger compression ratio, but only considers the repeatability of the data in the current search buffer area, and if the length of the search buffer area is long, the time efficiency of the coding can be reduced; otherwise, when the length of the search buffer is short, the possibility of containing the character string in the data sequence to be encoded is low, so that the compression efficiency is reduced.

Disclosure of Invention

The invention provides a disaster recovery system of a data center, which aims to solve the existing problems.

The disaster recovery system of the data center adopts the following technical scheme:

one embodiment of the present invention provides a disaster recovery system for a data center, the system comprising:

the data preprocessing module is used for acquiring hospital information data, processing the hospital information data by using a smoothing algorithm and expanding the hospital information data according to rows to acquire a data sequence to be encoded;

the similarity acquisition module is used for carrying out matching operation on the data sequence to be coded according to a preset search buffer area to obtain a matched character string; according toThe coding algorithm codes the matched character strings to obtain a coding result; updating a search buffer area according to the matched character string; acquiring a character frequency distribution sequence according to the character types and the frequency contained in the search buffer areas before and after updating; acquiring the similarity of the search buffer areas before and after updating according to the character frequency distribution sequences of the search buffer areas before and after updating;

the coding module is used for adjusting the length of the search buffer zone according to the similarity of the search buffer zones before and after updating to obtain a final search buffer zone; according to the final search buffer area, continuing to perform matching operation on the data sequence to be encoded until all characters in the data sequence to be encoded have completed traversing to stop iteration, and forming the encoding results of all the matched character strings in the encoding process into compressed data of hospital information data;

the storage module is used for storing the compressed data of the hospital information data and realizing the construction of a disaster recovery system of the hospital information data.

Preferably, the method for updating the search buffer according to the matched character string includes the following specific steps:

and removing the character string matched in the search buffer and the character before the matched character string in the search buffer from the search buffer, and adding the character string matched in the data sequence to be coded and the character next to the character string to the tail end of the search buffer to finish updating the search buffer.

Preferably, the method for obtaining the character frequency distribution sequence according to the character types and the frequency contained in the search buffer before and after updating includes the following specific steps:

acquiring character types of the search buffer before and after updating, integrating the character types, and respectively counting the occurrence frequencies of all the character types in the search buffer before and after updating to form a character frequency distribution sequence of the search buffer before updating and a character frequency distribution sequence of the search buffer after updating, wherein characters corresponding to each position in the character frequency distribution sequence of the search buffer before and after updating are the same.

Preferably, the step of obtaining the similarity of the search buffers before and after updating according to the character frequency distribution sequences of the search buffers before and after updating includes the following specific formulas:

wherein, the liquid crystal display device comprises a liquid crystal display device,indicate the slide->The similarity of the search buffers before and after the update is made, and +.>Wherein->Representing the number of search buffers needed to traverse the entire data sequence to be encoded,/for>Represents +.>The length of the string to which the search buffer before updating matches the data sequence to be encoded,/->And->The character frequency distribution sequences representing the search buffers before and after updating, respectively +.>Frequency corresponding to the individual character,/>Representing the number of character categories in the search buffer before and after updating, ++>An exponential function based on a natural number is represented.

Preferably, the length of the search buffer area is adjusted according to the similarity of the search buffer areas before and after updating to obtain a final search buffer area, which comprises the following specific steps:

presetting a similarity threshold value, and judging the similarity of the search buffer areas before and after updating: when the similarity of the search buffers before and after updating is greater than or equal to a similarity threshold, taking the updated search buffer as a final search buffer; when the similarity of the search buffers before and after updating is smaller than a similarity threshold value, according to a preset expansion length a, expanding the search buffer after updating forward by a character to obtain a search buffer after updating again, obtaining the similarity of the search buffer before updating and the search buffer after updating again, and repeating the judging operation of the similarity of the search buffer before updating and the search buffer after updating again until the final search buffer is obtained, and stopping iteration.

The technical scheme of the invention has the beneficial effects that: the searching buffer area can be updated according to the matching result of the searching buffer area and the data sequence to be coded, the length of the searching buffer area is shortened, and the matching speed can be improved; judging whether to adjust the search buffer area according to the similarity of the search buffer areas before and after updating, and expanding the search buffer area forwards to ensure the compression rate; the embodiment improves the compression efficiency of hospital information data.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a block diagram of a disaster recovery system of a data center according to the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purposes, the following detailed description is given below of a data center disaster recovery system according to the present invention, and the specific implementation, structure, characteristics and effects thereof, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the disaster recovery system of the data center provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a block diagram of a disaster recovery system for a data center according to an embodiment of the present invention is shown, where the system includes the following modules:

the data acquisition module 101 acquires hospital information data by using a hospital information system and classifies the hospital information data to obtain a data sequence to be encoded.

The data in the hospital information system is acquired, and the acquired data is processed according to the categories of clinical information, laboratory information, patient information, medical image information and the likeAnd (5) classification. Due toThe encoding algorithm is based on data repeatability compression, and unprocessed data may be affected by noise and other aspects to cause poor data repeatability, so that the compression effect may be affected. Therefore, the embodiment utilizes the smoothing algorithm to process the acquired information data of a plurality of categories, so that the data of adjacent time points and space points are more similar, and the possibility of data repetition is increased.

It should be noted that, in general, hospital information data collected by using a hospital information system includes a plurality of information such as dates, departments, patient numbers, and the like, and is stored by using a two-dimensional data table, so as to facilitate subsequent data compression, in this embodiment, the data table is regarded as a two-dimensional matrix, a one-dimensional data sequence is generated by transposing the matrix and then expanding the matrix according to a line, and the generated one-dimensional data sequence is regarded as a data sequence to be encoded.

So far, the data sequence to be encoded of the hospital information is obtained.

And the similarity obtaining module 102 reads in the data to be encoded, matches the data to be encoded according to a preset search buffer area, updates the search buffer area and obtains the similarity of the search buffer areas before and after updating.

It should be noted that the conventionalThe coding algorithm fixes the length of the search buffer, if the length of the search buffer is too long, the coding rate is improved and the coding time efficiency is reduced; otherwise, if the length of the search buffer is too short, the encoding rate will be low and the compression efficiency will be reduced when the encoding time efficiency is improved. Furthermore, is->The length of the search buffer in the coding is fixed, so according to the matching result of the search buffer and the data sequence to be coded, the search buffer can advance the length corresponding to the matched character string in the data sequence to be coded, thereby ensuring the fixed length of the search bufferThus, the matching result depends only on the search buffer length, which is liable to cause the reduction of compression efficiency or the inefficiency of time. Therefore, the embodiment adjusts the length of the search buffer through the matching result, and shortens the length of the search buffer as much as possible.

First preset the initial search buffer lengthBased on the longest match of the data sequence to be encoded with the initial search buffer, the present embodiment uses +.>For the purpose of illustration, the present embodiment deals with +.>And are not limited.

Sliding the initial searching buffer area to longest match with the data sequence to be coded, and recording the matched character string asAnd encoding the matched character string L and one character after the character string L according to the search buffer area to obtain an encoding result. Encoding the matched character string L and one character after L into +.>The prior art in coding algorithms is not described in detail here. Updating the search buffer area after obtaining the coding result: find matching string in search buffer +.>At the position of the character string +.>At string->The previous characters are all removed from the initial search buffer and the character string matched in the data sequence to be encoded is +.>And (4) character string->The next character is added to the end of the search buffer area to finish the update of the search buffer area, and the length of the updated search buffer area is +.>。

It should be noted that the portion removed from the search buffer includes the matched character string and the character located before the character string, since the matched character stringAdded to the end of the initial search buffer, the character string +_ appears again in the data sequence to be encoded>The matching result is not affected, but the removed part contains the character string in the initial search buffer area>In the previous part, if a string of +_ appears in the data sequence to be encoded>Since the compression efficiency is reduced due to the too short search buffer length of the previous character, the present embodiment performs similarity determination on the search buffers before and after updating in order to ensure the encoding time efficiency and improve the compression efficiency.

If the higher the similarity of the search buffers before and after the update, the influence of the updated search buffers on the matching result is not great, otherwise, the influence is great, and the adjustment is needed. Since the character frequency can intuitively reflect the similarity of the contents of two search buffers in character distribution, the embodiment uses statistics to update the search buffers before and after the search buffersAnd generating a search buffer character frequency distribution sequence before and after updating by two character frequency distribution sequencesThe divergence quantifies the similarity of the two search buffers.

It should be noted that, if the longer the length of the character string matched between the search buffer and the data sequence to be encoded is, the more excellent the search buffer before updating is explained, and this is according to the followingThe similarity of the search buffers before and after updating is high in the degree of divergence; if the length of the character string matched with the search buffer and the data sequence to be encoded is shorter, the effect of the search buffer before updating is poorer, and the search buffer before updating and the +_ of the multi-buffer after updating are adopted at the moment>The degree of divergence is less necessary as a measure. The present embodiment therefore adjusts the utilization +_based on the matched string length>The divergence quantifies the accuracy of the similarity of the search buffers before and after updating.

Acquiring character frequency distribution sequences of search buffers before and after updating, firstly constructing a character set of the search buffers before updating by character types appearing in the search buffers before updating and marking the character set asSimilarly, the word set for acquiring updated search buffers is recorded as +.>All character sets of the search buffer before and after updating are obtained through set phase and recorded as. And then respectively counting in the search buffer areas before and after updatingSet->The occurrence frequencies of all kinds of characters contained in the search buffer before updating and the character frequency distribution sequence of the search buffer after updating are formed. Note that, the characters corresponding to each position in the character frequency distribution sequence of the search buffer before and after updating are the same.

Due toThe divergence is defined based on the relative differences of the two sequences, so use is made of +.>The similarity of search buffers before and after the dispersion quantization update can more accurately obtain the frequency change of the corresponding character, so the embodiment calculates +.>The divergence quantifies the similarity. And because the longer the search buffer matches the data sequence to be encoded, then according to the +.f of the search buffer before and after the update>The more convincing the similarity between the search buffers before and after updating is measured by the divergence; if the character string is shorter, it indicates that the matching effect of the search buffer before updating is poor, and the matching length of the character string is used to adjust the search buffers before and after updating>Divergence to obtain more accurate similarity results. Therefore, the embodiment constructs a similarity calculation formula as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,indicate the slide->Similarity of search buffers before and after update, and +.>Wherein->Representing the number of search buffers needed to traverse the entire data sequence to be encoded,/for>Represents +.>The length of the character string to which the search buffer before updating matches the data sequence to be encoded,/->And->The character frequency distribution sequences representing the search buffers before and after updating, respectively +.>Frequency corresponding to the individual character,/>Representing all character types in the search buffer before and after updating of the statistics, ++>An exponential function based on a natural number is represented.

The character frequency distribution sequence of the search buffer before and after updatingThe larger the divergence, the more searching before and after updating is explainedThe lower the similarity of the cable buffers, the embodiment therefore uses +.>Function construction->The negative correlation between the divergence and the similarity is also because the longer the length of the character string matched with the data sequence to be encoded is, the direct utilization of +.>The better the effect of the divergence measure on the similarity of the search buffers before and after updating, the more +.>As an adjustment coefficient, when the length of the matched character string is larger, +.>Infinite towards 0, then use +.>The similarity result at this time is adjusted to be almost equal to +.>The divergence is kept consistent; conversely, when the length of the matched character string is shorter, the +.>The similarity of search buffers before and after updating is low in the measure of divergence by +.>The result of adjusting the similarity is larger, so that the subsequent search buffer expansion operation is more accurate.

So far, the similarity of the search buffers before and after updating in the encoding process is obtained.

And the encoding module 103 adjusts the length of the search buffer according to the similarity result and continues to detect the similarity, and stops until the similarity is larger than a preset threshold value, and encodes the data sequence to be encoded according to the finally obtained search buffer.

It should be noted that, the higher the similarity of the search buffers before and after the update, the description is continued to encode according to the search buffers after the update, so the present embodiment performs the operation of determining the similarity of the search buffers before and after the update:

by presetting a similarity thresholdWhen->When the method is used, the updated search buffer area is used as a final search buffer area; when->In this case, according to the predetermined extension length ∈ ->The updated search buffer is extended forward +.>And (3) obtaining the updated search buffer area again, obtaining the similarity between the search buffer area before updating and the updated search buffer area again, and repeating the judgment operation of the similarity between the search buffer area before updating and the updated search buffer area again until the final search buffer area is obtained, and stopping iteration. The embodiment is->To describe for example, for->The value of (2) is not limited.

It should be noted that, the encoding algorithm provided in the embodiment of the present invention is a process of encoding while searching, where the search buffer continuously slides rightward, and the length of the search buffer is updated and adjusted according to the length of the character string and the similarity of the search buffer and the data sequence to be encoded, so as to obtain the final search buffer.

And in the encoding process, continuing to perform matching operation on the data sequence to be encoded according to the final search buffer until all characters in the data sequence to be encoded have completed traversing to stop iteration, and forming all encoding results in the encoding process into compressed data of hospital information data.

So far, compressed data of hospital information data is acquired.

The storage module 104 is used for storing the compressed hospital information data and realizing the establishment of a hospital information data disaster recovery system.

The hospital information data is compressed and stored, so that the storage rate and the storage efficiency of the hospital information data are greatly improved. When faced with natural disasters and man-made attacks, the stored hospital information data can be accessed and used continuously.

When in use, the compressed data needs to be decompressed, and the specific decompression process is as follows: a null decoded data sequence is created, a compressed data sequence of hospital information data is obtained and read in therefrom, typically a series of triples, each comprising a pointer representing the data, a length value of the matched string and the next character. And finding out the position corresponding to the character string from the search buffer according to the pointer position and the length value, and taking the found character string and the next character in the triplet as a decoding result of the triplet. And rejecting the character strings found in the search buffer and all characters in front of the character strings, adding decoding results to the tail of the search buffer to realize updating, obtaining the similarity of the search buffer before and after updating according to the similarity obtaining module, if the similarity is larger than a threshold value, continuing to decode the next triplet according to the updated search buffer, otherwise, expanding the search buffer forwards, obtaining the final search buffer, and then decoding the next triplet.

The establishment of the disaster recovery system for the hospital information data ensures that the risks of data loss and service interruption are reduced to the greatest extent, ensures the safety and reliability of the hospital information data, and realizes the protection of the hospital data information by using the disaster recovery system.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A data center disaster recovery system, comprising the following modules:

2. The disaster recovery system of claim 1, wherein the updating the search buffer area according to the matched character string comprises the following specific methods:

3. The disaster recovery system of claim 1, wherein the obtaining the character frequency distribution sequence according to the character types and frequencies contained in the search buffers before and after the updating comprises the following specific methods:

4. The disaster recovery system of claim 3, wherein the obtaining the similarity of the search buffers before and after updating according to the character frequency distribution sequences of the search buffers before and after updating comprises the following specific formulas:

wherein (1)>Indicate the slide->Similarity of search buffers before and after update, and +.>Wherein->Representing the number of search buffers needed to traverse the entire data sequence to be encoded,/for>Represents +.>The length of the string to which the search buffer before updating matches the data sequence to be encoded,/->And->The character frequency distribution sequences representing the search buffers before and after updating, respectively +.>Frequency corresponding to the individual character,/>Indicating the number of all character categories in the search buffer before and after the update,an exponential function based on a natural number is represented.

5. The disaster recovery system of claim 1, wherein the adjusting the length of the search buffer according to the similarity of the search buffer before and after updating to obtain the final search buffer comprises the following specific steps: