CN108806773A

CN108806773A - Medical image cloud storage platform designing method

Info

Publication number: CN108806773A
Application number: CN201810487531.6A
Authority: CN
Inventors: 闫凤麒; 徐志坚; 陆明名
Original assignee: Shanghai Hee Hee Mdt Infotech Ltd
Current assignee: Shanghai Hee Hee Mdt Infotech Ltd
Priority date: 2018-05-21
Filing date: 2018-05-21
Publication date: 2018-11-13
Anticipated expiration: 2038-05-21
Also published as: CN108806773B

Abstract

A kind of medical image cloud storage platform designing method, is related to medical image memory technology.In order to solve how quickly to read image small documents from the file after merging, and HDFS is not suitable for the problem of storing medical image small documents, the present invention, which proposes, merges image file all in multiple inspection files, it is set to become a big file close to 128MB, this big file present invention is known as SData files.Before addition checks file into document container, the capacity of current file container can be judged plus checking whether the size of file is more than threshold value, if it exceeds threshold value then creates a document container.Small documents set is merged into several SData data files and uploads in HDFS, and corresponding SData index files are stored in index pond.Memory consumption of the medical image small documents to NameNode in HDFS is effectively reduced, HDFS is made to be suitble to store image small documents, index is in addition introduced, prefetches the reading speed for effectively accelerating file with caching mechanism.

Description

Medical image cloud storage platform designing method

Technical field

The present invention relates to medical image memory technologies.

Background technology

With the extensive use of image documentation equipment clinically, medical image data rapid expansion, region medical image data Reach PB grades, it is high using conventional store framework expense.Present Hospital PACS mostly uses greatly " online-near line-is offline " Storage mode, the availability of off-line data is excessively poor under this pattern, and cannot obtain in real time.With the hair of mobile Internet Exhibition, people propose new demand to existing medical service pattern, and the storage of a high-performance, highly reliable large nuber of images is System will be the foundation stone of everything.

Prior art scenario

On solving the problems, such as that small documents store and access, it can be efficiently solved there are many technology.These technologies can be divided into Two classes, one kind are common solutions, and small documents are exactly merged into big file by the core concept of such methods；Another kind of is needle To the solution of particular problem, this kind of scheme usually combine specific area data itself the characteristics of, for certain Railway Project into Row optimization.

For general certainly scheme, main HAR, SequenceFile, MapFile provided including Hadoop itself and For the improvement of these schemes.These schemes can effectively reduce the number of file in HDFS, but be applied to medical image There is some defects for storage.

And the spy of medical image is not related to due to being not Medical Imaging for the solution of particular problem There is feature, so being not appropriate for being applied directly to Medical Imaging.

Invention content

The merging of medical image small documents is the key technology of region medical image platform, and main here there are two problems：

1. how to solve the problems, such as that HDFS is not suitable for storing medical image small documents；

2. how quickly to read image small documents from the file after merging.

The present invention is proposed for the insufficient present invention of existing scheme for different images type from the two angles More tactful pooled models of file and being proposed based on medical image information hierarchical model prefetch and caching mechanism.

The present invention needs technical solution to be protected to be characterized as：

A kind of medical image cloud storage platform designing method, which is characterized in that in terms of storage, propose SData files Design scheme, i.e.,：Image file all in multiple inspection files is merged, it is made to become one close to the big of 128MB File, this big file present invention are known as SData files.The maximum capacity of SData document containers is 128MB, toward document container Before middle addition checks file, it can judge that the capacity of current file container adds and check whether the size of file is more than threshold value, if A document container is then created more than threshold value.

The SData files are made of SData index files and SData data file two parts, index file and data File corresponds.SData File formats designed by the present invention are based on SequenceFile and are extended to it, Introduce index, it is proposed that the document container of oneself.

The SData index files are divided into file header and file body two parts, and wherein file header includes file mark, name Mapping table and synchronous mark point, title mapping table is claimed to record small documents name and corresponding serial number in the form of key-value pair.File Body is made of several file records, and a record is made of three parts, respectively the length of file record, serial number (fixed word Section), file address.

The SData data files are equally divided into file header and file body two parts, file header and SData index files Equally.The file body is made of several file records；One record is made of three parts, respectively file record Length, serial number (fixed byte), file content.

Small documents set is merged into several SData data files and uploads in HDFS, corresponding SData index texts Part is stored in index pond.

Further, more tactful Merge Scenarios of small documents are provided, i.e.,：It is checked for caused by different image documentation equipments File takes different consolidation strategies.Based in medical image information hierarchical model, according to patient-inspection-sequence-image 4 A grade carrys out tissue medical image file.One patient may have multiple inspections, and each check may include one or more figures As sequence, there are one each sequences or multiple images.

For usually there are multiple sequences as the such inspection file of MR, CT, there are multiple image files under a sequence, In practical application, doctor often calls a sequential file to check, for such file, is with sequential file Unit is merged into the SData files, i.e., the All Files in the same sequence are stored in the same SData files, and one Different sequences under secondary inspection are potentially stored on two SData files.

For generating the seldom image type of image quantity as this single inspections of CR, it is merged into as unit of checking file The SData files are stored in the same SData files with the All Files under primary check.

Inspection file is mainly established the mapping of filename by Piece file mergence model according to respective rule, and is classified, will Sorted file merges according to consolidation strategy while establishing index.

Further, the pretreatment process scheme of file is provided：

1) file is read, judges whether file is DICOM file, if then continuing flow, if not then terminating flow.

2) data element in DICOM file is parsed, such as file UID, file name, check data, file type etc..

3) File Mapping table is generated, records file UID, check data, (serial number is in merging process for file type and serial number Generate), big file designation rule：Check data _ file type _ serial number .suffix, File Mapping table are used for establishing to be combined small Mapping relations after file and merging between big file.

4) as unit of checking file, carry out tissue examination file according to two catalogue levels of check data and file type.

Further, entire merging flow scheme is provided:

1) as unit of one day, all inspection files of this day are scanned, the pretreatment for style of writing part of going forward side by side.According to their class Type carries out file organization, will belong to same type of inspection Piece file mergence.

2) being merged according to more strategies of small documents will check Piece file mergence into SData document containers, different images equipment institute The inspection file just generated uses different consolidation strategies.

3) file is taken out successively from inspection file, be put into SData data files, and generate their index simultaneously, Their index is placed in SData index file containers.

4) capacity of current file container is judged plus checking whether the size of file is more than threshold value, if it exceeds threshold value is then Create a document container.

5) finally the SData file sets of generation are stored in by HDFS clients in HDFS clusters, index file is put Into index pond.

A kind of medical image cloud storage platform designing method, which is characterized in that in terms of reading, it is also proposed that secondary index text The design of the design scheme of part, the secondary index file is divided into two parts.

First layer index by big file check data (such as：20180307), inspect-type and serial number gencration.

Second part is the index file of each SData files, and index file stores small in corresponding data file The address of file.

In order to accelerate the lookup of index, a serial number and small text are added in the header file of data file and index file The mapping table of part title.The present invention manages index file and data in such a way that centrally stored and distributed storage is combined Index file is centrally stored in index pond by file, and data file uploads in HDFS, to realize index file sum number It is detached according to file.

Further, it gives and prefetches and cache strategy protocol：

In practical applications, in order to comprehensively analyze patient's state of an illness information, doctor is in the medical image data for having access to patient When would generally obtain it is primary check or all images of a sequence, without only obtaining individual a certain medical image.This Invention uses the pre- reading process fetched and accelerate file, will be from by the block comprising requested document when asking a certain image HDFS reads file server, while triggering prefetches activity, and parts of images file will be prefetched in same inspection.

This single of CT, MR, which is checked, the type of a large amount of DICOM files in file, be pre- with a small paper series Unit, a small paper series is taken to contain 30 DICOM small documents, prefetching number here can be changed according to actual conditions Become.

And the limited file type of DICOM file number in file is checked for this singles of CR, checking that level is enterprising Row prefetches.When asking a certain image file, whether have this document, when not having this article in caching if can first look in caching When part, triggering prefetches activity, and associated documents are cached in file server.

Further, the signified caching of the present invention is built up between HDFS and client, is realized by file server. User by client request one open image when, first check file server whether there is this document, and if it exists, then directly return It back to client, if being not present, is read out from HDFS, and file is cached in file server.It is such Design can reduce frequent access of the client to HDFS.

File reads the big filename obtained according to mapping relations corresponding to request small documents, and reads corresponding local Index file, the DataNode where big filename and the small documents address in big file to big file are read out It is cached with prefetching.

Advantageous effect

Memory consumption of the medical image small documents to NameNode in HDFS is effectively reduced, HDFS is made to be suitble to store shadow As small documents, in addition introduces index, prefetches the reading speed for effectively accelerating file with caching mechanism.

Description of the drawings

Fig. 1 files store and read flow chart

Fig. 2 SMERGE pooled model figures

Fig. 3 medical image cloud storage system technology paths

Specific implementation mode

Technical solution of the present invention is described in detail below in conjunction with attached drawing

In the medical image storage system that the prior art is built based on HDFS, although have the advantages that it is many, can not What is avoided be Hadoop itself is the life for big file, and medical image file is typical small documents, common CT, MR figure As size is in hundreds of KB or so, if directly small documents are stored in HDFS, can make the pressure of NameNode nodes drastically on Rise -- more memories are occupied to store metadata；Frequent write-in and reading medical image file can form NameNode huge Big challenge, while handling large amount of small documents and also implying that more MapReduce tasks, this certainly will will increase CPU overhead, drop Low cluster operational efficiency, so in the case where not doing any optimization, applying can substantially reduce in the upper performance of medical image storage, For this purpose, the present invention is based on HDFS, start with from the storage and reading of medical image, provides a set of medical image cloud storage platform Design scheme.

The invention discloses a kind of medicine small documents pooled models --- SMERGE pooled models.

Based on SMERGE pooled model schemes, introduced in terms of three below：

One, file stores

Two, file is read, and

Three, system realization scheme

In terms of file storage, include the pretreatment of (1) file, the design of (2) SData files, more plans of (3) small documents Slightly merge；

In terms of file reading, including (1) secondary index mechanism, (2) file prefetch caching mechanism.

One, it stores

(1) pretreatment of file

Pretreatment is divided into following several steps：File is read, file judges, document analysis, File Mapping and file organization. It is as follows to pre-process detailed process (left part flow diagram as shown in Figure 1).

(2) design (SMERGE pooled models as shown in Figure 2) of SData files

Image file all in multiple inspection files is merged, it is made to become a big file close to 128MB, This big file present invention is known as SData files.The maximum capacity of SData document containers is 128MB, is added into document container Before checking file, the capacity of current file container can be judged plus checking whether the size of file is more than threshold value, if it exceeds threshold Value then creates a document container.Reason for doing so is that：(1) any one file of Hadoop can all occupy a data Block, for example, a 1MB file, it is practical only to occupy the disk space of 1MB, but the data block of a 128MB can be occupied.To the greatest extent It is possible close to 128MB store can save a large amount of data block number.(2) another, which is advantageous in that, avoids checking one File is assigned in storing process in 2 data blocks.

(3) more strategies of small documents merge

The image file number and size and image documentation equipment generated in checking process of patient is relevant.Different images are set The total amount of data diversity ratio of standby single inspection is larger, is taken for inspection file caused by different image documentation equipments different Consolidation strategy.

In medical image information hierarchical model, carry out tissue medical image according to patient-inspection-grade of sequence-image 4 File.One patient may have multiple inspections, and each check may include one or more image sequences, and there are one each sequences Or multiple images.More strategies merge in the present invention, are to be based on medical image information hierarchical model.

The similar such inspection file of MR, CT usually has multiple sequences, there is multiple image files under a sequence, real In the application of border, doctor often calls a sequential file to check that, for such file, we are with sequential file Being merged into the SData files for unit, i.e., the All Files in the same sequence are stored in the same SData files, and Different sequences under primary inspection are potentially stored on two SData files.

For generating the seldom image type of image quantity as this single inspections of CR, we are closed as unit of checking file It goes forward side by side the SData files, i.e., is stored in the same SData files with the All Files under primary check.

Inspection file is mainly established the mapping of filename by Piece file mergence model according to respective rule, and is classified, will Sorted file merges according to consolidation strategy while establishing index, and entire merging flow detailed annotation is as follows:

Two, it reads

(1) design of secondary index file

The design of secondary index file is divided into two parts.

(2) it prefetches and cache policy

It is a kind of widely used storage optimization technology to prefetch.It significantly reduces the cost of magnetic disc i/o, and passes through profit With correlation between access file, data are extracted before asking related data and carry out the effective sound for improving file in caching Answer speed.Currently, HDFS does not provide pre-fetch function, considers the intrinsic correlation between medical image file and access part Property, using disclosed by the invention prefetch reading performance can be improved with cache policy.

This single of CT, MR, which is checked, the type of a large amount of DICOM files in file, we are with a small paper series To prefetch unit, a small paper series contain 30 DICOM small documents, and prefetching number here can carry out according to actual conditions Change.

And the limited file type of DICOM file number in file is checked for this singles of CR, we are checking level On prefetched.When we ask a certain image file, whether have this document, when in caching if can first look in caching When not having this document, triggering prefetches activity, and associated documents are cached in file server.

Signified caching of the invention is built up between HDFS and client, is realized by file server.User passes through Client request one open image when, first check file server whether there is this document, and if it exists, be then directly returned to client End, if being not present, is read out, and file is cached in file server from HDFS.Such design can be with Reduce frequent access of the client to HDFS.

File reads the big filename obtained according to mapping relations corresponding to request small documents, and reads corresponding local Index file, the DataNode where big filename and the small documents address in big file to big file are read out It is cached with prefetching.File is read, right side flow diagram as shown in Figure 1:

It is as follows that file reads flow detailed annotation：

1) user sends the request of an image file

2) by asking the filename of image file to map the big filename after being merged

3) index is read from level-one index pond, obtains big file address

4) it reads and indexes from secondary index pond, obtain small documents serial number and its address in big file

5) judge whether small documents have caching, if any step 6 is executed, be such as not carried out step 7

6) file destination in caching is found according to index

7) file destination in hard disk is found according to index

8) it obtains file and returns

Three, it is based on above technical scheme introduction, further provides the implementation of medical image small documents storage system.

The technology path of whole system is as shown in figure 3, the general frame of system includes mainly three parts：Client layer, place Manage layer, accumulation layer.

System reads mainly from storage and displaying aspect is started with, in terms of storage and reading, we used Hadoop Storage architecture is come the problem of carrying out the storage of medical image, be not suitable for small documents for Hadoop, on Hadoop storage architectures Construct one layer of small documents layer (i.e. process layer).Small documents layer, which is mainly responsible for, to be converted into small documents to be suitble to the big file of storage. And small documents are pre-processed, mapped, indexed and are prefetched with the operations such as caching.In terms of displaying, user can be by clear Device is look to carry out the access medical image file of cross-terminal, we use JS resolution files, and are shown by Canvas drawing Image.

Brief summary

Medical image data rapid expansion, region medical image data reach PB grades, using conventional store framework expense pole It is high.Present Hospital PACS mostly uses greatly the storage mode of " online-near line-is offline ", and off-line data can under this pattern It is excessively poor with property, and cannot obtain in real time.The medical image data for PB grades mostly uses HDFS to build medicine shadow at present As storage system, but HDFS itself designs for big file, and medical image file itself is small documents.Traditional way is Single is checked into all DICOM Piece file mergences in file into a big file, and the file having ignored after merging is to remain Small documents, the present invention have certain difference in view of common medical image Type C T, CR and US etc. in file size and quantity It is different, it is proposed that point tactful pooled model of different images type.The small documents of intraday same image modality are merged first 2 layer indexs are established at a big file, and for them.Secondly, it using the correlation between medical image small documents, introduces pre- Take and caching mechanism, improve access small documents stores and accesses efficiency.The invention can effectively reduce in HDFS The load of NameNode improves the medical image small documents stored and accessed on HDFS (Hadoop distributed file systems) Efficiency.

Claims

1. a kind of medical image cloud storage platform designing method, which is characterized in that in terms of storage, propose setting for SData files Meter scheme, i.e.,：Image file all in multiple inspection files is merged, it is made to become a big text close to 128MB Part, this big file present invention are known as SData files；The maximum capacity of SData document containers is 128MB, into document container Before addition checks file, the capacity of current file container can be judged plus checking whether the size of file is more than threshold value, if super It crosses threshold value and then creates a document container；

The SData files are made of SData index files and SData data file two parts, index file and data file It corresponds；SData File formats designed by the present invention are based on SequenceFile and are extended to it, introduce Index, it is proposed that the document container of oneself；

The SData index files are divided into file header and file body two parts, and wherein file header includes file mark, and title is reflected Firing table and synchronous mark point, title mapping table record small documents name and corresponding serial number in the form of key-value pair；File body by Several file records form, and a record is made of three parts, respectively the length of file record, serial number (fixed byte), File address；

The SData data files are equally divided into file header and file body two parts, and file header is as SData index files； The file body is made of several file records；One record is made of three parts, respectively the length of file record, Serial number (fixed byte), file content；

Small documents set is merged into several SData data files and uploads in HDFS, the storage of corresponding SData index files In indexing pond.

2. medical image cloud storage platform designing method as described in claim 1, which is characterized in that it is further, it provides small More tactful Merge Scenarios of file, i.e.,：Different merging plans is taken for inspection file caused by different image documentation equipments Slightly；Based in medical image information hierarchical model, carry out tissue medical image text according to patient-inspection-grade of sequence-image 4 Part；One patient may have multiple inspections, and each check may include one or more image sequences, each sequence there are one or Multiple images；

For usually there are multiple sequences as the such inspection file of MR, CT, there are multiple image files under a sequence, reality In, doctor often calls a sequential file to check, for such file, as unit of sequential file The SData files are merged into, i.e., the All Files in the same sequence are stored in the same SData files, and once examine Different sequences under looking into are potentially stored on two SData files；

For generating the seldom image type of image quantity as this single inspections of CR, it is merged into as unit of checking file described SData files are stored in the same SData files with the All Files under primary check；

Inspection file is mainly established the mapping of filename by Piece file mergence model according to respective rule, and is classified, and will be classified File afterwards merges according to consolidation strategy while establishing index.

3. medical image cloud storage platform designing method as claimed in claim 2, which is characterized in that it is further, provide text The pretreatment process scheme of part：

1) file is read, judges whether file is DICOM file, if then continuing flow, if not then terminating flow；

2) data element in DICOM file is parsed, such as file UID, file name, check data, file type etc.；

3) File Mapping table is generated, records file UID, check data, (serial number is raw in merging process for file type and serial number At), big file designation rule：Check data _ file type _ serial number .suffix, File Mapping table are used for establishing small text to be combined Mapping relations after part and merging between big file；

4. medical image cloud storage platform designing method as claimed in claim 3, which is characterized in that it is further, it provides whole A merging flow scheme:

1) as unit of one day, all inspection files of this day are scanned, the pretreatment for style of writing part of going forward side by side；According to their type into Row file organization will belong to same type of inspection Piece file mergence；

2) Piece file mergence will be checked into SData document containers by being merged according to more strategies of small documents, and different images equipment is just produced Raw inspection file uses different consolidation strategies；

3) file is taken out successively from inspection file, be put into SData data files, and generate their index simultaneously, by it Index be placed in SData index file containers；

4) capacity of current file container is judged plus checking whether the size of file is more than threshold value, if it exceeds threshold value then creates One document container；

5) finally the SData file sets of generation are stored in by HDFS clients in HDFS clusters, index file is put into rope Draw in pond.

5. the medical image cloud storage platform designing method as described in Claims 1-4 is any, which is characterized in that in reading side Face, it is also proposed that the design of the design scheme of secondary index file, the secondary index file is divided into two parts；

First layer index by big file check data (such as：20180307), inspect-type and serial number gencration；

Second part is the index file of each SData files, and index file stores corresponding data file small file Address.

6. medical image cloud storage platform designing method as claimed in claim 5, which is characterized in that in order to accelerate looking into for index It looks for, the mapping table of a serial number and small documents title is added in the header file of data file and index file.

7. medical image cloud storage platform designing method as claimed in claim 6, which is characterized in that the present invention is deposited using concentration Storage and the mode that is combined of distributed storage manage index file and data file, and index file is centrally stored in index Pond, and data file uploads in HDFS, to realize index file and data file separation.

8. medical image cloud storage platform designing method as claimed in claim 5, which is characterized in that it is further, it gives Prefetch and cache strategy protocol：The pre- reading process fetched and accelerate file is used, will include to be asked when asking a certain image Ask the block of file that will read file server from HDFS, while triggering prefetches activity, parts of images file will in same inspection It is prefetched；

This single of CT, MR, which is checked, the type of a large amount of DICOM files in file, be to prefetch list with a small paper series Position, a small paper series contain 30 DICOM small documents, and prefetching number here can be changed according to actual conditions；

And the limited file type of DICOM file number in file is checked for this singles of CR, it is carried out on checking level pre- It takes；When asking a certain image file, whether have this document, when not having this document in caching if can first look in caching When, triggering prefetches activity, and associated documents are cached in file server.

9. medical image cloud storage platform designing method as claimed in claim 8, which is characterized in that further, the present invention Meaning caching is built up between HDFS and client, is realized by file server；User is opened by client request one Image when, first check file server whether there is this document, and if it exists, be then directly returned to client, if being not present, It is read out from HDFS, and file is cached in file server；

File reads the big filename obtained according to mapping relations corresponding to request small documents, and reads corresponding local index File, the DataNode where big filename and the small documents address in big file to big file are read out and prefetch Caching.