CN104679892A

CN104679892A - Medical image storing method

Info

Publication number: CN104679892A
Application number: CN201510117839.8A
Authority: CN
Inventors: 徐宇
Original assignee: CHENGDU YINGTAI SCIENCE & TECHNOLOGY Co Ltd
Current assignee: CHENGDU YINGTAI SCIENCE & TECHNOLOGY Co Ltd
Priority date: 2015-03-18
Filing date: 2015-03-18
Publication date: 2015-06-03

Abstract

The invention provides a medical image storing method. The method comprises the following steps that a PACS (picture archiving and communication system) is set to be an online-pigeonhole secondary storage architecture; a cloud storage system takes Hadoop as a infrastructure; the Hadoop operates distributed parallel programs on a cluster system consisting of a large amount of nodes to finish calculation of mass data; an HDFS (Hadoop Distributed File System) adopts a client-server architecture to divide a file into blocks; the blocks are stored on different data nodes in a distributed way; an abstract layer used for processing file is created for a medical image file; acquired image files are stored. According to a method for processing the image data in a PACS system on the basis of cloud calculation, the image storing efficiency and the search speed are increased.

Description

A kind of medical image storage method

Technical field

The present invention relates to image to store and process, a kind of particularly medical image storage method.

Background technology

Along with digitized image technical development, a large amount of medical images produces thereupon, and these massive medical image data can provide service for clinical diagnosis.How effectively managing these medical images and organizing is a difficult problem faced by medical worker.PACS can be medical image storage in digitized form and transmission provides optimal solution, and one of gordian technique of PACS is exactly mass memory.Content-based medical image retrieval, grows up under the background of PACS framework.Medical image retrieval is a typical Data-intensive computing process, and for massive medical image, the medical image retrieval system based on single node is difficult to the requirement of real-time meeting image.PACS system based on cloud computing has distributed, parallel processing capability can carry out decomposition subtask by large-scale task, then subtask is assigned to each working node and jointly finishes the work, for medical image retrieval provides a kind of new approaches.But at present still there is the slow and inefficient phenomenon of retrieval rate based on the Image Retrieval of PACS platform, constrain the development of Medical Image Processing.

Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.

Summary of the invention

For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of medical image storage method, comprising:

PACS is set to online-filing secondary storage framework, cloud storage system is framework based on Hadoop, whole storage system by based on HDFS Physical layer, form as the middle layer of data, services, the interface layer calling above-mentioned service and concrete application layer for the treatment of with storage figure, described Physical layer provides memory capacity, its storage architecture is HDFS, load balancing, data backup is realized by HDFS, and unified memory access interface is outwards provided, the storage of the Interface realization view data that described middle layer is provided by the HDFS accessing Physical layer and reading, described interface layer carries out function package on the basis in middle layer, the functional interface that described application layer then utilizes interface layer to provide, and writes distributed parallel processing application program, described Hadoop uses Java exploitation and parallel processing mass data, be made up of distributed file system HDFS and MapReduce parallel computational model, when carrying out Hadoop exploitation, distributed parallel program is run on the calculating group system be made up of great deal of nodes completing mass data, described HDFS adopts client/server, a HDFS cluster is made up of a namenode and one group of back end, namenode is a Centroid, be in charge of the name space of file system and client to the access of file, in cluster, a node runs a back end, the data be in charge of on its place node store, and be responsible for the read-write requests of process file system client, the establishment of data block is carried out under namenode United Dispatching, delete and copy, HDFS becomes block Divide File, and by these block distributed storage on different back end, each piece can also be copied several parts and be stored on different back end, for medical image files, set up the level of abstraction for the treatment of file, stores processor is carried out to the image file collected.

Preferably, described foundation, for the treatment of the level of abstraction of file, is carried out stores processor to the image file collected, is comprised further:

Using every width medical image as a frame, all image processing and tracking unit once checked are become a sequence image file.View data is kept in pixel data elements, and the pixel data preserved in its codomain is raw data, or the data through encapsulating, and the value of the pixel data of encapsulation is made up of the multiple pixel data streams separated, and represents the image of multiframe with this; Or

The order of medical image files by its sequence number and numbering is divided into groups, the total size of file of each group is 64M, then a compressed file is become to store each group compressing file respectively, when download, decompress(ion) showing immediately after downloading one group of file.

Preferably, the Map stage in described MapReduce parallel computational model, the input data of task are first divided into the piecemeal of some fixed sizes by MapReduce framework, resolve into again multiple key-value pair (Key1, Value1) pass to Map operation to piecemeal; The Map operation of each node is to often organizing after key-value pair processes, form new key-value pair (Key2, Value2), and gather according to the carrying out that Key2 value is identical, form (Key2, list (Value2)), pass to the input of Reduce as Reduce, key-value pair identical for Key2 value is passed to identical node to carry out the process in Reduce stage by Map operation;

In the Reduce stage, (Key2, list (Value2)) that Map exports becomes the input in Reduce stage, obtains key-value pair (Key3 after carrying out distributed treatment for input, Value3), need to output to HDFS or HBase database according to user.

The present invention compared to existing technology, has the following advantages:

The present invention proposes based on the image processing method in the PACS system of cloud computing, improve image storing efficiency and retrieval rate.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the medical image storage method according to the embodiment of the present invention.

Embodiment

Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.

In order to improve medical image retrieval efficiency, an aspect of of the present present invention provides a kind of medical image storage method.Fig. 1 is the medical image storage method flow diagram according to the embodiment of the present invention.PACS framework of the present invention is " online-filing " secondary storage framework.So both can meet the demand of PB level memory capacity, also can realize the real time access of original " off-line " data, elevator system availability.Cloud storage system framework based on Hadoop of PACS, whole framework by based on HDFS Physical layer, for the treatment of with storage figure as data, services middle layer, call these service interface layers and concrete application layer form.Physical layer, namely memory device has the memory capacity of magnanimity, and storage architecture is HDFS, realizes the function such as load balancing, data backup by HDFS, and outwards provides unified memory access interface.Middle layer realizes storage and the reading of view data, the Interface realization that this function is provided by the HDFS accessing Physical layer.Interface layer does further function package on the basis in middle layer, exploitation is programmed easier.The functional interface that application layer then utilizes interface layer to provide, writes distributed parallel processing application program.

The Distributed Calculation Open Framework Hadoop of cloud platform uses support exploitation and the parallel processing mass data of Java, primarily of distributed file system HDFS and MapReduce parallel computational model composition.When carrying out Hadoop exploitation, distributed parallel program is run on the calculating large-scale cluster system be made up of great deal of nodes completing mass data, and scheduling, distributed storage, fault-tolerant processing, the problem such as network service and load balance in multiple programming need not be concerned about.

Medical image is all small documents usually, and larger file is about 10M byte as DR, CR, and CT, MR file then only has hundreds of K byte-sized.Because the data block size given tacit consent in HDFS file system is 64M byte, the small documents deposited is too many, will consume a large amount of HDFS host node namenode internal memory.Because each file can by multiple copies, therefore the present invention sets up the level of abstraction of a process small documents, processes the image file that each patient collects.Small file storage and inquire problem is stored for cloud, is optimized by adaptive document system.For the feature that PACS image file type is comparatively single, propose two storage schemes.

The first string regards every width image as a frame, and all image processing and tracking unit once checked are become a sequence image file.In medical image files, view data is kept in pixel data elements, the pixel data preserved in its codomain can be raw data, also can be through encapsulation.The value of the pixel data of encapsulation is made up of the multiple pixel data streams separated, and represents the image of multiframe with this.This scheme could show after file download will be waited complete, instead of download limit, the limit display that doctor is accustomed to, (as CT image, can a thousand sheets be reached) when the image that patient once checks is a lot, the total size of image file reaches the hundreds of M even G order of magnitude, and download time is longer.

Second scheme is packed compressed.The order of the image file of patient by its sequence number and numbering is divided into groups, the total size of file of each group is about 64M, then a compressed file is become to store each group compressing file respectively, like this when download, download one group of decompress(ion) showing immediately, to realize the function of download limit, limit display image.This scheme be also advantageous in that it can't harm the compression of image, compression after file usually less than 1/2 of the total size of original file, significantly reduce network latency.

HDFS of the present invention adopts client/server, and a HDFS cluster is made up of a namenode and one group of back end, and namenode is a Centroid, is in charge of the name space of file system and client to the access of file.Be generally that a node runs a back end in cluster, the data be in charge of on its place node store, and are responsible for the read-write requests of process file system client, carry out the establishment of data block, delete and copy under namenode United Dispatching.HDFS becomes block Divide File, and these blocks are stored on different back end dispersedly, and each piece can also be copied several parts and be stored on different back end, therefore have higher fault-tolerance and the high-throughput to reading and writing data.

First contourlet transformation and LBP is adopted to extract frequency domain and the spatial feature of medical science sample image, then Map operation is adopted it to be mated with the feature in medical image features storehouse, and adopt Reduce operation collect the matching result of Map task and sort, the optimum result for retrieval of medical image is finally obtained according to ranking results.

In Map (mapping) stage in MapReduce model, the input data of task are first divided into the piecemeal of some fixed sizes by MapReduce framework, resolve into again multiple key-value pair (Key1, Value1) pass to Map operation to piecemeal; The Map operation of each node, to often organizing after key-value pair processes, forms new key-value pair (Key2, Value2), and gather according to the carrying out that Key2 value is identical, formed (Key2, list (Value2)), pass to the input of Reduce as Reduce.In general, key-value pair identical for Key2 value is passed to identical node to carry out the process in Reduce stage by Map.

In the Reduce stage, (the Key2 that Map exports, list (Value2)) become the input in Reduce stage, key-value pair (Key3 can be obtained after input is handled accordingly, Value3), need to output to according to user the position that HDFS or HBase database specifies.

Below describe the medical image search method based on PACS in detail.

1. extract Contourlet characteristic of field

Contourlet one deck point stem-butts cutting off Fourior plane is divided into 4 quadrants, and the coefficient after decomposition is made up of 4 subbands, and corresponding direction is π/4+k pi/2, k=0,1,2,3, so Contourlet two-layer decomposition is further divided into 4 parts each quadrant, totally 12 directions, be respectively π/12+k π/6, k=0,1,, 11, the coefficient after decomposition is made up of 16 subbands, 4 subbands wherein around center are low frequency texture component, and remaining is high frequency texture.It is continue to be segmented, if but the number of plies too much just there will be obvious aliasing, so generally adopt 1 ~ 3 layer of decomposition to last layer that Contourlet multilayer is decomposed.

Make f _urepresent the coefficient after Contourlet decomposition, and the n-th subband of real part and imaginary part is designated as f respectively ^nr _uand f ⁿⁱ _u, n=1,2,32, then the average μ of the n-th subband modulus value _nand standard deviation sigma _nbe respectively:

μ_{n} = \frac{1}{MN} Σ_{i = 1}^{M} Σ_{j = 1}^{N} | f_{U} (i, j) | = \frac{1}{MN} Σ_{i = 1}^{M} Σ_{j = 1}^{N} \sqrt{{f^{ni}}_{U} {(i, j)}^{2} + {f^{nr}}_{U} {(i, j)}^{2}}

σ_{n} = \sqrt{\frac{1}{MN} Σ_{i = 1}^{M} Σ_{j = 1}^{N} {(| f_{u} (i, j) | - μ_{n})}^{2}}

Wherein M and N is line number and the columns of each subband, and the final feature of image is:

F＝[μ ₁,σ ₁,μ ₂,σ ₂,…μ _n,σ _n]

2. extract LBP feature

LBP can portray the situation of change of gray scale relative to central point of pixel in field, focuses on the change of pixel grey scale, meets the perception feature of human vision to image texture.Therefore to image zooming-out and using the spatial feature of histogram as image.

Wherein,

s (g_{i} - g_{c}) = \{\begin{matrix} 1, & g_{i} - g_{c} &GreaterEqual; 0 \\ 0, & g_{i} - g_{c} \leq 0 \end{matrix}\}

U(LBP3)＝|s(g ₇-g _c)-s(g ₀-g _c)|

Wherein, g _cbe the gray-scale value of a centre of neighbourhood pixel, g _iwith g _ccentered by the gray-scale value of the clockwise each pixel of 3 × 3 neighborhood.

3. similarity mode

Mean distance tolerance is adopted to Contourlet characteristic of field similarity:

SimContourlet (P,

Q) = Σ_{i = 1}^{6} | {EP}_{i} - {EQ}_{i} |

Wherein, P is medical image to be retrieved, and Q is the image in medical image storehouse, EP _iand EQ _irepresent the mean distance of i-th component of image P and Q respectively;

For image LBP feature, first Regularization is carried out to feature, then adopt Euclidean distance to calculate similarity.

SimLBP (P, Q) = \sqrt{Σ_{i = 1}^{32} {(W_{P_{i}} - W_{Q_{i}})}^{2}}

Wherein, WP _iand WQ _irepresent proper vector after i-th component regularization of image P and Q respectively;

Because SimContourlet with SimLBP span is different, outside Regularization is carried out to them, is specially:

Sim′Contourlet(P，Q)＝(SimContourlet(P，Q)-μ _Contourlet)/6σ _Contourlet

Sim′LBP(P，Q)＝(SimLBP(P，Q)-μ _LBP)/6σ _LBP

Wherein, σ _contourlet, μ _contourletand σ _lBP, μ _lBPrepresent standard deviation and the average of SimContourlet and SimLBP respectively.

The distance finally obtained between two width medical images is:

Sim(P，Q)＝w ₁Sim′Contourlet(P，Q)+w ₂Sim′LBP(P，Q)

Wherein, w ₁and w ₂for weight, and meet w ₁+ w ₂=1.

4. medical image retrieval step

Medical image and feature thereof are all stored in HBase, and when the data set of HBase is very large, the whole table of scanning search will spend the long time.In order to reduce the time of retrieving images and improve recall precision, MapReduce computation module is utilized to carry out parallel computation to medical image retrieval.

Medical image retrieval step based on MapReduce is as follows:

(1) collect medical image, extract corresponding feature, and characteristic is stored in HDFS.

(2) user submits retrieval request to, extracts Contourlet characteristic of field and the LBP feature of medical image to be retrieved.

(3) in the Map stage: the characteristics of image in medical image features to be retrieved and HBase is carried out similarity mode, the output of map operation is key-value pair < similarity, image ID>.

(4) export whole < similarity according to the size of similarity to map, image ID> key assignments carries out sorting and repartitioning, and then is input to reduce node.

(5) the Reduce stage: collect all < similarities, image ID> key-value pair, then these key-value pairs are carried out to the sequence of similarity, top n key-value pair is written to HDFS.

(6) export the ID of the image the highest with medical image similarity to be retrieved, user obtains final medical retrieval result.

In addition, for the consideration of accurate rate and computation complexity and extensibility, another embodiment of the present invention adopts SIFT detect and describe local feature, adopts K mean cluster structure vocabulary, adopt weighing vector to represent image, and build ranking index to realize efficient retrieval.

Because Hadoop is designed to be suitable the instrument of large-scale off-line data processing, do not ensure the real-time of online process, therefore on-line search part is still traditionally carried out, and processed offline part designs based on Hadoop.In order to image data processing better, first introduce a kind of Hadoop image processing method of improvement, this part be divided into proper vector generation, feature clustering, the vector representation of image and ranking index to build three phases on this basis and realize.

The Hadoop image processing method improved specifically describes as follows:

Avoid small documents low efficiency problem in order to image data processing better, a large amount of little image is stored in a large image library file by the thought of reference sequences document method merged file, but the mode stored is no longer key-value pair or the floating-point array of serializing, but all information of original image.So not only effectively reduce the memory requirements to namenode, also reduce the expense of task management, obviously can improve treatment effeciency, the original image information simultaneously preserved is conducive to tackling complicated image processing requirements.In order to realize the random reading to view data, needing an index file, wherein saving the side-play amount of all view data in image library file.Can arbitrary image easily in access images library file by side-play amount.

Proper vector generates:

First in computed image each pixel X=(x, y) at the Hessian matrix of yardstick σ:

H(X，σ)＝L _xx(X，σ)L _xy(X，σ)

L _xy(X，σ)L _yy(X，σ)

Wherein: L _xx(X, σ) is Gauss's second derivative with the convolution of pixel X horizontal ordinate in image; L _xy(X, σ) is Gauss's second derivative with the convolution of pixel X ordinate in image; L _yy(X, σ) is Gauss's second derivative with the convolution of pixel X ordinate in image.This matrix is made up of second derivative, approximate Gaussian core under useful different scale σ calculates, and therefore Hessian value becomes the function of 3 variablees: H (x, y, σ), position when then asking it simultaneously to reach local maximum in spatial domain and scale domain and corresponding yardstick.Its feature descriptor calculates based on small echo: for each unique point, calculate its small echo response in the x and y direction (being designated as dx and dy) in the circular scope that radius is 6 σ, to the response summation in the window of covering 60 °, the direction of the most long vector that rotary window calculates is Main way.Next be the square region of 20 σ by this directional structure vectorical structure size, and be divided into the zonule of 4 × 4, dx response and dy response are calculated to 25 sampled points of every sub regions and sues for peace respectively, every sub regions is extracted to the value of 4 descriptors: [Σ dx Σ dy Σ | dx| Σ | dy|], have the vector that 16 sub regions just obtain one 64 dimension, finally by its normalization.

Because the feature detection between image is separate with description, therefore only above computation process need be encapsulated enter in Map operation, and this one-phase only needs Map part to complete.Be below the MapReduce design of this one-phase:

1)Map。Be input as the image of shape as <id, data>.Map operation each image to input performs SIFT algorithm and extracts proper vector, and adds up the characteristic number fn in this image.The regularization of word frequency later of this characteristic number.Its output form is < (id:fn), feature>.

2)Reduce。The effect of Reduce operation is similar to identical relation, and each key-value pair is only delivered to output by it.

After this one-phase terminates, obtain the description document that is often opened proper vector contained by image.

Feature clustering:

First random selecting K sample is as initial cluster center, is assigned to each bunch, recalculates K Xin Cucu center to remaining its distance to bunch center of each sample evidence; Again its distance to bunch center of each sample evidence is assigned to each new bunch.Iteration like this is until objective function converges or iterate to a fixing step number.

This iterative process can be realized by the mode repeating to call MapReduce task, often starts a MapReduce and calculates a corresponding iteration.Be below the MapReduce design of this one-phase:

1)Map。Be input as shape as <line_num, bunch center of the sample to be allocated of ((id:fn), feature) > and last iteration (or initial).Here (id:fn) does not participate in calculating the characteristic number only comprised for the image belonging to identification characteristics and image.The each sample of Map operation to input calculates nearest bunch center and marks new bunch classification.Its output form is <cluster_id, ((id:fn), feature) >.

2)Reduce。Be input as shape as <cluster_id, the sample list of [((id:fn), feature)] >, (id:fn) here does not participate in calculating equally.The sample that all cluster_id are identical all flows to same Reduce task.The identical number of samples of the cumulative cluster_id of Reduce operation and each sample component of a vector with, ask the average of each component to obtain Xin Cu center.Its output form is <cluster_id, cluster_mean>.

Obtain description document and a vocabulary description document that one is often opened vocabulary belonging to feature contained by image and feature thereof after this one-phase terminates, wherein cluster_id is vocabulary numbering, and a bunch center cluster_mean represents vocabulary.

The vector representation of image and ranking index build:

Each image is represented as a vector, the vocabulary wherein in the corresponding vocabulary of each component, the weighted value that the value of component precomputes.When certain vocabulary does not occur in the picture, the component of its correspondence is 0.

After image is expressed as vector, the similarity between them can adopt cosine similarity to calculate.When the dimension of vector is very high, when total number of images is a lot, this computation process cost is very large, needs an efficient index structure.Because the vocabulary in vocabulary appears in same image seldom simultaneously, therefore image vector is sparse, has a lot of 0 component.Therefore for not needing to participate in calculating containing the image of vocabulary in query image, ranking index can realize this filtration.

Following MapReduce design achieves the structure of weight computing and ranking index:

1)Map。Be input as shape as <line_num, the description document of vocabulary belonging to feature contained by the image of (cluster_id, ((id:fn), feature)) > and feature.Here the information of cluster_id, id and fn is only needed.Map operation extracts shape as <cluster_id to each input, and the key-value pair of (id:fn) > is as output.

2)Reduce。Be input as shape as <cluster_id, [(id:fn)] > records list.The record that all cluster_id are identical all flows to same Reduce task.Reduce operation adds up to Two Variables tc, dc respectively to the value list [(id:fn)] that same cluster_id records, to tc, dc, 1 is added to each emerging id simultaneously, only to tc, 1 is added to each id occurred, then ask logarithm to obtain inverse document frequency with N divided by dc, the tc of each id is obtained to the word frequency of regularization divided by the fn of correspondence.Its output form is < (cluster_id:idf), [(id:tf)] >, wherein: (cluster_id:idf) is each vocabulary and respective inverse document frequency, and list [(id:tf)] is order recording table corresponding to this vocabulary.

Obtain a ranking index file after this one-phase terminates, it stores the vector representation of often opening image in image library.During online retrieving, equally SIFT feature vector is extracted, by each characteristic allocation to it apart from minimum vocabulary to query image; Then the weight vectors of computed image, then according to the vocabulary occurred in image from ranking index inquiry order recording table being merged; Finally calculate query image vector and the cosine similarity obtaining image vector, result is by the sequence of similarity height.

In sum, the present invention proposes based on the image processing method in the PACS system of cloud computing, improve image storing efficiency and retrieval rate.

Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.

Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims

1. a medical image storage method, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, described foundation, for the treatment of the level of abstraction of file, is carried out stores processor to the image file collected, comprised further:

3. method according to claim 2, it is characterized in that, the Map stage in described MapReduce parallel computational model, the input data of task are first divided into the piecemeal of some fixed sizes by MapReduce framework, multiple key-value pair (Key1, Value1) is resolved into again to piecemeal and passes to Map operation; The Map operation of each node is to often organizing after key-value pair processes, form new key-value pair (Key2, Value2), and gather according to the carrying out that Key2 value is identical, form (Key2, list (Value2)), pass to the input of Reduce as Reduce, key-value pair identical for Key2 value is passed to identical node to carry out the process in Reduce stage by Map operation;