CN110246567A

CN110246567A - A kind of medical image preprocess method

Info

Publication number: CN110246567A
Application number: CN201810186353.3A
Authority: CN
Inventors: 王国利; 李亮; 郭斌; 刘力; 徐琰
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2018-03-07
Filing date: 2018-03-07
Publication date: 2019-09-17
Anticipated expiration: 2038-03-07
Also published as: CN110246567B

Abstract

The invention discloses a kind of medical image preprocess methods, comprising the following steps: label information semantic image: pre-read digital medical image and its label information, and the text label information of tape format is converted to multilayer subseries mask image using distinguished number；Interest extracted region: reading digital medical image, removes transparency channel and obtains image, extracts the tissue regions profile in current slice image, divide an image into tissue regions and background area；More exposure mask sample classifications extract: utilizing multilayer subseries exposure mask generated, positive sample and negative sample are extracted in the tissue regions, and sample data information is packaged, form the structural data that can be applied to neural network model training and prediction.By the technical program, more efficient and more accurate data prediction may be implemented.

Description

A kind of medical image preprocess method

Technical field

The present invention relates to technical field of medical image processing more particularly to a kind of medical image preprocess methods.

Background technique

The following contents is only the background introduction about present techniques of inventor's understanding, not necessarily constitutes existing skill Art.

Computer-aided diagnosis (computer aided diagnosis, CAD) refers to by iconography, medical image Reason technology and other possible physiology, biochemical apparatus, in conjunction with the analytical calculation of computer, auxiliary discovery lesion improves diagnosis Accuracy rate method, the extensive use of " the third eye " of the doctor that is otherwise known as, CAD system helps to improve diagnosis Sensibility and specificity.

In order to accurately and efficiently utilize these information, the computer-aided diagnosis research based on cancer medical image becomes Industry hot spot, research and most widely used field are to manage medical image by machine learning and deep learning at present Solution and identification, the computer-aided diagnosis based on machine learning mainly include the content of four aspects: (1) image preprocessing；(2) feel The segmentation of region of interest (ROI)；(3) feature extraction, selection and classification；(4) identification (classification or segmentation) of tumor region.Its In, the ultrahigh resolution in image preprocessing part, pathology medical image proposes huge challenge to pretreatment mode.

The existing Classification and Identification technology based on image segmentation is generally used for the Pathologic image analysis under small resolution ratio, there is no Method effectively handles the super-resolution digital medical image of such huge data volume.In addition, in medical image recognition task, one Sample used in aspect includes puncture sample used in postoperative large slice sample and early screening, on the other hand, each number The form organized in word slice, area accounting is different, so that the calculation amount of sample extraction and accuracy also become a pair of It is difficult to the contradiction balanced.Front end and input data source as artificial neural network, how efficiently and rapidly to medical image number According to being pre-processed, have become one of the project needed to be studied in field of medical imaging.

Summary of the invention

For overcome the deficiencies in the prior art, technical problem solved by the invention be to provide one kind may be implemented efficiently, The accurately medical image preprocess method of data prediction.

In order to solve the above technical problems, the technical solution adopted in the present invention content is specific as follows:

A kind of medical image preprocess method, comprising the following steps:

Label information semantic image: pre-read digital medical image and its label information, it will be with lattice using distinguished number The text label information of formula is converted to multilayer subseries mask image；

Interest extracted region: reading digital medical image, removes transparency channel and obtains image, extracts current slice image In tissue regions profile, divide an image into tissue regions and background area；

More exposure mask sample classifications extract: utilizing multilayer subseries exposure mask generated, sun is extracted in the tissue regions Property sample and negative sample, and sample data information is packaged, formation can be applied to neural network model training and prediction Structural data.

Efficiently and rapidly medical image is pre-processed to realize, inventor uses doctor in the technical scheme The coordinate mapping policy between image pyramid multiresolution level is learned, the identification exposure mask of two kinds of granularities of thickness is constructed, passes through seat Mark establishes connection, is respectively completed the target for carrying out quickly positioning to tissue regions and accurately dividing.

Compared to other modes, the quick positioning and accurate division that tissue regions are carried out in the technical program, Neng Goushi Now efficiently and rapidly medical image is pre-processed.

In one or more embodiments, in the label information semantic image step:

More specifically, the transparency channel is the channel Alpha；

More specifically, the format of the text label is XML format；

It should be noted that the format of text label can be using a variety of, in a kind of embodiment, which is XML format, remaining embodiment can be according to actual needs using other corresponding different formats.

More specifically, the distinguished number is closed polygon coordinate distinguished number.

It should be noted that in the present embodiment, the distinguished number uses closed polygon coordinate distinguished number, phase Compared with other distinguished numbers, directly differentiated between pixel and arest neighbors polygon vertex due to having based on similar triangle theory Positional relationship the characteristics of, therefore can quickly determine each pixel whether be closed polygon encirclement, to quickly be converted For multilayer subseries mask image, the operation efficiency under the step is improved, and then improves the efficiency of data prediction.

In one or more more specific embodiments, the label information semantic image step is specifically included:

The resolution information for loading the digital medical image constructs the zero moment of same size according to the resolution information Battle array is as blank exposure mask of equal value；

Load the label information of XML format corresponding to the digital medical image, the label information point of the XML format The coordinate information of the closed polygon of several tab areas is not had recorded；

Determine whether pixel is in any one closed polygon using closed polygon coordinate distinguished number, and according to The text label data of XML format are converted to mask image by this, and obtaining includes multi-level point for marking exposure mask and rejecting exposure mask Class exposure mask.

It is described in the closed polygon coordinate distinguished number as one or more more specific embodiments, institute The positional relationship for stating pixel and closed polygon is obtained by following formula:

In_ploy=(E_1y-P_y)(E_1x-E_0x)-(E_1x-P_x)(E_1y-E_0y)

Wherein E₀、E₁Indicate two endpoints of the closed polygon a line, x, y indicate its transverse and longitudinal coordinate；P is indicated The pixel for needing to judge.

It should be noted that the formula need to only utilize polygon with the closing of the pixel arest neighbors for a certain pixel The coordinate on two vertex of a line of shape, can directly determine whether it is located at this polygonal internal.Compared to other modes, This formula, which can be realized more rapidly, is converted into multilayer subseries mask image, is improved under the step to a greater extent Operation efficiency, and then the efficiency of data prediction is improved to a greater extent, to realize the goal of the invention of this programme.

In one or more embodiments, in the label information semantic image step further include: the unified number The mark of word medical image and the corresponding mask image, and Correctness checking is carried out by preloading.

In a kind of specific Application Example, for each digital medical image, it is based on the matched side of filename Method re-scales its multilayer subseries exposure mask, such as the label exposure mask being mentioned below and the path for rejecting exposure mask, automatically to all Source digital medical image and its subsidiary mask image are preloaded, and the digital medical image file that can not be read is marked, If it exists, it attempts to regenerate mask image, it will slice file and the removal of XML tag file if failure.By this step into Row Correctness checking avoids subsequent place so as to reach the quick self-checking effect to all original pending data correctness It is interrupted during reason because data can not load.

In one or more embodiments, in the interest region extraction step:

More specifically, the method for reading the digital medical image is read using Openslide；

More specifically, described image is RGB image；

It should be noted that the image in the technical program can be RGB image but it is also possible to be the figure of extended formatting Picture, depending on depending on specific embodiment.

More specifically, further including using color gamut space transfer, corrosion and expansion, to extract tissue regions after obtaining image Profile.

It should be noted that can quickly be defined in medical image by preset color threshold by this technical characteristic Tissue regions, large stretch of tissue regions in image further can be directly oriented under large scale level, to improve emerging The efficiency of interesting region extraction step, and then the technical program is improved for the efficiency of data prediction.

In one or more more specific embodiments, the interest region extraction step is specifically included:

The digital medical image is read using Openslide, the channel Alpha is removed and obtains RGB image, arrived using RGB Space transfer, corrosion core and the processing for expanding core of HSV colour gamut, demarcate the tissue outer profile conduct in the digital medical image Background area, while the interest identification region ROI using the external limitting casing of tissue regions as the digital medical image.

In a kind of specific Application Example, RGB is obtained using Openslide load source digital medical image S and is schemed Picture removes transparency channel, obtains RGB image, is then converted into HSV space from rgb space, obtains the two of digital medical image Value image M₀, the scattered cavity in large stretch of tissue area is filled up using expansion core, recycles corrosion core to eliminate noise, is partitioned into solely Vertical tissue regions element, final extract obtain the profile of tissue regions and tissue mask image M in medical image₁, background It is distinguished with tissue regions.Meanwhile in M₁On its external limitting casing is generated to each white block of separation.

In one or more embodiments, in more exposure mask sample classification extraction steps: by sample number it is believed that It is by being realized using TFrecords that breath, which is packaged,.

It should be noted that the information such as sample metadata, label are directly encapsulated into a record by TFrecords, and A large amount of record can be packaged into an independent file, reduce code redundancy, the I/O load of system is greatly reduced, to reach To load resource is saved, data loading efficiency is improved, and then makes the promotion of data-handling efficiency indirectly.

In one or more embodiments, in more exposure mask sample classification extraction steps: the sample number it is believed that Breath includes one of the characteristic information of sample, source-information, position coordinates, image information, label data or a variety of；

More specifically, the characteristic information includes one of sample file name, sample store path or a variety of；It is described next Source information includes the filename of samples sources digital medical image；The position coordinates include on original image Level-0 coordinate system Centre coordinate；Described image information includes sample image data file；The label data includes sample label.

It should be noted that characteristic information is to determine specific path position of the sample in data set library；Source-information To determine the corresponding relationship of sample and source medical image, both the above information is used to trace to the source to the subsequent sample that leaves a question open； Position coordinates are to trace specific location of the sample in corresponding medical image that leave a question open；Image information is sample image, as base This content；Label data has determined the positive negativity of the sample.

In one or more more specific embodiments, more exposure mask sample classification extraction steps are specifically included:

The result that the label exposure mask and the rejecting exposure mask subtract each other identifies exposure mask as diseased region, scans institute using sliding window When stating the limitting casing of interest identification region ROI, first identify that exposure mask removes the rejecting exposure mask according to tissue regions, then according to disease Become area's identification exposure mask and tissue regions are divided into positive and feminine gender, extracts positive sample and negative sample respectively.

As one or more more specific embodiments, during demarcating tissue outer profile, retain colour gamut transformation Resulting first binary map identifies exposure mask as the tissue regions under small scale after operation, is scanning the limitting casing using sliding window When, while the white pixel accounting that the tissue regions under same position under small scale identify exposure mask is calculated by coordinate transformation, lead to Empty background area that may be present in preset threshold removal tissue is crossed, tissue regions sample is obtained.

It should be noted that due to original image enormous size, identification exposure mask (being binary map) obtained in basic scheme is The lower binary map of fineness under large scale, and when extracting sample, sliding window is slided in original image, can be potentially encountered large scale two It is worth unrecognized tiny white space under figure.So through the above scheme, the essence being achieved that under more tiny scale True tissue regions sample.

Compared with prior art, the beneficial effects of the present invention are:

1, medical image preprocess method of the invention, using the seat between medical image pyramid multiresolution level Mapping policy is marked, the identification exposure mask of two kinds of granularities of thickness is constructed, is established and is contacted by coordinate, is respectively completed and tissue regions is carried out The target for quickly positioning and accurately dividing, efficiently and rapidly pre-processes medical image so as to realize, in turn Reach and data prediction efficiently, accurately is carried out to medical image.

2, medical image preprocess method of the invention, using closed polygon coordinate distinguished number by the text of tape format Label information be converted to multilayer subseries mask image due to have based on similar triangle theory directly differentiates pixel with it is nearest The characteristics of positional relationship between adjacent polygon vertex, therefore can quickly determine whether each pixel is closed polygon encirclement, To quickly be converted into multilayer subseries mask image, the operation efficiency under the step is improved, and then improve data Pretreated efficiency.

3, medical image preprocess method of the invention, in label information semantic image step further include: in unification The mark of digital medical image and the corresponding mask image is stated, and carries out Correctness checking by preloading；Pass through this Step carries out Correctness checking and avoids so as to reach the quick self-checking effect to all original pending data correctness In subsequent processes because data can not load interrupt caused by processing pause, improve data processing operation fluency and effect Rate.

4, medical image preprocess method of the invention, obtain image after, further include using color gamut space transfer, corrosion and Expansion, to extract tissue regions profile；Medical image can quickly be defined by preset color threshold by this technology feature In tissue regions, large stretch of tissue regions in image further can be directly oriented under large scale level, to improve The efficiency of interest region extraction step, and then the technical program is improved for the efficiency of data prediction.

5, medical image preprocess method of the invention, by sample data in more exposure mask sample classification extraction steps It is by being realized using TFrecords that information, which is packaged,；Due to meeting in encapsulation step during Medical Image Processing A large amount of code redundancies are generated, are handled in this scheme using TFrecords, the I/O load of system can be greatly reduced, To reach saving load resource, data loading efficiency is improved, and then makes the promotion of data-handling efficiency indirectly.

6, medical image preprocess method of the invention retains colour gamut transformation behaviour during demarcating tissue outer profile Resulting first binary map identifies exposure mask as the tissue regions under small scale after work, obtains the tissue regions sample of different finenesses This；By this programme, the accurate tissue regions sample being achieved that under more tiny scale, so that identification is more smart Really, the accuracy of image procossing is improved.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects, features and advantages of the invention can It is clearer and more comprehensible, it is special below to lift preferred embodiment, and cooperate attached drawing, detailed description are as follows.

Detailed description of the invention

Fig. 1 is color gamut space transfer in interest region extraction step of the present invention, burn into expansion and outer profile generation phase Schematic diagram；

Fig. 2 is diseased region exposure mask in the more exposure mask sample classification extraction steps of the present invention, rejects exposure mask, diseased region identification exposure mask And normal area identifies exposure mask generation phase schematic diagram；

Fig. 3 is sample of the present invention multi-subarea extracting schematic diagram；

Fig. 4 is closed polygon coordinate diagnostic method calculating process signal in label information semantic image step of the present invention Figure；

Fig. 5 is a kind of frame flow diagram of more preferably embodiment of the present invention.

Specific embodiment

It is of the invention to reach the technical means and efficacy that predetermined goal of the invention is taken further to illustrate, below in conjunction with Attached drawing and preferred embodiment, to specific embodiment, structure, feature and its effect according to the present invention, detailed description are as follows:

Embodiment 1

The present embodiment provides medical image preprocess methods of the present invention comprising following steps:

Interest extracted region: reading digital medical image, removes transparency channel and obtains image, extracts current slice image In tissue regions profile, divide an image into tissue regions and background area, as shown in Figure 1；

More exposure mask sample classifications extract: utilizing multilayer subseries exposure mask generated, sun is extracted in the tissue regions Property sample and negative sample, as shown in Figures 2 and 3；And be packaged sample data information, formation can be applied to neural network The structural data of model training and prediction.

It is the basic embodiment of the technical program one of which above.In present embodiment, inventor is in this skill The coordinate mapping policy between medical image multiresolution level is used in art scheme, the identification of building two kinds of granularities of thickness is covered Film is established by coordinate and is contacted, and the target for carrying out quickly positioning to tissue regions and accurately dividing is respectively completed, by tissue The quick positioning and accurate division that region carries out, can be realized and efficiently and rapidly pre-process to medical image.

Embodiment 2

The present embodiment is a kind of preferred embodiment on the basis of above-described embodiment 1, the present embodiment 2 and above-mentioned reality The difference for applying example is: having in the present embodiment, in the label information semantic image step following one of or more Kind of preferred embodiment, these embodiments individually or can combine and implemented:

In some embodiments, the transparency channel is the channel Alpha.

In some embodiments, the format of the text label is XML format.The format of text label can use A variety of, in a kind of embodiment, which is XML format, remaining embodiment can use other phases according to actual needs Answer different formats.

In some embodiments, the distinguished number is closed polygon coordinate distinguished number.In present embodiment In, the distinguished number uses closed polygon coordinate distinguished number, compared to other distinguished numbers, due to having based on similar Triangle Principle directly differentiates the characteristics of positional relationship between pixel and arest neighbors polygon vertex, therefore can quickly determine Whether each pixel is closed polygon encirclement, to quickly be converted into multilayer subseries mask image, improves the step Under operation efficiency, and then improve the efficiency of data prediction.

It is described in the closed polygon coordinate distinguished number as one or more more specific embodiments, such as Shown in Fig. 4, the positional relationship of the pixel and closed polygon is obtained by following formula:

In_ploy=(E_1y-P_y)(E_1x-E_0x)-(E_1x-P_x)(E_1y-E_0y)

For a certain pixel, which need to only utilize two with a line of the closed polygon of the pixel arest neighbors The coordinate on a vertex, can directly determine whether it is located at this polygonal internal.Compared to other modes, this formula can be more It rapidly realizes and is converted into multilayer subseries mask image, improve the operation efficiency under the step to a greater extent, into And the efficiency of data prediction is improved to a greater extent, to realize the goal of the invention of this programme.

In a kind of specific Application Example, when P point is located at E in the vertical direction₀、E₁Between when (E_0y≤P_y≤ E_1yOr E_0y≥P_y≥E_1y), the value of in_ploy is calculated by above formula.Work as in_ploy=0, indicates pixel on side；Work as in_ Ploy < 0 indicates that pixel is located at the left side on side；Work as in_ploy > 0, indicates that pixel is located at the right on side.

For each of null matrix element, all closed polygons and its all sides are traversed, determines square according to the following formula The positional relationship of pixel and each closed polygon represented by array element element:

Wherein, i indicates i-th side of certain closed polygon, and N indicates the number of edges of closed polygon, and in_ploy indicates i-th Side the differentiation of pixel is contributed.

Work as IN_PLOY=0, then the pixel is the interior point of closed polygon, and it is white that pixel value is updated to (255,255,255) Color.

According to the above method, the text label data of XML format are converted into mask image, obtain label exposure mask and rejecting Exposure mask (background area).

By above-mentioned technical approach, it can be quickly and accurately positioned out identified tab area in XML tag data, it will Text label data are converted to region recognition mask image.

In one or more embodiments, in the label information semantic image step further include: unified above-mentioned number The mark of word medical image and the corresponding mask image, and Correctness checking is carried out by preloading.

Remaining embodiment of the present embodiment is same as the previously described embodiments, and all embodiments cited by the present embodiment are equal It can be combined implementation individually or with above-described embodiment 1, constitute different embodiments, be not repeated herein.

Embodiment 3

The present embodiment is a kind of preferred embodiment on the basis of above-described embodiment 1, the present embodiment 3 and above-mentioned reality The difference for applying example is: in the present embodiment, having following one or more of them preferred in the interest region extraction step Embodiment, these embodiments can individually and also combine implemented:

In some embodiments, the method for reading the digital medical image is read using Openslide.

In some embodiments, described image is RGB image.Image in the technical program can be RGB image, It may also be the image of extended formatting, depending on depending on specific embodiment.

It further include using color gamut space transfer, corrosion and expansion, to mention after obtaining image in some embodiments Take tissue regions profile.By this technical characteristic, the tissue in medical image can be quickly defined by preset color threshold Region further can directly orient large stretch of tissue regions in image, to improve interest region under large scale level The efficiency of extraction step, and then the technical program is improved for the efficiency of data prediction.

In some embodiments, the interest region extraction step is specifically included:

Remaining embodiment of the present embodiment is same as the previously described embodiments, and all embodiments cited by the present embodiment are equal It can be combined implementation individually or with above-described embodiment 1 or 2, constitute different embodiments, be not repeated herein.

Embodiment 4

The present embodiment is a kind of preferred embodiment on the basis of above-described embodiment 1, the present embodiment 4 and above-mentioned reality The difference for applying example is: having in the present embodiment, in more exposure mask sample classification extraction steps following one of or more Kind of preferred embodiment, these embodiments individually or can combine and implemented:

In some embodiments, it is by being realized using TFrecords that sample data information, which is packaged,. The information such as sample metadata, label are directly encapsulated into a record by TFRecords, and a large amount of record can be packaged into list An only file reduces code redundancy, greatly reduces the I/O load of system, to reach saving load resource, improves data Loading efficiency, and then make the promotion of data-handling efficiency indirectly.

In some embodiments, in more exposure mask sample classification extraction steps: the sample data information includes One of the characteristic information of sample, source-information, position coordinates, image information, label data are a variety of；

More specifically, the characteristic information includes one of sample file name, sample store path or a variety of；It is described next Source information includes the filename of samples sources digital medical image；The position coordinates include on original image Level-0 coordinate system Centre coordinate；Described image information includes sample image data file；The label data includes sample label.Preferred side herein In case, characteristic information is to determine specific path position of the sample in data set library；Source-information is to determine sample and come The corresponding relationship of source medical image, both the above information are used to trace to the source to the subsequent sample that leaves a question open；Position coordinates are to trace Leave a question open specific location of the sample in corresponding medical image；Image information is sample image, as basic content；Label data is true The positive negativity of the sample is determined.

In some embodiments, more exposure mask sample classification extraction steps are specifically included:

As one or more more specific embodiments, during demarcating tissue outer profile, retain colour gamut transformation Resulting first binary map identifies exposure mask as the tissue regions under small scale after operation, when sliding window scans the limitting casing, The white pixel accounting that the tissue regions under same position under small scale identify exposure mask is calculated by coordinate transformation simultaneously, by pre- If threshold value removes empty background area that may be present in tissue, the tissue regions sample of different finenesses is obtained.Herein preferably In scheme, due to original image enormous size, identification exposure mask (being binary map) obtained in basic scheme is fine under large scale When spending lower binary map, and extracting sample, sliding window is slided in original image, and can be potentially encountered can not identify under large scale binary map Tiny white space.So through the above scheme, the accurate tissue regions sample being achieved that under more tiny scale This.

In one or more more specific embodiments, diseased region is generated respectively using two kinds of XML mark files and is covered Film M_objWith rejecting exposure mask M_exc, M_obj-M_excResult as diseased region identify exposure mask M_pos, M₀-(M_obj-M_exc) result Exposure mask M is identified as normal area_neg.Then with 256 × 256 sliding window, do not scan M overlappingly₁On limitting casing, while sit Mark is mapped to diseased region identification exposure mask M_posExposure mask M is identified with normal area_negOn.For positive sample, work as M_posWhite pixel in window Accounting is greater than 70%, and M₀White pixel accounting is greater than 40% in the upper same size area of same position, extracts sliding window inner tissue Block is as positive sample；For negative sample, work as M_negWhite pixel accounting is greater than 25%, and M in window_posUpper same position is same big For white pixel accounting less than 20%, extraction sliding window inner tissue block is negative sample in zonule.In this preferred embodiment, M1 is coarse grain More coarse tissue regions identify exposure mask, M under degree/large scale_posWith M_negFor identification more fine under fine granularity/small scale Exposure mask is converted by the coordinate before the two and determines corresponding relationship, wherein can reach first with the former to tissue regions Quickly positioning；Further, it is mapped on fine-grained mask location, white pixel accounting, energy is calculated on fine granularity exposure mask Enough achieve the effect that accurately identify tissue block.

Remaining embodiment of the present embodiment is same as the previously described embodiments, and all embodiments cited by the present embodiment are equal It can be combined implementation individually or with above-described embodiment 1 or 2 or 3, constitute different embodiments, be it as shown in Figure 5 One of.It is not repeated herein.

The above embodiment is only the preferred embodiment of the present invention, and the scope of protection of the present invention is not limited thereto, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims

1. a kind of medical image preprocess method, which comprises the following steps:

Label information semantic image: pre-read digital medical image and its label information, using distinguished number by tape format Text label information is converted to multilayer subseries mask image；

Interest extracted region: reading digital medical image, removes transparency channel and obtains image, extracts in current slice image Tissue regions profile, divides an image into tissue regions and background area；

More exposure mask sample classifications extract: utilizing multilayer subseries exposure mask generated, positive sample is extracted in the tissue regions Sheet and negative sample, and sample data information is packaged, form the knot that can be applied to neural network model training and prediction Structure data.

2. medical image preprocess method as described in claim 1, which is characterized in that the label information semantic imageization step In rapid:

Preferably, the transparency channel is the channel Alpha；

Preferably, the format of the text label is XML format；

Preferably, the distinguished number is closed polygon coordinate distinguished number.

3. medical image preprocess method as claimed in claim 2, which is characterized in that the label information semantic imageization step Suddenly it specifically includes:

The resolution information for loading the digital medical image is made according to the null matrix that the resolution information constructs same size For blank exposure mask of equal value；

The label information of XML format corresponding to the digital medical image is loaded, the label information of the XML format is remembered respectively The coordinate information of the closed polygon of several tab areas is recorded；

It determines whether pixel is in any one closed polygon using closed polygon coordinate distinguished number, and accordingly will The text label data of XML format are converted to mask image, and obtaining includes that label exposure mask is covered with the multilayer subseries for rejecting exposure mask Film.

4. medical image preprocess method as claimed in claim 3, which is characterized in that described in the closed polygon coordinate In distinguished number, the positional relationship of the pixel and closed polygon is obtained by following formula:

In_ploy=(E_1y-P_y)(E_1x-E_0x)-(E_1x-P_x)(E_1y-E_0y)

Wherein E₀、E₁Indicate two endpoints of the closed polygon a line, x, y indicate its transverse and longitudinal coordinate；P indicates to need The pixel of judgement.

5. medical image preprocess method as described in claim 1, which is characterized in that the label information semantic imageization step In rapid further include: the mark of the unified digital medical image and the corresponding mask image, and carried out by preloading Correctness checking.

6. medical image preprocess method as described in claim 1, which is characterized in that in the interest region extraction step:

Preferably, the method for reading the digital medical image is read using Openslide；

Preferably, described image is RGB image；

It preferably, further include using color gamut space transfer, corrosion and expansion, to extract tissue regions profile after obtaining image.

7. medical image preprocess method as claimed in claim 6, which is characterized in that the interest region extraction step is specific Include:

The digital medical image is read using Openslide, the channel Alpha is removed and obtains RGB image, utilize RGB to HSV color Space transfer, corrosion core and the processing for expanding core in domain, demarcate the tissue outer profile in the digital medical image as background Region, while the interest identification region ROI using the external limitting casing of tissue regions as the digital medical image.

8. medical image preprocess method as described in claim 1, which is characterized in that more exposure mask sample classifications extract step In rapid:

Preferably, sample data information is packaged is by being realized using TFrecords；

Preferably, the sample data information includes characteristic information, source-information, position coordinates, image information, the label of sample One of data are a variety of；

It is highly preferred that the characteristic information includes one of sample file name, sample store path or a variety of；The source letter Breath includes the filename of samples sources digital medical image；The position coordinates include the center on original image Level-0 coordinate system Coordinate；Described image information includes sample image data file；The label data includes sample label.

9. such as the described in any item medical image preprocess methods of claim 3 or 8, which is characterized in that more exposure mask samples Classification extraction step specifically includes:

The result that the label exposure mask and the rejecting exposure mask subtract each other identifies exposure mask as diseased region, is scanned using sliding window described emerging When the limitting casing of interesting identification region ROI, first identify that exposure mask removes the background area and rejecting exposure mask institute really according to tissue regions Then fixed rejecting region identifies that tissue regions are divided into positive and feminine gender by exposure mask according to diseased region, extracts positive sample respectively Sheet and negative sample.

10. medical image preprocess method as claimed in claim 9, which is characterized in that

During demarcating tissue outer profile, resulting first binary map is as under small scale after retaining colour gamut transition operation Tissue regions identify exposure mask, calculate small scale under same position when sliding window scans the limitting casing, while through coordinate transformation Under tissue regions identification exposure mask white pixel accounting, pass through preset threshold and remove empty background area that may be present in tissue Domain obtains tissue regions sample.