Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is explicitly described, it is clear that described embodiment is the present invention
A part of embodiment rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not having
Make the every other embodiment obtained under creative work premise, broadly fall into the scope of protection of the invention.
Referring to Fig. 1, the present embodiment discloses a kind of method for medical image Chinese version region detection, including:
S1, obtain medical image to be detected;
S2, described medical image is detected, obtain a series of connected region, and obtain based on single sample image
The two-value template in described medical image Chinese version region;
It should be noted that described medical image is carried out detection can use MSER algorithm, here is omitted.
Single sample image refers to comprise the medical image of text object, it is possible to fully demonstrate the text in medical image special
Levy.In a particular application, the described two-value template obtaining described medical image Chinese version region based on single sample image, can wrap
Include:
The local auto-adaptive calculating described single sample image R returns core KR, and for each connected region T, calculate
The local auto-adaptive of this connected region T returns core KT;
To described KRIt is normalized and obtains weight vector matrix WR, to described KTIt is normalized and is weighed
Value vector matrix WT;
To described WRUse PCA algorithm (Principal Component Analysis Method) to process, obtain main constituent, and retain described main constituent
Front d item constitute matrix PR, by described WRTo described PRProject, obtain characteristic vector F of described single sample image RR, will
Described WTTo described PRProject, obtain characteristic vector F of described connected region TT;
Wherein, described d is integer, specifically can carry out value as required, such as can be with value 4,5,6 etc., the present invention
This is not construed as limiting by embodiment.By described WRTo described PRThe function expression carrying out projecting isBy described WT
To described PRThe function expression carrying out projecting is
Calculate described characteristic vector FRWith FTBetween similarity, it is judged that described similarity measurement whether more than the first numerical value,
If more than described first numerical value, then the pixel value of corresponding connected region being set to 1, obtaining text filed, otherwise, then by correspondence
The pixel value of connected region be set to 0, obtain background area, using described text filed and background area as described two-value mould
Plate.
In a particular application, cosine similarity tolerance can be used to calculate it should be noted that calculate similarity, this
Place repeats no more.Different values can be had, if than medical science figure according to difference first numerical value of medical image Chinese version font
As Chinese version is the Song typeface, then the first numerical value can be 70%, naturally it is also possible to carries out left and right as required and adjusts, the present embodiment pair
This is not construed as limiting.
Further it will be understood that the process that the present embodiment calculates two-value template its essence is the single sample image of calculating
Similarity between characteristic vector and each connected region, sets up corresponding connected region size according to the size of similarity
Completely black or the whitest two-value template.
S3, the non-textual region utilizing described two-value template to filter out in described connected region obtain text candidates region,
And further filter out the non-textual region in described text candidates region based on character feature;
In the present embodiment, the non-textual region utilizing described two-value template to filter out in described connected region obtains text and waits
Favored area, specifically says and the pixel value of the background area in connected region is set to 0, obtains text candidates region, and it processed
The corresponding mathematic(al) representation of journey is Ican=Imask∩IMSER, wherein, IcanFor text candidates region, ImaskFor two-value template, IMSER
For connected region.
Specifically, described further filter out the non-textual region in described text candidates region based on character feature, permissible
Including:
For each text candidates region, calculate stroke width feature SW of text candidate region, and retain stroke
Width characteristics SW is less than the text candidates region of second value, and wherein, the computing formula of described stroke width feature SW is
In formula, std and E is stroke width standard deviation and the meansigma methods of text candidate region respectively;
Generally, generally remain consistent because of the stroke width of single character so that the stroke width in text candidates region
The ratio of degree standard deviation and meansigma methods is less, can be filtered in part non-textual region by means of this feature.Need explanation
Being that the value of second value is relevant with the stroke width of character, the stroke width of character is the biggest, then this value value increases accordingly,
Generally value can be 0.5-1,5.
The number of the non-zero pixels in calculated text candidates region, filters the number of non-zero pixels more than third value
With the text candidates region less than the 4th numerical value;
In a particular application, the value of third value and the 4th numerical value is relevant with the number of pixels in text candidates region, and one
In the case of as, value can be 0.9 times and 0.5 times of the number of pixels in text candidates region respectively.
The region area in the number of the non-zero pixels in calculated text candidates region and corresponding text candidates region
Ratio, filter ratio more than the 5th numerical value with less than the text candidates region of the 6th numerical value;
In a particular application, the value of the 5th numerical value and the 6th numerical value generally can be respectively 70% and 10%.
Calculated text candidates region length-width ratio, filter out length-width ratio more than the 7th numerical value and less than the 8th number
The text candidates region of value;
In a particular application, the value of the 7th numerical value and the 8th numerical value generally can be respectively 1.2 and 0.5.
For each the text candidates region obtained, utilize sciagraphy or connected region domain method to text candidate region
Carrying out cutting, obtain multiple fritter, and determine whether each fritter is character, calculating is the ratio shared by the fritter of character,
Filter the ratio text candidates region less than the 9th numerical value.
It should be noted that determine whether each fritter is that character can use prior art, here is omitted.The
Nine numerical value ordinary circumstances can value be 2/3.
S4, text filed it is polymerized obtain, obtains line of text.
In a particular application, described S4, may include that
For obtain text filed in each text filed A not being polymerized, choose other text not being polymerized
The text filed B that in region one is not polymerized, it is judged that whether these two text filed A and B can be polymerized, if can gather
Close, then these two text filed A and B be polymerized, obtain text filed C, then from other be not polymerized text filed
Choose a text filed D not being polymerized, it is judged that whether described text filed C and D can be polymerized, if can be polymerized, then will
These two text filed C and D are polymerized, repeat above-mentioned choose text filed, judge whether the step that can be polymerized and be polymerized
Rapid until be not polymerized text filed choose complete.
It should be noted that the process to the text filed polymerization being polymerized obtained, its essence is medical image
On condense together at the text of a piece.Certainly, because the text at a piece on medical image is to have a certain distance to close in fact
System, such as the most adjacent text, the abscissa of the rightmost pixel that front is text filed is text filed with back
The absolute value of difference of abscissa of leftmost pixel be no more than the size of 1 pixel, and front text filed with
Vertical dimension between back is text filed is no more than the size of 0.5 pixel, for another example, for neighbouring literary composition
This, the vertical dimension between top is text filed and the most text filed is no more than the size of 1 pixel, top text area
The abscissa of the leftmost pixel in territory with the absolute value of the difference of the abscissa of the most text filed leftmost pixel is
It is not more than the size of 0.5 pixel.Accordingly, two text filed judge processs whether can being polymerized can be built, with literary composition
As a example by A and B of one's respective area, it is judged that process is as follows:
S40, the vertical dimension calculated between these two text filed A and B, it is judged that whether described vertical dimension is less than the tenth
Numerical value, if less than described tenth numerical value, then performs step S41, otherwise, performs step S42;
The text filed pixel comprised that in S41, calculating text filed A and B of said two, the abscissa of pixel is bigger
The maximum of the text filed pixel comprised that minimum abscissa is less with the abscissa of pixel in text filed A and B of said two
The absolute value of the difference of abscissa, it is judged that whether described absolute value is less than the 11st numerical value, if less than described 11st numerical value, then
By less for bigger for the abscissa of the described pixel text filed abscissa being aggregated in described pixel text filed after;
S42, judge described vertical dimension whether less than the 11st numerical value, if less than described 11st numerical value, then calculating institute
State the minimum abscissa of a text filed pixel comprised in two text filed A and B text filed to comprise with another
The absolute value of the difference of the minimum abscissa of pixel, it is judged that whether described absolute value is less than the tenth numerical value, if less than the described tenth
Numerical value, then text filed be aggregated in said two text area by less for the vertical coordinate of pixel in text filed for said two A and B
In territory below bigger text filed of the vertical coordinate of pixel.
Furthermore, it is necessary to explanation, the transverse axis of the coordinate system at coordinate place involved in the embodiment of the present invention be along with
The arragement direction of character is parallel.In addition, it is necessary to explanation, for the tenth numerical value and the value of the 11st numerical value, Ke Yigen
Determining according to the typesetting of word in medical image, for general medical image, the tenth numerical value can be the big of 1 pixel with value
Little, the 11st numerical value can be with size that value is 0.5 pixel.
The method for medical image Chinese version region detection that the embodiment of the present invention provides, utilizes medical science figure to be detected
The non-textual region filtered out in connected region as the two-value template in Chinese version region obtains text candidates region, special based on character
Levy the non-textual region further filtered out in described text candidates region, and text filed be polymerized obtain, obtain literary composition
One's own profession, compared to prior art, the embodiment of the present invention needs not distinguish between textural characteristics, it is possible to increase text filed detection essence
Degree.
Referring to Fig. 2, the present embodiment discloses a kind of device for medical image Chinese version region detection, including:
Acquiring unit 1, for obtaining medical image to be detected;
Computing unit 2, for detecting described medical image, obtains a series of connected region, and based on list
Sample image obtains the two-value template in described medical image Chinese version region;
It should be noted that described medical image is carried out detection can use MSER algorithm, here is omitted.
In a particular application, described computing unit, may be used for:
The local auto-adaptive calculating described single sample image R returns core KR, and for each connected region T, calculate
The local auto-adaptive of this connected region T returns core KT;
To described KRIt is normalized and obtains weight vector matrix WR, to described KTIt is normalized and is weighed
Value vector matrix WT;
To described WRUse PCA algorithm to process, obtain main constituent, and the front d item retaining described main constituent constitutes square
Battle array PR, by described WRTo described PRProject, obtain characteristic vector F of described single sample image RR, by described WTTo described PR
Project, obtain characteristic vector F of described connected region TT, wherein, described d is integer;
Calculate described characteristic vector FRWith FTBetween similarity, it is judged that described similarity measurement whether more than the first numerical value,
If more than described first numerical value, then the pixel value of corresponding connected region being set to 1, obtaining text filed, otherwise, then by correspondence
The pixel value of connected region be set to 0, obtain background area, using described text filed and background area as described two-value mould
Plate.
Cosine similarity tolerance can be used to calculate it should be noted that calculate similarity, here is omitted.
Filter unit 3, obtain literary composition for the non-textual region utilizing described two-value template to filter out in described connected region
This candidate region, and further filter out the non-textual region in described text candidates region based on character feature;
In actual applications, described in filter unit, specifically may be used for:
For each text candidates region, calculate stroke width feature SW of text candidate region, and retain stroke
Width characteristics SW is less than the text candidates region of second value, and wherein, the computing formula of described stroke width feature SW is
In formula, std and E is stroke width standard deviation and the meansigma methods of text candidate region respectively;
The number of the non-zero pixels in calculated text candidates region, filters the number of non-zero pixels more than third value
With the text candidates region less than the 4th numerical value;
The region area in the number of the non-zero pixels in calculated text candidates region and corresponding text candidates region
Ratio, filter ratio more than the 5th numerical value with less than the text candidates region of the 6th numerical value;
Calculated text candidates region length-width ratio, filter out length-width ratio more than the 7th numerical value and less than the 8th number
The text candidates region of value;
For each the text candidates region obtained, utilize sciagraphy or connected region domain method to text candidate region
Carrying out cutting, obtain multiple fritter, and determine whether each fritter is character, calculating is the ratio shared by the fritter of character,
Filter the ratio text candidates region less than the 9th numerical value.
Polymerized unit 4, for text filed being polymerized obtain, obtains line of text.
In the present embodiment, described polymerized unit, specifically may be used for for obtain text filed in each is not gathered
The text filed A closed, choose that other is not polymerized text filed in a text filed B not being polymerized, it is judged that these are two years old
Whether individual text filed A and B can be polymerized, if can be polymerized, is then polymerized by these two text filed A and B, obtains text
Region C, then from other be not polymerized text filed choose a text filed D not being polymerized, it is judged that described text area
Whether territory C and D can be polymerized, if can be polymerized, is then polymerized by these two text filed C and D, repeats above-mentioned to choose text
Region, judge whether the step can be polymerized and be polymerized until be not polymerized text filed choose complete.
In a particular application, described polymerized unit, specifically may be used for:
Calculate the vertical dimension between these two text filed A and B, it is judged that whether described vertical dimension is less than the tenth number
Value, if less than described tenth numerical value, then calculates bigger text filed of the abscissa of pixel in text filed A and B of said two
In minimum abscissa A and B text filed with said two of the pixel comprised, less text filed of the abscissa of pixel comprises
The absolute value of difference of maximum abscissa of pixel, it is judged that whether described absolute value less than the 11st numerical value, if less than described
11st numerical value, then by text less for bigger for the abscissa of the described pixel text filed abscissa being aggregated in described pixel
After region;Or
If not less than described tenth numerical value, then judge whether described vertical dimension is less than the 11st numerical value, if less than described
11st numerical value, then calculate in text filed A and B of said two the minimum abscissa of a text filed pixel comprised with another
The absolute value of the difference of the minimum abscissa of one text filed pixel comprised, it is judged that whether described absolute value is less than the tenth number
Value, if less than described tenth numerical value, then by less for the vertical coordinate of pixel in text filed for said two A and B text filed poly-
It is combined in below bigger text filed of the vertical coordinate of the text filed middle pixel of said two.
The device for medical image Chinese version region detection that the embodiment of the present invention provides, utilizes medical science figure to be detected
The non-textual region filtered out in connected region as the two-value template in Chinese version region obtains text candidates region, special based on character
Levy the non-textual region further filtered out in described text candidates region, and text filed be polymerized obtain, obtain literary composition
One's own profession, compared to prior art, the embodiment of the present invention needs not distinguish between textural characteristics, it is possible to increase text filed detection essence
Degree.
The device for medical image Chinese version region detection of the present embodiment, may be used for performing side shown in earlier figures 1
The technical scheme of method embodiment, it is similar with technique effect that it realizes principle, and here is omitted.
Those skilled in the art are it should be appreciated that embodiments herein can be provided as method, system or computer program
Product.Therefore, the reality in terms of the application can use complete hardware embodiment, complete software implementation or combine software and hardware
Execute the form of example.And, the application can use at one or more computers wherein including computer usable program code
The upper computer program product implemented of usable storage medium (including but not limited to disk memory, CD-ROM, optical memory etc.)
The form of product.
The application is with reference to method, equipment (system) and the flow process of computer program according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that can the most first-class by computer program instructions flowchart and/or block diagram
Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided
Instruction arrives the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce
A raw machine so that the instruction performed by the processor of computer or other programmable data processing device is produced for real
The device of the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame now.
These computer program instructions may be alternatively stored in and computer or other programmable data processing device can be guided with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in this computer-readable memory produces and includes referring to
Make the manufacture of device, this command device realize at one flow process of flow chart or multiple flow process and/or one square frame of block diagram or
The function specified in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that at meter
Perform sequence of operations step on calculation machine or other programmable devices to produce computer implemented process, thus at computer or
The instruction performed on other programmable devices provides for realizing at one flow process of flow chart or multiple flow process and/or block diagram one
The step of the function specified in individual square frame or multiple square frame.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality
Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating
Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to
Comprising of nonexcludability, so that include that the process of a series of key element, method, article or equipment not only include that those are wanted
Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment
Intrinsic key element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that
Including process, method, article or the equipment of described key element there is also other identical element.Term " on ", D score etc. refers to
The orientation shown or position relationship, for based on orientation shown in the drawings or position relationship, are for only for ease of the description present invention and simplification
Describe rather than indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and behaviour
Make, be therefore not considered as limiting the invention.Unless otherwise clearly defined and limited, term " install ", " being connected ",
" connect " and should be interpreted broadly, connect for example, it may be fixing, it is also possible to be to removably connect, or be integrally connected;Can be
It is mechanically connected, it is also possible to be electrical connection;Can be to be joined directly together, it is also possible to be indirectly connected to by intermediary, can be two
The connection of element internal.For the ordinary skill in the art, can understand that above-mentioned term is at this as the case may be
Concrete meaning in invention.