Invention content
One of present invention is designed to provide a kind of Symbol Recognition and system for medical report list, uses
In the recognition accuracy for providing OCR system, medical report list analyzing efficiency is improved.
In a first aspect, an embodiment of the present invention provides a kind of Symbol Recognitions for medical report list, including:
Grader is trained according to the latent structure training sample of distinct symbols;
From distinct symbols are acquired on foreground image in medical report free hand drawing piece, the feature of distinct symbols is obtained to construct symbol
Template;
Using grader identification and detection symbols template, for obtaining the abnormal index in medical report list and position
It sets.
Optionally, the construction training sample training grader uses logistic regression method, includes the following steps:
Construct training sample;
The training sample is subjected to size normalization, obtains the characteristics of image of same dimension;
Calculate the image feature value of training sample;
The grader is trained according to described image characteristic value, obtains the classifier parameters.
Optionally, before acquiring distinct symbols on foreground image in medical report free hand drawing piece, this method further includes pre- place
Step is managed, is specifically included:
The apex coordinate that foreground image is obtained using Hough transformation method, determines the dimensional information of the foreground image;
Using perspective transform method to the foreground image into line tilt correction, to obtain the foreground picture of orthographic projection
Picture;
The foreground image is divided into several regions using local thresholding method, binaryzation is carried out to each region.
Optionally, distinct symbols are acquired on foreground image in the free hand drawing piece from medical report, further includes that determination is often composed a piece of writing
The step of this height, including:
Read it is text filed in the foreground image, to the text carry out expansion and corrosion obtain connected domain;
When the floor projection energy of the connected domain is more than energy preset value, the maximum of the position of the floor projection is poor
Value is often this height of style of writing.
Optionally, when the symbol acquired on foreground image is arrow, include the following steps:
According to the latent structure vertical line template of vertical line;
According to all separable vertical lines in foreground image described in the vertical line Template Location;
According to the vertical line structure of transvers plate arrow template;
Training sample is constructed respectively, and training study obtains classifier parameters;
In the position detection arrow locations of separable vertical line.
Optionally, the latent structure vertical line template according to vertical line, including:
It is and big with the matching degree of symbol guide corresponding position when the pixel value of the position up and down of continuous line segment is 0
When predetermined threshold value, it is separable vertical line to mark continuous line segment;
The foreground image is traversed, separable vertical line all in foreground image is positioned.
Optionally, according to the vertical line structure of transvers plate arrow template, including:
Position where each separable vertical line carries out horizontal throwing in region identical with vertical line template size
Shadow;
Calculated level projection energy value is more than the maximum difference of the position of energy preset value, obtains the line width of vertical line.
Optionally, the function expression of the grader is:
Wherein, P (t) is classification results, and t is the weighted sum of feature vector;
The dimension that N is characterized, wiFor the weight coefficient of i-th dimension feature, xiFor the characteristic value of i-th dimension feature.
Second aspect, the embodiment of the present invention additionally provide a kind of symbol recognition system for medical report list, including:
Grader generation module, for training grader according to the latent structure training sample of distinct symbols;
Symbol guide constructing module, for from distinct symbols are acquired in medical report free hand drawing piece on foreground image, obtaining not
Feature with symbol is to construct symbol guide;
Template matches module, using grader identification and detection symbols template, for obtaining in medical report list
Abnormal index and position.
Compared with prior art, the present invention not only compensates for traditional OCR deficiencies low to additional character discrimination, is also adapted to
The typesetting format of text complicates and diversification;This recognition methods is applied to the solution of the medical report list of medical domain simultaneously
In analysis, using arrow upwardly or downwardly, to describe a certain index in medicine laboratory test report, there are higher or relatively low feelings
Condition characterizes abnormal index with this, can realize fast resolving medical report list, has high application value and wide hair
Exhibition foreground.
Specific implementation mode
To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real
Mode is applied the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application
Feature in example and embodiment can be combined with each other.
Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also
To be implemented different from other modes described here using other, therefore, protection scope of the present invention is not by described below
Specific embodiment limitation.
On the one hand, the present invention proposes a kind of Symbol Recognition for medical report list, as shown in Figure 1, including:
S10, grader is trained according to the latent structure training sample of distinct symbols;
S20, from distinct symbols are acquired on foreground image in medical report free hand drawing piece, obtain the feature of distinct symbols to construct
Symbol guide;
S30, using grader identification and detection symbols template, for obtain abnormal index in medical report list and
Position.
It will be appreciated that traditional Chinese medicine report picture of the present invention refers to that medical report list is placed the figure somewhere shot
Piece;And foreground image then refers to imaging of the medical report singly in picture.
For the problem low to additional character discrimination in text of OCR system in the prior art, the embodiment of the present invention is carried
The Symbol Recognition of confession trains different graders according to distinct symbols, and symbol is then acquired from medical report list establishes
Template carries out template matches to grader.Not only compensate for traditional OCR deficiencies low to additional character discrimination, while can be with
It is applied in the parsing of medical report list, to carry out fast resolving medical report list, is conducive to improve analyzing efficiency.
Under normal circumstances, the image obtained by scanner is all orthographic projection, and image does not have angular deviation, is conducive to image
In Text region.And it is limited by various conditions when camera acquisition image and interference, the picture of captured object can be sent out
The shape that changes (such as near big and far smaller) needs to use by certain distortion correction processing, when ensureing to image recognition not
There is error.As shown in Figure 1, the picture of the medical report list of video camera shooting, which occurs apparent deformation, it is therefore desirable to
Image is pre-processed.
Optionally, before acquiring distinct symbols on foreground image in medical report free hand drawing piece, this method further includes pre- place
Step is managed, is specifically included:
The apex coordinate that foreground image is obtained using Hough transformation method, determines the dimensional information of the foreground image;
The foreground image is divided into several regions using local thresholding method, binaryzation is carried out to each region.
First, the apex coordinate for obtaining foreground image using Hough transformation method is introduced, determines the ruler of the foreground image
The step of spending information.
The present invention determines foreground image using the foreground image edge of Hough transformation method detection medical report free hand drawing piece
Size.According to the duality of point and line, will input the given curve representation form of picture space becomes the point of parameter space, to
Convert the test problems for inputting given curve in picture to the spike problem found in parameter space.Global feature will be detected
Detection local characteristics are converted into, in this way by obtaining the edge line equation of foreground image and the intersection point of straight line, you can obtain
The apex coordinate of foreground image and the dimensional information of foreground image.Then, then respectively using the maximum value of width and height as
Width value and height value after foreground image correction.
Secondly, it introduces using perspective transform method to the foreground image into line tilt correction, to obtain the institute of orthographic projection
The step of stating foreground image.
After the size for determining foreground image, to foreground image into line tilt correction.In the present invention, foreground image is mapped
Onto shooting object plane, it is equivalent to video camera perpendicular to medical report list, to obtain ideal image shape, and not
Lose the information that foreground image includes.
In practical application, those skilled in the art are the correction realized to tilted image, can also use other pretreatments
Method solves above-mentioned technical problem, realizes that basic effect, the present invention are not construed as limiting.
Finally, it introduces and the foreground image is divided into several regions using local thresholding method, each region is carried out
Binaryzation.
Since foreground image includes 256 brightness degrees, to reduce the complexity calculated, the identification effect of additional character is improved
Rate.The present invention carries out binary conversion treatment to the foreground image.
When image binaryzation is handled, the present invention uses local thresholding method.The foreground image is divided into several regions, it is right
A threshold value progress binaryzation is arranged in each region can be in binaryzation foreground image to obtain the foreground image of binaryzation
In preferably distinguish target and background.
After being pre-processed to Fig. 2, the foreground image after being corrected, referring to Fig. 3.
For medicine laboratory test report image of the present invention using video camera shooting, the arrow wherein occurred is identified, and obtain it
Location information, more accurately and quickly to determine the abnormal index item of patient.
1) template for constructing vertical line, positions all separable vertical lines in foreground image.According to the feature of vertical line, i.e. vertical line
The upper surface of, the pixel value below, in the preset range on the left side and the right side be 0, construct vertical line template.
According to the vertical line template constructed, in the both horizontally and vertically upper using different step-length progress time of foreground image
It goes through, orients all separable vertical lines in foreground image.For example, in the embodiment of the present invention, initialization vertical line template
Height and width are respectively 40,3, and the horizontal direction moving step length of window is 4, vertical direction moving step length is 2, sliding window
Size is the width of vertical line template, highly adds the moving step length on both horizontally and vertically respectively again.Then, vertical line mould is utilized
Plate and moving step length scan foreground image, and according to the feature of separable vertical line, (i.e. line segment is continuous, and vertical line is up and down
A certain range of pixel value is 0), when the vertical line matching degree of line segment and corresponding position in the vertical line template in foreground image
When more than predetermined threshold value, then the line segment is labeled as vertical line.In practical application, the size of predetermined threshold value can pass through test of many times
To determine.After having traversed the foreground image, the coordinate of the position of all separable vertical lines can be obtained, and is deposited
Enter and is used for subsequent identification in text file.
2) the high line width with vertical line of row of text is determined.To text filed carry out basic handling in foreground image, for example, it is swollen
Swollen and corrosion, obtains connected domain, and often this height of style of writing is determined according to the width of connected domain.And this height of often being composed a piece of writing with this is made
To initialize the size of arrow, the i.e. height of arrow.It is constantly updated and is corrected in detection process.
To all separable vertical lines oriented, the embodiment of the present invention calculates the line width of vertical line using difference method.
The template of construction detection arrow first.Arrow is made of vertical line and symmetrical oblique line, therefore on vertical line template basis
Upper construction arrow template, the i.e. abscissa of vertical line take 3,6 step-lengths to left and right respectively in the horizontal direction, the ordinate of vertical line to
On take 2 step-lengths.Then, it takes in region identical with vertical line template size, horizontal throwing is carried out in the position where each vertical line
Shadow.Position of the energy of floor projection more than energy preset value when is found out, the difference of the position is then calculated.When obtaining the position
Maximum difference when, be exactly the line width of arrow vertical line.It is of course also possible to use same method obtains the height of vertical line.
Such as:Floor projection energy is more than to position when energy preset value, 1 is set to, is otherwise 0, one can be obtained
The array of multiline text:
A=[0,0,0,1,1,1,1,1,0,0];
Data step-by-step in array A is negated, array B is obtained:
B=[1,1,1,0,0,0,0,0,1,1];
That is there are 1 position it is respectively los=1 in array B, 2,3,8,9;
And seek estimated difference in position:
C=diff (los)=(1,1,5,1)
D=max (c)=5.
The line width that can be obtained this vertical line is 5.
3) training arrow template.According to the feature of arrow, the embodiment of the present invention constructs training sample.By constantly training
Study, obtains the parameter of grader, for detecting the arrow in foreground image.Grader is continued in the position where vertical line
Judge whether to meet feature possessed by arrow, if being identified as arrow, writes down position at this time, and think medical report list
The index of this journey occurs abnormal.
After determining arrow line width and line height, according to the feature of arrow, detection arrow is divided into upper and lower two parts.
Wherein, top half includes symmetrical oblique line, while the pixel value around the lower half portion vertical line of arrow outside pre-determined distance is all
0.For example, in top half, detected in the embodiment of the present invention in the range of each one times of line width of arranged on left and right sides of vertical line whether
Including including oblique line;In lower half portion, pixel value is 0 within the scope of each one times of the line width in both sides of oblique line.
Therefore, the present invention distinguishes separable vertical line and arrow by the method for training and determining grader.According to arrow
The component characteristic of head, constructs training sample respectively, is learnt by constantly training, acquires classifier parameters, is used for the inspection of arrow
Survey identification division.
Optionally, grader is obtained using logistic regression method, including:
Construct training sample;
The training sample is subjected to size normalization, obtains the characteristics of image of same dimension;
Calculate the image feature value of training sample;
The grader is trained according to image feature value, obtains the classifier parameters.
The preferred Logistic logistic regression classifiers of the present invention come train vertical line both sides whether contain oblique line and blank,
Logistic logistic regression classifiers have when realizing simple, classification calculation amount is very small, speed quickly, storage resource it is low etc. excellent
Point.Logistic is a kind of linear classifier, and function expression is:
Wherein, P (t) is classification results, and t is the weighted sum of feature vector;
The dimension that N is characterized, wiFor the weight coefficient of i-th dimension feature, xiFor the characteristic value of i-th dimension feature.
In the embodiment of the present invention, grader is obtained using logistic regression method (Logistic), including:
(1) training sample set is constructed.A large amount of arrow sample image is collected for training, in training process, selects arrow
It is only needed due to symmetrical using the line width values of certain multiple as wide and high construction square template in top half on the left of vertical line
Flip horizontal can be obtained right side top half template.Same method, construction arrow lower half portion, the template of arranged on left and right sides carry
Take the above tetrameric characteristics of image for training grader.
(2) image of all samples of training sample set is subjected to size normalization, the figure of same dimension can be obtained at this time
As feature, it to be used for the training of grader.
(3) characteristic value of training sample is calculated.By after the binaryzation of foreground image pixel value and its transformation be used as image
Feature, representation is simple, and computation complexity is low.
(4) training Logistic graders obtain classifier parameters.
(5) trained classifier parameters, the position for arrow to be marked according to vertical line coordinate are utilized.Such as Fig. 4 institutes
Show, has marked the position of arrow in the enlarged drawing of dotted line frame 1 with dotted line frame 2.This medicine is can be obtained by the position of arrow
Abnormal index in report.As shown in figure 5, arrow has marked an abnormal index:The 20th index in medical report list
Monocyte absolute value is higher.
The embodiment of the present invention is only illustrated Symbol Recognition in conjunction with arrow.In practical application, the present invention provides
Symbol Recognition be also applied in the identification of other symbols, such as plus sige, minus sign, more than or equal to number, be less than or equal to
Number, fullstop, exclamation, percentage sign, Roman number and asterisk etc..It is considered as being right according to the feature of distinct symbols, such as fullstop
Two semicircles claimed, exclamation can be divided into vertical line and point two parts, and percentage sign can be divided into oblique line and be located at the oblique line both sides
Circle etc., the Symbol Recognition that the embodiment of the present invention can be utilized to be provided be identified in detection medical report list
Other symbols, details are not described herein.
On the other hand, the embodiment of the present invention additionally provides a kind of symbol recognition system for medical report list, such as Fig. 6 institutes
Show, including:
Grader generation module, for obtaining grader according to the latent structure training sample of distinct symbols;
Symbol guide constructing module obtains different symbols from distinct symbols are acquired on foreground image in medical report free hand drawing piece
Number feature to construct symbol guide;
Template matches module, using grader identification and detection symbols template, for obtaining in medical report list
Abnormal index and position.
Based on same inventive concept, a kind of Symbol recognition system for medical report list provided in an embodiment of the present invention
System, the symbol recognition system can solve same technical problem due to being realized using above-mentioned Symbol Recognition, and
Identical technique effect is obtained, herein no longer in detail.
In the description of the present invention it should be noted that the orientation or positional relationship of the instructions such as term "upper", "lower" is base
It in orientation or positional relationship shown in the drawings, is merely for convenience of description of the present invention and simplification of the description, rather than indicates or imply
Signified device or element must have a particular orientation, with specific azimuth configuration and operation, therefore should not be understood as to this
The limitation of invention.
Although the embodiments of the invention are described in conjunction with the attached drawings, but those skilled in the art can not depart from this hair
Various modifications and variations are made in the case of bright spirit and scope, such modifications and variations are each fallen within by appended claims
Within limited range.