A kind of method and mobile device thereof that hand-written notes is converted to writing text
Technical field
The present invention relates to image recognition technology, particularly relate to a kind of be applied to PDA(Personal Digital Assistant) or have the PDA function mobile device hand-written notes are converted to the method for writing text, and the mobile device of realizing this method.
Background technology
PDA(Personal Digital Assistant) is because its powerful application function and the characteristic that is easy to carry about with one have obtained application more and more widely at present.And why PDA can be widely used, and this is can help the characteristics of user record information closely-related with it.The user needs at any time that some is important transaction record to get off for the purpose of memorandum, at this moment the method that the information that just can use PDA to provide is imported and preserved, the information of preserving as e-file both was easy to carry, and be convenient to subsequent treatment again, so this function became the huge advantage of PDA.At present, PDA offers that the user carries out information input and the method for preserving mainly contains two kinds, first method is a handwriting recognition, user's input character one by one in the handwriting input district just, PDA discerns this character of user's input after the user imports a character immediately then, just convert writing text to, be shown to the output area then; But this identifying needs to consume the regular hour for computing PDA limited in one's ability, is not suitable for the situation that the user need promptly import a plurality of characters.For example in meeting or interview process, the user may need promptly a lot of Word message of input continuously, and at this moment very long identification waits for that process is insufferable to the user.
In order to address this problem, PDA provides second kind of method that information is imported and preserved to the user, and that takes down in short-hand application program exactly.By the shorthand application program, the user can carry out continual continuous input with stylus in the handwriting input district of PDA, character in the handwriting input district can be shown to the output area automatically, but it is different with last a kind of method, what be shown to the output area this moment still is user's original handwriting tracks, just hand-written notes.This method guaranteed the user can fast recording under bulk information, provide great convenience to the user.
But, because it in most of the cases all is unusual Useful Information concerning the user that the user needs fast continuously the information of input, therefore the user wishes and can remember edlin into to this writing pencil very much, for example increase, delete or revise some character or the like, this with regard to needs in this case PDA also can carry out literal identification to hand-written notes, just the formed image information of hand-written notes can be converted to the word content information that it comprises.But up to the present, also there is not method can on PDA, realize this conversion.If the user need edit these information, need to carry out again the handwriting recognition process of first method as described above so, it is very inconvenient that the user still can feel.Therefore, the user wishes strongly and can remember the whole identification of row into to whole writing pencil, to bring into play the function of shorthand application program better.
The technology of carrying out whole identification for character image has had the application of some moulding on other equipment such as computing machine, for example on computers with the matching used optical character identification of scanner (OCR) software.The identifying of this OCR software is divided into following process basically:
(1) scanning input characters image;
(2) image is carried out pre-service, comprise slant correction and filtering interfering noise etc.;
(3) image layout is analyzed and understood;
(4) to capable cutting of image and character segmentation;
(5) carry out the selection and the extraction of feature based on the individual character image;
(6) classify based on the pattern of individual character characteristics of image;
(7) give recognition result with the pattern that is classified;
(8) recognition result is edited, revised and handles.
In aforementioned calculation machine character image identifying, (2), the algorithm more complicated in (3) and (4) step, need take a large amount of computational resources, and because the hardware configuration of general PDA is lower, particularly the data-handling capacity of computing chip is lower, therefore can not finish these complicated algorithm on PDA, this also is why up to the present PDA can't be converted to the user reason of writing text by the hand-written notes of shorthand application program input.
Summary of the invention
In view of this, an object of the present invention is to provide the conversion method that a kind of advantages of simplicity and high efficiency that only needs to take the low computational effort resource is converted to hand-written notes writing text.
Another object of the present invention provides a kind of conversion equipment that the modular converter that uses said method is installed.
Above-mentioned purpose of the present invention is achieved by the following technical solutions:
A kind of PDA of being applied to or have the conversion method that hand-written notes is converted into writing text of the mobile device of PDA function comprises the steps:
A. the output area is divided into and has identical shaped and continuous subregion scope, and record is input to the character information in the handwriting input district and abandons the character information that is input to outside the handwriting input district, the information of each character of record is transformed in the continuous subregion of output area in proper order, preserves formed picture file;
B. extract character feature in the picture file;
C. calling handwriting recognition engine discerns this character feature and recognition result is kept in the buffering;
D. the writing text that will form through identification in will cushioning is shown on the display screen.
In the above-mentioned conversion method that hand-written notes is converted into writing text, the information of each character that will write down in step a is transformed in the process in the continuous subregion of output area in proper order, can compress processing to character information.This compression is handled to adopt and is taken out collimation method, and can be that 1 multiplication of voltage contracts.
In the above-mentioned conversion method that hand-written notes is converted into writing text, the range size of the continuous subregion of each output area can be the handwriting input district scope 1/4th.The continuous subregion of output area and the shape in handwriting input district can be rectangle or square.
In the above-mentioned conversion method that hand-written notes is converted into writing text, in steps d, may further include and set in advance common identification error contrast dictionary, in will cushioning in the writing text that identification forms is shown to process on the screen, the common identification error of system call contrast dictionary carries out automatic error correction to the recognition result in the buffering, and the recognition result that will carry out after the automatic error correction is shown on the screen.
In the above-mentioned conversion method that hand-written notes is converted into writing text, handwriting recognition engine can adopt the Chinese character hand-written recognition engine.
A kind of PDA or have the mobile device of PDA function, except CPU, internal memory and the display screen that is electrically connected to bus, also further comprise carrying out above-mentioned hand-written notes being converted to the modular converter of the conversion method of writing text, this modular converter is electrically connected to CPU, internal memory and display screen.
By technical scheme of the present invention as can be seen, because the OCR recognition methods of prior art relatively, the present invention is by abandoning the information of the character part that exceeds the handwriting input district, the size of each hand-written notes is fixed, thereby whole hand-written former notes have been divided into the independent hand-written notes of normal size, need not like this to carry out cutting according to some complicated cutting algorithms again, therefore omitted the pre-service that whole hand-written former notes is comprised slant correction and filtering interfering noise, the step that image layout is analyzed and understood and capable cutting of image and character segmentation etc. are needed a large amount of computational resources, simplified processing procedure, improve processing speed, thereby realized only need taking the goal of the invention that the low computational effort resource can simply be converted to hand-written notes writing text efficiently.
Simultaneously, use the present invention can on PDA, realize the batch identification of the hand-written former notes of entire chapter, improved processing speed, also significantly reduced and to have repeated to input to the inconvenience that the user brings.Making does not in this way increase extra handwriting recognition storehouse owing to be not required to be image recognition, thereby makes the user can make full use of existing computational resource, has avoided unnecessary extraneous expense.
Description of drawings
Fig. 1 shows according to the shape of handwriting input district on the PDA of the present invention and output area and position example;
Fig. 2 shows image acquisition flow process of the present invention;
Fig. 3 shows 16 * 16 pixel standard grids of dividing according to whole output area of the present invention;
Fig. 4 shows according to hand-written notes example of the present invention;
Fig. 5 shows according to the shorthand browsing file window example on the PDA of the present invention;
Fig. 6 show of the present invention to image carry out pre-service to output end product flow process;
Fig. 7 shows according to the identification on the PDA of the present invention and finishes the window example;
Fig. 8 is the schematic representation of apparatus that hand-written notes is converted to writing text according to of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in detail.
The present invention can be divided into four-stage with the method that hand-written notes are converted to writing text: image acquisition, image pre-service, character recognition and post-processed.Characteristics of the present invention are to carry out some special processing in data acquisition, thereby can carry out integral body identification to hand-written notes too on the lower PDA of hardware configuration.
In the shorthand program of present PDA is used, data acquisition roughly is under the situation of opening PDA shorthand application software, directly carry out the literal input by stylus in the handwriting input district, when the user after the handwriting input district has been write, the output area can be transferred in the character of having write, and preserves as picture file then.That's about the size of it for data acquisition of the present invention, but carried out some special processing in this process.To introduce the present invention below in detail is how to carry out these special processings.
As shown in Figure 1, the size that might as well suppose the handwriting input district of PDA is 32 * 32 pixels, and what adopt in the example of the present invention is two frame handwriting input districts, and the size in each handwriting input district all is 32 * 32 pixels.In actual conditions, the shape in handwriting input district also can make rectangle, and for example its size is 32 * 24 pixels.
Opened the user under the situation of shorthand application program, by stylus input character within the handwriting input district.Generally speaking, the size in handwriting input district is enough big for the general user, and the user can not exceed the scope in handwriting input district in the ordinary course of things when input character.If but the user has exceeded the size in handwriting input district because of carelessness when input character, in order to guarantee in subsequent process, not carry out complicated operations such as cutting, as a special processing of the present invention, will abandon the character information that exceeds the handwriting input district in the present invention, just do not consider.Write down the character information within the handwriting input district then.
After a character is finished in user's input, this character information of system log (SYSLOG), after finishing input character in the handwriting input district, use is taken out collimation method the character information that the handwriting input district imports is shown to the output area.The collimation method of taking out that the present invention uses is a kind of known technology, and it is a kind of compression algorithm that diminishes in fact, for example will meet the line of odd number to take out, and can reduce the capacity of storage like this under the situation that does not influence identification.For example a line segment may be expressed as 10 11 00 10 01 00 01 with scale-of-two, after taking out the collimation method compression, only keeps the locational value of even numbers, and promptly this line segment changes into 1101000 and representing.Like this, take out after the collimation method compression for 1 times of a hand-written notes process level and vertical both direction, its size becomes 1/4th of original input characters size, and the word after so just this can being simplified is placed in the grid of one 16 * 16 pixel.
As shown in Figure 3,, fix each CSD in the output area, and to fix its size be 16 * 16 pixels, within the output area, just have the subregion of continuously arranged a plurality of 16 * 16 pixels like this as another special processing of the present invention.When the user as shown in Figure 4 behind character of the every input in handwriting input district, through taking out collimation method it is compressed to 1/4th of original size, be presented at then in the standard subregion of 16 * 16 pixels of output area.Each word of importing in the handwriting input district is carried out same processing, all import up to all words and finish and all be placed in the subregion of output area 16 * 16 pixels in proper order.After the literal of user's input takes a panel or finishes whole input process, the hand-written notes of output area are preserved as picture file.
By above-mentioned two special processings, the formed picture file of the present invention is different with the formed picture file of prior art, it can be divided into the continuum character picture of standard at an easy rate, thereby omitted in general identifying, must carry out image is comprised the pre-service of slant correction and filtering interfering noise etc., image layout is analyzed and understood and to steps such as capable cutting of image and character segmentations, owing to do not need to take again the calculating that a large amount of resources is carried out these complexity, therefore can finish at an easy rate from pre-service to the subsequent process that shows final recognition result.
Next the writing pencil of preserving is remembered into capable pre-service in data acquisition.As shown in Figure 5, if the user has selected to open the identification page, then carry out step as shown in Figure 6.At first read the picture file of being preserved, carry out cutting then and handle.Because each the hand-written notes in the output area all are positioned at fixing screen position, and its size also is 16 * 16 pixels of standard, therefore can omit complicated inclination rectification, filtering interfering noise, to image layout analyze and understand, to processes such as capable cutting of image and character segmentations.
Actual image pre-service is a known technology, for example can determine that it is black to each pixel of each hand-written notes or for white, and represent with 1 and 0 respectively, can obtain a binary sequence after like this each pixel being analyzed, with this binary sequence basis that identification is handled as successive character.
Character recognition process also is a known technology, for a character information that whenever reads in the preprocessing process, and binary sequence just, the system call handwriting recognition engine carries out identification, and the ISN with this Chinese character is kept in the buffer memory simultaneously.Each character for the output area repeats this process, can intactly obtain all writing text information of whole hand-written notes.
In subsequent processes, in order to improve the recognition effect that the user imports Chinese character, system can also provide the function of automatic error correction.And just because of the present invention with all input contents unify identification, therefore just make error correction become possibility, in the handwriting input of prior art, because the literal to input is independent identification, internal logical relationship between the literal of being imported is isolated out, so can't judge and whether import mistake.And the method for the application of the invention, what store in the buffer memory is a complete sentence, so just can remove some manifest error.Also have manyly as for the algorithm of error correction, for example can use existing artificial intelligence (AI) identification error, but these methods are more loaded down with trivial details.Therefore can set up a common speech identification error contrast dictionary in the present invention.For example " how do you do in the order sky " will be made into " today, how do you do " automatically.After the entire chapter document recognition was finished, system carried out a phrase error correction again in buffer memory, and whether search has " order sky " such vocabulary, will replace with " today " if any.After system's error correction is finished, the writing text in the buffer memory is shown to the output area as shown in Figure 7, can select to delete former hand-written notes then.
Above-mentionedly describe the method that hand-written notes is converted to writing text of the present invention in detail, in addition, the present invention also provides a kind of device that hand-written notes is converted to writing text, it is on the basis of existing PDA, increase by one and be used to use method of the present invention to carry out the hand-written modular converter of taking down notes the writing text conversion, its synoptic diagram as shown in Figure 8.As seen in Figure 8, this modular converter and CPU, touch-screen and internal memory are electrically connected, and under their help, comprise steps such as image acquisition, image pre-service, character recognition and post-processed by what carry out above-mentioned conversion method, thereby realize hand-written notes are converted to the purpose of writing text.
In data acquisition of the present invention, except taking out collimation method, also can use other compression algorithm, as long as can reduce memory capacity, do not influence literal identification again and get final product.And the present invention also is not limited to PDA, and for the mobile device or the handheld device of any resource-constrained, as long as they have the shorthand function, the present invention can be suitable for.Therefore being appreciated that above-mentioned only is detailed introduction to one embodiment of the present of invention, not in order to restriction protection scope of the present invention.