CN106650686A - Online hand-written chemical symbol identification method based on Hidden Markov model - Google Patents

Online hand-written chemical symbol identification method based on Hidden Markov model Download PDF

Info

Publication number
CN106650686A
CN106650686A CN201611251498.4A CN201611251498A CN106650686A CN 106650686 A CN106650686 A CN 106650686A CN 201611251498 A CN201611251498 A CN 201611251498A CN 106650686 A CN106650686 A CN 106650686A
Authority
CN
China
Prior art keywords
symbol
point
sample
feature
chemical symbol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611251498.4A
Other languages
Chinese (zh)
Inventor
杨巨峰
王恺
许静
陈丽怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201611251498.4A priority Critical patent/CN106650686A/en
Publication of CN106650686A publication Critical patent/CN106650686A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • G06V30/2268Character recognition characterised by the type of writing of cursive writing using stroke segmentation
    • G06V30/2276Character recognition characterised by the type of writing of cursive writing using stroke segmentation with probabilistic networks, e.g. hidden Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

An online hand-written chemical symbol identification method based on the Hidden Markov model solves the problem of the online identification of chemical symbol written by any one writer on any one device. The method constructs a processing framework of identification of the online written chemical symbol, and employs a hierachical processing and step-by-step optimization strategy. The method based on a support vector machine selects grid features and peripheral contour features to distinguish organic ring symbols and non-ring symbols, and the classification error rate is controlled under 0.2%. The method based on the Hidden Markov model to identify concrete symbols, and the accuracy is more than 90%. In order to improve the precision design, a set of preprocessing flow is designed, and the post-processing measures such as candidate result reliability, chemical symbol adjacent matrixes and the atom element conservation detection are employed. The method has the universal meaning, systematicness and completeness through the experimental proof of the data source such as the input of a Tablet PC, a digital panel and a mouse simulation pen, and can be used for the online hand-written chemical symbol identification field.

Description

A kind of hand script Chinese input equipment chemical symbol recognition methods based on HMM
【Technical field】
The invention belongs to pattern-recognition and field of human-computer interaction, and in particular to a kind of based on the online of HMM Hand-written chemical symbol recognition methods.
【Background technology】
Chemical formula (chemical equation) is the formula for representing chemical reaction rule, is that chemistry and chemical activity are most important The form of expression.Chemical formula is a kind of natural science field application expression formula widely as mathematical formulae.With letter The development of breathization society, increasing chemistry related work is transferred on electronic equipment and completes.But how fast and efficiently Chemical knowledge is particularly into chemical formula it is entered into computer and remains a difficult problem.At present, chemical formula it is main or according to Typing is carried out by professional software, the total shortcoming of this kind of software includes that interface complexity, inefficiency, cumbersome, equipment are relied on. Due to there is disadvantages mentioned above, traditional seriously constrains chemical knowledge particularly chemistry public affairs based on the typing mode of mouse, keyboard The level of digital of formula, some normal applications are restricted.Therefore, chemical information typing side new, rapidly and efficiently is explored Formula becomes the task of top priority.Compared with traditional approach, the hand-written typing based on electronic pen because its operate nature, simple interface the characteristics of More meet this requirement.
The identification problem of hand script Chinese input equipment chemical symbol is analyzed independently of chemical formula, understands and applied.Its main task bag Include:High accuracy identification letter, numeral, operator, all kinds of chemical symbols such as organic ring, at the same transmit useful layout information and when Between sequence to subsequent operation.Therefore propose a set of there is two aspects than more complete hand script Chinese input equipment chemical symbol recognition methods framework Meaning:One is laid the first stone to design and Implement an independent chemical symbol identifier, and the identifier can be used as bottom Engine is supplied to similar research to use.Two is to support formula level, the syntactic analysis of material level and chemistry with the result of Real time identification Rule verification.
In whole hand script Chinese input equipment chemical formula process problem, Symbol recognition plays central role, realizes user input Digital ink " translation " be reusable semiochemical function.Carrying out the difficult point of correlative study includes:(1) chemical symbol set It is larger, and analog structure therein is a lot;(2) size of symbol, position imply some chemical senses, identify symbol After also need to analyze and transmit these implicit informations;(3) handwriting samples deformation is serious, and stroke quality is uneven.Therefore, accurately Recognize that hand-written chemical symbol is a challenging job.
【The content of the invention】
The present invention seeks to solve the problems, such as the identification of hand script Chinese input equipment chemical symbol, there is provided one kind is based on HMM Hand script Chinese input equipment chemical symbol recognition methods, correctly to recognize the hand-written chemical symbol that different user is input into by distinct device.
The present invention takes the strategy solution of classification process, hierarchical optimization【Background technology】In the difficult point mentioned.First with Whole problem is split as global characteristics into (acyclic) identification of inorganic symbol and organic symbol (ring) recognizes two subproblems, in little Ji The fine local feature that extract again of closing further is classified.The thought that this classification is processed considerably reduces the performance of Matching Model Consume, improve the availability of identification process.Additionally provide the ancillary technique such as pretreatment and post processing to recognize main body Support, the strategy of two-stage optimizing also ensure that the reliability of recognition result.
To realize that the object of the invention needs to consider from following several respects emphatically:
1st, the ability of incoming symbol is accurately identified.During any handwriting input of people is converted into numeric results, Make a mistake unavoidably.Used as the most basic unit of input, the recognition accuracy of hand-written symbol is the important of evaluation system availability Index.For different application systems, although standard is different, all there is the critical interval of a discrimination.When single symbol When discrimination is not up to the boundary value, whole section of input all will be unreadable.For the purpose of the present invention, chemical symbol discrimination is improved There are two kinds of strategies:One is to try to choose effective feature, evades the interference for writing Zona transformans;Two is to provide one after recognition As a result candidate queue, using chemical rule believable result is selected.In a word, hand script Chinese input equipment chemical symbol must have higher knowledge Rate could not meet the requirement of practical application.
2nd, the unrelated ability of user.For chemical formula processing system, same set of kernel should recognize difference The input of writer.Even if the writing style of formula is inconsistent, such as order of strokes differs, symbol size has different, and system also should This is correctly recognized.On the one hand, increasing training sample scale can cover more special circumstances.On the other hand, introduce appropriate Pretreatment mechanism can also to greatest extent eliminate the difference between repeatedly writing, unified symbol specification.These measures are all helped Possesses the unrelated disposal ability of user in system is made.
Technical solution of the present invention
The present invention is considering the above on the basis of some, it is proposed that a kind of online hand based on HMM Chemical symbol recognition methods is write, the method is comprised the following steps:
1st, the set of hand script Chinese input equipment chemical symbol and classification are defined and standard is gathered, the symbol to collecting carries out pre- place Reason;
2nd, ring, acyclic rough sort feature extracting method are proposed for the hand script Chinese input equipment chemical symbol set that the 1st step is generated;
3rd, the rough segmentation category feature for extracting for the 2nd step, from SVMs ring, two acyclic class rough segmentations are carried out Class;
4th, in rough sort result, it is based on the local feature of point sequence to ring, acyclic symbol extraction respectively;
5th, final classification and the identification of hand script Chinese input equipment chemical symbol are realized using the method for HMM.
The concrete handling process of the present invention is as follows:
1st, the set of hand script Chinese input equipment chemical symbol and classification are defined and standard is gathered, the symbol to collecting is pre-processed
The set of the hand script Chinese input equipment chemical symbol of definition and classification include 10 Arabic numerals, 24 capitalizations, 20 Individual lowercase, 10 chemical operation symbols and 38 organic ring symbols, choose here following 102 chemical symbols right as processing As.
The collection standard of definition includes:Sample code name naming rule, collection environment, normalized written degree and writing time. The symbol sample for so collecting meets multi-source heterogeneous requirement, possesses representativeness.It is as follows to the naming rule of symbol label:Compile Code SXY represents the symbol that abscissa is X, ordinate is Y in S areas.Such as 000 represents symbol ' 0 ', 212 represents symbol109 Represent ' chemical bond ', form can be the line segment of any direction, random length.
Sample code name (label) is worked out according to information block code mode, is the character string that length is 15, wherein from left to right 1 representative sample gathers environment:T represents panel computer, H and represents Digitizing plate, P and represents customary personal computer;2nd representative sample Normalized written degree:S represents the very symbol of standard (Standard), N and represents symbol, F generations that normal (Normal) writes The table symbol that freely (Freestyle) writes;3-6 positions representative sample writer's numbering;7-9 positions represent chemical symbol volume Code, wherein the 7th is interval belonging to symbol:0th area storage numeral and letter, 1 area's deposit operation symbol and chemical bond, the storage of 2nd area Organic ring structure;8-9 positions represent position of the symbol in affiliated interval (a 10*10 list);10-15 positions represent and press The sample number that writer distinguishes.
Symbol to collecting carries out pretreatment and refers to, the initial symbol sample to collecting carries out that a series of to meet its special The pretreatment operation of point, including:Remove and repeat point, interpolation benefit point, detect sharp point, remove hook and smooth, so as to optimize its quality, make Symbol sample meets the needs of subsequent treatment.Flow process is as shown in Figure 1.The first step of pretreatment is to remove to repeat point, using minimum Apart from filtration method, with two sampled point Pi(xi, yi) and Pj(xj, yj) as a example by, if their Euclidean distance D is less than certain Threshold value, then only retain one, another removal.It is shown below:
Second step is that interpolation mends point, it is assumed that given stroke sampling point sequence is S={ P1(x1, y1) ..., PN(xN, yN)), meter Calculate average distance d between point as follows:
Wherein, D (i, i+1) is point Pi(xi, yi) and Pi+1(xi+1, yi+1) between Euclidean distance.Setting d=Len* 70% is the largest interval that the point-to-point transmission of arbitrary continuation in stroke allows to occur, if then distance between two points are more than d, under employing Face formula calculates the coordinate newly added some points therebetween:
3rd step determines the position of sharp point using 5 available points.Using its angle calcu-lation Φ two-by-twoA=| Φ123- Φ4|, if ΦAMore than threshold value Phi set in advanceT=60 °, then think 3 to be a little the sharp point to be looked for.
Finally, the sharp point for finding before order is control point, and to other points average smooth algorithm is used.I.e. with tie point 1 with Point between point 3 on straight line replaces original point 2, replaces a little 3 with point 2 and the point put between 4 on straight line in the same manner, until running into most Latter point.
2nd, ring, acyclic rough sort feature extracting method are proposed for the hand script Chinese input equipment chemical symbol set that the 1st step is generated
Described ring, acyclic rough sort feature extracting method is including two kinds:
The first be symbol boundary rectangle is divided into 4 × 4 uniform grid, according to from top to bottom, by left-to-right suitable Sequence calculates successively the number of the coordinate points in per sub-regions, Jing after normalized using 16 dimensional vectors for obtaining as rough segmentation The grid search-engine of class.
The grid search-engine and circumference feature for extracting chemical symbol respectively is as shown in Figure 2.The external of sample is obtained first The region that net boundary in rectangle, i.e. Fig. 2 is surrounded.Then according to sample is divided into the big sub-district such as m × m by boundary rectangle Domain, according to calculating successively positioned at the number per sub-regions midpoint from top to bottom, by left-to-right order, and is designated as N1, N2,...,Nm*m.These raw statistical datas are normalized, the sampling number Ni in each lattice are calculated and is accounted for symbol and adopt Total sampleRatio, correlation formula is as follows:
Make m take 4, the grid search-engine for obtaining totally 16 dimension, the numerical value per dimensional feature be all distributed in [0,1] it is interval in and all 16 Individual numerical value and equal to 1.
Second method extracts circumference feature, left, down, right, upper 4 side sequentially from sample image respectively to Right, upper, left, lower 4 scanning directions, until scan line runs into stroke or axis, the distance for writing down respective scan line process is The circumference feature of the sample.Specific circuit at equal intervals in picks symbols is scanned.As shown in Fig. 2 external from symbol The left margin of rectangle starts scanning profile from left to right, the scanning process and feature calculation method and left margin on other three borders It is similar.Each direction arranges 5 scan lines, and this kind of feature amounts to 20 dimensions.
3rd, the rough segmentation category feature extracted for the 2nd step, from SVMs ring, two acyclic class rough sorts are carried out
Described ring, two acyclic class rough sorts are from the SVMs realization based on Product function in radial direction base, parameter It is combined as:Penalty factor=211, parameter γ=2 of Radial basis kernel function.SVMs is the important set of Statistical Learning Theory Into part, compare with conventional method, it has solid theoretical foundation and preferable Generalization Ability, outstanding Nonlinear Processing energy Power and higher-dimension disposal ability.For two quasi-mode classification problems, it not only has grace and intuitively expresses, and treatment effeciency Also it is leading in congenic method.
4th, in rough sort result, it is based on the local feature of point sequence to ring, acyclic symbol extraction respectively
Described is that the stroke point sequence for being based on the online input of user is extracted based on the local feature of point sequence.HMM is one Individual double embedded random processes, one of them is state metastasis sequence, another symbol sebolic addressing exported when being transfer every time.Profit With HMM build chemical symbol model should be able to expression symbol component (stroke) sequential write and track.Chemically symbol First sampled point start the character representation that pointwise records the chemical symbol, it is complete that selection covers region residing for characteristic point 11 dimension local features of positional information and directional information, including:Normalized horizontal range, normalized vertical range, length and width Than, bending, linear, normalized first derivative, normalized second dervative and presentation direction;For organic chemistry symbol, Extracting first carries out the reordering operations of point sequence before local feature.
5th, final classification and the identification of hand script Chinese input equipment chemical symbol are realized using the method for HMM
The parameter combination of HMM is:6 states, 9 mixed Gaussians of each state.Using the model specificator Number careful classification.The symbol of hand script Chinese input equipment is by the groups of samples of series of discrete into these points are exactly according to the time write What order was lined up.So the temporal characteristicses of hand script Chinese input equipment symbol determine that it is adapted for use with the method for HMM and is processed.
Advantages of the present invention and good effect:
The present invention proposes a kind of hand script Chinese input equipment chemical symbol recognition methods based on HMM, the method structure The process framework of identification hand script Chinese input equipment chemical symbol is built, the strategy for take layered shaping, optimizing step by step solves relevant issues.Base In the organic ring symbol of method selection grid search-engine and circumference feature differentiation and acyclic symbol of SVMs, classification error Rate is controlled below 2/1000ths;Method based on hidden Markov model recognizes concrete symbol, and accuracy rate is 90 percent More than.Experiment of the inventive method Jing in the data sources such as Tablet PC, Digitizing plate, the input of mouse emulation pen is proved with general All over meaning, can the identification of effectively solving hand script Chinese input equipment chemical symbol problem.
【Description of the drawings】
Fig. 1 is pretreatment work flow chart.
Fig. 2 is rough sort grid search-engine and circumference feature schematic diagram.
Fig. 3 is the sharp point search algorithm schematic diagram of symbol.
Fig. 4 is organic ring symbol shuffle algorithm schematic diagram.
Fig. 5 is method overall flow figure.
【Specific embodiment】
Embodiment 1
Specific implementation process is as follows:
Step 1:Collection chemical symbol sample is simultaneously pre-processed to it
20 users of tissue carry out sample collection using HP Tablet PC.This work is existed using acquisition software HCSC Carry out under Windows Vista operating systems, effective symbol sample 12444 is collected altogether.It is complete with every user writing 102 symbols be designated as a set of, most long writing time 22 minutes, most short writing time 6 minutes, average writing time 11 minutes; The average writing time of single sample is 2.58 seconds, wherein starting to write 1.753 seconds, starting writing 1.85 seconds.
Carry out duplicate removal, interpolation successively to sample, detect sharp point, go hook, wherein smooth five step pretreatment operations, hook structure one As when occurring in the first stroke of a Chinese character or starting to write, length is less, angle change than larger, serious is affected on the accuracy of identification of symbol.First with Fig. 3 Shown method finds sharp point, and using 5 available points Φ is calculatedA=| Φ1234|, if ΦAMore than presetting Threshold value PhiT=60 °, then think 3 to be a little the sharp point to be looked for.After finding sharp point, if its number more than two, uses two Bar line segment SegbWith SegeWhether there is " hook pen " in determine stroke and obtain its position.SegbIt is that stroke section start two is sharp Line segment between point, SegeFor the line segment in stroke most between latter two sharp point, βbWith βeRespectively their inclination angle, LSegFor Line segment SegbOr SegeLength.Re-define two lines section, Segb+1It is the top-stitching of second and the 3rd sharp point of stroke section start Section, Sege-1For the line segment between penultimate and the 3rd sharp point, then their inclination angle is respectively βb+1With βe-1.Parameter lambda It is defined as follows:
λ=| βbb+1| or λ=| βee-1| (5)
If line segment LSegThen corresponding line segment is exactly one " hook pen " to meet following condition simultaneously with angle λ, is directly gone Remove.
λ≤Thresholdangle&&LSeg≤Thresholdlen (6)
Wherein it is determined that ThresholdangleFor 90 °, ThresholdlenIt is the 3% of catercorner length.Here diagonal is The diagonal of stroke boundary rectangle, is tried to achieve by following formula:
Above step is repeated in all effective samples for collecting to pre-process.
Step 2:Extract the grid search-engine of chemical symbol and the rough segmentation category feature of circumference
Step 2.1:Calculate grid search-engine
The boundary rectangle of each effective sample is obtained, sample is divided into the big subregion such as 4*4 according to boundary rectangle, As shown in Figure 2.According to calculating successively positioned at the number per sub-regions midpoint from top to bottom, by left-to-right order, and it is designated as N1,N2,...,Nm*m.These raw statistical datas are normalized, the sampling number Ni calculated in each lattice accounts for symbol Total number of sample pointsRatio, last each sample obtains the grid search-engine of totally 16 dimensions.
Step 2.2:Calculate circumference feature
For each sample image, left, down, right, upper 4 side sequentially from image respectively to the right, upper, left, lower 4 Scanning direction, until scan line runs into stroke or axis, the distance for writing down respective scan line process is the periphery wheel of the sample Wide feature.In order to reduce calculating cost, this feature is improved, only 5 circuits on each direction of picks symbols are carried out Scanning, rather than by row (column) scanning.Five scan values for writing down left margin are Ii(i=1,2 ..., 5), liIt is to open from left margin The distance that beginning is scanned across along i-th line, equally normalizes contour feature.After expanding to 4 directions, circumference feature rule Lattice are 4*5=20 dimensions.
Finally, each sample by equivalence transformation into above-mentioned 16+20=36 dimensional feature vectors.
Step 3:Ring, two acyclic class rough sorts are carried out using SVMs
For 12444 effective samples for collecting, implement to divide using the SVMs based on Product function in radial direction base Class.Product function is in radial direction base:
The parameter combination for adopting for:Penalty factor=211, parameter γ=2 of Radial basis kernel function.
Rough sort is carried out to sample set using the parameter configuration, the ring that obtains, acyclic structure discrimination are 99.82%, i.e., In 1000 hand script Chinese input equipment chemical symbols arbitrarily write, having may be divided into the ring, acyclic of mistake less than two symbols In classification.
Step 4:Extract the local feature based on point sequence of chemical symbol
Step 4.1:Coordinate points rearrangement is carried out to organic ring symbol
Before local feature is extracted, need for the original point sequence reorganization of organic ring symbol to obtain new symbol data, such as Shown in Fig. 4.For same organic symbol, the step for the effect realized be:Sample after its all rearrangements has Identical " order of strokes ", i.e., whole point sequence starts to be to reach unanimity on time order and function from certain position of organic symbol 's.Specific algorithm is as follows:
The first step:Calculate the barycenter of organic symbol sample;
Second step:One ray (being designated as scan line), the scan line angle (deflection) positive with X-axis are outwards drawn by barycenter It is designated as θ;θ initial values are set into 0.
3rd step:The point sequence of traversal organic symbol, calculates each point to scan line apart from d.If d is pre- less than one Threshold value T for first setting, by this point rearrangement point queue List is stored in, and is otherwise disregarded.
4th step:In the counterclockwise direction scan line is rotated into angle delta θ (being set by experience), now the side of scan line It is θ+Δ θ to angle;Return to previous step.
5th step:All data points of organic symbol are all stored in queue List, then reset and finish, and the point in List is Rearranged result.
The all organic ring symbol that obtains of classifying in step 3 carries out above-mentioned process, resets coordinate points and obtains new symbol Data.
Step 4.2:Coordinates computed point local feature
For the organic ring symbol after acyclic symbol and rearrangement, its office is extracted based on the stroke point sequence of the online input of user Portion's feature.Chemically first sampled point of symbol starts pointwise and records its character representation, and selection covers area residing for characteristic point 11 dimension local features of the complete positional information in domain and directional information, including:Normalized horizontal range, it is normalized it is vertical away from From, length-width ratio, bending, linear, normalized first derivative, normalized second dervative, presentation direction etc..Through this step behaviour Make, 12444 symbol samples respective 11 can be obtained and tie up local feature.
Step 5:Final classification and identification are realized using HMM
Class is finely divided to 12444 hand script Chinese input equipment chemical symbols from HMM.
For 64 kinds of inorganic chemistry symbols, each 122 sets of effective samples of collection, altogether containing hand-written symbol 7808.According to 3:1 Ratio cut partition is two subsets, and wherein training set includes 1952 samples comprising 5856 samples, test set.Hidden Markov mould Shape parameter is 6 states, 9 mixed Gaussians of each state.First candidate's highest discrimination now is 89.5%, and the first two candidate is accurate Rate and first three candidate's accuracy rate are respectively 97.0% and 98.4%.
For 38 kinds of organic ring symbols, each 122 sets of effective samples of collection, add up to and include symbol 4636.According to 3:1 Ratio cut partition is two subsets, and wherein training set includes 1159 samples comprising 3477 samples, test set.Hidden Markov mould Shape parameter be 8 states, 9 mixed Gaussians of each state.Highest discrimination now is 98.5%, the first two candidate's accuracy rate and front Three candidate's accuracys rate are respectively 99.5% and 99.9%.

Claims (6)

1. a kind of hand script Chinese input equipment chemical symbol recognition methods based on HMM, it is characterised in that the method include with Lower step:
1st, the set of hand script Chinese input equipment chemical symbol and classification are defined and standard is gathered, the symbol to collecting is pre-processed;
2nd, ring, acyclic rough sort feature extracting method are proposed for the hand script Chinese input equipment chemical symbol set that the 1st step is generated;
3rd, the rough segmentation category feature for extracting for the 2nd step, from SVMs ring, two acyclic class rough sorts are carried out;
4th, in rough sort result, it is based on the local feature of point sequence to ring, acyclic symbol extraction respectively;
5th, final classification and the identification of hand script Chinese input equipment chemical symbol are realized using the method for HMM.
2. method according to claim 1, it is characterised in that described in the 1st step, the collection of the hand script Chinese input equipment chemical symbol of definition Close includes that 10 Arabic numerals, 24 capitalizations, 20 lowercases, 10 chemical operation symbols and 38 are organic with classification Ring symbol;The collection standard of definition includes:Sample code name naming rule, collection environment, normalized written degree and writing time;This The symbol sample that sample is collected meets multi-source heterogeneous requirement, possesses representativeness;The described symbol to collecting carries out pre- place Reason refers to that the initial symbol sample to collecting carries out a series of pretreatment operations for meeting its feature, including:Remove and repeat Point, interpolation mend point, detect sharp point, remove hook and smooth, so that symbol sample meets the needs of subsequent treatment.
3. method according to claim 1, it is characterised in that the ring, acyclic rough sort feature extracting method described in the 2nd step Including two kinds:One kind be symbol boundary rectangle is divided into 4 × 4 uniform grid, according to from top to bottom, by left-to-right order The number of coordinate points in being located at per sub-regions is calculated successively, Jing after normalized using 16 dimensional vectors for obtaining as rough sort Grid search-engine;Another kind be left, down, right sequentially from sample image, upper 4 side respectively to the right, upper, left, lower 4 sides To scanning, until scan line runs into stroke or axis, the distance for writing down respective scan line process is the circumference of the sample Feature, each direction arranges 5 scan lines, and this kind of feature amounts to 20 dimensions.
4. method according to claim 1, it is characterised in that the ring described in the 3rd step, two acyclic class rough sorts select base The SVMs of Product function is realized in radial direction base, and parameter combination is:Penalty factor=211, the parameter of Radial basis kernel function γ=2.
5. method according to claim 1, it is characterised in that described in the 4th step is to be based on based on the local feature of point sequence The stroke point sequence of the online input of user is extracted;Chemically first sampled point of symbol starts pointwise and records the chemical symbol Character representation, selection covers 11 dimension local features of the complete positional information in region residing for characteristic point and directional information, including: It is normalized horizontal range, normalized vertical range, length-width ratio, bending, linear, normalized first derivative, normalized Second dervative and presentation direction;For organic chemistry symbol, the reordering operations of point sequence were first carried out before local feature is extracted.
6. method according to claim 1, it is characterised in that the parameter combination of the HMM described in the 5th step For:6 states, 9 mixed Gaussians of each state.
CN201611251498.4A 2016-12-30 2016-12-30 Online hand-written chemical symbol identification method based on Hidden Markov model Pending CN106650686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611251498.4A CN106650686A (en) 2016-12-30 2016-12-30 Online hand-written chemical symbol identification method based on Hidden Markov model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611251498.4A CN106650686A (en) 2016-12-30 2016-12-30 Online hand-written chemical symbol identification method based on Hidden Markov model

Publications (1)

Publication Number Publication Date
CN106650686A true CN106650686A (en) 2017-05-10

Family

ID=58836709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611251498.4A Pending CN106650686A (en) 2016-12-30 2016-12-30 Online hand-written chemical symbol identification method based on Hidden Markov model

Country Status (1)

Country Link
CN (1) CN106650686A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062529A (en) * 2017-12-22 2018-05-22 上海鹰谷信息科技有限公司 A kind of intelligent identification Method of chemical structural formula
CN108334839A (en) * 2018-01-31 2018-07-27 青岛清原精准农业科技有限公司 A kind of chemical information recognition methods based on deep learning image recognition technology
CN108920077A (en) * 2018-06-27 2018-11-30 青岛清原精准农业科技有限公司 Chemical structural formula method for drafting based on the identification of dynamic gesture library
CN112215178A (en) * 2020-10-19 2021-01-12 南京大学 Chemical experiment recording system based on pen type interaction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834941A (en) * 2015-05-19 2015-08-12 重庆大学 Offline handwriting recognition method of sparse autoencoder based on computer input
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834941A (en) * 2015-05-19 2015-08-12 重庆大学 Offline handwriting recognition method of sparse autoencoder based on computer input
CN105512692A (en) * 2015-11-30 2016-04-20 华南理工大学 BLSTM-based online handwritten mathematical expression symbol recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨巨峰: "联机手写化学公式处理关键问题研究", 《中国博士学位论文全文数据库 信息科技辑》 *
王科俊 等: "化学表达式的识别方法", 《中南大学学报(自然科学版)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062529A (en) * 2017-12-22 2018-05-22 上海鹰谷信息科技有限公司 A kind of intelligent identification Method of chemical structural formula
CN108062529B (en) * 2017-12-22 2024-01-12 上海鹰谷信息科技有限公司 Intelligent identification method for chemical structural formula
CN108334839A (en) * 2018-01-31 2018-07-27 青岛清原精准农业科技有限公司 A kind of chemical information recognition methods based on deep learning image recognition technology
WO2019148852A1 (en) * 2018-01-31 2019-08-08 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology
CN108920077A (en) * 2018-06-27 2018-11-30 青岛清原精准农业科技有限公司 Chemical structural formula method for drafting based on the identification of dynamic gesture library
WO2020000673A1 (en) * 2018-06-27 2020-01-02 青岛清原精准农业科技有限公司 Method for drawing chemical structural formula based on recognition of gestures in dynamic gesture library
CN108920077B (en) * 2018-06-27 2021-07-23 青岛清原精准农业科技有限公司 Chemical structural formula drawing method based on dynamic gesture library recognition
CN112215178A (en) * 2020-10-19 2021-01-12 南京大学 Chemical experiment recording system based on pen type interaction
CN112215178B (en) * 2020-10-19 2024-05-28 南京大学 Chemical experiment recording system based on pen type interaction

Similar Documents

Publication Publication Date Title
JP2667954B2 (en) Apparatus and method for automatic handwriting recognition using static and dynamic parameters
Tagougui et al. Online Arabic handwriting recognition: a survey
Matsakis Recognition of handwritten mathematical expressions
CN1167030C (en) Handwriteen character recognition using multi-resolution models
Harouni et al. Online Persian/Arabic script classification without contextual information
Artieres et al. Online handwritten shape recognition using segmental hidden markov models
CN103093196B (en) Character interactive input and recognition method based on gestures
CN1333366C (en) On-line hand-written Chinese characters recognition method based on statistic structural features
CN106650686A (en) Online hand-written chemical symbol identification method based on Hidden Markov model
CN102663454B (en) Method and device for evaluating character writing standard degree
CN101510259A (en) On-line identification method and recognition system for 'ding' of handwriting Tibet character
JP3761937B2 (en) Pattern recognition method and apparatus, and computer control apparatus
Yang et al. An EMD-based recognition method for Chinese fonts and styles
Ahmed et al. Recognition of Urdu Handwritten Alphabet Using Convolutional Neural Network (CNN).
Jawahar et al. Retrieval of online handwriting by synthesis and matching
Gao et al. Chinese character components segmentation method based on faster RCNN
Singh et al. Recognition of online unconstrained handwritten Gurmukhi characters based on Finite State Automata
Tang et al. Online chemical symbol recognition for handwritten chemical expression recognition
Zhang et al. Dynamic time warping for chinese calligraphic character matching and recognizing
Abuzaraida et al. Online recognition system for handwritten hindi digits based on matching alignment algorithm
Vuori Clustering writing styles with a self-organizing map
CN115311674A (en) Handwriting processing method and device, electronic equipment and readable storage medium
Kim On-line gesture recognition by feature analysis
Huang et al. Overview of mathematical expression recognition
Balreira et al. Assessing similarity in handwritten texts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170510

WD01 Invention patent application deemed withdrawn after publication