CN114419174A

CN114419174A - On-line handwritten text synthesis method, device and storage medium

Info

Publication number: CN114419174A
Application number: CN202111486658.4A
Authority: CN
Inventors: 于凤丽; 常欢; 吴嘉嘉; 殷兵; 胡金水
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-04-29

Abstract

An online handwritten text synthesis method, device and storage medium, the method comprising: acquiring input text, inputting the input text into a trained handwritten text synthesis model based on an attention mechanism, and outputting hidden state characteristics by an encoder in the model; transforming the hidden state feature into a context feature based on an attention mechanism of the model, wherein the context feature can indicate a character to which a handwriting point to be decoded belongs; and inputting the context characteristics into a decoder in the model, and outputting online handwriting points of the input text by the decoder. The method and the device have the advantages that the trained handwritten text synthesis model based on the attention mechanism is adopted to synthesize the online handwritten handwriting points aiming at the input text, and the characters to which the handwriting points to be decoded belong are indicated by the context characteristics output by the attention module, so that the decoder can be more accurately positioned to which input character plays a role in determining when decoding the current handwriting point, and a more stable handwritten text line can be generated.

Description

On-line handwritten text synthesis method, device and storage medium

Technical Field

The present application relates to the field of smart writing technologies, and in particular, to a method and an apparatus for synthesizing an online handwritten text, and a storage medium.

Background

Reading and writing play an extremely important role in human life, corresponding to the input and output of information from and to the world, respectively. Therefore, the issue of how to impart machine reading (i.e., handwritten text recognition) and writing (i.e., handwritten text generation) techniques has attracted considerable attention.

The accuracy of handwritten text recognition has been greatly improved at present, but for the composition of handwritten data, it is difficult to define or predict character-specific appearance representation due to the handwriting style of a writer, the style mixing of internal disconnection and connection of written contents, the transformation of font usage, the character spacing and inclination, etc., so that it is challenging to synthesize realistic handwritten data.

Generally, there are two general ways to characterize handwritten data: one is to treat it as an aligned pixel such as a still image written on paper; the other represents it as a sequence of strokes, i.e. a writing trace. These two representation modes correspond to offline handwritten data and online handwritten data, respectively. On-line handwriting data usually contains more information (such as time information) and can be converted into off-line data, and the key point is that in the actual writing process, people usually write a character in advance according to a defined sequence by one stroke of drawing, rather than generating an image at a time. In an actual online data application scenario, acquisition and labeling of online handwriting data are expensive, and thus, composition of online handwriting data is a challenging and promising task.

Disclosure of Invention

According to an aspect of the present application, there is provided an online handwritten text synthesis method, including: acquiring input text, inputting the input text into a trained handwritten text synthesis model based on an attention mechanism, and outputting hidden state characteristics by an encoder in the model; transforming the hidden state feature into a context feature based on an attention mechanism of the model, wherein the context feature can indicate a character to which a handwriting point to be decoded belongs; and inputting the context characteristics into a decoder in the model, and outputting online handwriting points of the input text by the decoder.

In an embodiment of the application, still acquire writing style information when acquireing the input text, the decoder is still based on writing style information output the online handwritten stroke point of input text.

In one embodiment of the present application, the encoder includes a bidirectional long-and-short memory network, the hidden-state features output by the encoder include N hidden vectors, each hidden vector is a concatenation of a forward state and a backward state, and N is greater than or equal to the number of characters included in the input text.

In an embodiment of the application, the decoder decodes every other time step, and outputs the position offset coordinates of one trace point and the trace state of the trace point after decoding each time.

In an embodiment of the application, the handwriting state of the trace point includes: stroke not-ended state, stroke ended state, all text ended state.

In an embodiment of the application, the decoder predicts the gaussian mixture distribution of the relative position of the current handwriting point to the previous handwriting point during each decoding, and samples the gaussian mixture distribution to obtain the position offset coordinate of the current handwriting point.

In an embodiment of the present application, the decoder concatenates the previous prediction output, the current context feature, and the writing style information to obtain a concatenation feature each time decoding is performed, obtains a decoder hidden state feature according to the concatenation feature, obtains a linear mapping feature according to the decoder hidden state feature, and obtains a current prediction output according to the linear mapping feature.

In one embodiment of the application, the decoder includes a bidirectional long-and-short memory network.

In one embodiment of the present application, the attention-based handwritten text synthesis model is trained by: the first stage is as follows: rendering handwriting text handwriting points in a training set into offline picture data, training an attention-based sequence to a sequence model based on the offline picture data to obtain an attention-based picture text line recognition model, and enabling the first stage to obtain attention position information on an original picture corresponding to a single character; and a second stage: and training a handwritten text synthesis model, and performing supervised training on the handwritten text synthesis model by taking the attention position information obtained in the first stage as a supervision signal in the training process to obtain the handwritten text synthesis model based on the attention mechanism.

In an embodiment of the application, the training of the handwritten text synthesis model is further based on writer identification information, and the handwriting points of the handwritten text of the writer corresponding to the writer identification information are included in the training set of the first stage.

According to another aspect of the present application, there is provided an online handwritten text synthesis apparatus, including: the device comprises an encoder module, a state detection module and a state conversion module, wherein the encoder module is used for acquiring an input text and outputting a hidden state characteristic based on the input text; the attention module is used for transforming the hidden state feature into a context feature based on an attention mechanism, and the context feature can indicate a character to which a handwriting point to be decoded belongs; and the decoder module is used for outputting the online handwriting trace points of the input text based on the context characteristics.

According to yet another aspect of the present application, there is provided an online handwritten text synthesis apparatus, the apparatus comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, the computer program, when executed by the processor, causing the processor to execute the above online handwritten text synthesis method.

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed, performs the above-described method for on-line handwritten text synthesis.

According to the method, the device and the storage medium for synthesizing the online handwritten text, a trained handwritten text synthesis model based on an attention mechanism (attention) is adopted to synthesize the online handwritten trace points (also called trace points) aiming at the input text, and because the context features output by an attention module indicate the characters to which the trace points to be decoded belong, a decoder can more accurately position the input characters to play a role in determining when decoding the current trace points, so that a more stable handwritten text line can be generated.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows a diagram of the challenging factors of synthesizing handwritten text.

FIG. 2 is a diagram illustrating a Chinese character represented based on a sequence of strokes.

Fig. 3 shows a schematic flow chart of an online handwritten text synthesis method according to an embodiment of the application.

Fig. 4 shows a schematic diagram of a first stage of training of an attention-based handwritten text synthesis model in an online handwritten text synthesis method according to an embodiment of the application.

FIG. 5 is a diagram illustrating a second stage of training of an attention-based handwritten text synthesis model in an online handwritten text synthesis method according to an embodiment of the application.

Fig. 6 shows a schematic block diagram of an online handwritten text synthesis apparatus according to an embodiment of the present application.

Fig. 7 shows a schematic block diagram of an online handwritten text synthesis apparatus according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail below with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application described in the present application without inventive step, shall fall within the scope of protection of the present application.

For the composition of handwriting data, it is difficult to define or predict character-specific appearance characteristics due to the handwriting style of the writer, style mixing of internal breaks and connections of the written contents, transformation of font usage, character spacing and tilt, etc., as shown in fig. 1, making the composition of realistic handwriting data challenging. Handwritten data is generally characterized in two ways: one is to treat it as an aligned pixel such as a still image written on paper; another representation is a sequence of strokes, i.e. a writing trace, as shown in fig. 2. These two representation modes correspond to offline handwritten data and online handwritten data, respectively. In an actual online data application scenario, acquisition and labeling of online handwriting data are expensive, and thus, composition of online handwriting data is a challenging and promising task.

With the rapid development of electronic engineering, people increasingly use devices based on a stylus, such as a smart phone, a whiteboard, or a tablet computer, to write text (online data) in the form of digital ink, which is easy to process and operate. However, despite favorable advances in handwriting recognition, handwritten text is still relatively rarely utilized in online data synthesis, which generates models based on sequences of handwriting points. Sequence generation model learning is a long-standing challenge in machine learning, belongs to the field of dynamic Bayes in the early stage, and is processed by a Recurrent Neural Network (RNN) in the subsequent development. Although there are some RNN-based generative models designed for handwritten text, the synthetic text of these models still has many problems compared to normal human writing. The generation of a countermeasure network (GAN) is intended to create a realistic image of handwritten text that researchers have applied to optical character recognition by obtaining and feeding into the generator network the word-embedded representations of the corresponding characters through a bidirectional long-short memory (BiLSTM) network loop layer, but this method does not allow direct synthesis of online handwritten text, although the method creates an image that is realistic, the conversion from image to digital strokes requires an efficient Ink Grab algorithm. At present, researchers apply a variational self-encoder (VAE) to RNN, and experimental verification is carried out by generating voice and online data, but handwritten data synthesized by the scheme is not true enough; alex Grave proposes a generator model based on RNN to simulate handwriting data to perform online text track point synthesis, and the synthesis result is unstable and can not synthesize the handwriting text data with a specific style.

Based on at least one of the above problems, the present application provides an online handwritten text synthesis scheme. As described below in connection with fig. 3 through 7.

Fig. 3 shows a schematic flow diagram of an online handwritten text synthesis method 300 according to an embodiment of the application. As shown in fig. 3, online handwritten text synthesis method 300 may include the steps of:

in step S310, the input text is input into a trained attention-based handwritten text synthesis model, and a hidden state feature is output by an encoder in the model.

In step S320, the hidden state feature is transformed into a context feature based on the attention mechanism of the model, and the context feature can indicate the character to which the handwriting point to be decoded belongs.

In step S330, the context features are input to a decoder in the model, and the decoder outputs the on-line handwritten locus points of the input text.

In an embodiment of the present application, online handwritten track points (also referred to as track points) are synthesized for input text using a trained attention-based handwritten text synthesis model. Wherein the handwritten text synthesis model includes an encoder, an attention module, and a decoder. The encoder extracts features aiming at the input text and outputs hidden state features; the attention module converts the hidden state features into context features, indicates the category information (namely the characters to which the handwriting points to be decoded belong) of the handwriting points to be decoded, and the decoder decodes each handwriting point to be decoded respectively based on the context features corresponding to each handwriting point to be decoded to obtain the relative position and the handwriting state of each handwriting point to be decoded, and finally obtains the online handwriting points of the whole input text. Because the context characteristics output by the attention module indicate the characters to which the handwriting points to be decoded belong, the decoder can more accurately position the character determined by the input character when decoding the current handwriting point, and therefore a more stable handwritten text line can be generated.

In a further embodiment of the present application, writing style information may also be acquired when the input text is acquired in step S310, based on which the decoder may also output online handwritten track points of the input text based on the writing style information in step S330. In this embodiment, writing style information (e.g., feature information corresponding to a certain handwritten person ID) is added to the decoding side, and thus online handwritten data having a specific style can be generated.

The following describes a training process of a handwritten text synthesis model based on an attention mechanism in an online handwritten text synthesis method according to an embodiment of the present application with reference to fig. 4 and 5. Fig. 4 is a schematic diagram illustrating a first stage of training of a handwritten text synthesis model based on an attention mechanism in an online handwritten text synthesis method according to an embodiment of the present application. FIG. 5 is a diagram illustrating a second stage of training of an attention-based handwritten text synthesis model in an online handwritten text synthesis method according to an embodiment of the application.

First, in the embodiment of the present application, the training data may be derived from trajectory point data collected on an electronic terminal device (such as a tablet, a smart phone, a whiteboard, or a tablet computer). The training data may then be preprocessed: the first step is that input data is text content, a label is a series of stroke points which are corresponding to the text and are sorted according to time, each stroke point is composed of (x, y) coordinates on a flat panel device and a pen lifting event, the pen coordinates are integer values limited by screen resolution, when a pen is lifted from a screen, the state of the pen lifting event is recorded as 1, and otherwise, the state is 0. Hand (W.E.)The write pattern is formally defined as

Wherein x_tIs the stroke point, and T is the stroke point number. In the actual model training, the input data handwriting point at a certain time is defined as the relative position of the current time stroke coordinate point and the last time stroke coordinate point, and consists of a real value pair x (x1, x2) and a binary representation x3, wherein if the current stroke ends x3 is 1, otherwise, the value is 0. Simultaneously, preprocessing operation is carried out on input data, and the preprocessing operation comprises the following steps: the global mean and variance (global mean, global var) of all handwriting point shifts are calculated and expressed as follows: x is (x-global _ mean)/global _ var.

Next, an attention supervisory signal training network may be constructed. Specifically, for the training data of the network, for a given online handwritten text training set, according to coordinate points and stroke sequences thereof, rendering the training set into offline picture data, constructing a sequence-to-sequence (sequence-to-sequence) ED model according to the number of offline pictures, training a picture text line recognition model based on attention to recognize text contents in the offline picture, after the model is stably trained, obtaining attention position information of a single character of a recognition result in the offline original picture, and corresponding the position information to an original offline coordinate point, thereby obtaining a coordinate point corresponding to the single character on the original coordinate point, that is, category information corresponding to each coordinate point in the online coordinate points, as specifically shown in fig. 4.

As shown in fig. 4, the trajectory point data of the handwritten text "as high" is rendered into an offline picture, which is input into the encoder of the picture text line recognition model. The encoder of the image text line recognition model comprises a Convolutional Neural Network (CNN) and a bidirectional long-time memory network (BilSTM), and the encoder outputs a hidden state feature Hc. The decoder of the picture text line recognition model comprises BilSTM, and the output content of the BilSTM passes through a classifier softmax to output a text recognition result "as high". Then, position information of each character of the handwritten text "as through" in the offline original image, such as the position of attention shown in fig. 4, may be obtained, and the position information corresponds to the original offline coordinate point, so as to obtain a coordinate point corresponding to a single character on the original coordinate point, that is, category information corresponding to each coordinate point in the online coordinate points. The process described above in connection with fig. 4 is the first stage of the training process.

In the second stage, a generative network from text to track points is constructed. The network includes an encoder, a decoder, and an attention module, wherein the encoder and decoder are each comprised of a layer of bi-directional LSTM network. For input text character X_cThe encoder encodes it as hidden state H { c1, c2_c＝ENC_c(X_c) Where Hc consists of N hidden vectors, each hidden vector being a concatenation of a forward state and a backward state, N being greater than or equal to the number N of characters contained in the input text. In the decoding stage, in order to make the input information of the decoding end most relevant to the category of the current decoding track point, in the scheme of the application, an attention mechanism is introduced between a content encoder and a decoder, and at the moment of a decoding time step t, the input content (namely the context characteristic mentioned above) of the decoder is c_t＝attention(H_c；h_t-1) Wherein h is_t-1Is the decoder hidden state for the last time step, and attention () is a mechanism of attention. Wherein c is_tAnd in the training stage, the class information of the single coordinate point obtained in the first stage is introduced as a supervision signal to carry out supervision training.

In the above network, the decoder may comprise a layer of bi-directional LSTM network. The decoder predicts the next point based on the previous prediction output and the current content information from the content encoder (i.e., the context features described above), respectively. At the current decoding time t, a training stage and a previous time prediction sample label p_t-1And content output c_tSplicing to obtain a_t＝[p_t-1；c_t]For obtaining the hidden state h of the decoder at the current moment_t＝DEC(h_t-1；a_t) (ii) a Finally, h is_tMapping to o by a linear layer_tTo predict the output stroke point p_t. In general, the decoder can decode every other time step, and the position offset seat of one trace point is output in each decodingMarking the handwriting state of the stroke point as the prediction output of this time, as shown in fig. 5.

As shown in fig. 5, the input text "as high" is encoded into a one-hot vector, which is input into the encoder of the handwritten text synthesis model. The BilSTM of an encoder of the handwritten text synthesis model acquires input word embedding (input embedding) characteristics and outputs hidden state characteristics Hc. The decoder of the handwritten text synthesis model comprises BilSTM, and the output content of the BilSTM passes through a Gaussian Mixture Model (GMM) and a classifier softmax to output a handwritten text stroke point "as through". During the process of training the handwritten text synthesis model, the attention position information obtained in the first stage is used as a supervision signal to supervise and train the handwritten text synthesis model. Here, a Gaussian Mixture Model (GMM) is used to model the position offset (dx; dy) of the locus points (e.g., a decoder may model the position offset of the locus points using a GMM with an R bivariate normal distribution). In this way, when the decoder decodes each time, the gaussian mixture distribution of the relative position of the current handwriting point relative to the previous handwriting point can be predicted, and the gaussian mixture distribution is sampled to obtain the position offset coordinate of the current handwriting point. With respect to the classifier softmax, in embodiments of the present application, a decoder may model point state classes (p 1; p 2; p3) using a three class classifier (i.e., softmax layer). Based on this, the handwriting state of the stroke point may include: stroke not end state (i.e., p1), stroke end state (i.e., p2), full text end state (i.e., p 3).

In addition, in the example shown in fig. 5, writer identification (id) information is also shown, which corresponds to the writing style information of the writer, and the model can be made to implicitly learn the style characteristics of the writer's writing by adding this information to train the handwritten text synthesis model. In this embodiment, at the current decoding time t, the training phase, the previous time to predict the sample label p_t-1And content output c_tAnd writing id information d_tSplicing to obtain a_t＝[p_t-1；c_t；d_t]For obtaining the hidden state h of the decoder at the current moment_t＝DEC(h_t-1；a_t) (ii) a Finally, h is_tMapping to o by a linear layer_tTo predict the output stroke point p_t. The process described above in connection with fig. 5 is the second phase of the training process.

In general, the attention-based handwritten text synthesis model of the present application is trained by: the first stage is as follows: rendering handwriting text handwriting points in a training set into offline picture data, training an attention-based sequence to a sequence model based on the offline picture data to obtain an attention-based picture text line recognition model, and enabling a first stage to obtain attention position information on an original picture corresponding to a single character; and a second stage: and training the handwritten text synthesis model, and performing supervised training on the handwritten text synthesis model by taking the attention position information obtained in the first stage as a supervision signal in the training process to obtain the handwritten text synthesis model based on the attention mechanism. The handwriting point of the handwritten text corresponding to the writer identification information can be included in the training set in the first stage. That is, the training data contains samples written by multiple persons, and the writing of each sample is marked by which writing is performed, i.e. the information is identified for the writer, so that the model implicitly learns the style characteristics of the writer writing.

Based on the training process, in actual use, the model is trained and converged to obtain a final handwritten text (track point) synthesis model, and a test text sequence is encoded by the encoder in fig. 5 to obtain an encoded representation of an input text; during decoding, sampling is carried out through Gaussian mixture distribution of relative positions of handwriting points predicted by a decoder to obtain the relative positions of the handwriting points generated by the model, the handwriting state of the handwriting points is predicted through the state class of the decoder, and then the online handwriting track points of the input text are generated. And for the style of generating the stroke points, controlling the style characteristics of the synthesized on-line stroke points according to the id information of the writer in the input.

Based on the above description, the basic idea of the online handwritten text synthesis method according to the embodiment of the application is to generate a sequence of handwritten track points based on an autoregressive neural network RNN generation model. In order to improve the stability of the handwriting track points synthesized in the algorithm, an attention mechanism alignment strategy is introduced in the scheme, and the attention is supervised and forced to align when the generated model is trained, so that a relatively stable generated result is obtained. Because the handwritten text track point data only contains track point coordinates and text contents, the supervision signal corresponding to attention in the training is text information corresponding to the current track point, but no track point marking information corresponding to a single character, namely the supervision information of attention exists on the actual data, in order to solve the problem, the scheme renders the track points of a training set into offline picture data training, trains a seq2seq picture text line recognition model based on attention, trains a picture text line recognition model based on attention, enables the stage to obtain attention position information corresponding to the single character on an original picture, solves the problem that each track point in online handwritten data needs to be marked with a corresponding character type, and saves marking cost. In addition, in order to enable the model to generate handwriting track point data of multiple styles, the id representation information of a writer is added in the scheme during training to guide the model to generate the data of the multiple styles, the training data comprises samples written by multiple persons, and the sample is marked by which writing is performed, namely the id information of the writer, so that the model implicitly learns the style characteristics written by the writer. Therefore, the online handwritten text synthesis method according to the embodiment of the application can generate relatively stable handwritten text lines and can also generate online handwritten data with a specific style.

The above exemplarily illustrates an online handwritten text synthesis method provided according to an aspect of the present application. In the following, with reference to fig. 6 and fig. 7, description will be made of online handwritten

text synthesis apparatuses

600 and 700 provided according to another aspect of the present application, each of which can implement the aforementioned online handwritten text synthesis method 100 according to an embodiment of the present application, and for brevity, some details may be referred to in the foregoing description, and only the structural components and the main functions of the online handwritten

text synthesis apparatuses

600 and 700 are described here.

Fig. 6 shows a schematic block diagram of an online handwritten text synthesis apparatus 600 according to an embodiment of the present application. As shown in fig. 6, online handwritten text synthesis apparatus 600 includes an encoder module 610, an attention module 620, and a decoder module 630. The encoder module 610 is configured to obtain an input text and output a hidden state feature based on the input text. The attention module 620 is configured to transform the hidden state feature into a context feature based on an attention mechanism, where the context feature is capable of indicating a character to which a handwriting point to be decoded belongs. The decoder module 630 is used to output online handwritten trace points of the input text based on the contextual characteristics.

The online handwritten text synthesis device 600 according to the embodiment of the present application may be used to implement the online handwritten text synthesis method 100 according to the embodiment of the present application, which is described above. Therefore, the online handwritten text synthesis device 600 according to the embodiment of the present application can realize online handwritten text track synthesis based on attention mechanism alignment. Specifically, the online handwritten text synthesis device 600 according to the embodiment of the present application performs supervised forced alignment on attention during model training, so as to obtain a relatively stable generation result. In addition, in consideration of the fact that data of a specific style can be generated in an actual use scene is very important, written id representation information can be added during training, and the model can control style information according to handwritten person id information during generation of online handwriting point data.

Fig. 7 shows a schematic block diagram of an online handwritten text synthesis apparatus 700 according to another embodiment of the present application. As shown in fig. 7, the online handwritten text synthesis apparatus 700 may include a memory 710 and a processor 720, where the memory 710 stores a computer program run by the processor 720, and the computer program, when executed by the processor 720, causes the processor 720 to execute the online handwritten text synthesis method 100 according to the embodiment of the present application described above. The detailed operation of the online handwritten text synthesis apparatus 700 according to the embodiments of the present application can be understood by those skilled in the art with reference to the foregoing description, and for the sake of brevity, detailed descriptions thereof are omitted here.

Furthermore, according to an embodiment of the present application, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the corresponding steps of the online handwritten text synthesis method of the embodiment of the present application. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

Based on the above description, the online handwritten text synthesis method and device according to the embodiment of the application provide an online handwritten text track synthesis scheme based on attention mechanism alignment. Specifically, for a given online handwritten text training set, rendering the given online handwritten text training set into offline picture data according to coordinate points and stroke sequences of the given online handwritten text training set; the first stage is to construct a sequence-to-sequence ED model for offline picture data, train an attribute-based picture text line recognition model and enable the stage to obtain attribute position information corresponding to a single character on an original picture; training a track point generation model in the second stage, wherein the generation model is divided into an encoder, an attention module and a decoder; an encoder inputs embedding representation of text line characters, and sequence features of the input text characters are extracted through a layer of BilSTM; the decoder performs modeling based on a Gaussian Mixture Model (GMM), obtains distribution parameters of the relative positions of the track coordinate points, and performs sampling according to the predicted distribution function to obtain the relative positions of the track points. In the two-stage model training, the attition of the generated model is supervised based on the attition learned in one stage, so that the model can be more accurately positioned to which input character plays a role in determining when the current stroke point is decoded, and a more stable handwritten text line can be generated. Meanwhile, in order to enable the generated text line to have style information, the characteristic information of the handwritten person id can be added at a decoding end in the text scheme, so that online handwritten data with a specific style can be generated.

Although the example embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above-described example embodiments are merely illustrative and are not intended to limit the scope of the present application thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present application. All such changes and modifications are intended to be included within the scope of the present application as claimed in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present application should not be construed to reflect the intent: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules according to embodiments of the present application. The present application may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only for the specific embodiments of the present application or the description thereof, and the protection scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope disclosed in the present application, and shall be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An online handwritten text synthesis method, the method comprising:

acquiring input text, inputting the input text into a trained handwritten text synthesis model based on an attention mechanism, and outputting hidden state characteristics by an encoder in the model;

transforming the hidden state feature into a context feature based on an attention mechanism of the model, wherein the context feature can indicate a character to which a handwriting point to be decoded belongs;

and inputting the context characteristics into a decoder in the model, and outputting online handwriting points of the input text by the decoder.

2. The method of claim 1, wherein writing style information is also obtained when the input text is obtained, and wherein the decoder outputs online handwritten track points of the input text based also on the writing style information.

3. The method of claim 1 or 2, wherein the encoder comprises a bidirectional long-and-short memory network, wherein the hidden-state features output by the encoder comprise N hidden vectors, each hidden vector being a concatenation of a forward state and a backward state, and wherein N is greater than or equal to a number of characters contained in the input text.

4. The method according to claim 1 or 2, wherein the decoder decodes every other time step, and outputs the position offset coordinates of one trace point and the handwriting state of the trace point each time the decoder decodes.

5. The method of claim 4, wherein the handwriting state of the stroke points comprises: stroke not-ended state, stroke ended state, all text ended state.

6. A method as claimed in claim 4, wherein the decoder predicts a Gaussian mixture distribution of the relative positions of a current trace point with respect to a previous trace point at each decoding time, and samples the Gaussian mixture distribution to obtain the position offset coordinates of the current trace point.

7. The method according to claim 4, wherein the decoder concatenates the previous prediction output, the current context feature and the writing style information to obtain a concatenation feature each time the decoder decodes, the concatenation feature is sent to the decoder to obtain a decoder hidden state feature, a linear mapping feature is obtained according to the decoder hidden state feature, and the current prediction output is obtained according to the linear mapping feature.

8. The method according to claim 1 or 2, wherein the decoder comprises a bidirectional long-and-short memory network.

9. The method of claim 1 or 2, wherein the attention-based handwritten text synthesis model is trained by:

the first stage is as follows: rendering handwriting text handwriting points in a training set into offline picture data, training an attention-based sequence to a sequence model based on the offline picture data to obtain an attention-based picture text line recognition model, and enabling the first stage to obtain attention position information on an original picture corresponding to a single character;

and a second stage: and training a handwritten text synthesis model, and performing supervised training on the handwritten text synthesis model by taking the attention position information obtained in the first stage as a supervision signal in the training process to obtain the handwritten text synthesis model based on the attention mechanism.

10. The method of claim 9, wherein training the handwritten text synthesis model is further based on writer identification information corresponding to handwritten text writing points of a writer included in the training set of the first stage.

11. An apparatus for synthesizing handwritten text on-line, the apparatus comprising:

the device comprises an encoder module, a state detection module and a state conversion module, wherein the encoder module is used for acquiring an input text and outputting a hidden state characteristic based on the input text;

the attention module is used for transforming the hidden state feature into a context feature based on an attention mechanism, and the context feature can indicate a character to which a handwriting point to be decoded belongs;

and the decoder module is used for outputting the online handwriting trace points of the input text based on the context characteristics.

12. An apparatus for on-line handwritten text synthesis, characterized in that the apparatus comprises a memory and a processor, the memory having stored thereon a computer program for execution by the processor, the computer program, when executed by the processor, causing the processor to carry out the method of on-line handwritten text synthesis according to any of claims 1-10.

13. A storage medium having stored thereon a computer program which, when executed, performs an online handwritten text synthesis method according to any of claims 1-10.