CN112001236A

CN112001236A - Writing behavior identification method and device based on artificial intelligence

Info

Publication number: CN112001236A
Application number: CN202010668299.3A
Authority: CN
Inventors: 高旻昱
Original assignee: Shanghai Lingteng Intelligent Technology Co ltd
Current assignee: Shanghai Lingteng Intelligent Technology Co ltd
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2020-11-27

Abstract

The invention provides a method and a device for identifying writing behaviors based on artificial intelligence, which comprises the following steps: acquiring acquired image data needing to be subjected to writing behavior recognition in real time; positioning a writing tool in the image data through an image target recognition deep learning neural network model based on a convolutional neural network of a Densenet structure or an image recognition main network based on a Resnet or vgg or dark net structure, and extracting the writing tool and an image sequence of a region near the writing tool frame by frame according to a time sequence; and recognizing and outputting the writing behavior state of the extracted image sequence by using a deep learning neural network model based on a Densenet structure. The embodiment of the invention realizes the recognition of the writing behavior, improves the accuracy of the behavior recognition, better meets the requirements of users and improves the user experience.

Description

Writing behavior identification method and device based on artificial intelligence

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for recognizing writing behaviors based on artificial intelligence.

Background

Artificial Intelligence (AI) is a new technical science for researching and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.

With the popularization of internet technology and the development of artificial intelligence, artificial intelligence has become increasingly important in behavior recognition applications.

The traditional behavior recognition mainly relies on monitoring by a specially-assigned person, and eyes stare at a monitoring picture for analysis, so that the behavior recognition mode is time-consuming and labor-consuming, high in labor cost, greatly influenced by subjective factors of people, and prone to overlook suspicious conditions in the monitoring picture, so that behavior actions cannot be accurately judged.

Disclosure of Invention

The application provides a writing behavior recognition method and device based on artificial intelligence, and aims to solve the problems that the existing behavior recognition technology is time-consuming, labor-consuming, high in labor cost and easy to judge wrongly, and further achieve recognition of writing behaviors.

An embodiment of the first aspect of the present application proposes an artificial intelligence-based writing behavior recognition method, including:

receiving collected image data needing to be subjected to writing behavior recognition in real time;

extracting the writing tool and the image sequence of the area near the writing tool from the image data according to the time sequence;

and recognizing and outputting the writing behavior state of the extracted image sequence by utilizing a preset deep learning neural network model.

Further, receiving the collected image data needing to be subjected to writing behavior recognition in real time, further comprising:

receiving image data acquired by image acquisition equipment in real time, wherein the image data comprises a plurality of interactive subjects.

Further, extracting a sequence of images of the writing instrument in chronological order from the image data, further comprising:

acquiring a set of continuous image data in time order; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus.

Further, extracting a sequence of images of a region near the writing instrument in chronological order with respect to the image data, further comprising:

extracting a set of consecutive image data in chronological order; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

Further, the deep learning neural network model adopts a recurrent neural network model, and the training of the recurrent neural network model comprises:

and constructing a recurrent neural network, acquiring and marking a plurality of image sequences, inputting the marked image sequences into the recurrent neural network, performing data cyclic feedback and transmission in a time evolution direction, and outputting an identification result.

Further, the deep learning neural network model adopts a three-dimensional convolutional neural network model, and training the three-dimensional convolutional neural network model comprises:

and constructing a three-dimensional convolutional neural network, acquiring a plurality of image sequences, inputting the image sequences into the three-dimensional convolutional neural network as a whole as three-dimensional data, taking a time axis as a third dimension, and outputting an identification result of any scale on the time axis.

In an embodiment of the second aspect of the present application, an apparatus for recognizing writing behavior based on artificial intelligence is provided, where the method for recognizing writing behavior based on artificial intelligence as described in any one of the above items is adopted, and includes: an acquisition module, an extraction module and a processing module,

the acquisition module is used for acquiring acquired image data needing writing behavior recognition;

the extraction module is used for extracting the writing tool and the image sequence of the area near the writing tool from the image data according to the time sequence;

and the processing module is used for recognizing and outputting the writing behaviors of the extracted image sequence by utilizing a preset deep learning neural network model.

Further, the processing module comprises a first extraction module and a second extraction module.

The first extraction module is used for acquiring a group of continuous image data according to a time sequence; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus;

the second extraction module is used for extracting a group of continuous image data according to a time sequence; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

In an embodiment of the third aspect of the present application, a terminal device is provided, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for artificial intelligence based recognition of writing behavior as proposed in the first aspect when executing the program.

An embodiment of the fourth aspect of the present application proposes a computer-readable storage medium on which a computer program is stored which, when being executed by a processor, carries out the method for artificial intelligence based recognition of writing behavior as proposed by the first aspect.

Compared with the prior art, the method and the device have the advantages that the plurality of image sequences are extracted and input into the deep learning neural network model, so that the writing behaviors are recognized, the accuracy of behavior recognition is improved, the requirements of a user are better met, and the user experience is improved.

Drawings

FIG. 1 is a flow chart of a method for recognizing writing behaviors based on artificial intelligence according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of neural recognition in a convolutional network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a recurrent neural network model RNN according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a recurrent neural network model RNN identification according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a three-dimensional convolutional neural network model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a three-dimensional convolutional neural network model RNN identification according to an embodiment of the present invention;

FIG. 7 is a block diagram of an artificial intelligence based recognition system for writing behavior in accordance with an embodiment of the present invention.

FIG. 8 is a block diagram of an artificial intelligence based recognition device for writing behavior in accordance with one embodiment of the present invention;

Detailed Description

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and are intended to explain the present invention, but are not to be construed as limiting the present invention.

Specifically, the embodiments of the present invention are directed to providing an artificial intelligence-based writing behavior identification method, which aims to solve the problems that in the prior art, behavior identification mainly depends on artificial identification, but the artificial identification is time-consuming and labor-consuming, high in labor cost, and large in error rate. According to the method for recognizing the writing behaviors based on the artificial intelligence, a section of continuous image data which takes time sequence as sequence is obtained, the trained deep neural network model is utilized, the writing behaviors are recognized, the accuracy of the writing behaviors is improved, the requirements of users are better met, and the experience of the users is improved.

The following parametric drawings describe a method and a device for recognizing writing behaviors based on artificial intelligence according to an embodiment of the invention.

Example one

Referring to fig. 1, the present embodiment provides an artificial intelligence-based writing behavior recognition method, which may include the following steps.

Step 101, acquiring acquired image data needing to be subjected to writing behavior recognition in real time.

The identification subject based on the artificial intelligence writing behavior provided by the embodiment of the invention is the identification device based on the artificial intelligence writing behavior provided by the embodiment of the invention, and the device is provided with the image acquisition equipment and can acquire image data in real time, so that the device can be configured in any terminal equipment with an image acquisition function to realize the identification of the writing behavior.

Further, the embodiment acquires the image data acquired by the image acquisition device in real time, wherein the image data includes a plurality of interactive subjects. The image data includes at least a writing instrument, a writing medium, and an interactive body such as a hand that holds the writing instrument to perform a writing action on the writing medium. The present embodiment is directed to recognizing image regions of a writing instrument and a writing instrument attachment by an image capturing device to determine a writing behavior state. The handwriting information left by the writing tool when the writing medium is written can be judged.

And 102, positioning the writing tool in the image data through an image target recognition deep learning neural network model based on a convolutional neural network of a Densenet structure or an image recognition main network based on a Resnet or vgg or a darknet structure, and extracting image sequences of the writing tool and the writing tool accessory region frame by frame according to a time sequence.

The densener algorithm in the simple deep neural network model based on densener is consistent with the ResNet algorithm, and dense connection (dense connection) between all layers in the front and the back layer is established, but the densener algorithm realizes feature reuse (feature reuse) through connection of features on a channel, and the densener realizes better performance than ResNet under the condition of less parameters and computation cost.

In the image recognition backbone network based on the Resnet or vgg or the darknet structure, further comprising that the Resnet network can extremely quickly accelerate the training of the neural network and improve the accuracy of the model. Meanwhile, the popularization of ResNet is very good, and even the ResNet can be directly used in an IncepotionNet network. ResNet adds a direct channel in the Network, namely the idea of high way Network. ResNet solves the problems of more or less information loss, loss and the like of a traditional convolution network or a full-connection network in information transmission to a certain extent, also solves the problems of gradient disappearance or gradient explosion and incapability of training due to a very deep network, protects the integrity of information by directly bypassing input information to output, and only needs to learn the part of input and output differences in the whole network, thereby simplifying the learning objective and difficulty. vgg networks, VGGNet all use 3 x 3 convolution kernels and 2 x 2 pooling kernels to improve performance by increasingly deepening the network structure. The increase in the number of network layers does not bring about an explosion in the amount of parameters, which are mainly concentrated in the last three fully connected layers. Meanwhile, the series connection of two 3 × 3 convolutional layers corresponds to 1 convolutional layer of 5 × 5, the series connection of 3 convolutional layers of 3 × 3 corresponds to 1 convolutional layer of 7 × 7, that is, the size of the field of 3 × 3 convolutional layers corresponds to 1 convolutional layer of 7 × 7. However, the number of the convolution layer parameters of 3 × 3 is only about half of 7 × 7, and the former can have 3 nonlinear operations, while the latter only has 1 nonlinear operation, so that the former has stronger learning capability for the features. The linear transformation is increased by using 1 × 1 convolutional layers, and the number of output channels is not changed. Referring to the other usage of 1 x 1 convolutional layers, 1 x 1 convolutional layers are often used to refine features, i.e., features of multiple channels are combined together and condensed into the output of larger or smaller channels, without changing the size of each picture. Sometimes a 1 x 1 convolutional neural network may also be used instead of a fully connected layer. The VGGNet firstly trains a simple network of a level A during training, and then the weights of the network A are multiplexed to initialize a plurality of following complex models, so that the convergence speed is higher. VGGNet authors conclude that LRN layers do not work much, deeper networks work better, 1 x 1 convolution is also very effective, but less than 3 x 3 convolution works well, since 3 x 3 networks can learn larger spatial features. The Darknet network is easy to install: selecting the required additional items (cuda, cudnn, opencv and the like) in the makefile to directly make the installation in a few minutes; there are no dependencies: the whole framework is written by C language, and can be independent of any library, and the opencv author writes a function which can be substituted; the structure is clear, and the source code is convenient to check and modify: the basic files of the framework are in the src folder, and some defined detection and classification functions are in the example folder, so that the source code can be directly viewed and modified according to the requirement; friendly python interface: although darknet is written in c language, a python interface is provided, and a trained weight format model can be directly called by using python through a python function; easy transplantation: the framework is very simple to deploy to the local of the machine, the cpu and the gpu can be used according to the condition of the machine, and particularly, the darknet is extremely convenient to deploy at the local end of a detection and identification task.

The image sequence is a group of continuous image data in time sequence, and the image data with the same size at fixed time intervals can also be regarded as a piece of video data.

The method comprises the steps of extracting an image sequence of the writing tool frame by frame according to the time sequence of the image data, and extracting an image sequence of an accessory area of the writing tool frame by frame according to the time sequence of the image data.

Extracting the image sequence of the writing instrument frame by frame in time order from the image data further comprises acquiring a set of consecutive image data in time order; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus.

Extracting a sequence of images of a region near the writing instrument frame by frame in time order for the image data further includes: extracting a set of continuous image data frame by frame in time order; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

Further, in this step 102, the recognition of the writing instrument is performed by the writing instrument locating neural network. The network consists of 5 layers of convolution neural networks and 3 layers of deconvolution neural networks, the network output is a characteristic image with the same size as the input image, the target characteristic image is in two-dimensional normal distribution X-N (u, d) with the pen point of the writing tool as the center during training, the value of the pen point is maximum 1, the diffusion towards the edge is gradually reduced to 0, and the naked eye looks like a light spot with the pen point as the center. The loss function of the neural network is the L1 error of the target characteristic image and the inference characteristic image, the optimization algorithm adopts AdamSGD, the training data is about 100 ten thousand images, and the iteration is performed for 100 ten thousand times of convergence. The output of the convolutional encoder is the input of the deconvolution decoder. The approximate position of the pen point is obtained through the neural network, offset calculation is carried out in four directions of up, down, left and right by taking the position as a reference center, and an image of the area near the writing tool is cut out. The specific offset distance needs to be adjusted in equal proportion according to the size of the writing instrument.

And 103, recognizing and outputting the writing behavior state of the extracted image sequence by using a deep learning neural network model based on a densenert structure.

The recognition of the writing behavior state is that, in a certain image sequence, for example, some (in a proportion range of 0-100%) of the image data of consecutive frames are in a written state, wherein the written state flag may be marked as 1 and the non-written state may be marked as 0. In this embodiment, a sufficient number of image sequences are acquired and marked.

Further, the recognition of the writing behavior state is performed by a writing state recognition neural network. There are three different sets of technical principles regarding this network. In the first scheme, a deep Convolutional Neural Network (CNN) is used for independently extracting features of each frame of picture, the input data is 240-800 resolution pictures, and the features are extracted through 3-7 CNN modules. The CNN module comprises a convolution layer, a pooling layer and an excitation layer. After feature extraction, the CNN module sequence is output to the softmax layer for classification. The writing state of each frame is obtained. In the second scheme, on the basis of using the CNN structure, a recurrent neural network with a corresponding data size is linked behind a CNN module sequence, so that extracted characteristics of continuous multiple frames are associated, through the circulation and time sequence memory function of the RNN, continuous information of the previous multiple frames and the next multiple frames is compared, data with a certain time length (3-10 frames, about 1-3 seconds) is analyzed, and a writing state classification and identification result with better accuracy is obtained. The third set of scheme uses a three-dimensional convolution neural network, simultaneously outputs past continuous multi-frame pictures, performs feature extraction through a three-dimensional CNN module, and inputs data of the three-dimensional CNN module, wherein the input data are multi-frame image data with the same size, and the data dimension is one more time sequence dimension than the width and the height of the pictures, so that the three-dimensional convolution is called. The size of the corresponding CNN module sequence has also changed from two dimensions to three dimensions. But its basic structure and calculation are completely unchanged. By the scheme, the characteristic extraction can be directly carried out on the multi-sequence images, and the multi-sequence images are directly accessed into a softmax classifier to carry out writing state recognition classification.

Referring to fig. 2-6, in the present embodiment, the deep learning neural network model may include, but is not limited to, a recurrent neural network model, and a three-dimensional convolutional neural network model.

Training the recurrent neural network model includes: constructing a cyclic neural network, acquiring and marking a plurality of image sequences, inputting the marked image sequences into the cyclic neural network, performing data cyclic feedback and transmission in the time evolution direction, and outputting an identification result

Further, a recurrent neural network structure is built, and the structure can be a standard recurrent neural network structure such as RNN, LSTM, GRU and the like. The recurrent neural network is a neural network structure which takes sequence data as input, carries out recurrent feedback and transmission of the data in the time evolution direction and finally outputs the data.

On the output layer, the output result may be a discrete classification result or a continuous regression value, and on the output quantity level, the output structure may be a single result or a continuous result, and a plurality of results are performed according to the time evolution.

In this embodiment, the cyclic neural network structure is trained by using the marked image data, and when the real-time sequence is continuously input into the cyclic neural network structure, the structure can continuously output the classification result of the input sequence corresponding to each time scale for a fixed time length in real time, so that in the real-time image acquisition process, whether the actual physical phenomenon represented by a series of image sequences corresponding to the current time is in a writing state or not can be known through calculation of the cyclic neural network.

The recurrent neural network can output results that are only related to past time data at a certain time, without having to obtain all the data over this time period.

Training a three-dimensional convolutional neural network model comprises: and constructing a three-dimensional convolutional neural network, acquiring a plurality of image sequences, integrally inputting the plurality of image sequences as three-dimensional data into the three-dimensional convolutional neural network, taking the time axis as a third dimension, and outputting an identification result of any scale on the time axis.

The three-dimensional convolutional neural network of the embodiment takes a time axis as a third dimension, and can also complete the working content of the recurrent neural network. The deep learning neural network does not emphasize the relationship between the time evolution direction and the data input, and in the three-dimensional convolutional neural network of the embodiment, the image sequence is input into the three-dimensional convolutional neural network as a three-dimensional data whole. Logic information in the whole data is extracted by designing a three-dimensional convolution kernel, and the output result can be a discrete classification result or a continuous regression value. The output structure may also be the result over multiple time scales.

When the same set of data, for example, the picture size is consistent, the content is consistent, the time and the length are consistent, the output of the recognition result can be performed on any time scale in the time period after all the data in the time period are obtained, so that the data at each moment may cause a delay problem, but correspondingly, higher accuracy may be brought.

Example two

Referring to fig. 7, the present embodiment provides an artificial intelligence based writing behavior recognition system, which employs an artificial intelligence based writing behavior recognition method according to an embodiment, and the system includes: the device comprises an acquisition module 100, an extraction module 200 and a processing module 300.

The acquisition module 100 is used for acquiring the acquired image data which needs to be subjected to writing behavior recognition.

The extraction module 200 is configured to identify a deep learning neural network model through an image target of a convolutional neural network based on a densenert structure, or identify a main network based on a conventional image, locate a writing tool in the image data, and extract the writing tool and an image sequence of a region near the writing tool frame by frame according to a time sequence.

The extraction module 200 includes a first extraction module 200 and a second extraction module 200, the first extraction module 200 is configured to obtain a group of continuous image data frame by frame according to a time sequence; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus; the second extraction module 200 is configured to extract a group of consecutive image data frame by frame in time order; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

The processing module 300 is configured to perform writing behavior recognition on the extracted image sequence by using a preset deep learning neural network model.

The present embodiment provides an apparatus for recognizing a writing behavior based on artificial intelligence, which includes an image pickup device 11, an AI intelligence device 10, and an output device, as shown in fig. 8, by applying the method for recognizing a writing behavior based on artificial intelligence in the first embodiment. In one embodiment, the image pickup device 11 and the output device are embedded in the AI smart device 10.

The camera device 11 is used for collecting image data that needs to be recognized by writing behavior.

The AI intelligent device 10 is configured to obtain acquired image data that needs to be subjected to writing behavior recognition; extracting the writing tool and an image sequence of a region near the writing tool from the image data according to a time sequence; further comprising acquiring a set of consecutive image data in chronological order; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus; and extracting a set of consecutive image data in chronological order; recognizing writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; positioning an interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area; and further utilizing a preset deep learning neural network model to identify the writing behaviors of the extracted image sequence.

The output device is used for outputting the recognition result of the AI intelligent device and outputting the recognition result in a voice or image/video mode.

The output device in the artificial intelligence based writing behavior recognition apparatus provided in this embodiment may be a display device, such as the display 12 or a display tool including an indicator light, for indicating that a writing behavior is currently performed.

An embodiment of the present invention provides a terminal device, including: the method comprises the following steps: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the program, the method for recognizing the writing behavior based on artificial intelligence is realized.

An embodiment of the present invention proposes a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the artificial intelligence based recognition method of writing behavior as in the preceding embodiments.

Furthermore, the embodiment of the present invention further provides a computer program product, which, when executed by an instruction processor in the computer program product, executes the method for recognizing writing behavior based on artificial intelligence as in the foregoing embodiment.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated into one processing module 300, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. The method for recognizing the writing behavior based on the artificial intelligence is characterized by comprising the following steps:

acquiring acquired image data needing to be subjected to writing behavior recognition in real time;

positioning a writing tool in the image data through an image target recognition deep learning neural network model based on a convolutional neural network of a Densenet structure or an image recognition main network based on a Resnet or vgg or dark net structure, and extracting the writing tool and an image sequence of a region near the writing tool frame by frame according to a time sequence;

and recognizing and outputting the writing behavior state of the extracted image sequence by using a deep learning neural network model based on a Densenet structure.

2. The artificial intelligence based recognition method of writing behavior according to claim 1, wherein the acquiring of the image data required to perform recognition of writing behavior in real time further comprises:

the method comprises the steps of obtaining image data collected by image collecting equipment in real time, wherein the image data comprises a plurality of interactive bodies.

3. The artificial intelligence based recognition method of writing behavior according to claim 2, wherein an image sequence of a writing instrument is extracted in chronological order from the image data, further comprising:

acquiring a group of continuous image data frame by frame according to a time sequence; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus.

4. A method for artificial intelligence based recognition of writing behavior as defined in claim 3, wherein the extracting a sequence of images of a region near a writing instrument frame by frame in time order for the image data further comprises:

extracting a set of continuous image data frame by frame in time order; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

5. The artificial intelligence based recognition method of writing behavior according to claim 4, wherein the deep learning neural network model employs a recurrent neural network model, and the training of the recurrent neural network model comprises:

6. The artificial intelligence based recognition method of writing behavior according to claim 4, wherein the deep learning neural network model employs a three-dimensional convolutional neural network model, and training the three-dimensional convolutional neural network model comprises:

7. An artificial intelligence based recognition apparatus of writing behavior, characterized in that the artificial intelligence based recognition method of writing behavior according to any one of claims 1 to 6 is adopted, the apparatus comprising: an acquisition module, an extraction module and a processing module,

the acquisition module is used for acquiring acquired image data needing writing behavior recognition in real time;

the extraction module is used for identifying a deep learning neural network model through an image target of a convolutional neural network based on a densenert structure or identifying a main network based on a conventional image, positioning a writing tool in the image data, and extracting the writing tool and an image sequence of an area near the writing tool frame by frame according to a time sequence;

and the processing module is used for recognizing and outputting the writing behaviors of the extracted image sequence by utilizing a deep learning neural network model based on a densenert structure.

8. The artificial intelligence based recognition apparatus of a writing behavior according to claim 7, wherein the extraction module includes a first extraction module and a second extraction module,

the first extraction module is used for acquiring a group of continuous image data frame by frame according to a time sequence; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; positioning the interactive focus of each frame of image data, extracting an image sequence of the interactive focus, and acquiring a motion track of the interactive focus;

the second extraction module is used for extracting a group of continuous image data frame by frame according to a time sequence; identifying writing tools in a plurality of interaction bodies in each frame of image data, and taking the writing tools as interaction focuses; and positioning the interactive focus of each frame of image data, extracting a focus area image in a preset range near the interactive focus, and acquiring an image sequence of the interactive focus area.

9. A terminal device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the artificial intelligence based recognition method of writing behavior as claimed in any one of claims 1 to 6 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for artificial intelligence based recognition of writing behavior as claimed in any one of claims 1 to 6.