CN108764070A

CN108764070A - A kind of stroke dividing method and calligraphic copying guidance method based on writing video

Info

Publication number: CN108764070A
Application number: CN201810446094.3A
Authority: CN
Inventors: 龚晓庆; 许鹏飞; 刘宝英; 陈�峰; 郭军; 肖云; 陈晓江; 房鼎益
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2018-05-11
Filing date: 2018-05-11
Publication date: 2018-11-06
Anticipated expiration: 2038-05-11
Also published as: CN108764070B

Abstract

The invention discloses a stroke segmentation method based on writing video and a calligraphy copying guidance method, wherein, based on the stroke segmentation method of writing video, the convolutional neural network is redesigned, and the convolutional neural network and the cyclic neural network are combined to greatly improve The parameters of the neural network are reduced, and the training speed is accelerated; and the ability to express the spatial characteristics of the video frame is enhanced, and at the same time, the time-domain motion information of calligraphy writing is extracted to achieve high-accuracy fine-grained action recognition. The calligraphy copying instruction method of the present invention adopts the above-mentioned stroke segmentation method based on the writing video, and utilizes the stroke video obtained by stroke segmentation to realize accurate guidance on the calligraphy copying process.

Description

A stroke segmentation method and calligraphy copying guidance method based on writing video

技术领域technical field

本发明属于计算机识别技术领域，涉及一种基于书写视频的笔画分割方法及书法临摹指导方法。The invention belongs to the technical field of computer recognition, and relates to a stroke segmentation method based on writing video and a calligraphy copying instruction method.

背景技术Background technique

中国书法是一种独特的视觉艺术，是数千年中国文化的结晶，对中国传统文化的传承起到重要的作用。随着中国对其传统文化的逐渐重视，越来越多的中国人以及外国人开始学习中国书法。而书法学习的基础在于掌握书写过程中的运笔之法，正所谓“书法之妙，全在运笔”(语出清代康有为)，而在古代亦有书法家蔡邕著有《笔势》一书，足见书法运笔在中国书法书写过程中的重要性。Chinese calligraphy is a unique visual art, the crystallization of thousands of years of Chinese culture, and plays an important role in the inheritance of Chinese traditional culture. As China pays more and more attention to its traditional culture, more and more Chinese and foreigners begin to learn Chinese calligraphy. The basis of calligraphy learning lies in mastering the method of using the brush in the process of writing. As the saying goes, "The magic of calligraphy lies in the use of the brush" (from Kang Youwei in the Qing Dynasty). , which shows the importance of calligraphy strokes in the process of Chinese calligraphy writing.

书法初学者常见的学习方法是临摹学习，在此过程中，首先需要掌握的就是基本笔画的运笔方法。然而，目前中国书法严重缺乏教学资源，导致很多书法学习者难以获得优质的书法教学，这将成为严重阻碍中国书法的学习推广的主要因素之一。为了对书法学习者进行指导，采用对书法动作进行识别的方法实现。The common learning method for beginners of calligraphy is to learn by copying. In this process, the first thing that needs to be mastered is the method of using the basic strokes. However, at present, there is a serious lack of teaching resources in Chinese calligraphy, which makes it difficult for many calligraphy learners to obtain high-quality calligraphy teaching, which will become one of the main factors that seriously hinder the learning and promotion of Chinese calligraphy. In order to guide calligraphy learners, the method of recognizing calligraphy actions is adopted.

现有技术中缺少对书法书写动作识别的相关工作(无论是针对静态图像还是动态视频)。与之相类似的工作有基于图像或视频的人体行为或动作的识别，主要包括传统机器学习方法和深度学习方法。随着深度学习方法的广泛应用，大量基于卷积神经网络(CNN)的方法用于静态视频帧中的动作识别，相对于传统机器学习方法，识别效果得到显著提升。Andrej等人在CNN的框架中加入时间卷积，来获取时域上的局部运动信息，相对于静态CNN的识别方法提高了2％。但由于该时间卷积只能获取短暂数帧的时域信息，而在运动幅度小的视频中，几帧图像的静态信息几乎一致，难以达到获取时域运动信息的目的，因此此类方法对于动作变化细微的视频，识别精度有限。There is a lack of related work on calligraphy writing action recognition in the prior art (whether it is for static images or dynamic videos). Similar works include recognition of human behavior or actions based on images or videos, mainly including traditional machine learning methods and deep learning methods. With the wide application of deep learning methods, a large number of methods based on convolutional neural networks (CNN) are used for action recognition in static video frames. Compared with traditional machine learning methods, the recognition effect has been significantly improved. Andrej et al. added time convolution to the CNN framework to obtain local motion information in the time domain, which improved by 2% compared to the static CNN recognition method. However, since the time convolution can only obtain temporal information of a few short frames, and in videos with small motion ranges, the static information of several frames of images is almost the same, and it is difficult to obtain temporal motion information. For videos with subtle motion changes, the recognition accuracy is limited.

发明内容Contents of the invention

针对现有技术中存在的问题，本发明的目的在于，提供一种基于书写视频的笔画分割方法，该方法能够对书写者的提笔和落笔工作进行准确分割，实现笔画分割的目的。Aiming at the problems existing in the prior art, the object of the present invention is to provide a stroke segmentation method based on writing video, which can accurately segment the writer's work of raising and lowering the pen, so as to achieve the purpose of stroke segmentation.

为了实现上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种基于书写视频的笔画分割方法，该方法用于将书写视频按照笔画分割成多个笔画视频，包括以下步骤：A stroke segmentation method based on writing video, the method is used to divide the writing video into multiple stroke videos according to strokes, comprising the following steps:

步骤1，获取书写单个字的书写视频，该书写视频包括多帧图像；将该多帧图像分成多个视频组，每个视频组包括n帧连续的图像；Step 1, obtain the writing video of writing single word, this writing video comprises multi-frame image; This multi-frame image is divided into a plurality of video groups, and each video group comprises n frames of continuous images;

步骤2，针对每个视频组，将视频组包括的所有图像输入到卷积神经网络中，卷积神经网络输出该视频组对应的图像空间特征向量；该卷积神经网络包括两个第一卷积组、两个第二卷积组、三个全连接层和一个softmax层，其中，第一卷积组包括按照数据传输方向依次连接的两个卷积层和一个池化层，第二卷积组包括按照数据传输方向依次连接的三个卷积层和一个池化层；三个全连接层和一个softmax层按照数据传输方向依次连接，三个全连接层和一个softmax层为卷积神经网络的最后四层；Step 2, for each video group, input all images included in the video group into the convolutional neural network, and the convolutional neural network outputs the image space feature vector corresponding to the video group; the convolutional neural network includes two first volumes Product group, two second convolutional groups, three fully connected layers and a softmax layer, wherein the first convolutional group includes two convolutional layers and a pooling layer connected in sequence according to the direction of data transmission, the second convolutional layer The product group includes three convolutional layers and a pooling layer connected in sequence according to the direction of data transmission; three fully connected layers and a softmax layer are connected in sequence according to the direction of data transmission, and three fully connected layers and a softmax layer are convolution neural networks. The last four layers of the network;

步骤3，将视频组对应的图像空间特征向量输入到循环神经网络中，循环神经网络输出该视频组的状态为书写状态或者非书写状态；Step 3, input the image space feature vector corresponding to the video group into the recurrent neural network, and the recurrent neural network outputs the state of the video group as writing state or non-writing state;

步骤4，在状态为非书写状态的所有视频组中选取相邻的两个视频组，该选取的相邻的两个视频组之间的所有视频组组合在一起形成一个笔画视频；状态为非书写状态的所有视频组中的第一个视频组之前的所有视频组组合在一起形成一个笔画视频；状态为非书写状态的所有视频组中的最后一个视频组之后的所有视频组组合在一起形成一个笔画视频。Step 4, select two adjacent video groups in all video groups whose state is a non-writing state, and all video groups between the selected adjacent two video groups are combined to form a stroke video; the state is non-writing All video groups before the first video group in the writing state are combined to form a stroke video; all video groups in the non-writing state after the last video group are combined to form a stroke video A stroke video.

可选地，所述两个第一卷积组、两个第二卷积组、三个全连接层和一个softmax层按照数据传输方向依次连接。Optionally, the two first convolution groups, the two second convolution groups, the three fully connected layers and the one softmax layer are sequentially connected according to the data transmission direction.

可选地，所述两个第一卷积组和两个第二卷积组中的所有卷积层的卷积窗口的大小均为3×3，所有池化层的的池化窗口均为2×2；Optionally, the convolution windows of all convolution layers in the two first convolution groups and the two second convolution groups are 3×3, and the pool windows of all pool layers are 2×2;

第一个第一卷积组中的卷积层的卷积核的个数为64，第二个第一卷积组的卷积层的卷积核的个数为128，第一个第二卷积组中的卷积层的卷积核的个数为256，第二个第二卷积组的卷积层的卷积核的个数为512。The number of convolution kernels of the convolution layer in the first first convolution group is 64, the number of convolution kernels of the convolution layer of the second first convolution group is 128, and the first second The number of convolution kernels of the convolution layer in the convolution group is 256, and the number of convolution kernels of the convolution layer of the second convolution group is 512.

本发明还提供一种书法临摹指导方法，包括以下步骤：The present invention also provides a calligraphy copying instruction method, comprising the following steps:

步骤1，将临摹者的书写视频和标准的书写视频分别分割成多个临摹笔画视频和多个标准笔画视频；其中，临摹者的书写视频和标准的书写视频为同一个字的书写视频，多个临摹笔画视频和多个标准笔画视频按照笔画一一对应；Step 1, the writing video of the imitator and the standard writing video are divided into a plurality of copying stroke videos and a plurality of standard stroke videos respectively; wherein, the writing video of the imitator and the standard writing video are writing videos of the same word, more There is a one-to-one correspondence between a copy stroke video and multiple standard stroke videos according to the strokes;

步骤2，对临摹笔画视频和标准笔画视频均采用轨迹跟踪方法进行处理，分别得到临摹笔画视频对应的轨迹坐标序列和标准笔画视频对应的轨迹坐标序列；Step 2, all adopt trajectory tracking method to copy stroke video and standard stroke video to process, respectively obtain the corresponding trajectory coordinate sequence of copying stroke video and the corresponding trajectory coordinate sequence of standard stroke video;

步骤3，利用动态时间规整方法计算临摹笔画视频的轨迹坐标序列和与该临摹笔画视频对应的标准笔画视频的轨迹坐标序列之间的相似度；Step 3, utilize the dynamic time warping method to calculate the similarity between the track coordinate sequence of the copying stroke video and the track coordinate sequence of the standard stroke video corresponding to this copying stroke video;

步骤4，步骤3得到的所有相似度组合形成一个特征向量，将该特征向量输入到线性回归模型中，该线性回归模型输出评分值；Step 4, all similarities obtained in step 3 are combined to form a feature vector, and the feature vector is input into the linear regression model, and the linear regression model outputs a score value;

所述步骤1中的将临摹者的书写视频和标准的书写视频分别分割成多个临摹笔画视频和多个标准笔画视频，根据上述的基于书写视频的笔画分割方法得到。In the step 1, the copying person's writing video and the standard writing video are respectively divided into a plurality of copying stroke videos and a plurality of standard stroke videos, which are obtained according to the above-mentioned stroke segmentation method based on writing videos.

与现有技术相比，本发明具有以下技术效果：本发明的基于书写视频的笔画分割方法，重新设计卷积神经网络，将卷积神将网络和循环神经网络相结合，大大减少了神经网络的参数，加快了训练速度；并且增强了对视频帧的空间特征的表达能力，同时又提取了书法字书写的时域运动信息，实现高准确率的细粒度动作识别。本发明的书法临摹指导方法，采用上述基于书写视频的笔画分割方法，利用笔画分割得到的笔画视频，能够实现对书法临摹过程的精确指导。Compared with the prior art, the present invention has the following technical effects: the stroke segmentation method based on the writing video of the present invention redesigns the convolutional neural network, combines the convolutional neural network and the cyclic neural network, and greatly reduces the number of neural networks. The parameters speed up the training speed; and enhance the ability to express the spatial characteristics of the video frame, and at the same time extract the time-domain motion information of calligraphy writing to achieve high-accuracy fine-grained action recognition. The calligraphy copying instruction method of the present invention adopts the above-mentioned stroke segmentation method based on the writing video, and utilizes the stroke video obtained by stroke segmentation to realize accurate guidance on the calligraphy copying process.

下面结合附图和具体实施方式对本发明的方案作进一步详细地解释和说明。The solution of the present invention will be further explained and described in detail in conjunction with the accompanying drawings and specific embodiments.

附图说明Description of drawings

图1是本发明的基于书写视频的笔画分割方法的流程图。Fig. 1 is a flowchart of the stroke segmentation method based on writing video of the present invention.

具体实施方式Detailed ways

本发明公开了一种基于书写视频的笔画分割方法，参见图1，该方法用于将书写视频按照笔画分割成多个笔画视频，具体包括以下步骤The present invention discloses a method for segmenting strokes based on writing videos, as shown in Figure 1. The method is used to segment writing videos into multiple stroke videos according to strokes, and specifically includes the following steps

步骤1，获取书写单个字的书写视频，该书写视频包括多帧图像；将该多帧图像分成多个视频组，每个视频组包括n帧连续的图像，在本实施例中，n＝5。Step 1, obtain the writing video of writing a single word, this writing video comprises multi-frame image; This multi-frame image is divided into a plurality of video groups, each video group comprises n frames of continuous images, in this embodiment, n=5 .

步骤2，针对每个视频组，将视频组包括的所有图像输入到卷积神经网络中，卷积神经网络输出该视频组对应的图像空间特征向量；该卷积神经网络包括两个第一卷积组、两个第二卷积组、三个全连接层和一个softmax层，其中，第一卷积组包括按照数据传输方向依次连接的两个卷积层和一个池化层，第二卷积组包括按照数据传输方向依次连接的三个卷积层和一个池化层；三个全连接层和一个softmax层按照数据传输方向依次连接，三个全连接层和一个softmax层为卷积神经网络的最后四层。Step 2, for each video group, input all images included in the video group into the convolutional neural network, and the convolutional neural network outputs the image space feature vector corresponding to the video group; the convolutional neural network includes two first volumes Product group, two second convolutional groups, three fully connected layers and a softmax layer, wherein the first convolutional group includes two convolutional layers and a pooling layer connected in sequence according to the direction of data transmission, the second convolutional layer The product group includes three convolutional layers and a pooling layer connected in sequence according to the direction of data transmission; three fully connected layers and a softmax layer are connected in sequence according to the direction of data transmission, and three fully connected layers and a softmax layer are convolution neural networks. The last four layers of the network.

步骤3，将视频组对应的图像空间特征向量输入到循环神经网络中，循环神经网络输出该视频组的状态为书写状态或者非书写状态。In step 3, the image space feature vector corresponding to the video group is input into the cyclic neural network, and the cyclic neural network outputs the state of the video group as a writing state or a non-writing state.

本发明将大的卷积核分解为若干个小型卷积核叠加。使用多个较小的卷积核的卷积层代替一个卷积核较大的卷积层，使得网络的层次更深，不仅可以减少参数，并且进行了更多的非线性映射，可以增加网络的拟合表达能力，不仅可以提取运笔的整体运动信息，并且也可以提取笔锋的细节特征。The present invention decomposes a large convolution kernel into several small convolution kernels for superposition. Using multiple convolutional layers with smaller convolutional kernels instead of one convolutional layer with larger convolutional kernels makes the network deeper, which not only reduces parameters, but also performs more nonlinear mappings, which can increase the network. Fitting expression ability can not only extract the overall movement information of the brush stroke, but also extract the detailed features of the stroke.

优选地，在又一实施例中，两个第一卷积组、两个第二卷积组、三个全连接层和一个softmax层按照数据传输方向依次连接。通过多个小窗口的卷积层组合代替传统的大窗口卷积，使得网络的层次更深，对书法中细微运笔姿态的空间特征表达能力更强，另一方面，有效的减少了整个网络的参数，效率更高。Preferably, in yet another embodiment, two first convolutional groups, two second convolutional groups, three fully connected layers and one softmax layer are sequentially connected according to the data transmission direction. The traditional large-window convolution is replaced by the combination of multiple small-window convolution layers, which makes the network layer deeper and has a stronger ability to express the spatial characteristics of the subtle gestures of calligraphy. On the other hand, it effectively reduces the parameters of the entire network. ,higher efficiency.

可选地，在又一实施例中，两个第一卷积组和两个第二卷积组中的所有卷积层的卷积窗口的大小均为3×3，所有池化层的的池化窗口均为2×2；第一个第一卷积组中的卷积层的卷积核的个数为64，第二个第一卷积组的卷积层的卷积核的个数为128，第一个第二卷积组中的卷积层的卷积核的个数为256，第二个第二卷积组的卷积层的卷积核的个数为512。Optionally, in yet another embodiment, the convolution windows of all the convolutional layers in the two first convolutional groups and the two second convolutional groups are both 3×3, and the sizes of all pooling layers The pooling windows are all 2×2; the number of convolution kernels of the convolution layer in the first first convolution group is 64, and the number of convolution kernels of the convolution layer of the second first convolution group is The number is 128, the number of convolution kernels of the convolution layer in the first second convolution group is 256, and the number of convolution kernels of the convolution layer of the second second convolution group is 512.

本发明还公开了一种书法临摹指导方法，包括以下步骤：The invention also discloses a calligraphy copying instruction method, comprising the following steps:

步骤1，将临摹者的书写视频和标准的书写视频分别分割成多个临摹笔画视频和多个标准笔画视频；其中，临摹者的书写视频和标准的书写视频为同一个字的书写视频，多个临摹笔画视频和多个标准笔画视频按照笔画一一对应。Step 1, the writing video of the imitator and the standard writing video are divided into a plurality of copying stroke videos and a plurality of standard stroke videos respectively; wherein, the writing video of the imitator and the standard writing video are writing videos of the same word, more There is a one-to-one correspondence between a copy stroke video and a plurality of standard stroke videos according to the strokes.

步骤2，对临摹笔画视频和标准笔画视频均采用轨迹跟踪方法进行处理，分别得到临摹笔画视频对应的轨迹坐标序列和标准笔画视频对应的轨迹坐标序列；此处的轨迹跟踪方法采用TLD轨迹跟踪算法。Step 2, the video of copied strokes and the video of standard strokes are processed by the trajectory tracking method, and the corresponding trajectory coordinate sequence of the copied stroke video and the corresponding trajectory coordinate sequence of the standard stroke video are obtained respectively; the trajectory tracking method here adopts the TLD trajectory tracking algorithm .

步骤3，利用动态时间规整方法(DTW)计算临摹笔画视频的轨迹坐标序列和与该临摹笔画视频对应的标准笔画视频的轨迹坐标序列之间的相似度。Step 3, using the dynamic time warping method (DTW) to calculate the similarity between the trajectory coordinate sequence of the copied stroke video and the trajectory coordinate sequence of the standard stroke video corresponding to the copied stroke video.

步骤4，步骤3得到的所有相似度组合形成一个特征向量，将该特征向量输入到线性回归模型中，该线性回归模型输出评分值。In step 4, all similarities obtained in step 3 are combined to form a feature vector, which is input into a linear regression model, and the linear regression model outputs a score value.

上述步骤1中将临摹者的书写视频和标准的书写视频分别分割成多个临摹笔画视频和多个标准笔画视频，根据上述基于书写视频的笔画分割方法得到。In the above step 1, the copying person's writing video and the standard writing video are respectively divided into a plurality of copying stroke videos and a plurality of standard stroke videos, which are obtained according to the above-mentioned stroke segmentation method based on the writing video.

实施例Example

本实施例建立了一个书法书写视频数据库，并基于该数据库对本发明的方法，以及现有方法进行测试。书法视频数据库的数据来源主要是浙江大学于钟华老师的魔鬼训练营的1位书法教师和6名学员在进行书法临摹书写时录制的高清视频。共录制了此7位书法书写者分别书写“十，古，大，夫，内，上，不，兄，左，子”等汉字的颜体楷书，这十个字是学员开始练习书法时经常练习书写的汉字，并且每人书写每个字20遍，因此完整字书写的片段共计有7*10*20＝1400个完整的单字书写视频段，视频帧的分辨率为1920*1080，帧频率为50帧/秒。基于这些视频数据，分别进行书法笔画分割为笔画书写子视频，TLD进行毛笔轨迹的追踪，DTW进行毛笔轨迹特征的转换，线性回归进行书法轨迹的评估等任务。In this embodiment, a calligraphic writing video database is established, and the method of the present invention and existing methods are tested based on the database. The data source of the calligraphy video database is mainly the high-definition videos recorded by a calligraphy teacher and 6 students in the devil training camp of Mr. Yu Zhonghua of Zhejiang University when they were copying and writing calligraphy. A total of 7 calligraphers have recorded the Yan Ti regular script of Chinese characters such as "Shi, Gu, Da, Fu, Nei, Shang, No, Brother, Zuo, Zi". These ten characters are often practiced by students when they start practicing calligraphy. written Chinese characters, and each person writes each word 20 times, so there are 7*10*20=1400 complete single-character writing video segments in total, the video frame resolution is 1920*1080, and the frame frequency is 50 frames per second. Based on these video data, calligraphic strokes are segmented into stroke writing sub-videos, TLD is used to track brush trajectories, DTW is used to convert features of brush trajectories, and linear regression is used to evaluate calligraphy trajectories.

(1)将书写视频分割成笔画视频(1) Divide writing video into stroke video

对10个汉字的1400多个视频分别应用本发明的方法(MCNN-LSTM)，CNN和CNN-LSTM来识别每个完整字的每个笔画之间的切换时间点，在这些汉中的书写过程中，一个落笔与一个提笔之间的过程就是书写一个笔画的过程；然后按照笔画分割的结果将完整字的视频划分为若干个笔画的子视频。其中训练数据是从1400个视频中随机选择300个视频，然后标记落笔视频帧序列，书写过程帧序列以及提笔帧序列,剩余其他视频作为测试数据。在测试中，CNN的输入是单个帧数据，进行视频帧识别，CNN-LSTM和MCNN-LSTM的输入是连续5帧的帧序列，进行连续帧序列的识别，及提笔或落笔的动作帧序列识别。MCNN-LSTM的卷积层结构设计如前文所述，LSTM隐藏层采用256个神经元，同样采用随机梯度下降法采用随机梯度下降法，bath size为50，学习率为0.001。测试实验结果如表1所示：Apply the method of the present invention (MCNN-LSTM) to more than 1400 videos of 10 Chinese characters, CNN and CNN-LSTM to identify the switching time points between each stroke of each complete character, during the writing process of these Chinese characters , the process between a pen down and a pen up is the process of writing a stroke; then according to the result of the stroke segmentation, the video of the complete character is divided into sub-videos of several strokes. The training data is to randomly select 300 videos from 1400 videos, and then mark the pen-down video frame sequence, the writing process frame sequence and the pen-lifting frame sequence, and the remaining other videos are used as test data. In the test, the input of CNN is a single frame data for video frame recognition, the input of CNN-LSTM and MCNN-LSTM is a frame sequence of 5 consecutive frames, for the recognition of continuous frame sequence, and the action frame sequence of pen lifting or pen down identify. The convolutional layer structure design of MCNN-LSTM is as mentioned above. The hidden layer of LSTM uses 256 neurons, and the stochastic gradient descent method is also used. The bath size is 50 and the learning rate is 0.001. The test results are shown in Table 1:

从表1中数据可以看出，CNN的识别率有两行数据，其原因在于此CNN主要是进行单帧识别的，因此第一行是利用大约1万帧作为训练，剩余的3万帧作为测试的帧识别率。而第二，三，四行是经过帧标签核对，计算得出的连续帧序列的识别率，第二行是连续五帧中五帧全对的视频段识别率，第三行是连续五帧中只要有4帧识别正确的视频段识别率，第四行是连续五帧中只要有3帧识别正确的视频段识别率。从实验数据可以看出单纯的卷积神经网络难以在具有时序信息的视频段识别中取得较好的效果，只有连续帧中60％的帧识别正确的视频段识别率才接近具有时序特征的LSTM的识别结果。相比之下，将连续帧序列经过卷积神经网络得到的特征作为LSTM的输入，然后进行视频段的识别，在原有的基础上，能够得到很大的性能提升，可见LSTM在具有时序特征的信号处理方法具有优良的特性。而相比之下，本文设计的MCNN具有多层小卷积核叠加，相对单纯的大卷积核或者单纯的小卷积核，MCNN能够更好地捕获细节和整体运动特征，再将其与LSTM结合。因此，MCNN-LSTM能够取得更好的视频帧序列的识别效果。从实验数据可以看出，虽然MCNN-LSTM在10个字中与其他方法的识别率差异参差不齐，但是总体上全线优于对比方法。It can be seen from the data in Table 1 that the recognition rate of CNN has two rows of data. The reason is that this CNN mainly performs single-frame recognition, so the first row uses about 10,000 frames as training, and the remaining 30,000 frames as The frame recognition rate of the test. The second, third, and fourth lines are the recognition rate of the continuous frame sequence calculated after frame label checking, the second line is the video segment recognition rate of five consecutive frames in five consecutive frames, and the third line is five consecutive frames As long as there are 4 frames in the correct video segment recognition rate, the fourth row is the video segment recognition rate as long as 3 frames are correctly recognized in five consecutive frames. From the experimental data, it can be seen that the pure convolutional neural network is difficult to achieve good results in video segment recognition with timing information. Only 60% of the frames in the continuous frames are correctly recognized. The video segment recognition rate is close to the LSTM with timing characteristics. recognition results. In contrast, the features obtained by the continuous frame sequence through the convolutional neural network are used as the input of the LSTM, and then the recognition of the video segment is performed. On the basis of the original, a great performance improvement can be obtained. The signal processing method has excellent properties. In contrast, the MCNN designed in this paper has multiple layers of small convolution kernels superimposed. Compared with simple large convolution kernels or simple small convolution kernels, MCNN can better capture details and overall motion features, and then combine it with LSTMs combined. Therefore, MCNN-LSTM can achieve better recognition effect of video frame sequence. It can be seen from the experimental data that although the difference in the recognition rate between MCNN-LSTM and other methods in 10 words is uneven, it is generally better than the comparison method across the board.

表1Table 1

(2)对书法临摹进行指导(2) Guidance on calligraphy copying

利用MCNN-LSTM对完整字的视频按笔画划分为笔画子视频后，再利用FasterRCNN-TLD在这些子视频中进行毛笔轨迹的追踪，得到每个人在写每个字的每个笔画的轨迹信息。然后将该轨迹信息转换为线性回归模型中所需要的轨迹相依相似度特征信息，并结合书法老师对每个学员书写字过程的评分以得出每个人在写每个笔画时的得分以及写整个字的得分。其中表2显示的是六位学员在书写“大”时每个笔画的评分和整个字的评分(该评分是每个人写20遍的评分的均值)与人工评分的对比，其他字的评价与之类似。该人工评分是他们的老师，即模板书写者对各位学员书写过程的评价。从表中数据可以看出，本文提出的评价系统基本符合人工评价。为后期对书法书写过程的自动评价奠定基础。After using MCNN-LSTM to divide the video of complete characters into stroke sub-videos, use FasterRCNN-TLD to track the brush trajectory in these sub-videos, and obtain the trajectory information of each stroke of each person writing each character. Then the trajectory information is converted into the trajectory-dependent similarity feature information required in the linear regression model, and combined with the calligraphy teacher's score on each student's writing process to obtain each person's score when writing each stroke and the entire writing process. word score. Wherein Table 2 shows the comparison between the ratings of each stroke and the ratings of the whole word (the rating is the average value of the ratings written by each person 20 times) and the manual ratings of six students when writing "big", and the ratings of other characters are compared with similar to The manual scoring is their teacher, that is, the template writer's evaluation of each student's writing process. It can be seen from the data in the table that the evaluation system proposed in this paper basically conforms to manual evaluation. It lays the foundation for the automatic evaluation of the calligraphy writing process in the later stage.

表2Table 2

Claims

1. A stroke segmentation method based on writing video, the method is used to divide writing video into a plurality of stroke videos according to strokes, it is characterized in that, comprising the following steps:

Step 1, obtain the writing video of writing single word, this writing video comprises multi-frame image; This multi-frame image is divided into a plurality of video groups, and each video group comprises n frames of continuous images;

Step 2, for each video group, input all images included in the video group into the convolutional neural network, and the convolutional neural network outputs the image space feature vector corresponding to the video group; the convolutional neural network includes two first volumes Product group, two second convolutional groups, three fully connected layers and a softmax layer, wherein the first convolutional group includes two convolutional layers and a pooling layer connected in sequence according to the direction of data transmission, the second convolutional layer The product group includes three convolutional layers and a pooling layer connected in sequence according to the direction of data transmission; three fully connected layers and a softmax layer are connected in sequence according to the direction of data transmission, and three fully connected layers and a softmax layer are convolution neural networks. The last four layers of the network;

Step 3, input the image space feature vector corresponding to the video group into the recurrent neural network, and the recurrent neural network outputs the state of the video group as writing state or non-writing state;

Step 4, select two adjacent video groups in all video groups whose state is a non-writing state, and all video groups between the selected adjacent two video groups are combined to form a stroke video; the state is non-writing All video groups before the first video group in the writing state are combined to form a stroke video; all video groups in the non-writing state after the last video group are combined to form a stroke video A stroke video.

2. the stroke segmentation method based on writing video as claimed in claim 1, is characterized in that, described two first convolution groups, two second convolution groups, three fully connected layers and a softmax layer according to data The transmission direction is connected sequentially.

3. the stroke segmentation method based on writing video as claimed in claim 2, is characterized in that, the size of the convolution window of all convolution layers in described two first convolution groups and two second convolution groups Both are 3×3, and the pooling windows of all pooling layers are 2×2;

The number of convolution kernels of the convolution layer in the first first convolution group is 64, the number of convolution kernels of the convolution layer of the second first convolution group is 128, and the first second The number of convolution kernels of the convolution layer in the convolution group is 256, and the number of convolution kernels of the convolution layer of the second convolution group is 512.

4. A calligraphy copying instruction method, comprising the following steps:

Step 1, the writing video of the imitator and the standard writing video are divided into a plurality of copying stroke videos and a plurality of standard stroke videos respectively; wherein, the writing video of the imitator and the standard writing video are writing videos of the same word, more There is a one-to-one correspondence between a copy stroke video and multiple standard stroke videos according to the strokes;

Step 2, all adopt trajectory tracking method to copy stroke video and standard stroke video to process, respectively obtain the corresponding trajectory coordinate sequence of copying stroke video and the corresponding trajectory coordinate sequence of standard stroke video;

Step 3, utilize the dynamic time warping method to calculate the similarity between the track coordinate sequence of the copying stroke video and the track coordinate sequence of the standard stroke video corresponding to this copying stroke video;

Step 4, all similarities obtained in step 3 are combined to form a feature vector, and the feature vector is input into the linear regression model, and the linear regression model outputs a score value;

In said step 1, the writing video and standard writing video of the imitator are respectively divided into a plurality of copying stroke videos and a plurality of standard stroke videos, according to the writing video based on any one of claims 1-3 The stroke segmentation method is obtained.