CN113591743B

CN113591743B - Handwriting video identification method, system, storage medium and computing device

Info

Publication number: CN113591743B
Application number: CN202110895033.7A
Authority: CN
Inventors: 梁循; 吴佳辰; 黄伟兰
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2023-11-24
Anticipated expiration: 2041-08-04
Also published as: CN113591743A

Abstract

The application relates to a handwriting video identification method, a handwriting video identification system, a storage medium and a computing device, wherein the handwriting video identification method comprises the following steps: acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition. The application can improve the accuracy of the video recognition of the cursive script and calligraphy and can be widely applied to the technical field of video data recognition.

Description

Calligraphy video recognition method, system, storage medium and computing device

技术领域Technical field

本发明涉及一种视频数据识别技术领域，特别是关于一种针对行草书的书法视频识别方法、系统、存储介质及计算设备。The present invention relates to the technical field of video data recognition, and in particular to a calligraphy video recognition method, system, storage medium and computing device for cursive scripts.

背景技术Background technique

行书草书是方便快速连续书写的产物，因此在书写过程中会出现笔画相连、变形、简化，书写潦草等现象，使书法字体字形和一般的简体字楷书写法不同，为手写书法的识别带来了困难。但是这种书写的简化不是随意的，是具有一定之规的，自古以来，草书的书写简化法则都不是固定的，但是在逐渐的演变中，针对某一个字或者某一结构，大家会相互约定俗成地固定下来一种草写的写法。行草书在书体演变中发展出了它独特的技术体系，其中最重要的就是笔序和草书符号的简化。Cursive script is a product that facilitates quick and continuous writing. Therefore, during the writing process, there will be phenomena such as connected strokes, deformation, simplification, and sloppy writing. This makes the calligraphy font shape different from the general simplified regular script writing method, which brings difficulties to the recognition of handwritten calligraphy. . However, this kind of simplification of writing is not arbitrary, it has certain rules. Since ancient times, the rules of simplification of cursive writing have not been fixed. However, in the gradual evolution, everyone will fix it by mutual agreement for a certain character or a certain structure. Come down with a cursive writing method. Cursive script has developed its unique technical system in the evolution of calligraphy style, the most important of which is the simplification of stroke order and cursive script symbols.

笔序的调整使行草书在连续书写的时候更加自然和方便。例如，竖心旁的书写就可以从“先写左点，后写右点，最后写垂露竖”变为“先写短竖,再折笔写短横,然后翻笔向左上,顺势写长竖”。而草书符号即为替代楷书的偏旁部首以简练的符号书写，这些草书部件是历代书法家总结规律，不断发展演变的。在近代于右任先生把这些草书部件的写法归纳成标准草书符号，之后在《草书字法解析》一书中扩展提出了71个偏旁草书符号和355个字根草书符号。普通人在认识行书草书手写书法时也要先了解一些这些先验知识，因此引入行书草书书法笔序和符号的信息在书法识别，尤其是是书法视频的识别中至关重要。视频特征较普通的图像特征包含了图像特征的时序变化，书法书写视频能够更好地反应行书草书书写的笔顺信息。虽然现阶段视频动作识别、视频分类领域已经有很多成型的方法。近年来，神经网络在图像识别、物体检测等计算机视觉任务上取得了几乎超越人类的成果，研究者在视频任务中也越来越多的开始使用神经网络，例如基于三维卷积的神经网络、基于双流的神经网络等。The adjustment of the stroke order makes cursive writing more natural and convenient when writing continuously. For example, the writing next to the vertical center can change from "first write the left dot, then write the right dot, and finally write the vertical dew" to "first write the short vertical, then fold the pen to write the short horizontal, then turn the pen to the upper left, and write with the flow. Long vertical". Cursive script symbols are radicals written in concise symbols that replace regular script. These cursive script components are constantly developed and evolved by the calligraphers of the past dynasties summarizing the rules. In modern times, Mr. Yu Youren summarized the writing methods of these cursive script components into standard cursive script symbols, and later expanded and proposed 71 radical cursive script symbols and 355 radical cursive script symbols in the book "Analysis of Cursive Script Calligraphy". Ordinary people must first understand some of these prior knowledge when understanding cursive handwriting calligraphy. Therefore, introducing information about cursive calligraphy stroke order and symbols is crucial in calligraphy recognition, especially the recognition of calligraphy videos. Video features include temporal changes in image features compared to common image features, and calligraphy writing videos can better reflect the stroke order information of cursive writing. Although there are many established methods in the field of video action recognition and video classification at this stage. In recent years, neural networks have achieved results that almost surpass human performance in computer vision tasks such as image recognition and object detection. Researchers are increasingly using neural networks in video tasks, such as neural networks based on three-dimensional convolution, Neural network based on dual streams, etc.

然而书法视频识别领域的研究并不多见。However, there are few studies in the field of calligraphy video recognition.

发明内容Contents of the invention

针对上述问题，本发明的目的是提供一种书法视频识别方法、系统、存储介质及计算设备，提高了行书草书书法视频识别的准确性。In response to the above problems, the purpose of the present invention is to provide a calligraphy video recognition method, system, storage medium and computing device, which improves the accuracy of cursive calligraphy video recognition.

为实现上述目的，本发明采取以下技术方案：一种书法视频识别方法，其包括：获取初始书法视频数据并处理；收集获取行书草书书写笔序和草书符号的先验知识；结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将所述关键帧图片转化为文本；将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；将所述视频进行向量降维可视化处理，完成分类识别。In order to achieve the above object, the present invention adopts the following technical solution: a calligraphy video recognition method, which includes: obtaining initial calligraphy video data and processing; collecting and obtaining prior knowledge of cursive cursive writing stroke order and cursive symbols; combining the prior knowledge in Extract video key frame video pictures from the initial calligraphy video data, and convert the key frame pictures into text; vectorize the text of the picture to obtain a multi-dimensional vector of each text, and splice and combine the multi-dimensional vectors generated by each picture in time sequence. Form a video vector; perform vector dimensionality reduction and visualization processing on the video to complete classification and recognition.

进一步，所述获取初始书法视频数据并处理，包括：采用爬虫爬取初始书法视频数据；筛选出视频效果清晰且对书写内容没有对文字部分超过预先设定范围遮挡的视频；对筛选出的视频中的单字视频进行截取。Further, the acquisition and processing of the initial calligraphy video data include: using a crawler to crawl the initial calligraphy video data; filtering out videos that have clear video effects and do not block the written content beyond the preset range; and filtering the filtered videos. Extract the single-word video in the video.

进一步，所述先验知识，包括：在行书草书中与楷书书写方式不同的笔序信息。Further, the prior knowledge includes: stroke order information in cursive script that is different from regular script writing.

进一步，所述结合先验知识在初始书法视频数据中提取视频关键帧视频图片，包括：调用opencv包，按预先设定间隔截取视频帧，并保存为图片；按照行书和草书的笔序信息和草书符号，获取关键帧在视频里的大概进度位置；按照关键帧对每个书法视频自动筛选得到固定数量的关键帧视频图片。Further, the method of extracting video key frame video pictures from the initial calligraphy video data by combining prior knowledge includes: calling the opencv package, intercepting video frames at preset intervals, and saving them as pictures; and according to the stroke order information of running script and cursive script. Cursive symbols, obtain the approximate progress position of key frames in the video; automatically filter each calligraphy video according to the key frames to obtain a fixed number of key frame video pictures.

进一步，所述将所述关键帧图片转化为文本，包括：将每张图片的像素点的特征信息转化为文本存储；将图片标准化为固定长度和宽度，进行灰度化处理；提取图片的图像数值矩阵并生成其转置矩阵，将图像数值矩阵及其转置矩阵进行拼接，得到图片的文本。Further, converting the key frame pictures into text includes: converting the characteristic information of the pixel points of each picture into text storage; standardizing the picture to a fixed length and width, and performing grayscale processing; extracting the image of the picture Numeric matrix and generate its transpose matrix, concatenate the image numerical matrix and its transpose matrix to obtain the text of the picture.

进一步，所述组合形成视频向量，包括：调用gensim包，采用Doc2Vec文档嵌入模型实现图片文本的向量化，并预先设定好文本向量的长度及窗口参数；遍历向量维度和窗口参数，为Doc2Vec文档嵌入模型确定最优参数；将同一个视频每张图片生成的向量按照时序顺序进行拼接，组合形成视频向量。Further, the combination forms a video vector, including: calling the gensim package, using the Doc2Vec document embedding model to vectorize the image text, and pre-setting the length and window parameters of the text vector; traversing the vector dimensions and window parameters to form a Doc2Vec document The embedding model determines the optimal parameters; the vectors generated from each picture of the same video are spliced in chronological order and combined to form a video vector.

进一步，所述将所述视频进行向量降维可视化处理，包括：对所述视频向量进行流形学习，进行降维可视化，将高维矩阵转化为二维向量组，将每个文档看作一个散点，绘制成图；在降维结果的图上得到同一个字的向量聚集在图上相近的地方，根据得到的图完成分类识别。Further, performing vector dimensionality reduction and visualization processing on the video includes: performing manifold learning on the video vector, performing dimensionality reduction visualization, converting a high-dimensional matrix into a two-dimensional vector group, and treating each document as a Scatter points are drawn into a graph; the vectors of the same word obtained on the dimensionality reduction result graph are gathered in similar places on the graph, and classification and recognition are completed based on the obtained graph.

一种书法视频识别系统，其包括：初始数据获取模块、先验知识收集模块、文本转化模块、向量化模块和识别模块；所述初始数据获取模块，用于获取初始书法视频数据并处理；所述先验知识收集模块，用于收集获取行书草书书写笔序和草书符号的先验知识；所述文本转化模块，结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将所述关键帧图片转化为文本；所述向量化模块，将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；所述识别模块，将所述视频进行向量降维可视化处理，完成分类识别。A calligraphy video recognition system, which includes: an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and a recognition module; the initial data acquisition module is used to acquire initial calligraphy video data and process it; The prior knowledge collection module is used to collect and obtain prior knowledge of cursive writing stroke order and cursive symbols; the text conversion module combines prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and converts the Key frame pictures are converted into text; the vectorization module vectorizes the text of the picture to obtain a multi-dimensional vector of each text, splices the multi-dimensional vectors generated by each picture in time sequence, and combines them to form a video vector; the identification module, The video is subjected to vector dimensionality reduction and visualization processing to complete classification and recognition.

一种存储一个或多个程序的计算机可读存储介质，所述一个或多个程序包括指令，所述指令当由计算设备执行时，使得所述计算设备执行上述方法中的任一方法。A computer-readable storage medium storing one or more programs including instructions that, when executed by a computing device, cause the computing device to perform any of the above methods.

一种计算设备，其包括：一个或多个处理器、存储器及一个或多个程序，其中一个或多个程序存储在所述存储器中并被配置为所述一个或多个处理器执行，所述一个或多个程序包括用于执行上述方法中的任一方法的指令。A computing device comprising: one or more processors, a memory, and one or more programs, wherein one or more programs are stored in the memory and configured for execution by the one or more processors, so The one or more programs include instructions for performing any of the methods described above.

本发明由于采取以上技术方案，其具有以下优点：Since the present invention adopts the above technical solutions, it has the following advantages:

1、本发明将基于人工智能的静止文字识别问题，上升到了一个借助动态笔序先验知识的行草字识别问题。1. The present invention raises the problem of static character recognition based on artificial intelligence to a problem of cursive character recognition using prior knowledge of dynamic stroke order.

2、本发明引入了行书草书的书写笔序和草书符号等先验知识，提高了行书草书书法视频识别的准确性。2. The present invention introduces prior knowledge such as the order of writing strokes and cursive symbols of cursive script, and improves the accuracy of video recognition of cursive calligraphy.

3、本发明采用无监督的算法进行视频和图像的嵌入训练，有利于本发明应用的推广。3. The present invention uses an unsupervised algorithm for video and image embedding training, which is conducive to the promotion of the application of the present invention.

附图说明Description of drawings

图1是本发明一实施例中的书法视频识别方法流程示意图；Figure 1 is a schematic flow chart of a calligraphy video recognition method in an embodiment of the present invention;

图2是本发明一实施例中的行书和草书识别方法流程示意图；Figure 2 is a schematic flow chart of a method for identifying running script and cursive script in an embodiment of the present invention;

图3是本发明一实施例中爬取到的视频存储到本地磁盘中示意图；Figure 3 is a schematic diagram of storing crawled videos in a local disk in an embodiment of the present invention;

图4是本发明一实施例中草书符号先验知识示意图；Figure 4 is a schematic diagram of prior knowledge of cursive symbols in an embodiment of the present invention;

图5是本发明一实施例中的计算设备结构示意图。Figure 5 is a schematic structural diagram of a computing device in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例的附图，对本发明实施例的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于所描述的本发明的实施例，本领域普通技术人员所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings of the embodiments of the present invention. Obviously, the described embodiments are some, but not all, of the embodiments of the present invention. Based on the described embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art fall within the scope of protection of the present invention.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terms used herein are only for describing specific embodiments and are not intended to limit the exemplary embodiments according to the present application. As used herein, the singular forms are also intended to include the plural forms unless the context clearly indicates otherwise. Furthermore, it will be understood that when the terms "comprises" and/or "includes" are used in this specification, they indicate There are features, steps, operations, means, components and/or combinations thereof.

本发明是基于草书笔序和草书符号的先验知识应用视频嵌入进行行草书书法视频识别的方法，涉及网络视频获取、视频处理、视频嵌入、图片特征提取等技术及方法。本发明仅涉及视频中时序视觉模态信息的提取，需要忽视视频背景变动和拍摄造成的遮挡、抖动、视角变化，精准识别书法视频中的文字书写信息。本发明的着眼点为书法视频的识别，将基于人工智能的静止文字识别问题，上升到了一个借助动态笔序先验知识的行草字识别问题。由于单字书法视频要先经过切分处理，自建数据库比较耗时，因此本发明使用的方法针对于小规模数据集，选用无监督的方法。The present invention is a method of applying video embedding to identify cursive calligraphy videos based on prior knowledge of cursive hand order and cursive symbols, and involves technologies and methods such as network video acquisition, video processing, video embedding, and picture feature extraction. The present invention only involves the extraction of time-series visual modal information in the video. It needs to ignore the occlusion, jitter, and perspective changes caused by video background changes and shooting, and accurately identify the text writing information in the calligraphy video. The focus of the present invention is the recognition of calligraphy videos, and the problem of static character recognition based on artificial intelligence is raised to a problem of cursive character recognition using prior knowledge of dynamic stroke order. Since the single-character calligraphy video needs to be segmented first, building a self-built database is time-consuming. Therefore, the method used in this invention is an unsupervised method for small-scale data sets.

在本发明的一个实施例中，如图1所示，提供一种书法视频识别方法，本实施例以该方法应用于终端进行举例说明，可以理解的是，该方法也可以应用于服务器，还可以应用于包括终端和服务器的系统，并通过终端和服务器的交互实现。本实施例所提供的识别方法不仅可以用于行草书书法视频识别，也可应用到其他领域对其他视频数据进行识别，例如还可以对概书书法视频进行识别，本实施例以行草书为例，对其他书法类型不做限定。本实施例中，该方法包括以下步骤：In one embodiment of the present invention, as shown in Figure 1, a calligraphy video recognition method is provided. This embodiment illustrates the application of this method to a terminal. It can be understood that this method can also be applied to a server. It can be applied to systems including terminals and servers, and is implemented through the interaction between terminals and servers. The recognition method provided in this embodiment can not only be used to identify cursive calligraphy videos, but can also be applied to other fields to identify other video data. For example, it can also be used to identify cursive calligraphy videos. This embodiment takes cursive calligraphy as an example. , there are no restrictions on other calligraphy types. In this embodiment, the method includes the following steps:

步骤1、获取初始书法视频数据并处理；Step 1. Obtain initial calligraphy video data and process it;

步骤2、收集获取行书草书书写笔序和草书符号的先验知识；Step 2. Collect and acquire prior knowledge of cursive writing strokes and cursive symbols;

步骤3、结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将关键帧图片转化为文本；Step 3. Combine the prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and convert the key frame pictures into text;

步骤4、将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；Step 4. Vectorize the text of the picture to obtain a multi-dimensional vector of each text, splice the multi-dimensional vectors generated from each picture in time sequence, and combine them to form a video vector;

步骤5、将视频进行向量降维可视化处理，完成分类识别。Step 5: Perform vector dimensionality reduction and visualization on the video to complete classification and recognition.

在一个优选的实施例中，步骤1中获取初始书法视频数据并处理，包括以下步骤：In a preferred embodiment, obtaining and processing the initial calligraphy video data in step 1 includes the following steps:

步骤11、采用爬虫爬取初始书法视频数据；Step 11. Use a crawler to crawl the initial calligraphy video data;

爬取短视频网站上的书法书写视频，缓存到本地文件中，部分结果如图3所示。Crawling calligraphy writing videos on short video websites and caching them into local files. Some results are shown in Figure 3.

步骤12、为了训练效果，筛选出视频效果清晰且对书写内容没有对文字部分超过预先设定范围遮挡的视频；在本实施例中，预先设定范围遮挡优选为25％遮挡；Step 12. For the training effect, select videos with clear video effects and no occlusion of the written content beyond the preset range; in this embodiment, the preset range occlusion is preferably 25% occlusion;

步骤13、对筛选出的视频中的单字视频进行截取，使每个视频仅包含单个汉字的书写过程，并对视频进行命名，截取删除视频前后的用户水印。Step 13: Intercept the single-character videos in the filtered videos so that each video only contains the writing process of a single Chinese character, name the videos, and intercept the user watermarks before and after deleting the videos.

具体为：本实施例中采用爬虫爬取初始书法视频数据并处理。如今的短视频网站上有大量书法书写视频，但是这些网站视频拍摄残次不齐，且所有下载的视频都在最后包含几秒用户水印。因此在下载视频后需要对这些视频进行处理，人为筛选获得视频效果清晰且对书写内容没有重大遮挡的视频，并统一删去最后几秒的水印。因为有的视频中一个视频包含多个字的书写过程，本实施例中处理和识别的是单字视频，因此对这些视频要进行截取。Specifically: in this embodiment, a crawler is used to crawl the initial calligraphy video data and process it. There are a large number of calligraphy writing videos on today's short video websites, but the videos on these websites are unevenly shot, and all downloaded videos contain a few seconds of user watermark at the end. Therefore, after downloading the videos, these videos need to be processed, artificially filtered to obtain videos with clear video effects and no significant obstruction to the written content, and the watermarks of the last few seconds are uniformly deleted. Because one video in some videos contains the writing process of multiple words, single-word videos are processed and recognized in this embodiment, so these videos need to be intercepted.

在一个优选的实施例中，步骤2中收集获取行书草书书写笔序和草书符号的先验知识，先验知识包括在行书草书中与楷书书写方式不同的笔序信息。In a preferred embodiment, in step 2, prior knowledge of cursive writing stroke order and cursive symbols is collected and acquired. The prior knowledge includes information on stroke order in cursive writing that is different from regular script writing.

具体为：参考网页草书偏旁部首大全、最全草书偏旁写法和书籍于右任《标准草书》、刘东芹《草书字法解析》、孙宝文《行书实用字典》等参考资料，结合日常书写的使用习惯，收集常用的在行书草书中与楷书书写方式不同的笔序信息和草书符号。Specifically: refer to the website's complete collection of cursive radicals, the most complete cursive radical writing method and books Yu Youren's "Standard Cursive Script", Liu Dongqin's "Analysis of Cursive Script", Sun Baowen's "A Practical Dictionary of Running Script" and other reference materials, combined with daily writing usage habits , collecting commonly used stroke order information and cursive symbols that are different from regular script writing in cursive script.

因为标准的参考材料在制定草书符号标准时更偏向于不产生草法的混乱，因此推崇具有唯一对应性的符号，但是在实际使用中并不是这样。因此，本发明结合了日常使用习惯，总结了在行书和草书中常用草书符号及其对应的代表部首和使用文字。由于本发明着重于解决方法的思想传达，因此仅收集了35组常用草书符号作为实施例。Because standard reference materials prefer not to create confusion in cursive writing when formulating standards for cursive symbols, symbols with unique correspondences are promoted, but this is not the case in actual use. Therefore, the present invention combines daily usage habits and summarizes commonly used cursive symbols in running script and cursive script and their corresponding representative radicals and used words. Since the present invention focuses on the communication of ideas of solutions, only 35 sets of commonly used cursive symbols are collected as examples.

由于收集了行书草书的笔序信息，因此本发明的方法不仅能够识别行书草书书法的单字视频，还能够区分针对某个字的楷书书法视频和行书草书书法视频。Since the stroke order information of cursive script is collected, the method of the present invention can not only identify single-word videos of cursive calligraphy, but also distinguish regular script calligraphy videos and cursive calligraphy videos of a certain word.

在一个优选的实施例中，步骤3中结合先验知识在初始书法视频数据中提取视频关键帧视频图片，包括以下步骤：In a preferred embodiment, step 3 combines prior knowledge to extract video key frame video pictures from the initial calligraphy video data, including the following steps:

步骤311、调用opencv包，按预先设定间隔截取视频帧，并保存为图片；Step 311: Call the opencv package to intercept video frames at preset intervals and save them as pictures;

步骤312、按照行书和草书的笔序信息和草书符号，获取关键帧在视频里的大概进度位置；Step 312: According to the stroke order information and cursive symbols of running script and cursive script, obtain the approximate progress position of the key frame in the video;

虽然视频长度和提取的图片数量不同，但是一个人在写字时每个笔画书写速度都是相似的。对于每个待识别文字的一组书写视频，按照行书和草书的笔序规则和草书符号，设定一个大概的一系列关键帧在视频里的进度位置。Although the video length and the number of extracted pictures are different, the writing speed of each stroke when a person writes is similar. For a set of writing videos of each text to be recognized, according to the stroke order rules and cursive symbols of running script and cursive script, a rough series of key frame progress positions in the video are set.

步骤313、按照关键帧对每个书法视频自动筛选得到固定数量的关键帧视频图片。Step 313: Automatically filter each calligraphy video according to key frames to obtain a fixed number of key frame video pictures.

具体为：由于训练时间问题，不可能提出并训练视频中的每一帧，因此本实施例结合之前的先验知识提取单字书法视频关键帧。对于每个待识别文字的一组书写视频，按照行书和草书的笔序规则和草书符号，设定一个大概的一系列关键帧在视频里的进度位置。按照设置的关键帧，对于每个书法视频自动筛选得到固定数量的关键帧视频图片。Specifically: due to training time issues, it is impossible to propose and train every frame in the video, so this embodiment combines previous prior knowledge to extract single-word calligraphy video key frames. For a set of writing videos of each text to be recognized, according to the stroke order rules and cursive symbols of running script and cursive script, a rough series of key frame progress positions in the video are set. According to the set key frames, each calligraphy video is automatically filtered to obtain a fixed number of key frame video pictures.

在一个优选的实施例中，步骤3中将关键帧图片转化为文本，包括以下步骤：In a preferred embodiment, converting keyframe images into text in step 3 includes the following steps:

步骤321、将每张图片的像素点的特征信息转化为文本存储；Step 321: Convert the feature information of the pixels of each picture into text storage;

步骤322、将图片标准化为固定长度和宽度，进行灰度化处理；Step 322: Standardize the image to a fixed length and width, and perform grayscale processing;

步骤323、提取图片的图像数值矩阵(即像素矩阵)并生成其转置矩阵，将图像数值矩阵及其转置矩阵进行拼接，得到图片的文本。Step 323: Extract the image value matrix (i.e., pixel matrix) of the picture and generate its transpose matrix, and splice the image value matrix and its transpose matrix to obtain the text of the picture.

在本实施例中，采用无监督学习中的Doc2Vec算法，因此要先把关键帧的图片转化为文本。首先把图片灰度化，使图像只含亮度信息，不含多余的色彩信息。其中，白色点的值为255，黑色点的值为0，0～255为灰度点。提取图片的图像数值矩阵并生成其转置矩阵，为了同时提取图片的横向和纵向特征，把图像数值矩阵及其转置矩阵进行拼接。把图片文本化的结果保存在txt文件中，存储在本地。In this embodiment, the Doc2Vec algorithm in unsupervised learning is used, so the key frame pictures must be converted into text first. First, the image is grayscaled so that the image only contains brightness information and does not contain redundant color information. Among them, the value of the white point is 255, the value of the black point is 0, and 0 to 255 are grayscale points. Extract the image numerical matrix of the picture and generate its transpose matrix. In order to extract the horizontal and vertical features of the picture at the same time, the image numerical matrix and its transpose matrix are spliced. Save the image textualization result in a txt file and store it locally.

在一个优选的实施例中，步骤4中组合形成视频向量，包括以下步骤：In a preferred embodiment, combining to form a video vector in step 4 includes the following steps:

步骤41、调用gensim包，采用Doc2Vec文档嵌入模型实现图片文本的向量化，并预先设定好文本向量的长度及窗口参数；Step 41. Call the gensim package, use the Doc2Vec document embedding model to vectorize the image text, and preset the length and window parameters of the text vector;

由于Doc2Vec模型可以创建文档的固定长度的向量化表示，而不管其长度如何。使用gensim包中的Doc2Vec函数，把图片的文本表示输入函数中，并预设好文档向量的长度、窗口等参数。Since the Doc2Vec model can create a fixed-length vectorized representation of a document, regardless of its length. Use the Doc2Vec function in the gensim package to input the text representation of the image into the function, and preset parameters such as the length and window of the document vector.

步骤42、遍历向量维度和窗口参数，为Doc2Vec文档嵌入模型确定最优参数；Step 42: Traverse the vector dimensions and window parameters to determine the optimal parameters for the Doc2Vec document embedding model;

步骤43、将同一个视频每张图片生成的向量按照时序顺序进行拼接，组合形成视频向量。Step 43: Splice the vectors generated from each picture of the same video in chronological order and combine them to form a video vector.

具体为：对图片文本，训练Doc2Vec模型，把图片的文本表示输入函数中，并预设好文档向量的长度、窗口等参数。利用Doc2Vec中的PV-DM模型训练向量的空间表示，模型输出得到每个文本的多维向量，每一维代表一个该文本表示的图像的隐藏特征，这些特征概括了该文本所代表的书法图像的横向和纵向特征。Specifically: train the Doc2Vec model for image text, input the text representation of the image into the function, and preset parameters such as the length and window of the document vector. Using the spatial representation of the PV-DM model training vector in Doc2Vec, the model output obtains a multi-dimensional vector for each text. Each dimension represents a hidden feature of the image represented by the text. These features summarize the calligraphy image represented by the text. Horizontal and vertical features.

在一个优选的实施例中，步骤5中将视频进行向量降维可视化处理，包括以下步骤：In a preferred embodiment, performing vector dimensionality reduction and visualization processing on the video in step 5 includes the following steps:

步骤51、对视频向量进行流形学习，进行降维可视化，将高维矩阵转化为二维向量组，将每个文档看作一个散点，绘制成图；Step 51: Perform manifold learning on the video vectors, perform dimensionality reduction visualization, convert the high-dimensional matrix into a two-dimensional vector group, treat each document as a scatter point, and draw it into a graph;

其中，本实施例中采用T-SNE方法进行降维可视化。Among them, in this embodiment, the T-SNE method is used for dimensionality reduction visualization.

步骤52、在降维结果的图上得到同一个字的向量聚集在图上相近的地方，根据得到的图完成分类识别。Step 52: Obtain vectors of the same word on the dimensionality reduction result map and gather them in similar places on the map, and complete classification and recognition based on the obtained map.

具体为：在利用无监督学习把单字书法视频表示为向量形式以后。对生成的视频向量进行流形学习，用T-SNE方法进行降维可视化，将高维矩阵转化为二维向量组，将每个文档看作一个散点，绘制成图。在降维结果的图上就可以看出同一个字的向量聚集在图上相近的地方。Specifically: after using unsupervised learning to represent the single-character calligraphy video into a vector form. Perform manifold learning on the generated video vectors, use the T-SNE method for dimensionality reduction visualization, convert the high-dimensional matrix into a two-dimensional vector group, treat each document as a scatter point, and draw it into a graph. In the picture of the dimensionality reduction result, it can be seen that the vectors of the same word are gathered in similar places on the picture.

应用这种向量可视化的方法，对于有标签的单字视频，还可以进行分类实验。对于未识别的书法单字视频，可以使用本发明的方法进行识别。Applying this vector visualization method, classification experiments can also be performed on labeled single-word videos. For unrecognized calligraphy single-character videos, the method of the present invention can be used for identification.

在本发明的一个实施例中，提供一种书法视频识别系统，其包括：初始数据获取模块、先验知识收集模块、文本转化模块、向量化模块和识别模块；In one embodiment of the present invention, a calligraphy video recognition system is provided, which includes: an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and a recognition module;

初始数据获取模块，用于获取初始书法视频数据并处理；The initial data acquisition module is used to obtain initial calligraphy video data and process it;

先验知识收集模块，用于收集获取行书草书书写笔序和草书符号的先验知识；The prior knowledge collection module is used to collect and obtain prior knowledge of cursive writing strokes and cursive symbols;

文本转化模块，结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将关键帧图片转化为文本；The text conversion module combines prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and converts the key frame pictures into text;

向量化模块，将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；The vectorization module vectorizes the text of the picture to obtain a multi-dimensional vector of each text, splices the multi-dimensional vectors generated by each picture in time sequence, and combines them to form a video vector;

识别模块，将视频进行向量降维可视化处理，完成分类识别。The recognition module performs vector dimensionality reduction and visualization processing on the video to complete classification and recognition.

本实施例提供的系统是用于执行上述各方法实施例的，具体流程和详细内容请参照上述实施例，此处不再赘述。The system provided by this embodiment is used to execute each of the above method embodiments. Please refer to the above embodiments for specific processes and details, which will not be described again here.

如图5所示，为本发明一实施例中提供的计算设备结构示意图，该计算设备可以是终端，其可以包括：处理器(processor)、通信接口(Communications Interface)、存储器(memory)、显示屏和输入装置。其中，处理器、通信接口、存储器通过通信总线完成相互间的通信。该处理器用于提供计算和控制能力。该存储器包括非易失性存储介质、内存储器，该非易失性存储介质存储有操作系统和计算机程序，该计算机程序被处理器执行时以实现一种识别方法；该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该通信接口用于与外部的终端进行有线或无线方式的通信，无线方式可通过WIFI、管理商网络、NFC(近场通信)或其他技术实现。该显示屏可以是液晶显示屏或者电子墨水显示屏，该输入装置可以是显示屏上覆盖的触摸层，也可以是计算设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。处理器可以调用存储器中的逻辑指令，以执行如下方法：As shown in Figure 5, it is a schematic structural diagram of a computing device provided in an embodiment of the present invention. The computing device may be a terminal, which may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), a display screens and input devices. Among them, the processor, communication interface, and memory complete communication with each other through the communication bus. The processor is used to provide computing and control capabilities. The memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. When the computer program is executed by the processor, it implements an identification method; the internal memory is non-volatile. Provides an environment for the operation of operating systems and computer programs in permanent storage media. The communication interface is used for wired or wireless communication with external terminals. The wireless mode can be implemented through WIFI, manager network, NFC (Near Field Communication) or other technologies. The display screen may be a liquid crystal display or an electronic ink display. The input device may be a touch layer covered on the display screen, or may be a button, trackball or touch pad provided on the housing of the computing device, or may be an external Keyboard, trackpad or mouse, etc. The processor can call logical instructions in memory to perform methods such as:

获取初始书法视频数据并处理；收集获取行书草书书写笔序和草书符号的先验知识；结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将所述关键帧图片转化为文本；将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；将所述视频进行向量降维可视化处理，完成分类识别。Obtain initial calligraphy video data and process it; collect and obtain prior knowledge of cursive writing stroke order and cursive symbols; combine the prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and convert the key frame pictures into text; The text of the picture is vectorized to obtain a multi-dimensional vector of each text, and the multi-dimensional vectors generated by each picture are spliced in time sequence and combined to form a video vector; the video is subjected to vector dimensionality reduction and visualization processing to complete classification and recognition.

此外，上述的存储器中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

本领域技术人员可以理解，图5中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算设备的限定，具体的计算设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 5 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computing device to which the solution of the present application is applied. The specific computing device can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.

在本发明的一个实施例中，提供一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法实施例所提供的方法，例如包括：获取初始书法视频数据并处理；收集获取行书草书书写笔序和草书符号的先验知识；结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将所述关键帧图片转化为文本；将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；将所述视频进行向量降维可视化处理，完成分类识别。In one embodiment of the present invention, a computer program product is provided. The computer program product includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When the program instructions When executed by a computer, the computer can execute the methods provided in each of the above method embodiments, including, for example: obtaining initial calligraphy video data and processing it; collecting and obtaining prior knowledge of the cursive writing stroke order and cursive symbols; combining the prior knowledge in the initial Extract video key frame video pictures from the calligraphy video data, convert the key frame pictures into text; vectorize the text of the picture to obtain a multi-dimensional vector of each text, and splice the multi-dimensional vectors generated by each picture in time sequence to form a combination Video vector; perform vector dimensionality reduction and visualization processing on the video to complete classification and recognition.

在本发明的一个实施例中，提供一种非暂态计算机可读存储介质，该非暂态计算机可读存储介质存储服务器指令，该计算机指令使计算机执行上述各实施例提供的方法，例如包括：获取初始书法视频数据并处理；收集获取行书草书书写笔序和草书符号的先验知识；结合先验知识在初始书法视频数据中提取视频关键帧视频图片，将所述关键帧图片转化为文本；将图片的文本向量化得到每个文本的多维向量，将每张图片生成的多维向量按照时序进行拼接，组合形成视频向量；将所述视频进行向量降维可视化处理，完成分类识别。In one embodiment of the present invention, a non-transitory computer-readable storage medium is provided. The non-transitory computer-readable storage medium stores server instructions. The computer instructions cause the computer to execute the methods provided in the above embodiments, for example, including : Obtain initial calligraphy video data and process it; collect and obtain prior knowledge of cursive cursive writing strokes and cursive symbols; combine prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and convert the key frame pictures into text ;Vectorize the text of the picture to obtain a multi-dimensional vector of each text, splice the multi-dimensional vectors generated by each picture in time sequence, and combine them to form a video vector; perform vector dimensionality reduction and visualization processing on the video to complete classification and recognition.

上述实施例提供的一种计算机可读存储介质，其实现原理和技术效果与上述方法实施例类似，在此不再赘述。The implementation principles and technical effects of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and will not be described again here.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A calligraphy video recognition method, characterized by including:

Obtain initial calligraphy video data and process it;

Collect and acquire prior knowledge of cursive writing strokes and cursive symbols;

Combining prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and convert the key frame video pictures into text;

Vectorize the text of the picture to obtain a multi-dimensional vector of each text, splice the multi-dimensional vectors generated from each picture in time sequence, and combine to form a video vector;

Perform vector dimensionality reduction and visualization processing on the video to complete classification and recognition;

The prior knowledge includes: information on the stroke order in cursive script that is different from regular script writing;

The method of extracting video key frame video pictures from the initial calligraphy video data by combining prior knowledge includes:

Call the opencv package to intercept video frames at preset intervals and save them as pictures;

According to the stroke order information and cursive symbols of running script and cursive script, the progress position of the key frame in the video is obtained;

Automatically filter each calligraphy video according to key frames to obtain a fixed number of key frame video pictures.

2. The identification method according to claim 1, characterized in that said obtaining and processing the initial calligraphy video data includes:

Use a crawler to crawl the initial calligraphy video data;

Filter out videos that have clear video effects and do not block the written content beyond the preset range;

Extract single-word videos from the filtered videos.

3. The identification method according to claim 1, wherein converting the key frame video picture into text includes:

Convert the feature information of the pixels of each image into text storage;

Standardize the image to a fixed length and width and perform grayscale processing;

Extract the image numerical matrix of the picture and generate its transpose matrix, and concatenate the image numerical matrix and its transpose matrix to obtain the text of the picture.

4. The identification method according to claim 1, characterized in that the combination to form a video vector includes:

Call the gensim package, use the Doc2Vec document embedding model to vectorize image text, and preset the length and window parameters of the text vector;

Traverse the vector dimensions and window parameters to determine the optimal parameters for the Doc2Vec document embedding model;

The vectors generated from each picture of the same video are spliced in chronological order and combined to form a video vector.

5. The identification method according to claim 1, characterized in that said performing vector dimensionality reduction and visualization processing on the video includes:

Perform manifold learning on the video vectors, perform dimensionality reduction visualization, convert the high-dimensional matrix into a two-dimensional vector group, treat each document as a scatter point, and draw it into a graph;

The vectors of the same word obtained on the dimensionality reduction result map are gathered in similar places on the map, and classification and recognition are completed based on the obtained map.

6. A calligraphy video recognition system, characterized by including: an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and a recognition module;

The initial data acquisition module is used to acquire initial calligraphy video data and process it;

The prior knowledge collection module is used to collect and obtain prior knowledge of cursive writing strokes and cursive symbols;

The text conversion module combines prior knowledge to extract video key frame video pictures from the initial calligraphy video data, and converts the key frame video picture into text;

The vectorization module vectorizes the text of the picture to obtain a multi-dimensional vector of each text, splices the multi-dimensional vectors generated by each picture in time sequence, and combines them to form a video vector;

The identification module performs vector dimensionality reduction and visualization processing on the video to complete classification and identification;

7. A computer-readable storage medium storing one or more programs, characterized in that the one or more programs include instructions that, when executed by a computing device, cause the computing device to perform as claimed Any of the methods described in 1 to 5.

8. A computing device, characterized by comprising: one or more processors, a memory, and one or more programs, wherein one or more programs are stored in the memory and configured as the one or more The processor executes, and the one or more programs include instructions for executing any one of the methods according to claims 1 to 5.