CN113591743B - Handwriting video identification method, system, storage medium and computing device - Google Patents
Handwriting video identification method, system, storage medium and computing device Download PDFInfo
- Publication number
- CN113591743B CN113591743B CN202110895033.7A CN202110895033A CN113591743B CN 113591743 B CN113591743 B CN 113591743B CN 202110895033 A CN202110895033 A CN 202110895033A CN 113591743 B CN113591743 B CN 113591743B
- Authority
- CN
- China
- Prior art keywords
- video
- handwriting
- pictures
- key frame
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 239000013598 vector Substances 0.000 claims abstract description 79
- 238000012545 processing Methods 0.000 claims abstract description 29
- 230000009467 reduction Effects 0.000 claims abstract description 22
- 238000012800 visualization Methods 0.000 claims abstract description 18
- 239000011159 matrix material Substances 0.000 claims description 25
- 238000012216 screening Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000009193 crawling Effects 0.000 claims description 4
- 244000025254 Cannabis sativa Species 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000012925 reference material Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
- Character Discrimination (AREA)
Abstract
The application relates to a handwriting video identification method, a handwriting video identification system, a storage medium and a computing device, wherein the handwriting video identification method comprises the following steps: acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition. The application can improve the accuracy of the video recognition of the cursive script and calligraphy and can be widely applied to the technical field of video data recognition.
Description
Technical Field
The application relates to the technical field of video data identification, in particular to a handwriting video identification method, a handwriting video identification system, a storage medium and a computing device for a grass script.
Background
The handwriting is a product which is convenient for rapid and continuous writing, so that the phenomena of stroke connection, deformation, simplification, writing and the like can occur in the writing process, the handwriting font style is different from the common simplified character regular script writing, and the difficulty is brought to the recognition of the handwriting. However, the simplification of writing is not random, and has a certain rule, and since ancient times, the simplification rule of writing of cursive writing is not fixed, but in gradual evolution, a cursive writing method is colloquially fixed for a certain word or a certain structure. Cursive books have developed their unique technological system in the evolution of book bodies, the most important of which is the simplification of stroke and cursive symbols.
The adjustment of the pen sequence makes the grass writing more natural and convenient when writing continuously. For example, writing beside a vertical heart can be changed from writing a left point first, writing a right point later, and finally writing a vertical dew vertical to writing a short vertical first, then folding a pen to write a short horizontal, and turning the pen to the left and writing a long vertical in sequence. The cursive script symbols are written in concise symbols instead of radicals of the regular script, and the cursive script components are summarized by the calligraphers of the past generation and are continuously developed. The writing method of the cursive script parts is generalized into standard cursive script symbols by the modern right-hand mr, and then 71 radical cursive script symbols and 355 radical cursive script symbols are expanded and proposed in a book of analysis of cursive script character method. The common people also know some priori knowledge when recognizing the handwriting of the handwriting, so that the introduction of the information of the sequence and the symbols of the handwriting is very important in handwriting recognition, especially in the recognition of handwriting videos. The video features contain time sequence changes of the image features compared with the common image features, and handwriting video can better reflect the writing order information of the line books and the grass. Although there are many shaping methods in the field of video motion recognition and video classification at present. In recent years, neural networks have achieved a nearly better result than human beings in computer vision tasks such as image recognition and object detection, and researchers have come to use neural networks in video tasks, such as three-dimensional convolution-based neural networks and dual-stream-based neural networks.
However, the research in the field of handwriting video recognition is not uncommon.
Disclosure of Invention
Aiming at the problems, the application aims to provide a handwriting video identification method, a handwriting video identification system, a storage medium and a computing device, so that accuracy of handwriting video identification of a script and a grass script is improved.
In order to achieve the above purpose, the present application adopts the following technical scheme: a method of handwriting video recognition, comprising: acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition.
Further, the acquiring and processing the initial handwriting video data comprises: crawling initial handwriting video data by adopting a crawler; screening out videos which are clear in video effect and have no shielding effect on written contents when the text part exceeds a preset range; and intercepting the single-word video in the screened video.
Further, the a priori knowledge includes: and writing order information in a different manner from that of regular script writing in the script.
Further, the extracting video key frame video pictures from the initial handwriting video data by combining the prior knowledge comprises the following steps: calling an opencv packet, intercepting video frames according to preset intervals, and storing the video frames as pictures; acquiring the approximate progress position of a key frame in a video according to the stroke order information and the cursive script symbol of the line script and the cursive script; and automatically screening each handwriting video according to the key frames to obtain a fixed number of key frame video pictures.
Further, the converting the key frame picture into text includes: converting the characteristic information of the pixel points of each picture into text for storage; standardizing the picture to be of a fixed length and width, and carrying out gray scale treatment; and extracting an image numerical matrix of the picture, generating a transposed matrix of the image numerical matrix, and splicing the image numerical matrix and the transposed matrix of the image numerical matrix to obtain a text of the picture.
Further, the combining forms a video vector, comprising: calling a genesim package, adopting a Doc2Vec document embedding model to realize vectorization of a picture text, and presetting the length of a text vector and window parameters; traversing vector dimension and window parameters, and determining optimal parameters for the Doc2Vec document embedding model; and splicing vectors generated by each picture of the same video according to a time sequence order, and combining to form video vectors.
Further, the performing vector dimension reduction visualization processing on the video includes: manifold learning is carried out on the video vectors, dimension reduction visualization is carried out, a high-dimensional matrix is converted into a two-dimensional vector group, each document is regarded as a scattered point, and a graph is drawn; and (3) obtaining vectors of the same word on the graph of the dimension reduction result, gathering the vectors on the graph at a similar place, and completing classification and identification according to the obtained graph.
A handwriting video recognition system, comprising: the system comprises an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and an identification module; the initial data acquisition module is used for acquiring and processing initial handwriting video data; the priori knowledge collection module is used for collecting and obtaining priori knowledge of the written sequence of the cursive script and the cursive script symbol; the text conversion module is used for extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge, and converting the key frame pictures into texts; the vectorization module vectorizes the texts of the pictures to obtain multidimensional vectors of each text, and the multidimensional vectors generated by each picture are spliced according to a time sequence to form video vectors; and the identification module performs vector dimension reduction visualization processing on the video to finish classification identification.
A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.
A computing apparatus, comprising: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.
Due to the adoption of the technical scheme, the application has the following advantages:
1. the application raises the static character recognition problem based on artificial intelligence to the character recognition problem by means of dynamic stroke priori knowledge.
2. The application introduces priori knowledge such as the writing sequence of the cursive script and the cursive script symbols, and the like, and improves the accuracy of the video identification of the cursive script and the calligraphic script.
3. The application adopts an unsupervised algorithm to carry out embedded training of videos and images, which is beneficial to popularization of application of the application.
Drawings
FIG. 1 is a flow chart of a handwriting video recognition method according to an embodiment of the application;
FIG. 2 is a schematic flow chart of a method for identifying a row book and a grass book in an embodiment of the application;
FIG. 3 is a schematic diagram of video crawled for storage on a local disk in an embodiment of the present application;
FIG. 4 is a schematic diagram of cursive symbol prior knowledge in an embodiment of the application;
FIG. 5 is a schematic diagram of a computing device in an embodiment of the application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which are obtained by a person skilled in the art based on the described embodiments of the application, fall within the scope of protection of the application.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The application discloses a method for carrying out line cursive script handwriting video identification by applying video embedding based on prior knowledge of cursive script sequences and cursive script symbols, and relates to technologies and methods for network video acquisition, video processing, video embedding, picture feature extraction and the like. The application only relates to extraction of time sequence visual mode information in a video, and needs to disregard the background variation of the video and the shielding, shaking and visual angle variation caused by shooting, so as to accurately identify the writing information in the handwriting video. The application aims at identifying calligraphic videos, and the static character identification problem based on artificial intelligence is raised to the line character identification problem by means of dynamic stroke priori knowledge. Because the single-word calligraphic video is subjected to segmentation treatment, and the self-built database is time-consuming, the method used by the application selects an unsupervised method for a small-scale data set.
In one embodiment of the present application, as shown in fig. 1, a handwriting video recognition method is provided, where the method is applied to a terminal for illustration, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. The identification method provided by the embodiment not only can be used for identifying the handwriting video of the grass handwriting, but also can be applied to other fields for identifying other video data, for example, the handwriting video of the rough handwriting can be identified, and the embodiment takes the grass handwriting as an example and does not limit other handwriting types. In this embodiment, the method includes the steps of:
step 1, acquiring and processing initial handwriting video data;
step 2, collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol;
step 3, extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge, and converting the key frame pictures into texts;
step 4, vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors;
and 5, carrying out vector dimension reduction visualization processing on the video to finish classification recognition.
In a preferred embodiment, the initial handwriting video data is acquired and processed in step 1, comprising the steps of:
step 11, crawling initial handwriting video data by adopting a crawler;
and crawling the handwriting written video on the short video website, and caching the handwriting written video into a local file, wherein partial results are shown in figure 3.
Step 12, for training effect, screening out video with clear video effect and no shielding of the written content to the text part exceeding the preset range; in this embodiment, the preset range shielding is preferably 25% shielding;
and step 13, intercepting single-word videos in the screened videos, so that each video only contains a writing process of a single Chinese character, naming the videos, and intercepting and deleting user watermarks before and after the videos.
The method comprises the following steps: in the embodiment, a crawler is adopted to crawl and process the initial handwriting video data. There are a large number of handwriting videos on short video websites today, but these website videos are shot at a poor time and all downloaded videos contain a few seconds of user watermark at the end. Therefore, after downloading the videos, the videos need to be processed, and the videos with clear video effect and no significant shielding to the written content are obtained by artificial screening, and the watermarks of the last few seconds are uniformly deleted. Because one of the videos contains writing of a plurality of words, the single word video is processed and identified in the present embodiment, and therefore the videos are intercepted.
In a preferred embodiment, the prior knowledge of the written sequence of the cursive script and the cursive script symbol is collected and acquired in step 2, wherein the prior knowledge comprises sequence information which is different from the writing mode of the regular script in the cursive script.
The method comprises the following steps: the method is characterized by collecting common pen order information and cursive symbols which are different from regular script writing modes in a cursive script by combining the use habit of daily writing with reference to the reference materials such as the radical of the cursive script of a webpage, the radical writing method of the most complete cursive script, the right-hand any standard cursive script, liu Dongqin analysis of the cursive script and Sun Baowen practical dictionary of the cursive script.
Because standard reference materials are more biased to not create confusion of grass methods when making grass symbol standards, symbols with unique correspondence are favored, but are not so in practical use. Therefore, the application combines the daily use habit, and summarizes the cursive characters commonly used in the line books and the cursive books and the corresponding representative radicals and the use characters. Since the application focuses on the idea of solving the solution, only 35 sets of common cursive symbols are collected as an example.
Because the pen sequence information of the grass books is collected, the method not only can identify single-word videos of the grass books and calligraphies of the grass books, but also can distinguish regular script calligraphies and calligraphies aiming at certain word.
In a preferred embodiment, the step 3 of extracting video key frame video pictures from the initial handwriting video data in combination with a priori knowledge comprises the steps of:
step 311, calling an opencv packet, intercepting video frames according to preset intervals, and storing the video frames as pictures;
step 312, acquiring the approximate progress position of the key frame in the video according to the stroke order information of the line books and the cursive script symbols;
although the video length and the number of pictures extracted are different, the writing speed of each stroke is similar when a person writes. For a group of written videos of each character to be identified, setting a general progress position of a series of key frames in the videos according to the sequence rules of the line books and the cursive script symbols.
Step 313, automatically screening each handwriting video according to the key frames to obtain a fixed number of key frame video pictures.
The method comprises the following steps: because of the training time problem, it is not possible to propose and train every frame in the video, so this embodiment combines prior a priori knowledge to extract the single word handwriting video key frames. For a group of written videos of each character to be identified, setting a general progress position of a series of key frames in the videos according to the sequence rules of the line books and the cursive script symbols. And automatically screening each handwriting video according to the set key frames to obtain video pictures with a fixed number of key frames.
In a preferred embodiment, the step 3 of converting the key frame picture into text includes the steps of:
step 321, converting the characteristic information of the pixel points of each picture into text for storage;
step 322, standardizing the picture to a fixed length and width, and performing gray scale treatment;
step 323, extracting an image numerical matrix (i.e. a pixel matrix) of the picture, generating a transposed matrix thereof, and splicing the image numerical matrix and the transposed matrix thereof to obtain a text of the picture.
In this embodiment, the Doc2Vec algorithm in unsupervised learning is used, so that the pictures of the key frames are first converted into text. Firstly, the picture is grayed, so that the image only contains brightness information and does not contain redundant color information. Wherein the white point has a value of 255, the black point has a value of 0, and 0 to 255 are gray points. And extracting the image numerical matrix of the picture and generating a transposed matrix thereof, and splicing the image numerical matrix and the transposed matrix thereof in order to extract the transverse and longitudinal characteristics of the picture at the same time. And saving the picture text result in a txt file and storing the result locally.
In a preferred embodiment, the combining in step 4 to form the video vector comprises the steps of:
step 41, calling a genesim package, adopting a Doc2Vec document embedding model to realize vectorization of the picture text, and presetting the length of a text vector and window parameters;
since the Doc2Vec model can create a fixed length vectorized representation of a document, regardless of its length. And using the Doc2Vec function in the genesim package to input the text representation of the picture into the function, and presetting parameters such as the length, window and the like of the document vector.
Step 42, traversing vector dimension and window parameters, and determining optimal parameters for the Doc2Vec document embedding model;
and 43, splicing vectors generated by each picture of the same video according to a time sequence order, and combining to form video vectors.
The method comprises the following steps: training a Doc2Vec model for the text of the picture, inputting the text representation of the picture into a function, and presetting parameters such as the length of a document vector, a window and the like. Using the spatial representation of the PV-DM model training vector in Doc2Vec, the model output yields a multi-dimensional vector for each text, each dimension representing hidden features of an image of the text representation, which summarize the lateral and longitudinal features of the calligraphic image represented by the text.
In a preferred embodiment, the vector dimension reduction visualization processing is performed on the video in step 5, which includes the following steps:
step 51, manifold learning is carried out on video vectors, dimension reduction visualization is carried out, a high-dimensional matrix is converted into a two-dimensional vector group, each document is regarded as a scattered point, and a graph is drawn;
in this embodiment, a T-SNE method is used for dimension reduction visualization.
And 52, obtaining vectors of the same word on the graph of the dimension reduction result, gathering the vectors in the similar places on the graph, and completing classification and identification according to the obtained graph.
The method comprises the following steps: after the single-word handwriting video is represented in vector form using unsupervised learning. Manifold learning is carried out on the generated video vectors, dimension reduction visualization is carried out by a T-SNE method, a high-dimensional matrix is converted into a two-dimensional vector group, each document is regarded as a scattered point, and a graph is drawn. It can be seen on the graph of the dimension reduction result that the vectors of the same word are clustered in close places on the graph.
By using the vector visualization method, classification experiments can be performed on single-word videos with labels. For unidentified handwriting single-word videos, the method can be used for identification.
In one embodiment of the present application, there is provided a handwriting video recognition system including: the system comprises an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and an identification module;
the initial data acquisition module is used for acquiring and processing initial handwriting video data;
the priori knowledge collection module is used for collecting and obtaining priori knowledge of the written sequence of the cursive script and the cursive script symbol;
the text conversion module is used for extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge, and converting the key frame pictures into texts;
the vectorization module vectorizes the texts of the pictures to obtain multidimensional vectors of each text, and the multidimensional vectors generated by each picture are spliced according to a time sequence to form video vectors;
and the identification module is used for carrying out vector dimension reduction visualization processing on the video to finish classification identification.
The system provided in this embodiment is used to execute the above method embodiments, and specific flow and details refer to the above embodiments, which are not described herein.
As shown in fig. 5, a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device may be a terminal, and may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), a display screen, and an input device. The processor, the communication interface and the memory complete communication with each other through a communication bus. The processor is configured to provide computing and control capabilities. The memory includes a non-volatile storage medium storing an operating system and a computer program which when executed by the processor implements an identification method; the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a manager network, NFC (near field communication) or other technologies. The display screen can be a liquid crystal display screen or an electronic ink display screen, the input device can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computing equipment, and can also be an external keyboard, a touch pad or a mouse and the like. The processor may call logic instructions in memory to perform the following method:
acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition.
Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be appreciated by those skilled in the art that the architecture shown in fig. 5 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computing devices to which the present inventive arrangements may be applied, and that a particular computing device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment of the present application, there is provided a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the method embodiments described above, for example comprising: acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition.
In one embodiment of the present application, there is provided a non-transitory computer-readable storage medium storing server instructions that cause a computer to perform the methods provided by the above embodiments, for example, including: acquiring and processing initial handwriting video data; collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol; extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame pictures into texts; vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors; and carrying out vector dimension reduction visualization processing on the video to finish classification recognition.
The foregoing embodiment provides a computer readable storage medium, which has similar principles and technical effects to those of the foregoing method embodiment, and will not be described herein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.
Claims (8)
1. A method for identifying a calligraphic video, comprising:
acquiring and processing initial handwriting video data;
collecting and obtaining priori knowledge of the writing sequence of the cursive script and the cursive script symbol;
extracting video key frame video pictures from initial handwriting video data by combining prior knowledge, and converting the key frame video pictures into texts;
vectorizing texts of the pictures to obtain multidimensional vectors of each text, splicing the multidimensional vectors generated by each picture according to a time sequence, and combining to form video vectors;
performing vector dimension reduction visualization processing on the video to finish classification recognition;
the a priori knowledge includes: the writing order information in the script is different from the writing way of the regular script;
the extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge comprises the following steps:
calling an opencv packet, intercepting video frames according to preset intervals, and storing the video frames as pictures;
acquiring the progress position of a key frame in a video according to the stroke order information and the cursive script symbol of the line book and the cursive script;
and automatically screening each handwriting video according to the key frames to obtain a fixed number of key frame video pictures.
2. The identification method of claim 1, wherein the acquiring and processing of the initial handwriting video data comprises:
crawling initial handwriting video data by adopting a crawler;
screening out videos which are clear in video effect and have no shielding effect on written contents when the text part exceeds a preset range;
and intercepting the single-word video in the screened video.
3. The method of identifying of claim 1, wherein said converting the key frame video picture to text comprises:
converting the characteristic information of the pixel points of each picture into text for storage;
standardizing the picture to be of a fixed length and width, and carrying out gray scale treatment;
and extracting an image numerical matrix of the picture, generating a transposed matrix of the image numerical matrix, and splicing the image numerical matrix and the transposed matrix of the image numerical matrix to obtain a text of the picture.
4. The identification method of claim 1, wherein the combining forms a video vector, comprising:
calling a genesim package, adopting a Doc2Vec document embedding model to realize vectorization of a picture text, and presetting the length of a text vector and window parameters;
traversing vector dimension and window parameters, and determining optimal parameters for the Doc2Vec document embedding model;
and splicing vectors generated by each picture of the same video according to a time sequence order, and combining to form video vectors.
5. The method of identifying as in claim 1, wherein said subjecting the video to vector dimension reduction visualization comprises:
manifold learning is carried out on the video vectors, dimension reduction visualization is carried out, a high-dimensional matrix is converted into a two-dimensional vector group, each document is regarded as a scattered point, and a graph is drawn;
and (3) obtaining vectors of the same word on the graph of the dimension reduction result, gathering the vectors on the graph at a similar place, and completing classification and identification according to the obtained graph.
6. A handwriting video recognition system, comprising: the system comprises an initial data acquisition module, a priori knowledge collection module, a text conversion module, a vectorization module and an identification module;
the initial data acquisition module is used for acquiring and processing initial handwriting video data;
the priori knowledge collection module is used for collecting and obtaining priori knowledge of the written sequence of the cursive script and the cursive script symbol;
the text conversion module is used for extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge, and converting the key frame video pictures into texts;
the vectorization module vectorizes the texts of the pictures to obtain multidimensional vectors of each text, and the multidimensional vectors generated by each picture are spliced according to a time sequence to form video vectors;
the identification module performs vector dimension reduction visualization processing on the video to finish classification identification;
the a priori knowledge includes: the writing order information in the script is different from the writing way of the regular script;
the extracting video key frame video pictures from the initial handwriting video data by combining the priori knowledge comprises the following steps:
calling an opencv packet, intercepting video frames according to preset intervals, and storing the video frames as pictures;
acquiring the progress position of a key frame in a video according to the stroke order information and the cursive script symbol of the line book and the cursive script;
and automatically screening each handwriting video according to the key frames to obtain a fixed number of key frame video pictures.
7. A computer readable storage medium storing one or more programs, wherein the one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-5.
8. A computing device, comprising: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110895033.7A CN113591743B (en) | 2021-08-04 | 2021-08-04 | Handwriting video identification method, system, storage medium and computing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110895033.7A CN113591743B (en) | 2021-08-04 | 2021-08-04 | Handwriting video identification method, system, storage medium and computing device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113591743A CN113591743A (en) | 2021-11-02 |
CN113591743B true CN113591743B (en) | 2023-11-24 |
Family
ID=78255306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110895033.7A Active CN113591743B (en) | 2021-08-04 | 2021-08-04 | Handwriting video identification method, system, storage medium and computing device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591743B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170083805A (en) * | 2016-01-11 | 2017-07-19 | 경북대학교 산학협력단 | Distinction method and system for characters written in caoshu characters or cursive characters |
CN108932508A (en) * | 2018-08-13 | 2018-12-04 | 杭州大拿科技股份有限公司 | A kind of topic intelligent recognition, the method and system corrected |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
CN110580352A (en) * | 2017-07-04 | 2019-12-17 | 艾朝君 | Chinese character and line book intercommunication mutual identification technical method |
CN111436005A (en) * | 2019-01-15 | 2020-07-21 | 北京字节跳动网络技术有限公司 | Method and apparatus for displaying image |
CN111881310A (en) * | 2019-12-07 | 2020-11-03 | 杭州华冬人工智能有限公司 | Chinese character hard-stroke writing intelligent guidance and scoring method and guidance scoring system |
CN112015955A (en) * | 2020-09-01 | 2020-12-01 | 清华大学 | Multi-mode data association method and device |
CN112036522A (en) * | 2020-07-20 | 2020-12-04 | 上海卓希智能科技有限公司 | Calligraphy individual character evaluation method, system and terminal based on machine learning |
CN112183335A (en) * | 2020-09-28 | 2021-01-05 | 中国人民大学 | Handwritten image recognition method and system based on unsupervised learning |
CN112766080A (en) * | 2020-12-31 | 2021-05-07 | 北京搜狗科技发展有限公司 | Handwriting recognition method and device, electronic equipment and medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200134444A1 (en) * | 2018-10-31 | 2020-04-30 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks |
-
2021
- 2021-08-04 CN CN202110895033.7A patent/CN113591743B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170083805A (en) * | 2016-01-11 | 2017-07-19 | 경북대학교 산학협력단 | Distinction method and system for characters written in caoshu characters or cursive characters |
CN110580352A (en) * | 2017-07-04 | 2019-12-17 | 艾朝君 | Chinese character and line book intercommunication mutual identification technical method |
CN108932508A (en) * | 2018-08-13 | 2018-12-04 | 杭州大拿科技股份有限公司 | A kind of topic intelligent recognition, the method and system corrected |
CN110019817A (en) * | 2018-12-04 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of detection method, device and the electronic equipment of text in video information |
CN111436005A (en) * | 2019-01-15 | 2020-07-21 | 北京字节跳动网络技术有限公司 | Method and apparatus for displaying image |
CN111881310A (en) * | 2019-12-07 | 2020-11-03 | 杭州华冬人工智能有限公司 | Chinese character hard-stroke writing intelligent guidance and scoring method and guidance scoring system |
CN112036522A (en) * | 2020-07-20 | 2020-12-04 | 上海卓希智能科技有限公司 | Calligraphy individual character evaluation method, system and terminal based on machine learning |
CN112015955A (en) * | 2020-09-01 | 2020-12-01 | 清华大学 | Multi-mode data association method and device |
CN112183335A (en) * | 2020-09-28 | 2021-01-05 | 中国人民大学 | Handwritten image recognition method and system based on unsupervised learning |
CN112766080A (en) * | 2020-12-31 | 2021-05-07 | 北京搜狗科技发展有限公司 | Handwriting recognition method and device, electronic equipment and medium |
Non-Patent Citations (5)
Title |
---|
An investigation of the modified direction feature for cursive character recognition;Michael Blumenstein等;《Pattern Recognition》;第40卷(第2期);第376-388页 * |
基于深度学习的手写汉字识别技术研究;孙巍巍;《中国优秀硕士学位论文全文数据库 信息科技辑》(第5期);第I138-1130页 * |
深度模型及其在视觉文字分析中的应用;张树业;《中国博士学位论文全文数据库信息科技辑》(第2期);第I138-179页 * |
镜像图灵测试:古诗的机器识别;薛扬 等;《计算机学报》;第44卷(第7期);第1398-1413页 * |
长文本武侠小说外号识别研究;唐锋 等;《中文信息学报》;第33卷(第8期);第132-142页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113591743A (en) | 2021-11-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190180154A1 (en) | Text recognition using artificial intelligence | |
Zhong et al. | Multi-font printed Chinese character recognition using multi-pooling convolutional neural network | |
CN108664996A (en) | A kind of ancient writing recognition methods and system based on deep learning | |
CN108805833B (en) | Miscellaneous minimizing technology of copybook binaryzation ambient noise based on condition confrontation network | |
CN112862024B (en) | Text recognition method and system | |
CN110390254B (en) | Character analysis method and device based on human face, computer equipment and storage medium | |
CN111666937A (en) | Method and system for recognizing text in image | |
CN114092938B (en) | Image recognition processing method and device, electronic equipment and storage medium | |
CN113920516B (en) | Calligraphy character skeleton matching method and system based on twin neural network | |
CN113011253B (en) | Facial expression recognition method, device, equipment and storage medium based on ResNeXt network | |
Wang et al. | Facial expression recognition based on CNN | |
CN114581918A (en) | Text recognition model training method and device | |
Singh et al. | Dknet: Deep kuzushiji characters recognition network | |
De Nardin et al. | Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding | |
Yan et al. | SMFNet: One Shot Recognition of Chinese Character Font Based on Siamese Metric Model | |
Vinokurov | Using a convolutional neural network to recognize text elements in poor quality scanned images | |
Panchal et al. | An investigation on feature and text extraction from images using image recognition in Android | |
CN113591743B (en) | Handwriting video identification method, system, storage medium and computing device | |
CN116363732A (en) | Face emotion recognition method, device, equipment and storage medium | |
CN110929652A (en) | Handwritten Chinese character recognition method based on LeNet-5 network model | |
Hutagalung et al. | Hiragana Handwriting Recognition Using Deep Neural Network Search. | |
Munggaran et al. | Handwritten pattern recognition using Kohonen neural network based on pixel character | |
CN108334884B (en) | Handwritten document retrieval method based on machine learning | |
Mulyono et al. | Hiragana Character Classification Using Convolutional Neural Networks Methods based on Adam, SGD, and RMSProps Optimizer | |
Sari et al. | The Involvement of Local Binary Pattern to Improve the Accuracy of Multi Support Vector-Based Javanese Handwriting Character Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |