WO2023000442A1 - Pen tip tracking method, medium, and computing device - Google Patents

Pen tip tracking method, medium, and computing device Download PDF

Info

Publication number
WO2023000442A1
WO2023000442A1 PCT/CN2021/115507 CN2021115507W WO2023000442A1 WO 2023000442 A1 WO2023000442 A1 WO 2023000442A1 CN 2021115507 W CN2021115507 W CN 2021115507W WO 2023000442 A1 WO2023000442 A1 WO 2023000442A1
Authority
WO
WIPO (PCT)
Prior art keywords
pen tip
pen
tracking
video
frame
Prior art date
Application number
PCT/CN2021/115507
Other languages
French (fr)
Chinese (zh)
Inventor
向大凤
Original Assignee
北京华文众合科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京华文众合科技有限公司 filed Critical 北京华文众合科技有限公司
Publication of WO2023000442A1 publication Critical patent/WO2023000442A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to the field of image tracking, in particular to a pen tip tracking method, medium and computing equipment.
  • the main purpose of the present invention is to provide a pen tip tracking method, medium and computing device, aiming to solve the problems mentioned in the background art.
  • the present invention proposes a nib tracking method, comprising:
  • the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
  • the position of the pen tip in the pen-moving video is determined.
  • the specific detection model is obtained by training based on a training sample set including a plurality of different pen tip images, including:
  • the specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
  • normalization processing is performed on the training sample set, and the specific detection model is trained based on the normalized training sample set.
  • the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting condition and different shooting conditions.
  • a plurality of pen tip tracking video clips taken under the same shooting conditions include:
  • a plurality of nib tracking video clips captured under the different shooting conditions include:
  • the acquiring a plurality of pen tip images in all the frame image data based on the multi-frame frame image data includes:
  • a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
  • the plurality of pen tip tracking video clips include hard pen tip tracking video clips and soft pen tip tracking video clips.
  • using a specific detection model to obtain a template image from the pen-moving video includes:
  • the first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
  • determining the position of the pen tip in the pen-moving video includes:
  • the first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
  • determining the position of the pen tip in the pen-moving video also includes:
  • the first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
  • the second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
  • matching the second characteristic response with the first characteristic response as the target type, and using the position of the target type matched on the second characteristic response as the pen tip position in the pen movement video include:
  • the response distribution result is mapped to each frame of image corresponding to the pen movement video, and the position with the highest response score is selected as the position of the pen tip in each frame of image.
  • the pen-moving video after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data, and the specific detection model is based on The normalized video frame data of pen movement is used to acquire a template image and determine the position of the pen tip.
  • the present invention also proposes a medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
  • the present invention also proposes a computing device, which includes a processor, and the processor is configured to implement the method described in any one of the above when executing a computer program stored in a memory.
  • the detection model is trained based on a training sample set including a plurality of different nib images, so the detection model has a strong migration ability and can adapt to a wider range of application environments, so when tracking the nib task, It can automatically identify various pen tip images; secondly, after automatically identifying the pen tip images, use the tracking model based on the twin network to track the recognized pen tip images, which can ensure the robustness, real-time and accuracy of the tracking task sex.
  • Fig. 1 is a step diagram of an embodiment of the nib tracking method of the present invention
  • Fig. 2 is the flowchart of an embodiment of the nib tracking method of the present invention
  • Fig. 3 is the structural representation of the detection model in the nib tracking method of the present invention.
  • Fig. 4 is the structural representation of the tracking model in the nib tracking method of the present invention.
  • Fig. 5 is the structural representation of the copying system utilizing the nib tracking method of the present invention.
  • FIG. 6 is a schematic structural diagram of a medium according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
  • the embodiments of the present invention can be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
  • a pen tip tracking method, medium and computing device are proposed.
  • the inventors also found that deep learning technology has excellent performance in the fields of image processing and video processing, and has shown excellent performance in the fields of recognition and detection, which are widely used.
  • the local perception characteristics of deep neural networks are helpful for small target tracking tasks.
  • Local perception proposes that each neuron does not need to perceive all pixels in the image, but only needs to perceive local pixels of the image.
  • the neural units in different layers are connected locally, that is, the neural units in each layer are only connected to some neural units in the previous layer.
  • Such a local connectivity pattern ensures that the learned model parameters have the strongest response to spatial local patterns.
  • This network structure is highly invariant to translation, scaling, tilting, or other forms of deformation.
  • the tracking task based on the matching ability of the Siamese network has become a hot topic in the field of machine learning recently. Then, introducing the twin network structure into the pen tip tracking task of small targets can effectively improve the ability of the pen tip tracking task to resist the influence of light, shadow and other factors, while ensuring the robustness, real-time performance and accuracy of the tracking task; in addition, the invention People also found that in the existing target tracking method based on the twin network, it is necessary to manually select the tracking target before tracking, which is not suitable for the pen tip tracking scenario.
  • the present invention proposes a nib tracking method, which trains the detection model in advance, and then uses the trained detection model to detect the nib tracking video without manual intervention, not only realizing automatic detection and tracking, It also overcomes the problems in the prior art that are easily affected by factors such as light and shadows.
  • the pen tip tracking method according to an exemplary embodiment of the present invention is described below with reference to FIG. 1 , including the following steps:
  • Step S100 Obtain a pen-moving video.
  • Step S200 Using a specific detection model to obtain a template image from the pen movement video, the template image includes the pen tip to be tracked, and the specific detection model is trained based on a training sample set including a plurality of different pen tip images.
  • Step S300 Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, determine the position of the pen tip in the pen-moving video.
  • a specific detection model is used to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
  • Step S210 Obtain multiple pen tip tracking videos; in this step, multiple pen tip tracking videos include multiple pen tip tracking video clips under the same shooting conditions and different shooting conditions, for example: multiple identical shootings are included in the same video Conditions and different shooting conditions of the tip tracking video clips, or all the clips in the same tip tracking video use the same shooting conditions, but the shooting conditions of multiple tip tracking videos are different from each other.
  • multiple pen tip tracking videos may be prepared in advance, and then provided according to a preset interface or uploading method, for example, they may be prepared in advance for the pen tip that the user wants to track. Or it can be a pen tip tracking video shot on the spot, such as: using the pen that the user wants to track, writing and filming on the spot. Whether it is prepared in advance or shot on-site, it only needs to include both the pen tip tracking video clips of the same shooting conditions and the pen tip tracking video clips of different shooting conditions, for example, including: in the same shooting light, the same Shooting angles, multiple clips of pen tip tracking video under the same shooting background, and multiple clips of pen tip tracking video clips under different shooting angles, different shooting lights, and different shooting backgrounds. Or, multiple pen tip tracking video clips under the same shooting light, different shooting angles, and the same shooting background.
  • pen tip tracking video data when preparing pen tip tracking video data, it can be prepared in advance and then provided according to the preset interface or upload method; it can also be used for real-time shooting on the spot, as long as it is included in multiple pen tip tracking video data
  • the pen tip tracking video clips under the same shooting conditions may include the pen tip tracking video clips under different shooting conditions.
  • Step S220 split the plurality of pen tip tracking videos frame by frame to obtain multi-frame frame image data; in this step, a frame split plug-in can be used, or a frame split plug-in can be written to obtain in step S210 All the pen tip tracking videos are segmented frame by frame, so as to obtain each frame image of each pen tip tracking video, forming multiple frames of frame image data.
  • Step S230 Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images; in this step, based on the multiple nib images obtained in step S230, For frame-by-frame image data, a frame image data set ⁇ X i ⁇ can be established, and each frame image can be expressed as X ij , which means the j-th frame image in the i-th segment of video.
  • each frame image in the frame image data set to obtain the pen tip image such as using the labelme tool, labelimg tool, and yolo-mark tool to manually frame each frame image in the frame image data set ⁇ X i ⁇ for comparison
  • a small pen tip image is to use the Vatic tool, the Sloth tool, and the Rectlabel tool to automatically select a relatively small pen tip image for each frame image in the frame image dataset ⁇ X i ⁇ .
  • the pen tip tracking video acquired in step S210 there may be individual frames without a pen tip image, and at this time, the frame image can be ignored directly, and only the pen tip part in the frame image with the pen tip image needs to be selected.
  • all the nib images obtained by frame selection constitute the training sample set.
  • ⁇ Z ⁇ can be used to represent the training sample set, then Z i can represent the i-th nib template image in the training sample set.
  • the above-mentioned tools such as labelme can be used to frame the pen tip image, and other frame selection tools other than the above-mentioned frame selection tools can also be used in other embodiments, and the technical solution of the present invention does not limit the frame selection tools.
  • step S240 perform training based on the training sample set to obtain a specific detection model.
  • an existing detection model can be used, or a detection model can be constructed for training.
  • Figure 3 it is a schematic structural diagram of the two-dimensional convolutional network detection model constructed in this embodiment.
  • the detection model includes the first two-dimensional convolutional layer conv2D_1, the first residual module residual_block_1, and the second residual module connected in sequence residual_block_2, the third residual block residual_block_3, the fourth residual block residual_block_4, and the fifth residual block residual_block_5, wherein the third residual block is also connected to the first splicing layer concatendate_1, and the first splicing layer concatendate_1 is connected to the first two-dimensional
  • the convolution block conv2D_block_1, the first two-dimensional convolution block conv2D_block_1 is also connected with the second two-dimensional convolution layer conv2D_2; in addition, the fourth residual module residual_block_4 is also connected with the second concatenated layer concatendate_2, and the second concatenated layer concatendate_2 is connected with the second concatenated layer
  • the second two-dimensional convolution block conv2D_block_2, the second two-dimensional convolution block conv2D_block_2 is connected to the
  • the detection model can be trained with the nib images in the training sample set established in step S230.
  • the nib images in the training sample set include nib images under various shooting conditions, such as including It includes the pen tip images under the same shooting light, the same shooting angle, and the same shooting background, and under different shooting light, different shooting angles, and different shooting backgrounds; or it also includes part of the same shooting conditions and some different shooting conditions.
  • the image of the pen tip such as the same shooting light with different shooting angles, the same shooting angle with different shooting backgrounds, etc.
  • the training sample set contains various forms of nib images under different combinations such as various lights, various backgrounds, various angles, and various shadows. Therefore, after using the nib images in this training sample set to train the detection model, the The detection model has a strong migration ability and can adapt to a wider range of application environments. Therefore, when faced with pen tracking videos under various shooting conditions, it can automatically detect and recognize pen tip images from them.
  • the specific detection model is used to obtain a template image from the pen-moving video, and the pen-moving video can be input into the above-mentioned trained specific detection model, and the specific detection model is obtained from the described specific detection model.
  • the first pen tip image detected in the pen movement video is used as the template image.
  • step S300 is performed, based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
  • the tracking model includes the first convolutional layer conver_1, the first pooling layer pool_1, the second convolutional layer conv_2, the second pooling layer pool_2, the third convolutional layer conv_3, the Four convolutional layers conv_4, fifth convolutional layer conv_5.
  • conv_1 convolution kernel size is 11 ⁇ 11
  • step size is 2
  • channel number is 96
  • pool_1 convolution kernel size is 3 ⁇ 3, step size is 2
  • conv_2 convolution kernel is 5 ⁇ 5, step size is 1, the number of channels is 256
  • the pool_2 convolution kernel size is 3 ⁇ 3, the step size is 2
  • the conv_3 convolution kernel size is 3 ⁇ 3, the step size is 1, the number of channels is 384
  • the conv_4 convolution kernel size is 3 ⁇ 3, the step size is 1, the number of channels is 384
  • the conv_5 convolution kernel size is 3 ⁇ 3, the step size is 1, and the number of channels is 256.
  • the network structure of this tracking model has the advantage of being more portable, and the speed of running and processing data is faster .
  • the step S200 recognizes the first nib image, here for convenience of description, a frame of the first nib image in the pen-moving video is referred to as the first frame (the first frame here is only for convenience) description, does not represent the first frame image of the pen-moving video), then it can be seen that the pen tip begins to appear in the first frame, and the position of the pen tip in the pen-moving video includes the position of the pen tip in the first frame, and every frame after the first frame.
  • the pen tip position of the frame, the determination method for the pen tip position of the first frame is as follows:
  • the position of the pen tip in the image that is, the position of the first feature response on the first frame is taken as the position of the pen tip in the first frame).
  • the method of determining the position of the pen tip for each frame after the first frame is as follows:
  • the first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
  • the second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
  • the pen -moving video frame by frame in real time, and then transmit the split frame-by-frame images to a specific detection model in real time.
  • the nib image is automatically recognized, and the nib image is used as a template image, and the nib image is input into the Siamese network, and the first feature response is extracted.
  • the first feature response is used as the target type.
  • the obtained response distribution result f(T 0 , T) is mapped back in the Tth frame image, and the region with the highest response score is selected as the position result of the tracking target of the Tth frame image;
  • the nib image detection after the nib image detection is performed based on each frame image of the frame image data and the training sample set is established, it also includes: performing normalization processing on the nib images in the training sample set , and use the normalized pen tip images in the training sample set to train the detection model.
  • the normalization processing method of the maximum or minimum value can be used to process the nib images in the training sample set.
  • the standard nib images of the same form After normalization, the standard nib images of the same form can be obtained, and then the standard nib images of the same form can be used for training and detection. The model is easier and simpler.
  • the plurality of pen tip tracking videos include a hard pen tip tracking video and a soft pen tip tracking video.
  • the tracking video of the tip of the hard pen and the tracking video of the tip of the soft pen are only different in the type of the tip, and the acquisition method thereof refers to the acquisition in step S210 , which will not be repeated here. Since it covers both hard pen nibs and soft pen nibs, the training sample set contains images of soft pen nibs and hard pen nibs under various conditions. Therefore, after the detection model uses the nib images in the training sample set of this nib image, It can automatically recognize the image of the tip of the hard pen, and it can also automatically identify the image of the tip of the soft pen.
  • the multiple nib tracking videos include a hard pen nib tracking video and a soft pen nib tracking video, wherein the hard pen nib tracking video includes nib tracking video clips under the same shooting conditions and different shooting conditions.
  • Pen tip tracking video clips, soft pen tip tracking video clips also include pen tip tracking video clips under the same shooting conditions and pen tip tracking video clips under different shooting conditions.
  • the pen-moving video after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data , the specific detection model acquires a template image based on the normalized video frame data of pen movement, and determines the position of the pen tip. For example, after obtaining the pen-moving video, use the frame splitting tool to split the frame of the pen-moving video to obtain each frame image, and then use the normalization processing method of the maximum or minimum value based on all frame images to perform normalization After processing, each frame image of the pen movement video with the same direction is obtained, and then input to the detection model to detect the pen tip image, which is easier and faster.
  • the technical solution of the present invention first obtains the pen tip tracking video data under different shooting conditions, then performs frame-by-frame segmentation on the pen tip tracking video data under various shooting conditions, and obtains each frame In order to establish a training sample set, and then use this training sample set to train the detection model, since the training sample set contains pen tip tracking videos under various shooting conditions, the obtained training sample set also contains various shooting conditions.
  • the pen tip image under the conditions so the trained detection model can accurately identify the pen tip image in the pen tip video in different environments when detecting the pen tip image; and then when the detection model detects the first pen tip image, it will detect
  • the first pen tip image and the pen-moving video are input into the Siamese network, and the convolution calculation is performed through the twin network to obtain the position of the first pen-tip image in each frame of the pen-moving video, thereby forming the pen tip tracking result; on the other hand Due to the use of the twin network to perform convolution calculations on the first pen tip image and the frame image of the pen-moving video, it effectively improves the ability of the tracking task to resist the influence of shooting conditions such as light and shadows, and also ensures the robustness and real-time performance of the tracking task. and accuracy.
  • the computer-readable storage medium shown in it is an optical disc 200, on which a computer program (that is, a program product) is stored. Documented steps such as:
  • the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
  • the position of the pen tip in the pen-moving video is determined.
  • examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random Access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other optical and magnetic storage media will not be repeated here.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random Access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • flash memory or other optical and magnetic storage media will not be repeated here.
  • FIG. 7 A block diagram of an exemplary computing device 300, which may be a computer system or a server.
  • the computing device 300 shown in FIG. 7 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.
  • components of computing device 300 may include, but are not limited to: one or more processors or processing units 310 , system memory 320 , and bus 330 connecting different system components (including system memory and processing unit 310 ).
  • Computing device 300 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computing device 300 and include both volatile and nonvolatile media, removable and non-removable media.
  • System memory may include computer system readable media in the form of volatile memory, such as random access memory (RAM 321 ) and/or cache memory 322 .
  • Computing device 300 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • ROM 323 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk may be provided, as well as a removable non-volatile disk (such as a CD-ROM 323, DVD- ROM323 or other optical media) CD-ROM drive for reading and writing.
  • each drive can be connected to the bus via one or more data medium interfaces.
  • the system memory includes at least one program product having a set (eg, at least one) of program modules 324 configured to perform the functions of various embodiments of the present invention.
  • Program modules 324 generally perform the functions and/or methodologies of the described embodiments of the invention.
  • Computing device 300 may also communicate with one or more external devices 340 (eg, keyboards, pointing devices, displays, etc.). Such communication may occur through input/output (I/O) interface 350 . Also, computing device 300 may communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet) through network adapter 360. As shown in FIG. 7 , the network adapter 360 communicates with other modules of the computing device 300 (such as the processing unit 310 , etc.) through a bus. It should be appreciated that although not shown in FIG. 7 , other hardware and/or software modules may be used in conjunction with computing device 300 .
  • external devices 340 eg, keyboards, pointing devices, displays, etc.
  • networks e.g., local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet
  • network adapter 360 communicates with other modules of the computing device 300 (such as the processing unit 310 , etc.) through a bus.
  • the processing unit 310 executes various functional applications and data processing by running programs stored in the system memory, such as:
  • the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
  • the position of the pen tip in the pen-moving video is determined.
  • FIG. 5 shows a schematic structural diagram of an exemplary tracing system suitable for implementing embodiments of the present invention.
  • the tracing system includes: a tracing board 400, a camera 420, and a computer 430, And display screen 440, wherein, camera 420 is used for taking the video of pen movement of copying pen 410 on the copy board 400, camera 420 is connected with computer 430, and the video of pen movement captured is sent to computer 430, and computer 430 can specifically refer to above-mentioned example
  • a computing device the computer 430 is used to execute the above-mentioned nib tracking method, and the motion position of the nib is determined according to the pen movement video taken by the camera 420, and the computer 430 is also connected with the display screen 440 for displaying the determined nib position on the display screen 440, For the reference of copyists.
  • a nib tracking method comprising:
  • the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
  • the position of the pen tip in the pen-moving video is determined.
  • nib tracking method as described in technical scheme 1, wherein, obtain described specific detection model based on the training sample set training that comprises a plurality of different nib images, comprising:
  • the specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
  • the normalized training sample set is obtained through training.
  • the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting conditions and different shooting conditions.
  • nib tracking method described in any one of claim technical solutions 1-4 wherein, a plurality of nib tracking video clips captured under the same shooting conditions include:
  • a plurality of nib tracking video clips captured under the different shooting conditions include:
  • a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
  • the first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
  • the first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
  • the first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
  • the second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
  • the response distribution result is mapped to each frame of image corresponding to the pen movement video, and the position with the highest response score is selected as the position of the pen tip in each frame of image.
  • nib tracking method wherein, also comprise after obtaining the pen movement video, carry out frame splitting to described pen movement video, obtain multi-frame pen movement video frame data, and described pen movement video
  • the frame data is subjected to normalization processing, and the specific detection model acquires a template image based on the normalized processing of the pen-moving video frame data, and determines the position of the pen tip.
  • a medium on which a computer program is stored, wherein when the computer program is executed by a processor, the method according to any one of technical solutions 1-12 is implemented.
  • a computing device characterized in that the computing device includes a processor, and the processor is configured to implement the method according to any one of technical solutions 1-12 when executing a computer program stored in a memory.

Abstract

Disclosed are a pen tip tracking method, a medium, and a computing device. The method comprises: obtaining a pen wielding video; using a specific detection model to obtain a template image from the pen wielding video, the template image comprising a pen tip to be tracked, and the specific detection model being obtained by means of training on the basis of a training sample set comprising a plurality of different pen tip images; and determining a pen tip position in the pen wielding video on the basis of the template image, the pen wielding video, and a tracking model constructed on the basis of a twin network. By means of the pen tip tracking method, pen tip images in pen wielding videos in different environments can be accurately identified; moreover, the ability of a tracking task to resist the influence of photographing conditions such as light and shadow is effectively improved, and robustness, real-time performance, and accuracy of the tracking task are also ensured.

Description

笔尖跟踪方法、介质及计算设备Pen tip tracking method, medium and computing device 技术领域technical field
本发明涉及图像跟踪领域,特别涉及一种笔尖跟踪方法、介质及计算设备。The invention relates to the field of image tracking, in particular to a pen tip tracking method, medium and computing equipment.
背景技术Background technique
在目前的笔尖跟踪任务中,大多是将笔尖视为小目标,然后依托模板匹配的方法实现跟踪。然而,笔尖与普通的小目标也有明显差异,笔尖目标在书写过程中易受到光线、阴影等因素影响,导致跟踪精度下降,甚至在书写速度过快时会出现脱靶现象。In the current pen tip tracking tasks, most of them regard the pen tip as a small target, and then rely on the method of template matching to achieve tracking. However, there are also obvious differences between the nib and ordinary small targets. The nib target is easily affected by factors such as light and shadow during the writing process, resulting in a decrease in tracking accuracy, and even missing the target when the writing speed is too fast.
发明内容Contents of the invention
本发明的主要目的是提出一种笔尖跟踪方法、介质及计算设备,旨在解决背景技术中所提到的问题。The main purpose of the present invention is to provide a pen tip tracking method, medium and computing device, aiming to solve the problems mentioned in the background art.
为实现上述目的,本发明提出一种笔尖跟踪方法,包括:In order to achieve the above object, the present invention proposes a nib tracking method, comprising:
获取运笔视频;Obtain the pen video;
采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
可选地,基于包括多个不同笔尖图像的训练样本集训练得到所述特定的检测模型,包括:Optionally, the specific detection model is obtained by training based on a training sample set including a plurality of different pen tip images, including:
获取多个笔尖跟踪视频片段;Get multiple nib tracking video clips;
对所述多个笔尖跟踪视频片段进行逐帧拆分,得到多帧帧图像数据;Carrying out frame-by-frame splitting of the multiple pen tip tracking video clips to obtain multi-frame frame image data;
基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,并基于所述多个笔尖图像构建训练样本集;Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images;
所述特定的检测模型基于所述训练样本集训练得到,以使所述特定的检 测模型能够自动检测笔尖图像。The specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
可选地,在所述训练样本集构建完成后,对所述训练样本集进行归一化处理,所述特定的检测模型基于归一化处理后的所述训练样本集训练得到。Optionally, after the training sample set is constructed, normalization processing is performed on the training sample set, and the specific detection model is trained based on the normalized training sample set.
可选地,所述多个笔尖跟踪视频片段包括:相同拍摄条件和不同拍摄条件下拍摄的多个笔尖跟踪视频片段。Optionally, the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting condition and different shooting conditions.
可选地,所述相同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:Optionally, a plurality of pen tip tracking video clips taken under the same shooting conditions include:
相同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频频段;Multiple pen tip tracking video frequency bands shot under the same shooting angle, shooting light, and shooting background;
所述不同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:A plurality of nib tracking video clips captured under the different shooting conditions include:
不同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频片段。Multiple nib tracking video clips taken at different camera angles, shooting lights, and shooting backgrounds.
可选地,所述基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,包括:Optionally, the acquiring a plurality of pen tip images in all the frame image data based on the multi-frame frame image data includes:
基于所述多帧帧图像数据,利用特定工具从全部所述帧图像数据中检测获取多个笔尖图像。Based on the multiple frames of frame image data, a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
可选地,所述多个笔尖跟踪视频片段包括硬笔笔尖跟踪视频片段和软笔笔尖跟踪视频片段。Optionally, the plurality of pen tip tracking video clips include hard pen tip tracking video clips and soft pen tip tracking video clips.
可选地,采用特定的检测模型从所述运笔视频中获取模板图像包括:Optionally, using a specific detection model to obtain a template image from the pen-moving video includes:
将所述特定的检测模型从所述运笔视频中检测到的第一个笔尖图像作为模板图像。The first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
可选地,基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置,包括:Optionally, based on the template image, the pen-moving video, and a tracking model based on twin network construction, determining the position of the pen tip in the pen-moving video includes:
将所述第一个笔尖图像输入至所述跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
以所述第一特征响应为目标类型,在所述第一个笔尖图像所属的一帧图像上追踪所述目标类型的位置,并作为所述第一个笔尖图像所属的一帧图像中的笔尖位置。Taking the first characteristic response as the target type, tracking the position of the target type on a frame image to which the first pen tip image belongs, and using it as the pen tip in the frame image to which the first pen tip image belongs Location.
可选地,基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置,还包括:Optionally, based on the template image, the pen-moving video, and a tracking model based on twin network construction, determining the position of the pen tip in the pen-moving video also includes:
将所述第一个笔尖图像输入至所述跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
将所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像输入至所述跟踪模型的孪生网络,得到所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像各自的第二特征响应;Input each frame of image after the frame of the first pen tip image in the pen movement video to the twin network of the tracking model to obtain the frame of the pen movement video after the frame of the first pen tip image The second characteristic response of each frame image;
以所述第一特征响应为目标类型匹配所述第二特征响应,将在所述第二特征响应上匹配到的所述目标类型的位置,作为所述运笔视频中的笔尖位置。The second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
可选地,以所述第一特征响应为目标类型匹配所述第二特征响应,将在所述第二特征响应上匹配到的所述目标类型的位置,作为所述运笔视频中的笔尖位置包括:Optionally, matching the second characteristic response with the first characteristic response as the target type, and using the position of the target type matched on the second characteristic response as the pen tip position in the pen movement video include:
对所述第一特征响应和所述第二特征响应进行卷积互相关计算,得到所述第一个笔尖图像在所述运笔视频中所属的一帧图像后的每一帧图像中的响应分布结果;Carrying out convolution cross-correlation calculation on the first characteristic response and the second characteristic response to obtain the response distribution of the first pen tip image in each frame image after the one frame image belonging to the pen movement video result;
将所述响应分布结果映射到所述运笔视频所对应的每一帧图像中,选取响应得分最高的位置作为每一帧图像中的笔尖位置。The response distribution result is mapped to each frame of image corresponding to the pen movement video, and the position with the highest response score is selected as the position of the pen tip in each frame of image.
可选地,获取运笔视频后还包括,对所述运笔视频进行帧拆分,得到多帧运笔视频帧数据,并对所述运笔视频帧数据进行归一化处理,所述特定的检测模型基于经过归一化处理的所述运笔视频帧数据获取模板图像,以及进行笔尖位置确定。Optionally, after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data, and the specific detection model is based on The normalized video frame data of pen movement is used to acquire a template image and determine the position of the pen tip.
本发明还提出一种介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一项所述的方法。The present invention also proposes a medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.
本发明还提出一种计算设备,所述计算设备包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现上述任一项所述的方法。The present invention also proposes a computing device, which includes a processor, and the processor is configured to implement the method described in any one of the above when executing a computer program stored in a memory.
本发明的技术方案,首先基于包括多个不同笔尖图像的训练样本集对检测模型进行训练,因此该检测模型具备很强的迁移能力,可以适应更广泛的应用环境,故而在跟踪笔尖任务时,能够自动识别各种不同的笔尖图像;其次,在自动识别笔尖图像后,利用基于孪生网络构建的跟踪模型,对识别到的笔尖图像进行跟踪,能够保证跟踪任务的鲁棒性、实时性以及准确性。In the technical solution of the present invention, first, the detection model is trained based on a training sample set including a plurality of different nib images, so the detection model has a strong migration ability and can adapt to a wider range of application environments, so when tracking the nib task, It can automatically identify various pen tip images; secondly, after automatically identifying the pen tip images, use the tracking model based on the twin network to track the recognized pen tip images, which can ensure the robustness, real-time and accuracy of the tracking task sex.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to the structures shown in these drawings without creative effort.
图1为本发明笔尖跟踪方法一实施例的步骤图;Fig. 1 is a step diagram of an embodiment of the nib tracking method of the present invention;
图2为本发明笔尖跟踪方法一实施例的流程图;Fig. 2 is the flowchart of an embodiment of the nib tracking method of the present invention;
图3为本发明笔尖跟踪方法中的检测模型的结构示意图;Fig. 3 is the structural representation of the detection model in the nib tracking method of the present invention;
图4为本发明笔尖跟踪方法中的跟踪模型的结构示意图;Fig. 4 is the structural representation of the tracking model in the nib tracking method of the present invention;
图5为本利用本发明笔尖跟踪法中的临摹系统的结构示意图;Fig. 5 is the structural representation of the copying system utilizing the nib tracking method of the present invention;
图6为本发明实施例的一种介质的结构示意图;FIG. 6 is a schematic structural diagram of a medium according to an embodiment of the present invention;
图7为本发明实施例的一种计算设备的结构示意图。FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式detailed description
下面将参考若干示例性实施方式来描述本发明的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本发明,而并非以任何方式限制本发明的范围。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。The principle and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are given only to enable those skilled in the art to better understand and implement the present invention, rather than to limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
本领域技术人员知道,本发明的实施方式可以实现为一种系统、装置、设备、方法或计算机程序产品。因此,本公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。Those skilled in the art know that the embodiments of the present invention can be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.
根据本发明的实施方式,提出了一种笔尖跟踪方法、介质及计算设备。According to the embodiments of the present invention, a pen tip tracking method, medium and computing device are proposed.
发明概述Summary of the invention
发明人研究发现,目前一部分笔尖跟踪方法在匹配算法的基础上融入卡尔曼滤波,但此种方法容易受到复杂背景干扰,从而出现目标丢失和跟踪错误等情况;另有一部分方法是采用改进的粒子滤波方法对笔尖进行跟踪,虽然丢失情况有所改进,但仍易受到光线、阴影等因素影响;另外,还有一部分方法 采用模版匹配与笔尖形状判断结合的方法实现跟踪任务,这种方法在较大程度上受到笔尖形状限制,无法适应软笔书法的应用场景。The inventors have found that some current pen tip tracking methods incorporate Kalman filtering on the basis of matching algorithms, but this method is susceptible to interference from complex backgrounds, resulting in target loss and tracking errors; another part of the method uses improved particle The filtering method tracks the nib. Although the loss situation has been improved, it is still vulnerable to factors such as light and shadows. In addition, some methods use the combination of template matching and nib shape judgment to realize the tracking task. Limited by the shape of the pen tip to a large extent, it cannot adapt to the application scenario of soft pen calligraphy.
发明人还发现,深度学习技术在图像处理、视频处理领域都有着优秀的表现,在应用较为广泛的识别、检测领域都展现出了非常出色的性能。例如,深度神经网络的局部感知特性有助于小目标跟踪任务,局部感知提出每个神经元不需要感知图像中的全部像素,只需对图像的局部像素进行感知。不同层的神经单元采用局部连接的方式,即每一层的神经单元只与前一层部分神经单元相连。这样的局部连接模式保证了学习到的模型参数对空间局部模式具有最强的响应。这种网络结构对平移、比例缩放、倾斜或者其他形式的变形具有高度不变性。而且,基于孪生网络的匹配能力来实现跟踪任务,已经成为了近期机器学习领域的热门话题。那么将孪生网络结构引入到小目标的笔尖跟踪任务中,就可以有效提升笔尖跟踪任务抗光线、阴影等因素影响的能力,同时保证跟踪任务的鲁棒性、实时性以及准确性;另外,发明人还发现,在现有的基于孪生网络进行目标跟踪方法中,在进行跟踪前均需要手动选定跟踪目标,并不适用于笔尖跟踪场景。因此,根据上述特性,本发明提出一种笔尖跟踪方法,预先对检测模型进行训练,而后使用训练后的检测模型,在对笔尖跟踪视频进行检测时,就无需人工干预,不仅实现自动检测跟踪,还克服了现有技术中容易受到光线、阴影等因素影响的问题。在介绍了本发明的基本原理之后,下面具体介绍本发明的各种非限制性实施方式。The inventors also found that deep learning technology has excellent performance in the fields of image processing and video processing, and has shown excellent performance in the fields of recognition and detection, which are widely used. For example, the local perception characteristics of deep neural networks are helpful for small target tracking tasks. Local perception proposes that each neuron does not need to perceive all pixels in the image, but only needs to perceive local pixels of the image. The neural units in different layers are connected locally, that is, the neural units in each layer are only connected to some neural units in the previous layer. Such a local connectivity pattern ensures that the learned model parameters have the strongest response to spatial local patterns. This network structure is highly invariant to translation, scaling, tilting, or other forms of deformation. Moreover, the tracking task based on the matching ability of the Siamese network has become a hot topic in the field of machine learning recently. Then, introducing the twin network structure into the pen tip tracking task of small targets can effectively improve the ability of the pen tip tracking task to resist the influence of light, shadow and other factors, while ensuring the robustness, real-time performance and accuracy of the tracking task; in addition, the invention People also found that in the existing target tracking method based on the twin network, it is necessary to manually select the tracking target before tracking, which is not suitable for the pen tip tracking scenario. Therefore, according to the above characteristics, the present invention proposes a nib tracking method, which trains the detection model in advance, and then uses the trained detection model to detect the nib tracking video without manual intervention, not only realizing automatic detection and tracking, It also overcomes the problems in the prior art that are easily affected by factors such as light and shadows. After introducing the basic principles of the present invention, various non-limiting embodiments of the present invention are described in detail below.
示例性方法exemplary method
下面参考图1来描述根据本发明示例性实施方式的笔尖跟踪方法,包括如下步骤:The pen tip tracking method according to an exemplary embodiment of the present invention is described below with reference to FIG. 1 , including the following steps:
步骤S100:获取运笔视频。Step S100: Obtain a pen-moving video.
步骤S200:采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到。Step S200: Using a specific detection model to obtain a template image from the pen movement video, the template image includes the pen tip to be tracked, and the specific detection model is trained based on a training sample set including a plurality of different pen tip images.
步骤S300:基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Step S300: Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, determine the position of the pen tip in the pen-moving video.
对于步骤S200,采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;首先需要获得该特定的检测模型,包括以下步骤:For step S200, a specific detection model is used to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images; first The need to obtain this specific detection model involves the following steps:
步骤S210:获取多个笔尖跟踪视频;在本步骤中,多个笔尖跟踪视频包括了相同拍摄条件和不同拍摄条件的多个笔尖跟踪视频片段,比如:在同一个视频中包含了多个相同拍摄条件和不同拍摄条件的笔尖跟踪视频片段,或者同一个笔尖跟踪视频中的所有片段均使用了相同的拍摄条件,而多个笔尖跟踪视频的拍摄条件又彼此不同。Step S210: Obtain multiple pen tip tracking videos; in this step, multiple pen tip tracking videos include multiple pen tip tracking video clips under the same shooting conditions and different shooting conditions, for example: multiple identical shootings are included in the same video Conditions and different shooting conditions of the tip tracking video clips, or all the clips in the same tip tracking video use the same shooting conditions, but the shooting conditions of multiple tip tracking videos are different from each other.
另外多个笔尖跟踪视频可以是事先准备好的,然后按照预设的接口或上传方式提供,例如可以是针对用户想要跟踪的笔尖事先准备好的。或者也可以是现场拍摄的笔尖跟踪视频,如:使用用户想要跟踪的笔,现场书写并进行拍摄。无论是事先准备好的,还是现场拍摄的只需满足既包括相同拍摄条件的笔尖跟踪视频片段,又包括不同拍摄条件的笔尖跟踪视频片段即可,例如,包括:在相同的拍摄光线、相同的拍摄角度、相同的拍摄背景下拍摄设的多段笔尖跟踪视频,以及在不同拍摄角度、不同拍摄光线、不同拍摄背景下的多段笔尖跟踪视频片段。又或者,在相同拍摄光线、不同拍摄角度、相同拍摄背景下的多段笔尖跟踪视频片段。In addition, multiple pen tip tracking videos may be prepared in advance, and then provided according to a preset interface or uploading method, for example, they may be prepared in advance for the pen tip that the user wants to track. Or it can be a pen tip tracking video shot on the spot, such as: using the pen that the user wants to track, writing and filming on the spot. Whether it is prepared in advance or shot on-site, it only needs to include both the pen tip tracking video clips of the same shooting conditions and the pen tip tracking video clips of different shooting conditions, for example, including: in the same shooting light, the same Shooting angles, multiple clips of pen tip tracking video under the same shooting background, and multiple clips of pen tip tracking video clips under different shooting angles, different shooting lights, and different shooting backgrounds. Or, multiple pen tip tracking video clips under the same shooting light, different shooting angles, and the same shooting background.
总之,在准备笔尖跟踪视频数据时,既可以利用预先准备的,然后按照预设的接口或上传方式提供;也可以利用现场实时拍摄的,只需满足在多个笔尖跟踪视频数据中即包括了相同拍摄条件下的笔尖跟踪视频片段,又包括了不同拍摄条件下的笔尖跟踪视频片段即可。In short, when preparing pen tip tracking video data, it can be prepared in advance and then provided according to the preset interface or upload method; it can also be used for real-time shooting on the spot, as long as it is included in multiple pen tip tracking video data The pen tip tracking video clips under the same shooting conditions may include the pen tip tracking video clips under different shooting conditions.
步骤S220:对所述多个笔尖跟踪视频进行逐帧拆分,得到多帧帧图像数据;在本步骤中,可以使用帧拆分插件,或者编写帧拆分插件,对在步骤S210中获取的全部笔尖跟踪视频进行逐帧切分,从而得到每一段笔尖跟踪视频的每一帧图像,组成了多帧帧图像数据。Step S220: split the plurality of pen tip tracking videos frame by frame to obtain multi-frame frame image data; in this step, a frame split plug-in can be used, or a frame split plug-in can be written to obtain in step S210 All the pen tip tracking videos are segmented frame by frame, so as to obtain each frame image of each pen tip tracking video, forming multiple frames of frame image data.
步骤S230:基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,并基于所述多个笔尖图像构建训练样本集;在本步骤中,基于步 骤S230得到的多帧帧图像数据,可以建立帧图像数据集{X i},那么其中每一帧帧图像则可以表示为X ij,即表示第i段视频中的第j帧图像。然后针对帧图像数据集中的每一帧图像进行检测获取笔尖图像,如采用labelme工具、labelimg工具、yolo-mark工具对帧图像数据集{X i}中的每一帧图像进行手动框选相对较小的笔尖图像;又如利用Vatic工具、Sloth工具、Rectlabel工具对帧图像数据集{X i}中的每一帧图像进行自动框选相对较小的笔尖图像。 Step S230: Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images; in this step, based on the multiple nib images obtained in step S230, For frame-by-frame image data, a frame image data set {X i } can be established, and each frame image can be expressed as X ij , which means the j-th frame image in the i-th segment of video. Then detect each frame image in the frame image data set to obtain the pen tip image, such as using the labelme tool, labelimg tool, and yolo-mark tool to manually frame each frame image in the frame image data set {X i } for comparison A small pen tip image; another example is to use the Vatic tool, the Sloth tool, and the Rectlabel tool to automatically select a relatively small pen tip image for each frame image in the frame image dataset {X i }.
另外对于在步骤S210中获取的笔尖跟踪视频,可能存在个别帧没有笔尖图像,此时则直接忽略该帧图像即可,只需将具有笔尖图像的帧图像中的笔尖部分框选出来即可,从而框选得到的全部笔尖图像就构成了训练样本集。可以用{Z}来表示训练样本集,则Z i就可以表示训练样本集中的第i张笔尖模板图像。需要说明的是在框选笔尖图像时可以使用labelme等上述工具,在其他实施例中也可以使用除了上述框选工具之外的其他框选工具,本发明技术方案对框选工具不做限制。 In addition, for the pen tip tracking video acquired in step S210, there may be individual frames without a pen tip image, and at this time, the frame image can be ignored directly, and only the pen tip part in the frame image with the pen tip image needs to be selected. Thus, all the nib images obtained by frame selection constitute the training sample set. {Z} can be used to represent the training sample set, then Z i can represent the i-th nib template image in the training sample set. It should be noted that the above-mentioned tools such as labelme can be used to frame the pen tip image, and other frame selection tools other than the above-mentioned frame selection tools can also be used in other embodiments, and the technical solution of the present invention does not limit the frame selection tools.
然后进行步骤S240:基于训练样本集进行训练得到特定的检测模型。在本步骤中,可以利用现有的检测模型,或者构建检测模型进行训练。如图3所示为本实施例构建的二维卷积网络检测模型的结构示意图,该检测模型包括依次连接的第一二维卷积层conv2D_1、第一残差模块residual_block_1、第二残差模块residual_block_2、第三残差模块residual_block_3、第四残差模块residual_block_4、第五残差模块residual_block_5,其中第三残差模块还连接有第一拼接层concatendate_1,第一拼接层concatendate_1后连接有第一二维卷积块conv2D_block_1,第一二维卷积块conv2D_block_1后还连接有第二二维卷积层conv2D_2;另外第四残差模块residual_block_4还连接有第二拼接层concatendate_2,第二拼接层concatendate_2后连接第二二维卷积块conv2D_block_2,第二二维卷积块conv2D_block_2连接第一向上采样模块upsampling2D_1,第一向上采样模块upsampling2D_1连接第三二维卷积层conv2D_3,第三二维卷积层conv2D_3连接第一拼接层concatendate_1,第二二维卷积块conv2D_block_2还连接有第四二维卷积层conv2D_4;另外第五残差模块residual_block_5分别连接第五二维卷积层conv2D_5和第六二维卷积层conv2D_6,第五二维卷积层conv2D_5还连接第二向上采样模块 upsampling_2,第二向上采样模块upsampling_2连接第二拼接层concatendate_2。其中,第一二维卷积层conv2D_1卷积核为3×3,通道数为32;第二二维卷积层conv2D_2卷积核为1×1,通道数为256;第三二维卷积层conv2D_3卷积核为1×1,通道数为128;第四二维卷积层conv2D_4卷积核为1×1,通道数为128;第五二维卷积层conv2D_5卷积核为1×1,通道数为256第六二维卷积层conv2D_5卷积核为11×11,通道数为256;第一残差模块residual_block_1卷积核为1×1,通道数为64;第二残差模块residual_block_2卷积核为2×2,通道数为128;第三残差模块residual_block_3卷积核为8×8,通道数为256;第四残差模块residual_block_4卷积核为8×8,通道数为512;第五残差模块residual_block_1卷积核为4×4,通道数为1024;第一二维卷积块conv2D_block_1和第二二维卷积块conv2D_block_2,均包括3组依次连接的卷积层组成,其中每组卷积层局包括通道数为128,卷积核为1×1的卷积层,和通道数为256,卷积核为3×3的卷积层组成。本检测模型结构较为轻便,处理速度较快,从而能够实时的对运笔视频进行检测。构建完成检测模型后就可以用在步骤S230中建立的训练样本集中的笔尖图像来训练该检测模型,根据步骤S210可知,训练样本集中的笔尖图像包含了各种拍摄条件下的笔尖图像,如包括了在相同的拍摄光线、相同的拍摄角度、相同的拍摄背景,以及在不同的拍摄光线、不同的拍摄角度、不同的拍摄背景下的笔尖图像;或者还包括部分拍摄条件相同,部分拍摄条件不同的笔尖图像,如相同的拍摄光线不同的拍摄角度、相同的拍摄角度不同的拍摄背景等。因而训练样本集中包含各种光线、各种背景、各种角度、各种阴影等不同组合下的各种形式的笔尖图像,因此,使用此训练样本集中的笔尖图像去训练该检测模型后,该检测模型具备很强的迁移能力,可以适应更广泛的应用环境,故而在面对各种拍摄条件下的运笔踪视频时,均能够从中自动检测识别笔尖图像。Then proceed to step S240: perform training based on the training sample set to obtain a specific detection model. In this step, an existing detection model can be used, or a detection model can be constructed for training. As shown in Figure 3, it is a schematic structural diagram of the two-dimensional convolutional network detection model constructed in this embodiment. The detection model includes the first two-dimensional convolutional layer conv2D_1, the first residual module residual_block_1, and the second residual module connected in sequence residual_block_2, the third residual block residual_block_3, the fourth residual block residual_block_4, and the fifth residual block residual_block_5, wherein the third residual block is also connected to the first splicing layer concatendate_1, and the first splicing layer concatendate_1 is connected to the first two-dimensional The convolution block conv2D_block_1, the first two-dimensional convolution block conv2D_block_1 is also connected with the second two-dimensional convolution layer conv2D_2; in addition, the fourth residual module residual_block_4 is also connected with the second concatenated layer concatendate_2, and the second concatenated layer concatendate_2 is connected with the second concatenated layer The second two-dimensional convolution block conv2D_block_2, the second two-dimensional convolution block conv2D_block_2 is connected to the first upsampling module upsampling2D_1, the first upsampling module upsampling2D_1 is connected to the third two-dimensional convolutional layer conv2D_3, and the third two-dimensional convolutional layer conv2D_3 is connected to the first A splicing layer concatendate_1, the second two-dimensional convolutional block conv2D_block_2 is also connected to the fourth two-dimensional convolutional layer conv2D_4; in addition, the fifth residual module residual_block_5 is respectively connected to the fifth two-dimensional convolutional layer conv2D_5 and the sixth two-dimensional convolutional layer conv2D_6, the fifth two-dimensional convolutional layer conv2D_5 is also connected to the second upsampling module upsampling_2, and the second upsampling module upsampling_2 is connected to the second concatenated layer concatendate_2. Among them, the convolution kernel of the first two-dimensional convolution layer conv2D_1 is 3×3, and the number of channels is 32; the convolution kernel of the second two-dimensional convolution layer conv2D_2 is 1×1, and the number of channels is 256; the third two-dimensional convolution The convolution kernel of layer conv2D_3 is 1×1, and the number of channels is 128; the convolution kernel of the fourth two-dimensional convolution layer conv2D_4 is 1×1, and the number of channels is 128; the convolution kernel of the fifth two-dimensional convolution layer conv2D_5 is 1× 1, the number of channels is 256 The sixth two-dimensional convolutional layer conv2D_5 convolution kernel is 11×11, the number of channels is 256; the first residual module residual_block_1 convolution kernel is 1×1, the number of channels is 64; the second residual The convolution kernel of the module residual_block_2 is 2×2 and the number of channels is 128; the convolution kernel of the third residual module residual_block_3 is 8×8 and the number of channels is 256; the convolution kernel of the fourth residual module residual_block_4 is 8×8 and the number of channels is 512; the fifth residual module residual_block_1 convolution kernel is 4×4, and the number of channels is 1024; the first two-dimensional convolution block conv2D_block_1 and the second two-dimensional convolution block conv2D_block_2, both include 3 sets of sequentially connected convolutional layers Each set of convolutional layers consists of a convolutional layer with 128 channels and a 1×1 convolution kernel, and a convolutional layer with 256 channels and a 3×3 convolution kernel. The structure of this detection model is relatively light, and the processing speed is relatively fast, so that it can detect the pen movement video in real time. After the detection model is built, the detection model can be trained with the nib images in the training sample set established in step S230. According to step S210, the nib images in the training sample set include nib images under various shooting conditions, such as including It includes the pen tip images under the same shooting light, the same shooting angle, and the same shooting background, and under different shooting light, different shooting angles, and different shooting backgrounds; or it also includes part of the same shooting conditions and some different shooting conditions. The image of the pen tip, such as the same shooting light with different shooting angles, the same shooting angle with different shooting backgrounds, etc. Therefore, the training sample set contains various forms of nib images under different combinations such as various lights, various backgrounds, various angles, and various shadows. Therefore, after using the nib images in this training sample set to train the detection model, the The detection model has a strong migration ability and can adapt to a wider range of application environments. Therefore, when faced with pen tracking videos under various shooting conditions, it can automatically detect and recognize pen tip images from them.
在上述特定检测模型构建完成后,采用该特定的检测模型从运笔视频中获取模板图像,可以将运笔视频输入至上述训练好的特定的检测模型中,并将所述特定的检测模型从所述运笔视频中检测到的第一个笔尖图像作为模板图像。其中,运笔视频中可能不会每一帧图像都有笔尖图像,因此只需实时将运笔视频输入至特定的检测模型中,当特定的检测模型检测到运笔视频中具有 笔尖图像的第一帧时,即可自动将此第一个笔尖图像识别。After the construction of the above-mentioned specific detection model is completed, the specific detection model is used to obtain a template image from the pen-moving video, and the pen-moving video can be input into the above-mentioned trained specific detection model, and the specific detection model is obtained from the described specific detection model. The first pen tip image detected in the pen movement video is used as the template image. Among them, there may not be a pen tip image in every frame of the pen movement video, so it is only necessary to input the pen movement video into a specific detection model in real time, when the specific detection model detects the first frame with the pen tip image in the pen movement video , the first pen tip image can be automatically recognized.
至此,已经完整的阐述了特定的检测模型的构建步骤,以及基于构建好的特定的检测模型获取了运笔视频中的模板图像。So far, the construction steps of the specific detection model have been fully described, and the template image in the pen-moving video has been obtained based on the specific detection model built.
接下来进行步骤S300,基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。如图4所示,其中该跟踪模型包括依次连接的第一卷积层conver_1、第一池化层pool_1、第二卷积层conv_2、第二池化层pool_2、第三卷积层conv_3、第四卷积层conv_4、第五卷积层conv_5。具体参数为,conv_1卷积核大小为11×11,步长为2,通道数为96,pool_1卷积核大小为3×3,步长为2,conv_2卷积核为5×5,步长为1,通道数为256,pool_2卷积核大小为3×3,步长为2,conv_3卷积核大小为3×3,步长为1,通道数为384,conv_4卷积核大小为3×3,步长为1,通道数为384,conv_5卷积核大小为3×3,步长为1通道数为256,该跟踪模型的网络结构具有更加轻便的优势,运行处理数据速度较快。另一方面,当步骤S200识别到第一个笔尖图像后,此处为了方便描述,将第一个笔尖图像在运笔视频中的一帧称为第一帧(此处的第一帧仅为方便描述,并不代表是运笔视频的第一帧图像),那么可知笔尖在第一帧开始出现,则该运笔视频中的笔尖位置则包括了第一帧的笔尖位置,以及第一帧之后每一帧的笔尖位置,对于第一帧的笔尖位置确定方法如下:Next, step S300 is performed, based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined. As shown in Figure 4, the tracking model includes the first convolutional layer conver_1, the first pooling layer pool_1, the second convolutional layer conv_2, the second pooling layer pool_2, the third convolutional layer conv_3, the Four convolutional layers conv_4, fifth convolutional layer conv_5. The specific parameters are, conv_1 convolution kernel size is 11×11, step size is 2, channel number is 96, pool_1 convolution kernel size is 3×3, step size is 2, conv_2 convolution kernel is 5×5, step size is 1, the number of channels is 256, the pool_2 convolution kernel size is 3×3, the step size is 2, the conv_3 convolution kernel size is 3×3, the step size is 1, the number of channels is 384, and the conv_4 convolution kernel size is 3 ×3, the step size is 1, the number of channels is 384, the conv_5 convolution kernel size is 3×3, the step size is 1, and the number of channels is 256. The network structure of this tracking model has the advantage of being more portable, and the speed of running and processing data is faster . On the other hand, after the step S200 recognizes the first nib image, here for convenience of description, a frame of the first nib image in the pen-moving video is referred to as the first frame (the first frame here is only for convenience) description, does not represent the first frame image of the pen-moving video), then it can be seen that the pen tip begins to appear in the first frame, and the position of the pen tip in the pen-moving video includes the position of the pen tip in the first frame, and every frame after the first frame. The pen tip position of the frame, the determination method for the pen tip position of the first frame is as follows:
将该第一个笔尖图像输入至跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;之后以该第一特征响应为目标类型,在所述第一个笔尖图像在所述运笔视频中所属的一帧图像上追踪该目标类型的位置(即在第一帧上追踪该目标类型的位置),并作为所述第一个笔尖图像在所述运笔视频中所属的一帧图像中的笔尖位置(即把第一特征响应在第一帧上的位置,作为第一帧的笔尖位置)。Input the first nib image into the twin network of the tracking model to obtain the first characteristic response of the first nib image; then take the first characteristic response as the target type, and then use the first nib image as the target Track the position of the target type on a frame image belonging to the pen movement video (that is, track the position of the target type on the first frame), and use it as the frame of the first pen tip image belonging to the pen movement video The position of the pen tip in the image (that is, the position of the first feature response on the first frame is taken as the position of the pen tip in the first frame).
对于第一帧之后每一帧的笔尖位置确定方法如下:The method of determining the position of the pen tip for each frame after the first frame is as follows:
将所述第一个笔尖图像输入至所述跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
将所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像输入至所述跟踪模型的孪生网络(即将从第二帧开始将之后的每一帧输入孪生网 络),得到所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像各自的第二特征响应(即第二帧及之后每一帧所对应的第二特征响应);Input each frame image after the frame where the first pen tip image is located in the pen movement video into the twin network of the tracking model (be about to input each frame after the second frame into the twin network), and obtain The respective second characteristic responses of each frame image after the frame where the first pen tip image is located in the pen movement video (i.e. the second frame and the corresponding second characteristic response of each frame thereafter);
以所述第一特征响应为目标类型匹配所述第二特征响应,将在所述第二特征响应上匹配到的该目标类型的位置,作为所述运笔视频中的笔尖位置。The second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
结合图2,对上述笔尖位置确定方法进行阐述,如下:In conjunction with Figure 2, the method for determining the position of the pen tip above is described as follows:
首先对运笔视频实时进行逐帧拆分,然后将拆分后的逐帧图像实时的传输至特定的检测模型,假设运笔视频的第T 0帧出现笔尖图像,则特定的检测模型在第T 0帧图像时自动识别到笔尖图像,并将此笔尖图像作为模板图像,并将此笔尖图像输入到孪生网络中,并提取第一特征响应,一方面以该第一特征响应为目标类型,在第T 0帧图像上跟踪其位置,并把此位置作为第T 0帧图像上的笔尖位置输出;另一方面,将运笔视频第T 0帧后的每一帧图像均输出到孪生网络中,并分别提取每一帧的第二特征响应,而后以第一特征响应为做目标类型,并将此第一特征响应的目标类型在第T 0帧以后的每一帧图像对应的第二特征响应上进行匹配,并将匹配到的目标位置(具体可以用坐标表示),作为第T 0帧图像之后每一帧上的笔尖位置输出。 Firstly, split the pen -moving video frame by frame in real time, and then transmit the split frame-by-frame images to a specific detection model in real time. When the image is framed, the nib image is automatically recognized, and the nib image is used as a template image, and the nib image is input into the Siamese network, and the first feature response is extracted. On the one hand, the first feature response is used as the target type. Track its position on the T 0 frame image, and output this position as the pen tip position on the T 0 frame image; on the other hand, output each frame image after the T 0 frame of the pen movement video to the twin network, and Extract the second feature response of each frame respectively, and then use the first feature response as the target type, and put the target type of the first feature response on the second feature response corresponding to each frame image after T0 frame Matching is performed, and the matched target position (specifically, it can be represented by coordinates) is output as the position of the pen tip on each frame after the T 0th frame image.
匹配时可以通过以下方法:The following methods can be used to match:
以f(T 0)表示孪生网络从第T 0帧图像中检测出来的笔尖图像中提取的第一特征响应; Use f(T 0 ) to represent the first feature response extracted by the twin network from the pen tip image detected in the T 0th frame image;
以f(T)表示孪生网络从第T帧图像中提取到的第二特征响应,其中T位于T 0之后; Denote the second feature response extracted by the twin network from the T-th frame image with f(T), where T is located after T 0 ;
则对第T帧图像提取的第二特征响应与第一特征响应进行卷积互相关,得到f(T 0,T)=f(T 0)*f(T),其中“*”代表卷积互相关操作,f(T 0,T)即为运笔视频的第T 0帧图像中的笔尖图像,在运笔视频的第T帧图像中的响应分布结果; Then perform convolution cross-correlation on the second characteristic response extracted from the T-th frame image and the first characteristic response, and obtain f(T 0 , T)=f(T 0 )*f(T), where "*" represents convolution Cross-correlation operation, f(T 0 , T) is the nib image in the T 0 frame image of the pen-moving video, and the response distribution result in the T-frame image of the pen-moving video;
而后将得到的此响应分布结果f(T 0,T)映射回第T帧图像中,选取响应得分最高的区域作为第T帧图像的跟踪目标的位置结果; Then the obtained response distribution result f(T 0 , T) is mapped back in the Tth frame image, and the region with the highest response score is selected as the position result of the tracking target of the Tth frame image;
当运笔视频从第T 0帧图像开始的每一帧图像均确定位置后,就得到了该运笔视频的全部笔尖跟踪结果。 When the position of each frame image starting from the T 0th frame of the pen-moving video is determined, all the pen tip tracking results of the pen-moving video are obtained.
在本实施方式的另一个实施例中,在基于帧图像数据的每一帧图像进行笔尖图像检测,建立训练样本集后,还包括:对所述训练样本集内的笔尖图像 进行归一化处理,并利用经过归一化处理的笔尖图像训练样本集内的笔尖图像对所述检测模型进行训练。具体可以采用最大值或最小值的归一化处理方法,对训练样本集内的笔尖图像进行处理,归一化处理后得到相同形式的标准笔尖图像,然后利用相同形式的标准笔尖图像去训练检测模型更加的容易、简捷。In another example of this embodiment, after the nib image detection is performed based on each frame image of the frame image data and the training sample set is established, it also includes: performing normalization processing on the nib images in the training sample set , and use the normalized pen tip images in the training sample set to train the detection model. Specifically, the normalization processing method of the maximum or minimum value can be used to process the nib images in the training sample set. After normalization, the standard nib images of the same form can be obtained, and then the standard nib images of the same form can be used for training and detection. The model is easier and simpler.
在本实施方式的另一个实施例中,所述多个笔尖跟踪视频包括硬笔笔尖跟踪视频和软笔笔尖跟踪视频。硬笔笔尖跟踪视频和软笔笔尖跟踪视频仅仅是笔尖类型不同,其获取的方式参照步骤S210获取,在此不一一赘述。那么由于涵盖了硬笔笔尖和软笔笔尖,故而在训练样本集中就包含了各种条件下的软笔笔尖图像和硬笔笔尖图像,因此检测模型使用此笔尖图像训练样本集中的笔尖图像训练后,不仅能够自动识别硬笔笔尖图像,还能自动识别软笔笔尖图像。In another example of this embodiment, the plurality of pen tip tracking videos include a hard pen tip tracking video and a soft pen tip tracking video. The tracking video of the tip of the hard pen and the tracking video of the tip of the soft pen are only different in the type of the tip, and the acquisition method thereof refers to the acquisition in step S210 , which will not be repeated here. Since it covers both hard pen nibs and soft pen nibs, the training sample set contains images of soft pen nibs and hard pen nibs under various conditions. Therefore, after the detection model uses the nib images in the training sample set of this nib image, It can automatically recognize the image of the tip of the hard pen, and it can also automatically identify the image of the tip of the soft pen.
在本实施方式的另一个实施例中,多个笔尖跟踪视频包括硬笔笔尖跟踪视频和软笔笔尖跟踪视频,其中硬笔笔尖跟踪视频中包含了相同拍摄条件的的笔尖跟踪视频片段和不同拍摄条件的笔尖跟踪视频片段,软笔笔尖跟踪视频中同样也包含了相同拍摄条件的笔尖跟踪视频片段和不同拍摄条件的笔尖跟踪视频片段。In another example of this embodiment, the multiple nib tracking videos include a hard pen nib tracking video and a soft pen nib tracking video, wherein the hard pen nib tracking video includes nib tracking video clips under the same shooting conditions and different shooting conditions. Pen tip tracking video clips, soft pen tip tracking video clips also include pen tip tracking video clips under the same shooting conditions and pen tip tracking video clips under different shooting conditions.
在本实施例方式的另一个实施例中,获取运笔视频后还包括,对所述运笔视频进行帧拆分,得到多帧运笔视频帧数据,并对所述运笔视频帧数据进行归一化处理,所述特定的检测模型基于经过归一化处理的所述运笔视频帧数据获取模板图像,以及进行笔尖位置确定。例如,获取运笔视频后,利用帧拆分工具对运笔视频进行帧拆分,得到每一帧图像,然后基于全部的帧图像,采用最大值或最小值的归一化处理方法,进行归一化处理,之后就得到了具有同向形式的运笔视频的每一帧图像,而后再将输入至检测模型进行检测笔尖图像,更加的容易、快捷。In another embodiment of the present embodiment, after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data , the specific detection model acquires a template image based on the normalized video frame data of pen movement, and determines the position of the pen tip. For example, after obtaining the pen-moving video, use the frame splitting tool to split the frame of the pen-moving video to obtain each frame image, and then use the normalization processing method of the maximum or minimum value based on all frame images to perform normalization After processing, each frame image of the pen movement video with the same direction is obtained, and then input to the detection model to detect the pen tip image, which is easier and faster.
从上述示例性实施方式的方法可知,本发明的技术方案,首先获取不同拍摄条件下的笔尖跟踪视频数据,然后对各种拍摄条件的笔尖跟踪视频数据进行逐帧切分,并获取每一帧中的笔尖图像,以建立训练样本集,然后用此训练样本集去训练检测模型,由于训练样本集中包含了各种拍摄条件下的笔尖跟 踪视频,因此得到的训练样本集中也包含了各种拍摄条件下的笔尖图像,故而经过训练的检测模型在对运笔视频进行检测时,能够准确识别不同环境下的运笔视频中的笔尖图像;而后当检测模型检测到第一个笔尖图像后,将检测到的第一个笔尖图像以及运笔视频输入到孪生网络中,通过孪生网络进行卷积计算,就可以得到第一个笔尖图像在运笔视频中每一帧的位置,从而形成笔尖跟踪结果;另一方面由于使用孪生网络对第一个笔尖图像及运笔视频的帧图像进行卷积计算,有效提升跟踪任务的抗光线、阴影等拍摄条件影响的能力,同时也保证了跟踪任务的鲁棒性、实时性以及准确性。It can be seen from the method of the above-mentioned exemplary embodiment that the technical solution of the present invention first obtains the pen tip tracking video data under different shooting conditions, then performs frame-by-frame segmentation on the pen tip tracking video data under various shooting conditions, and obtains each frame In order to establish a training sample set, and then use this training sample set to train the detection model, since the training sample set contains pen tip tracking videos under various shooting conditions, the obtained training sample set also contains various shooting conditions. The pen tip image under the conditions, so the trained detection model can accurately identify the pen tip image in the pen tip video in different environments when detecting the pen tip image; and then when the detection model detects the first pen tip image, it will detect The first pen tip image and the pen-moving video are input into the Siamese network, and the convolution calculation is performed through the twin network to obtain the position of the first pen-tip image in each frame of the pen-moving video, thereby forming the pen tip tracking result; on the other hand Due to the use of the twin network to perform convolution calculations on the first pen tip image and the frame image of the pen-moving video, it effectively improves the ability of the tracking task to resist the influence of shooting conditions such as light and shadows, and also ensures the robustness and real-time performance of the tracking task. and accuracy.
示例性介质Exemplary medium
在介绍了本发明示例性实施方式的方法、装置之后,接下来,参考图6对本发明示例性实施方式的计算机可读存储介质进行说明。After introducing the method and device of the exemplary embodiment of the present invention, next, the computer-readable storage medium of the exemplary embodiment of the present invention will be described with reference to FIG. 6 .
请结合参照图6,其示出的计算机可读存储介质为光盘200,其上存储有计算机程序(即程序产品),所述计算机程序在被处理器运行时,会实现上述方法实施方式中所记载的各步骤,例如:Please refer to FIG. 6, the computer-readable storage medium shown in it is an optical disc 200, on which a computer program (that is, a program product) is stored. Documented steps such as:
获取运笔视频;Obtain the pen video;
采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
需要说明的是,所述计算机可读存储介质的例子还可以包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他光学、磁性存储介质,在此不再一一赘述。It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random Access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other optical and magnetic storage media will not be repeated here.
示例性计算设备Exemplary Computing Device
在介绍了本发明示例性实施方式的方法、装置和介质之后,接下来, 参考图7对本发明示例性实施方式的计算设备300进行说明,图7示出了适于用来实现本发明实施方式的示例性计算设备300的框图,该计算设备300可以是计算机系统或服务器。图7显示的计算设备300仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。After introducing the method, apparatus and medium of the exemplary embodiment of the present invention, next, the computing device 300 of the exemplary embodiment of the present invention will be described with reference to FIG. A block diagram of an exemplary computing device 300, which may be a computer system or a server. The computing device 300 shown in FIG. 7 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.
如图7所示,计算设备300的组件可以包括但不限于:一个或者多个处理器或者处理单元310,系统存储器320,连接不同系统组件(包括系统存储器和处理单元310)的总线330。As shown in FIG. 7 , components of computing device 300 may include, but are not limited to: one or more processors or processing units 310 , system memory 320 , and bus 330 connecting different system components (including system memory and processing unit 310 ).
计算设备300典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算设备300访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Computing device 300 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computing device 300 and include both volatile and nonvolatile media, removable and non-removable media.
系统存储器可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM321)和/或高速缓存存储器322。计算设备300可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,ROM323可以用于读写不可移动的、非易失性磁介质(图7中未显示,通常称为“硬盘驱动器”)。尽管未在图7中示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM323,DVD-ROM323或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线相连。系统存储器中可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块324,这些程序模块324被配置以执行本发明各实施例的功能。System memory may include computer system readable media in the form of volatile memory, such as random access memory (RAM 321 ) and/or cache memory 322 . Computing device 300 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM 323 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in FIG. 7, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") may be provided, as well as a removable non-volatile disk (such as a CD-ROM 323, DVD- ROM323 or other optical media) CD-ROM drive for reading and writing. In these cases, each drive can be connected to the bus via one or more data medium interfaces. Included in the system memory is at least one program product having a set (eg, at least one) of program modules 324 configured to perform the functions of various embodiments of the present invention.
具有一组(至少一个)程序模块324的程序/实用工具325,可以存储在例如系统存储器中,且这样的程序模块324包括但不限于:操作系统、一个或者多个应用程序、其它程序模块324以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块324通常执行本发明所描述的实施例中的功能和/或方法。A program/utility tool 325 having a set (at least one) of program modules 324, which may be stored, for example, in system memory, and such program modules 324 include, but are not limited to: an operating system, one or more application programs, other program modules 324 As well as program data, each or some combination of these examples may include the implementation of the network environment. Program modules 324 generally perform the functions and/or methodologies of the described embodiments of the invention.
计算设备300也可以与一个或多个外部设备340(如键盘、指向设备、显示器等)通信。这种通信可以通过输入/输出(I/O)接口350进行。并且,计算设备300还可以通过网络适配器360与一个或者多个网络(例如局域 网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图7所示,网络适配器360通过总线与计算设备300的其它模块(如处理单元310等)通信。应当明白,尽管图7中未示出,可以结合计算设备300使用其它硬件和/或软件模块。 Computing device 300 may also communicate with one or more external devices 340 (eg, keyboards, pointing devices, displays, etc.). Such communication may occur through input/output (I/O) interface 350 . Also, computing device 300 may communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet) through network adapter 360. As shown in FIG. 7 , the network adapter 360 communicates with other modules of the computing device 300 (such as the processing unit 310 , etc.) through a bus. It should be appreciated that although not shown in FIG. 7 , other hardware and/or software modules may be used in conjunction with computing device 300 .
处理单元310通过运行存储在系统存储器中的程序,从而执行各种功能应用以及数据处理,例如:The processing unit 310 executes various functional applications and data processing by running programs stored in the system memory, such as:
获取运笔视频;Obtain the pen video;
采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
示例性临摹系统Exemplary Copying System
参照图5对本发明示例性临摹系统进行说明,图5示出了适于用来实现本发明实施方式的示例性临摹系统的结构示意图,该临摹系统包括:临摹板400、摄像头420、计算机430,以及显示屏440,其中,摄像头420用来拍摄临摹板400上临摹笔410的运笔视频,摄像头420与计算机430连接,并将拍摄到的运笔视频发送至计算机430,计算机430具体可以参照上述的示例性计算设备,计算机430用来执行上述笔尖跟踪方法,根据摄像头420拍摄的运笔视频确定笔尖的运动位置,计算机430还与显示屏440连接,用于将确定的笔尖位置在显示屏440上显示,供临摹人员参考。An exemplary tracing system of the present invention is described with reference to FIG. 5 . FIG. 5 shows a schematic structural diagram of an exemplary tracing system suitable for implementing embodiments of the present invention. The tracing system includes: a tracing board 400, a camera 420, and a computer 430, And display screen 440, wherein, camera 420 is used for taking the video of pen movement of copying pen 410 on the copy board 400, camera 420 is connected with computer 430, and the video of pen movement captured is sent to computer 430, and computer 430 can specifically refer to above-mentioned example A computing device, the computer 430 is used to execute the above-mentioned nib tracking method, and the motion position of the nib is determined according to the pen movement video taken by the camera 420, and the computer 430 is also connected with the display screen 440 for displaying the determined nib position on the display screen 440, For the reference of copyists.
此外,尽管在附图中以特定顺序描述了本发明方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。In addition, while operations of the methods of the present invention are depicted in the figures in a particular order, there is no requirement or implication that these operations must be performed in that particular order, or that all illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.
虽然已经参考若干具体实施方式描述了本发明的精神和原理,但是应该理解,本发明并不限于所公开的具体实施方式,对各方面的划分也不意味着这些方面中的特征不能组合以进行受益,这种划分仅是为了表述的方 便。本发明旨在涵盖所附权利要求的精神和范围内所包括的各种修改和等同布置。Although the spirit and principles of the invention have been described with reference to a number of specific embodiments, it should be understood that the invention is not limited to the specific embodiments disclosed, nor does division of aspects imply that features in these aspects cannot be combined to achieve optimal performance. Benefit, this division is only for the convenience of expression. The present invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是在本发明的发明构思下,利用本发明说明书及附图内容所作的等效结构变换,或直接/间接运用在其他相关的技术领域均包括在本发明的专利保护范围内。The above is only a preferred embodiment of the present invention, and does not limit the patent scope of the present invention. Under the inventive concept of the present invention, the equivalent structural transformation made by using the description of the present invention and the contents of the accompanying drawings, or direct/indirect use All other relevant technical fields are included in the patent protection scope of the present invention.
通过以上描述,本发明的实施例提供了以下的技术方案,但不限于此:Through the above description, the embodiments of the present invention provide the following technical solutions, but are not limited thereto:
1.一种笔尖跟踪方法,包括:1. A nib tracking method, comprising:
获取运笔视频;Obtain the pen video;
采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
2.如技术方案1所述的笔尖跟踪方法,其中,基于包括多个不同笔尖图像的训练样本集训练得到所述特定的检测模型,包括:2. the nib tracking method as described in technical scheme 1, wherein, obtain described specific detection model based on the training sample set training that comprises a plurality of different nib images, comprising:
获取多个笔尖跟踪视频片段;Get multiple nib tracking video clips;
对所述多个笔尖跟踪视频片段进行逐帧拆分,得到多帧帧图像数据;Carrying out frame-by-frame splitting of the multiple pen tip tracking video clips to obtain multi-frame frame image data;
基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,并基于所述多个笔尖图像构建训练样本集;Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images;
所述特定的检测模型基于所述训练样本集训练得到,以使所述特定的检测模型能够自动检测笔尖图像。The specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
3.如技术方案1或2所述的笔尖跟踪方法,其中,还包括,在所述训练样本集构建完成后,对所述训练样本集进行归一化处理,所述特定的检测模型基于归一化处理后的所述训练样本集训练得到。3. The nib tracking method as described in technical solution 1 or 2, wherein, also includes, after the construction of the training sample set is completed, carrying out normalization processing to the training sample set, and the specific detection model is based on normalization The normalized training sample set is obtained through training.
4.如技术方案1-3任一项所述的笔尖跟踪方法,其中,所述多个笔尖跟踪视频片段包括:相同拍摄条件和不同拍摄条件下拍摄的多个笔尖跟踪视频片段。4. The pen tip tracking method according to any one of technical solutions 1-3, wherein the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting conditions and different shooting conditions.
5.权利要求技术方案1-4任一项所述的笔尖跟踪方法,其中,所述相同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:5. The nib tracking method described in any one of claim technical solutions 1-4, wherein, a plurality of nib tracking video clips captured under the same shooting conditions include:
相同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频频段;Multiple pen tip tracking video frequency bands shot under the same shooting angle, shooting light, and shooting background;
所述不同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:A plurality of nib tracking video clips captured under the different shooting conditions include:
不同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频片段。Multiple nib tracking video clips taken at different camera angles, shooting lights, and shooting backgrounds.
6.如技术方案1-5任一项所述的笔尖跟踪方法,其中,所述基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,包括:6. The nib tracking method as described in any one of technical solutions 1-5, wherein said acquisition of a plurality of nib images in all said frame image data based on said multi-frame frame image data includes:
基于所述多帧帧图像数据,利用特定工具从全部所述帧图像数据中检测获取多个笔尖图像。Based on the multiple frames of frame image data, a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
7.如技术方案1-6任一项所述的笔尖跟踪方法,其中,所述多个笔尖跟踪视频片段包括硬笔笔尖跟踪视频片段和软笔笔尖跟踪视频片段。7. The pen tip tracking method according to any one of technical solutions 1-6, wherein the plurality of pen tip tracking video clips include a hard pen tip tracking video clip and a soft pen tip tracking video clip.
8.如技术方案1-7任一项所述的笔尖跟踪方法,其中,采用特定的检测模型从所述运笔视频中获取模板图像包括:8. The nib tracking method as described in any one of technical schemes 1-7, wherein, adopting a specific detection model to obtain a template image from the pen-moving video includes:
将所述特定的检测模型从所述运笔视频中检测到的第一个笔尖图像作为模板图像。The first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
9.如技术方案1-8任一项所述的笔尖跟踪方法,其中,基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置,包括:9. The nib tracking method according to any one of technical solutions 1-8, wherein, based on the template image, the pen video and the tracking model based on twin network construction, determine the nib position in the pen video, including :
将所述第一个笔尖图像输入至所述跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
以所述第一特征响应为目标类型,在所述第一个笔尖图像所属的一帧图像上追踪所述目标类型的位置,并作为所述第一个笔尖图像所属的一帧图像中的笔尖位置。Taking the first characteristic response as the target type, tracking the position of the target type on a frame image to which the first pen tip image belongs, and using it as the pen tip in the frame image to which the first pen tip image belongs Location.
10.如技术方案1-9任一项所述的笔尖跟踪方法,其中,基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置,还包括:10. as the nib tracking method described in any one of technical scheme 1-9, wherein, based on described template image, described pen video and the tracking model based on twin network construction, determine the nib position in described pen video, also include:
将所述第一个笔尖图像输入至所述跟踪模型的孪生网络,得到所述第一个笔尖图像的第一特征响应;The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;
将所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像输入至所述跟踪模型的孪生网络,得到所述运笔视频中在所述第一个笔尖图像所在帧之后的每一帧图像各自的第二特征响应;Input each frame of image after the frame of the first pen tip image in the pen movement video to the twin network of the tracking model to obtain the frame of the pen movement video after the frame of the first pen tip image The second characteristic response of each frame image;
以所述第一特征响应为目标类型匹配所述第二特征响应,将在所述第二特征响应上匹配到的所述目标类型的位置,作为所述运笔视频中的笔尖位置。The second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.
11.如技术方案1-10任一项所述的笔尖跟踪方法,其中,以所述第一特征响应为目标类型匹配所述第二特征响应,将在所述第二特征响应上匹配到的所述目标类型的位置,作为所述运笔视频中的笔尖位置包括:11. The nib tracking method as described in any one of technical solutions 1-10, wherein, matching the second characteristic response with the first characteristic response as the target type, and matching the second characteristic response on the second characteristic response The position of the target type, as the pen tip position in the pen movement video, includes:
对所述第一特征响应和所述第二特征响应进行卷积互相关计算,得到所述第一个笔尖图像在所述运笔视频中所属的一帧图像后的每一帧图像中的响应分布结果;Carrying out convolutional cross-correlation calculations on the first characteristic response and the second characteristic response to obtain the response distribution of the first pen tip image in each frame of image after the one frame of image in the pen movement video result;
将所述响应分布结果映射到所述运笔视频所对应的每一帧图像中,选取响应得分最高的位置作为每一帧图像中的笔尖位置。The response distribution result is mapped to each frame of image corresponding to the pen movement video, and the position with the highest response score is selected as the position of the pen tip in each frame of image.
12.如技术方案1-11任一项所述笔尖跟踪方法,其中,获取运笔视频后还包括,对所述运笔视频进行帧拆分,得到多帧运笔视频帧数据,并对所述运笔视频帧数据进行归一化处理,所述特定的检测模型基于经过归一化处理的所述运笔视频帧数据获取模板图像,以及进行笔尖位置确定。12. as described in any one of technical scheme 1-11 nib tracking method, wherein, also comprise after obtaining the pen movement video, carry out frame splitting to described pen movement video, obtain multi-frame pen movement video frame data, and described pen movement video The frame data is subjected to normalization processing, and the specific detection model acquires a template image based on the normalized processing of the pen-moving video frame data, and determines the position of the pen tip.
13.一种介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如技术方案1-12中任一项所述的方法。13. A medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the method according to any one of technical solutions 1-12 is implemented.
14.一种计算设备,其特征在于,所述计算设备包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如技术方案1-12中任一项所述的方法。14. A computing device, characterized in that the computing device includes a processor, and the processor is configured to implement the method according to any one of technical solutions 1-12 when executing a computer program stored in a memory.

Claims (10)

  1. 一种笔尖跟踪方法,包括:A pen tip tracking method comprising:
    获取运笔视频;Obtain the pen video;
    采用特定的检测模型从所述运笔视频中获取模板图像,所述模板图像包括待跟踪的笔尖,所述特定的检测模型基于包括多个不同笔尖图像的训练样本集训练得到;Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;
    基于所述模板图像、所述运笔视频以及基于孪生网络构建的跟踪模型,确定所述运笔视频中的笔尖位置。Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
  2. 如权利要求1所述的笔尖跟踪方法,其中,基于包括多个不同笔尖图像的训练样本集训练得到所述特定的检测模型,包括:The nib tracking method according to claim 1, wherein the specific detection model is obtained by training based on a training sample set comprising a plurality of different nib images, comprising:
    获取多个笔尖跟踪视频片段;Get multiple nib tracking video clips;
    对所述多个笔尖跟踪视频片段进行逐帧拆分,得到多帧帧图像数据;Carrying out frame-by-frame splitting of the multiple pen tip tracking video clips to obtain multi-frame frame image data;
    基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,并基于所述多个笔尖图像构建训练样本集;Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images;
    所述特定的检测模型基于所述训练样本集训练得到,以使所述特定的检测模型能够自动检测笔尖图像。The specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
  3. 如权利要求2所述的笔尖跟踪方法,其中,还包括,在所述训练样本集构建完成后,对所述训练样本集进行归一化处理,所述特定的检测模型基于归一化处理后的所述训练样本集训练得到。The nib tracking method according to claim 2, further comprising, after the training sample set is constructed, performing normalization processing on the training sample set, and the specific detection model is based on normalization processing. The training sample set is obtained by training.
  4. 如权利要求2所述的笔尖跟踪方法,其中,所述多个笔尖跟踪视频片段包括:相同拍摄条件和不同拍摄条件下拍摄的多个笔尖跟踪视频片段。The pen tip tracking method according to claim 2, wherein the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting condition and different shooting conditions.
  5. 权利要求4所述的笔尖跟踪方法,其中,所述相同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:The nib tracking method according to claim 4, wherein the multiple nib tracking video clips captured under the same shooting conditions include:
    相同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频频段;Multiple pen tip tracking video frequency bands shot under the same shooting angle, shooting light, and shooting background;
    所述不同拍摄条件下拍摄的多个笔尖跟踪视频片段包括:A plurality of nib tracking video clips captured under the different shooting conditions include:
    不同拍摄角度、拍摄光线、拍摄背景下拍摄的多个笔尖跟踪视频片段。Multiple nib tracking video clips taken at different camera angles, shooting lights, and shooting backgrounds.
  6. 如权利要求2-5任一项所述的笔尖跟踪方法,其中,所述基于所述多帧帧图像数据,获取全部所述帧图像数据中的多个笔尖图像,包括:The nib tracking method according to any one of claims 2-5, wherein said acquiring a plurality of nib images in all said frame image data based on said multi-frame frame image data comprises:
    基于所述多帧帧图像数据,利用特定工具从全部所述帧图像数据中检测获取多个笔尖图像。Based on the multiple frames of frame image data, a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
  7. 如权利要求2-5任一项所述的笔尖跟踪方法,其中,所述多个笔尖跟踪视频片段包括硬笔笔尖跟踪视频片段和软笔笔尖跟踪视频片段。The pen tip tracking method according to any one of claims 2-5, wherein the plurality of pen tip tracking video clips include a hard pen tip tracking video clip and a soft pen tip tracking video clip.
  8. 如权利要求1所述的笔尖跟踪方法,其中,采用特定的检测模型从所述运笔视频中获取模板图像包括:The pen tip tracking method according to claim 1, wherein, adopting a specific detection model to obtain a template image from the pen movement video includes:
    将所述特定的检测模型从所述运笔视频中检测到的第一个笔尖图像作为模板图像。The first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
  9. 一种介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-8中任一项所述的方法。A medium on which a computer program is stored, wherein the computer program implements the method according to any one of claims 1-8 when executed by a processor.
  10. 一种计算设备,其特征在于,所述计算设备包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1-8中任一项所述的方法。A computing device, characterized in that the computing device includes a processor, and the processor is configured to implement the method according to any one of claims 1-8 when executing a computer program stored in a memory.
PCT/CN2021/115507 2021-07-23 2021-08-31 Pen tip tracking method, medium, and computing device WO2023000442A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110841194.8 2021-07-23
CN202110841194.8A CN113449695A (en) 2021-07-23 2021-07-23 Pen point tracking method, medium and computing device

Publications (1)

Publication Number Publication Date
WO2023000442A1 true WO2023000442A1 (en) 2023-01-26

Family

ID=77817218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115507 WO2023000442A1 (en) 2021-07-23 2021-08-31 Pen tip tracking method, medium, and computing device

Country Status (2)

Country Link
CN (1) CN113449695A (en)
WO (1) WO2023000442A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170255319A1 (en) * 2016-03-06 2017-09-07 Microsoft Technology Licensing, Llc Pen location detection
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111429482A (en) * 2020-03-19 2020-07-17 上海眼控科技股份有限公司 Target tracking method and device, computer equipment and storage medium
CN111612822A (en) * 2020-05-21 2020-09-01 广州海格通信集团股份有限公司 Object tracking method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170255319A1 (en) * 2016-03-06 2017-09-07 Microsoft Technology Licensing, Llc Pen location detection
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111429482A (en) * 2020-03-19 2020-07-17 上海眼控科技股份有限公司 Target tracking method and device, computer equipment and storage medium
CN111612822A (en) * 2020-05-21 2020-09-01 广州海格通信集团股份有限公司 Object tracking method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEN JING, CHEN JUN-LIN: "Multi-templates Pen Tip Tracking Algorithm Based on Particle Filtering", COMPUTER ENGINEERING, SHANGHAI JISUANJI XUEHUI, CN, vol. 37, no. 21, 5 November 2011 (2011-11-05), CN , pages 136 - 140, XP093026690, ISSN: 1000-3428, DOI: 10.3969/j.issn.1000-3428.2011.21.046 *

Also Published As

Publication number Publication date
CN113449695A (en) 2021-09-28

Similar Documents

Publication Publication Date Title
US11734826B2 (en) Image segmentation method and apparatus, computer device, and storage medium
US8467596B2 (en) Method and apparatus for object pose estimation
US20100194679A1 (en) Gesture recognition system and method thereof
CN110322500A (en) Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN111640181A (en) Interactive video projection method, device, equipment and storage medium
US20220044352A1 (en) Cross-domain image translation
Schmidt et al. COMPARATIVE ASSESSMENT OF POINT FEATURE DETECTORS AND DESCRIPTORS IN THE CONTEXT OF ROBOT NAVIGATION.
KR102285915B1 (en) Real-time 3d gesture recognition and tracking system for mobile devices
US9721387B2 (en) Systems and methods for implementing augmented reality
JP2011008687A (en) Image processor
Beyeler OpenCV with Python blueprints
Xiong et al. Snap angle prediction for 360 panoramas
CN115761826A (en) Palm vein effective area extraction method, system, medium and electronic device
CN115008454A (en) Robot online hand-eye calibration method based on multi-frame pseudo label data enhancement
WO2022237048A1 (en) Pose acquisition method and apparatus, and electronic device, storage medium and program
WO2020147258A1 (en) Remote desktop operation method and apparatus, readable storage medium, and terminal device
US9569661B2 (en) Apparatus and method for neck and shoulder landmark detection
CN110188630A (en) A kind of face identification method and camera
WO2023000442A1 (en) Pen tip tracking method, medium, and computing device
CA2806149A1 (en) Method and system for gesture-based human-machine interaction and computer-readable medium thereof
Sun et al. Uni6Dv2: Noise elimination for 6D pose estimation
CN109840017A (en) A kind of panoramic picture methods of exhibiting, system and storage medium
CN112184766B (en) Object tracking method and device, computer equipment and storage medium
Santellani et al. S-TREK: Sequential Translation and Rotation Equivariant Keypoints for local feature extraction
Wang et al. KVNet: An iterative 3D keypoints voting network for real-time 6-DoF object pose estimation

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE