WO2023000442A1

WO2023000442A1 - Pen tip tracking method, medium, and computing device

Info

Publication number: WO2023000442A1
Application number: PCT/CN2021/115507
Authority: WO
Inventors: 向大凤
Original assignee: 北京华文众合科技有限公司
Priority date: 2021-07-23
Filing date: 2021-08-31
Publication date: 2023-01-26
Also published as: CN113449695A

Abstract

Disclosed are a pen tip tracking method, a medium, and a computing device. The method comprises: obtaining a pen wielding video; using a specific detection model to obtain a template image from the pen wielding video, the template image comprising a pen tip to be tracked, and the specific detection model being obtained by means of training on the basis of a training sample set comprising a plurality of different pen tip images; and determining a pen tip position in the pen wielding video on the basis of the template image, the pen wielding video, and a tracking model constructed on the basis of a twin network. By means of the pen tip tracking method, pen tip images in pen wielding videos in different environments can be accurately identified; moreover, the ability of a tracking task to resist the influence of photographing conditions such as light and shadow is effectively improved, and robustness, real-time performance, and accuracy of the tracking task are also ensured.

Description

Pen tip tracking method, medium and computing device

technical field

The invention relates to the field of image tracking, in particular to a pen tip tracking method, medium and computing equipment.

Background technique

In the current pen tip tracking tasks, most of them regard the pen tip as a small target, and then rely on the method of template matching to achieve tracking. However, there are also obvious differences between the nib and ordinary small targets. The nib target is easily affected by factors such as light and shadow during the writing process, resulting in a decrease in tracking accuracy, and even missing the target when the writing speed is too fast.

Contents of the invention

The main purpose of the present invention is to provide a pen tip tracking method, medium and computing device, aiming to solve the problems mentioned in the background art.

In order to achieve the above object, the present invention proposes a nib tracking method, comprising:

Obtain the pen video;

Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;

Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.

Optionally, the specific detection model is obtained by training based on a training sample set including a plurality of different pen tip images, including:

Get multiple nib tracking video clips;

Carrying out frame-by-frame splitting of the multiple pen tip tracking video clips to obtain multi-frame frame image data;

Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images;

The specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.

Optionally, after the training sample set is constructed, normalization processing is performed on the training sample set, and the specific detection model is trained based on the normalized training sample set.

Optionally, the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting condition and different shooting conditions.

Optionally, a plurality of pen tip tracking video clips taken under the same shooting conditions include:

Multiple pen tip tracking video frequency bands shot under the same shooting angle, shooting light, and shooting background;

A plurality of nib tracking video clips captured under the different shooting conditions include:

Multiple nib tracking video clips taken at different camera angles, shooting lights, and shooting backgrounds.

Optionally, the acquiring a plurality of pen tip images in all the frame image data based on the multi-frame frame image data includes:

Based on the multiple frames of frame image data, a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.

Optionally, the plurality of pen tip tracking video clips include hard pen tip tracking video clips and soft pen tip tracking video clips.

Optionally, using a specific detection model to obtain a template image from the pen-moving video includes:

The first pen tip image detected by the specific detection model from the pen movement video is used as a template image.

Optionally, based on the template image, the pen-moving video, and a tracking model based on twin network construction, determining the position of the pen tip in the pen-moving video includes:

The first pen tip image is input to the twin network of the tracking model to obtain the first characteristic response of the first pen tip image;

Taking the first characteristic response as the target type, tracking the position of the target type on a frame image to which the first pen tip image belongs, and using it as the pen tip in the frame image to which the first pen tip image belongs Location.

Optionally, based on the template image, the pen-moving video, and a tracking model based on twin network construction, determining the position of the pen tip in the pen-moving video also includes:

Input each frame of image after the frame of the first pen tip image in the pen movement video to the twin network of the tracking model to obtain the frame of the pen movement video after the frame of the first pen tip image The second characteristic response of each frame image;

The second characteristic response is matched with the first characteristic response as the target type, and the position of the target type matched on the second characteristic response is used as the position of the pen tip in the pen movement video.

Optionally, matching the second characteristic response with the first characteristic response as the target type, and using the position of the target type matched on the second characteristic response as the pen tip position in the pen movement video include:

Carrying out convolution cross-correlation calculation on the first characteristic response and the second characteristic response to obtain the response distribution of the first pen tip image in each frame image after the one frame image belonging to the pen movement video result;

The response distribution result is mapped to each frame of image corresponding to the pen movement video, and the position with the highest response score is selected as the position of the pen tip in each frame of image.

Optionally, after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data, and the specific detection model is based on The normalized video frame data of pen movement is used to acquire a template image and determine the position of the pen tip.

The present invention also proposes a medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.

The present invention also proposes a computing device, which includes a processor, and the processor is configured to implement the method described in any one of the above when executing a computer program stored in a memory.

In the technical solution of the present invention, first, the detection model is trained based on a training sample set including a plurality of different nib images, so the detection model has a strong migration ability and can adapt to a wider range of application environments, so when tracking the nib task, It can automatically identify various pen tip images; secondly, after automatically identifying the pen tip images, use the tracking model based on the twin network to track the recognized pen tip images, which can ensure the robustness, real-time and accuracy of the tracking task sex.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to the structures shown in these drawings without creative effort.

Fig. 1 is a step diagram of an embodiment of the nib tracking method of the present invention;

Fig. 2 is the flowchart of an embodiment of the nib tracking method of the present invention;

Fig. 3 is the structural representation of the detection model in the nib tracking method of the present invention;

Fig. 4 is the structural representation of the tracking model in the nib tracking method of the present invention;

Fig. 5 is the structural representation of the copying system utilizing the nib tracking method of the present invention;

FIG. 6 is a schematic structural diagram of a medium according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.

The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

detailed description

The principle and spirit of the present invention will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are given only to enable those skilled in the art to better understand and implement the present invention, rather than to limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art know that the embodiments of the present invention can be implemented as a system, device, device, method or computer program product. Therefore, the present disclosure may be embodied in the form of complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

According to the embodiments of the present invention, a pen tip tracking method, medium and computing device are proposed.

Summary of the invention

The inventors have found that some current pen tip tracking methods incorporate Kalman filtering on the basis of matching algorithms, but this method is susceptible to interference from complex backgrounds, resulting in target loss and tracking errors; another part of the method uses improved particle The filtering method tracks the nib. Although the loss situation has been improved, it is still vulnerable to factors such as light and shadows. In addition, some methods use the combination of template matching and nib shape judgment to realize the tracking task. Limited by the shape of the pen tip to a large extent, it cannot adapt to the application scenario of soft pen calligraphy.

The inventors also found that deep learning technology has excellent performance in the fields of image processing and video processing, and has shown excellent performance in the fields of recognition and detection, which are widely used. For example, the local perception characteristics of deep neural networks are helpful for small target tracking tasks. Local perception proposes that each neuron does not need to perceive all pixels in the image, but only needs to perceive local pixels of the image. The neural units in different layers are connected locally, that is, the neural units in each layer are only connected to some neural units in the previous layer. Such a local connectivity pattern ensures that the learned model parameters have the strongest response to spatial local patterns. This network structure is highly invariant to translation, scaling, tilting, or other forms of deformation. Moreover, the tracking task based on the matching ability of the Siamese network has become a hot topic in the field of machine learning recently. Then, introducing the twin network structure into the pen tip tracking task of small targets can effectively improve the ability of the pen tip tracking task to resist the influence of light, shadow and other factors, while ensuring the robustness, real-time performance and accuracy of the tracking task; in addition, the invention People also found that in the existing target tracking method based on the twin network, it is necessary to manually select the tracking target before tracking, which is not suitable for the pen tip tracking scenario. Therefore, according to the above characteristics, the present invention proposes a nib tracking method, which trains the detection model in advance, and then uses the trained detection model to detect the nib tracking video without manual intervention, not only realizing automatic detection and tracking, It also overcomes the problems in the prior art that are easily affected by factors such as light and shadows. After introducing the basic principles of the present invention, various non-limiting embodiments of the present invention are described in detail below.

exemplary method

The pen tip tracking method according to an exemplary embodiment of the present invention is described below with reference to FIG. 1 , including the following steps:

Step S100: Obtain a pen-moving video.

Step S200: Using a specific detection model to obtain a template image from the pen movement video, the template image includes the pen tip to be tracked, and the specific detection model is trained based on a training sample set including a plurality of different pen tip images.

Step S300: Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, determine the position of the pen tip in the pen-moving video.

For step S200, a specific detection model is used to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images; first The need to obtain this specific detection model involves the following steps:

Step S210: Obtain multiple pen tip tracking videos; in this step, multiple pen tip tracking videos include multiple pen tip tracking video clips under the same shooting conditions and different shooting conditions, for example: multiple identical shootings are included in the same video Conditions and different shooting conditions of the tip tracking video clips, or all the clips in the same tip tracking video use the same shooting conditions, but the shooting conditions of multiple tip tracking videos are different from each other.

In addition, multiple pen tip tracking videos may be prepared in advance, and then provided according to a preset interface or uploading method, for example, they may be prepared in advance for the pen tip that the user wants to track. Or it can be a pen tip tracking video shot on the spot, such as: using the pen that the user wants to track, writing and filming on the spot. Whether it is prepared in advance or shot on-site, it only needs to include both the pen tip tracking video clips of the same shooting conditions and the pen tip tracking video clips of different shooting conditions, for example, including: in the same shooting light, the same Shooting angles, multiple clips of pen tip tracking video under the same shooting background, and multiple clips of pen tip tracking video clips under different shooting angles, different shooting lights, and different shooting backgrounds. Or, multiple pen tip tracking video clips under the same shooting light, different shooting angles, and the same shooting background.

In short, when preparing pen tip tracking video data, it can be prepared in advance and then provided according to the preset interface or upload method; it can also be used for real-time shooting on the spot, as long as it is included in multiple pen tip tracking video data The pen tip tracking video clips under the same shooting conditions may include the pen tip tracking video clips under different shooting conditions.

Step S220: split the plurality of pen tip tracking videos frame by frame to obtain multi-frame frame image data; in this step, a frame split plug-in can be used, or a frame split plug-in can be written to obtain in step S210 All the pen tip tracking videos are segmented frame by frame, so as to obtain each frame image of each pen tip tracking video, forming multiple frames of frame image data.

Step S230: Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images; in this step, based on the multiple nib images obtained in step S230, For frame-by-frame image data, a frame image data set {X _i } can be established, and each frame image can be expressed as X _ij , which means the j-th frame image in the i-th segment of video. Then detect each frame image in the frame image data set to obtain the pen tip image, such as using the labelme tool, labelimg tool, and yolo-mark tool to manually frame each frame image in the frame image data set {X _i } for comparison A small pen tip image; another example is to use the Vatic tool, the Sloth tool, and the Rectlabel tool to automatically select a relatively small pen tip image for each frame image in the frame image dataset {X _i }.

In addition, for the pen tip tracking video acquired in step S210, there may be individual frames without a pen tip image, and at this time, the frame image can be ignored directly, and only the pen tip part in the frame image with the pen tip image needs to be selected. Thus, all the nib images obtained by frame selection constitute the training sample set. {Z} can be used to represent the training sample set, then Z _i can represent the i-th nib template image in the training sample set. It should be noted that the above-mentioned tools such as labelme can be used to frame the pen tip image, and other frame selection tools other than the above-mentioned frame selection tools can also be used in other embodiments, and the technical solution of the present invention does not limit the frame selection tools.

Then proceed to step S240: perform training based on the training sample set to obtain a specific detection model. In this step, an existing detection model can be used, or a detection model can be constructed for training. As shown in Figure 3, it is a schematic structural diagram of the two-dimensional convolutional network detection model constructed in this embodiment. The detection model includes the first two-dimensional convolutional layer conv2D_1, the first residual module residual_block_1, and the second residual module connected in sequence residual_block_2, the third residual block residual_block_3, the fourth residual block residual_block_4, and the fifth residual block residual_block_5, wherein the third residual block is also connected to the first splicing layer concatendate_1, and the first splicing layer concatendate_1 is connected to the first two-dimensional The convolution block conv2D_block_1, the first two-dimensional convolution block conv2D_block_1 is also connected with the second two-dimensional convolution layer conv2D_2; in addition, the fourth residual module residual_block_4 is also connected with the second concatenated layer concatendate_2, and the second concatenated layer concatendate_2 is connected with the second concatenated layer The second two-dimensional convolution block conv2D_block_2, the second two-dimensional convolution block conv2D_block_2 is connected to the first upsampling module upsampling2D_1, the first upsampling module upsampling2D_1 is connected to the third two-dimensional convolutional layer conv2D_3, and the third two-dimensional convolutional layer conv2D_3 is connected to the first A splicing layer concatendate_1, the second two-dimensional convolutional block conv2D_block_2 is also connected to the fourth two-dimensional convolutional layer conv2D_4; in addition, the fifth residual module residual_block_5 is respectively connected to the fifth two-dimensional convolutional layer conv2D_5 and the sixth two-dimensional convolutional layer conv2D_6, the fifth two-dimensional convolutional layer conv2D_5 is also connected to the second upsampling module upsampling_2, and the second upsampling module upsampling_2 is connected to the second concatenated layer concatendate_2. Among them, the convolution kernel of the first two-dimensional convolution layer conv2D_1 is 3×3, and the number of channels is 32; the convolution kernel of the second two-dimensional convolution layer conv2D_2 is 1×1, and the number of channels is 256; the third two-dimensional convolution The convolution kernel of layer conv2D_3 is 1×1, and the number of channels is 128; the convolution kernel of the fourth two-dimensional convolution layer conv2D_4 is 1×1, and the number of channels is 128; the convolution kernel of the fifth two-dimensional convolution layer conv2D_5 is 1× 1, the number of channels is 256 The sixth two-dimensional convolutional layer conv2D_5 convolution kernel is 11×11, the number of channels is 256; the first residual module residual_block_1 convolution kernel is 1×1, the number of channels is 64; the second residual The convolution kernel of the module residual_block_2 is 2×2 and the number of channels is 128; the convolution kernel of the third residual module residual_block_3 is 8×8 and the number of channels is 256; the convolution kernel of the fourth residual module residual_block_4 is 8×8 and the number of channels is 512; the fifth residual module residual_block_1 convolution kernel is 4×4, and the number of channels is 1024; the first two-dimensional convolution block conv2D_block_1 and the second two-dimensional convolution block conv2D_block_2, both include 3 sets of sequentially connected convolutional layers Each set of convolutional layers consists of a convolutional layer with 128 channels and a 1×1 convolution kernel, and a convolutional layer with 256 channels and a 3×3 convolution kernel. The structure of this detection model is relatively light, and the processing speed is relatively fast, so that it can detect the pen movement video in real time. After the detection model is built, the detection model can be trained with the nib images in the training sample set established in step S230. According to step S210, the nib images in the training sample set include nib images under various shooting conditions, such as including It includes the pen tip images under the same shooting light, the same shooting angle, and the same shooting background, and under different shooting light, different shooting angles, and different shooting backgrounds; or it also includes part of the same shooting conditions and some different shooting conditions. The image of the pen tip, such as the same shooting light with different shooting angles, the same shooting angle with different shooting backgrounds, etc. Therefore, the training sample set contains various forms of nib images under different combinations such as various lights, various backgrounds, various angles, and various shadows. Therefore, after using the nib images in this training sample set to train the detection model, the The detection model has a strong migration ability and can adapt to a wider range of application environments. Therefore, when faced with pen tracking videos under various shooting conditions, it can automatically detect and recognize pen tip images from them.

After the construction of the above-mentioned specific detection model is completed, the specific detection model is used to obtain a template image from the pen-moving video, and the pen-moving video can be input into the above-mentioned trained specific detection model, and the specific detection model is obtained from the described specific detection model. The first pen tip image detected in the pen movement video is used as the template image. Among them, there may not be a pen tip image in every frame of the pen movement video, so it is only necessary to input the pen movement video into a specific detection model in real time, when the specific detection model detects the first frame with the pen tip image in the pen movement video , the first pen tip image can be automatically recognized.

So far, the construction steps of the specific detection model have been fully described, and the template image in the pen-moving video has been obtained based on the specific detection model built.

Next, step S300 is performed, based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined. As shown in Figure 4, the tracking model includes the first convolutional layer conver_1, the first pooling layer pool_1, the second convolutional layer conv_2, the second pooling layer pool_2, the third convolutional layer conv_3, the Four convolutional layers conv_4, fifth convolutional layer conv_5. The specific parameters are, conv_1 convolution kernel size is 11×11, step size is 2, channel number is 96, pool_1 convolution kernel size is 3×3, step size is 2, conv_2 convolution kernel is 5×5, step size is 1, the number of channels is 256, the pool_2 convolution kernel size is 3×3, the step size is 2, the conv_3 convolution kernel size is 3×3, the step size is 1, the number of channels is 384, and the conv_4 convolution kernel size is 3 ×3, the step size is 1, the number of channels is 384, the conv_5 convolution kernel size is 3×3, the step size is 1, and the number of channels is 256. The network structure of this tracking model has the advantage of being more portable, and the speed of running and processing data is faster . On the other hand, after the step S200 recognizes the first nib image, here for convenience of description, a frame of the first nib image in the pen-moving video is referred to as the first frame (the first frame here is only for convenience) description, does not represent the first frame image of the pen-moving video), then it can be seen that the pen tip begins to appear in the first frame, and the position of the pen tip in the pen-moving video includes the position of the pen tip in the first frame, and every frame after the first frame. The pen tip position of the frame, the determination method for the pen tip position of the first frame is as follows:

Input the first nib image into the twin network of the tracking model to obtain the first characteristic response of the first nib image; then take the first characteristic response as the target type, and then use the first nib image as the target Track the position of the target type on a frame image belonging to the pen movement video (that is, track the position of the target type on the first frame), and use it as the frame of the first pen tip image belonging to the pen movement video The position of the pen tip in the image (that is, the position of the first feature response on the first frame is taken as the position of the pen tip in the first frame).

The method of determining the position of the pen tip for each frame after the first frame is as follows:

Input each frame image after the frame where the first pen tip image is located in the pen movement video into the twin network of the tracking model (be about to input each frame after the second frame into the twin network), and obtain The respective second characteristic responses of each frame image after the frame where the first pen tip image is located in the pen movement video (i.e. the second frame and the corresponding second characteristic response of each frame thereafter);

In conjunction with Figure 2, the method for determining the position of the pen tip above is described as follows:

Firstly, split the _pen -moving video frame by frame in real time, and then transmit the split frame-by-frame images to a specific detection model in _real time. When the image is framed, the nib image is automatically recognized, and the nib image is used as a template image, and the nib image is input into the Siamese network, and the first feature response is extracted. On the one hand, the first feature response is used as the target type. Track its position on the T ₀ frame image, and output this position as the pen tip position on the T ₀ frame image; on the other hand, output each frame image after the T ₀ frame of the pen movement video to the twin network, and Extract the second feature response of each frame respectively, and then use the first feature response as the target type, and put the target type of the first feature response on the second feature response corresponding to each frame image after _T0 frame Matching is performed, and the matched target position (specifically, it can be represented by coordinates) is output as the position of the pen tip on each frame after the T _0th frame image.

The following methods can be used to match:

Use f(T ₀ ) to represent the first feature response extracted by the twin network from the pen tip image detected in the T _0th frame image;

Denote the second feature response extracted by the twin network from the T-th frame image with f(T), where T is located after T ₀ ;

Then perform convolution cross-correlation on the second characteristic response extracted from the T-th frame image and the first characteristic response, and obtain f(T ₀ , T)=f(T ₀ )*f(T), where "*" represents convolution Cross-correlation operation, f(T ₀ , T) is the nib image in the T ₀ frame image of the pen-moving video, and the response distribution result in the T-frame image of the pen-moving video;

Then the obtained response distribution result f(T ₀ , T) is mapped back in the Tth frame image, and the region with the highest response score is selected as the position result of the tracking target of the Tth frame image;

When the position of each frame image starting from the T _0th frame of the pen-moving video is determined, all the pen tip tracking results of the pen-moving video are obtained.

In another example of this embodiment, after the nib image detection is performed based on each frame image of the frame image data and the training sample set is established, it also includes: performing normalization processing on the nib images in the training sample set , and use the normalized pen tip images in the training sample set to train the detection model. Specifically, the normalization processing method of the maximum or minimum value can be used to process the nib images in the training sample set. After normalization, the standard nib images of the same form can be obtained, and then the standard nib images of the same form can be used for training and detection. The model is easier and simpler.

In another example of this embodiment, the plurality of pen tip tracking videos include a hard pen tip tracking video and a soft pen tip tracking video. The tracking video of the tip of the hard pen and the tracking video of the tip of the soft pen are only different in the type of the tip, and the acquisition method thereof refers to the acquisition in step S210 , which will not be repeated here. Since it covers both hard pen nibs and soft pen nibs, the training sample set contains images of soft pen nibs and hard pen nibs under various conditions. Therefore, after the detection model uses the nib images in the training sample set of this nib image, It can automatically recognize the image of the tip of the hard pen, and it can also automatically identify the image of the tip of the soft pen.

In another example of this embodiment, the multiple nib tracking videos include a hard pen nib tracking video and a soft pen nib tracking video, wherein the hard pen nib tracking video includes nib tracking video clips under the same shooting conditions and different shooting conditions. Pen tip tracking video clips, soft pen tip tracking video clips also include pen tip tracking video clips under the same shooting conditions and pen tip tracking video clips under different shooting conditions.

In another embodiment of the present embodiment, after obtaining the pen-moving video, it also includes: performing frame splitting on the pen-moving video to obtain multi-frame pen-moving video frame data, and performing normalization processing on the pen-moving video frame data , the specific detection model acquires a template image based on the normalized video frame data of pen movement, and determines the position of the pen tip. For example, after obtaining the pen-moving video, use the frame splitting tool to split the frame of the pen-moving video to obtain each frame image, and then use the normalization processing method of the maximum or minimum value based on all frame images to perform normalization After processing, each frame image of the pen movement video with the same direction is obtained, and then input to the detection model to detect the pen tip image, which is easier and faster.

It can be seen from the method of the above-mentioned exemplary embodiment that the technical solution of the present invention first obtains the pen tip tracking video data under different shooting conditions, then performs frame-by-frame segmentation on the pen tip tracking video data under various shooting conditions, and obtains each frame In order to establish a training sample set, and then use this training sample set to train the detection model, since the training sample set contains pen tip tracking videos under various shooting conditions, the obtained training sample set also contains various shooting conditions. The pen tip image under the conditions, so the trained detection model can accurately identify the pen tip image in the pen tip video in different environments when detecting the pen tip image; and then when the detection model detects the first pen tip image, it will detect The first pen tip image and the pen-moving video are input into the Siamese network, and the convolution calculation is performed through the twin network to obtain the position of the first pen-tip image in each frame of the pen-moving video, thereby forming the pen tip tracking result; on the other hand Due to the use of the twin network to perform convolution calculations on the first pen tip image and the frame image of the pen-moving video, it effectively improves the ability of the tracking task to resist the influence of shooting conditions such as light and shadows, and also ensures the robustness and real-time performance of the tracking task. and accuracy.

Exemplary medium

After introducing the method and device of the exemplary embodiment of the present invention, next, the computer-readable storage medium of the exemplary embodiment of the present invention will be described with reference to FIG. 6 .

Please refer to FIG. 6, the computer-readable storage medium shown in it is an optical disc 200, on which a computer program (that is, a program product) is stored. Documented steps such as:

Obtain the pen video;

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random Access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other optical and magnetic storage media will not be repeated here.

Exemplary Computing Device

After introducing the method, apparatus and medium of the exemplary embodiment of the present invention, next, the computing device 300 of the exemplary embodiment of the present invention will be described with reference to FIG. A block diagram of an exemplary computing device 300, which may be a computer system or a server. The computing device 300 shown in FIG. 7 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

As shown in FIG. 7 , components of computing device 300 may include, but are not limited to: one or more processors or processing units 310 , system memory 320 , and bus 330 connecting different system components (including system memory and processing unit 310 ).

Computing device 300 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computing device 300 and include both volatile and nonvolatile media, removable and non-removable media.

System memory may include computer system readable media in the form of volatile memory, such as random access memory (RAM 321 ) and/or cache memory 322 . Computing device 300 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM 323 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in FIG. 7, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk") may be provided, as well as a removable non-volatile disk (such as a CD-ROM 323, DVD- ROM323 or other optical media) CD-ROM drive for reading and writing. In these cases, each drive can be connected to the bus via one or more data medium interfaces. Included in the system memory is at least one program product having a set (eg, at least one) of program modules 324 configured to perform the functions of various embodiments of the present invention.

A program/utility tool 325 having a set (at least one) of program modules 324, which may be stored, for example, in system memory, and such program modules 324 include, but are not limited to: an operating system, one or more application programs, other program modules 324 As well as program data, each or some combination of these examples may include the implementation of the network environment. Program modules 324 generally perform the functions and/or methodologies of the described embodiments of the invention.

Computing device 300 may also communicate with one or more external devices 340 (eg, keyboards, pointing devices, displays, etc.). Such communication may occur through input/output (I/O) interface 350 . Also, computing device 300 may communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet) through network adapter 360. As shown in FIG. 7 , the network adapter 360 communicates with other modules of the computing device 300 (such as the processing unit 310 , etc.) through a bus. It should be appreciated that although not shown in FIG. 7 , other hardware and/or software modules may be used in conjunction with computing device 300 .

The processing unit 310 executes various functional applications and data processing by running programs stored in the system memory, such as:

Obtain the pen video;

Exemplary Copying System

An exemplary tracing system of the present invention is described with reference to FIG. 5 . FIG. 5 shows a schematic structural diagram of an exemplary tracing system suitable for implementing embodiments of the present invention. The tracing system includes: a tracing board 400, a camera 420, and a computer 430, And display screen 440, wherein, camera 420 is used for taking the video of pen movement of copying pen 410 on the copy board 400, camera 420 is connected with computer 430, and the video of pen movement captured is sent to computer 430, and computer 430 can specifically refer to above-mentioned example A computing device, the computer 430 is used to execute the above-mentioned nib tracking method, and the motion position of the nib is determined according to the pen movement video taken by the camera 420, and the computer 430 is also connected with the display screen 440 for displaying the determined nib position on the display screen 440, For the reference of copyists.

In addition, while operations of the methods of the present invention are depicted in the figures in a particular order, there is no requirement or implication that these operations must be performed in that particular order, or that all illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution.

Although the spirit and principles of the invention have been described with reference to a number of specific embodiments, it should be understood that the invention is not limited to the specific embodiments disclosed, nor does division of aspects imply that features in these aspects cannot be combined to achieve optimal performance. Benefit, this division is only for the convenience of expression. The present invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

The above is only a preferred embodiment of the present invention, and does not limit the patent scope of the present invention. Under the inventive concept of the present invention, the equivalent structural transformation made by using the description of the present invention and the contents of the accompanying drawings, or direct/indirect use All other relevant technical fields are included in the patent protection scope of the present invention.

Through the above description, the embodiments of the present invention provide the following technical solutions, but are not limited thereto:

1. A nib tracking method, comprising:

Obtain the pen video;

2. the nib tracking method as described in technical scheme 1, wherein, obtain described specific detection model based on the training sample set training that comprises a plurality of different nib images, comprising:

Get multiple nib tracking video clips;

3. The nib tracking method as described in technical solution 1 or 2, wherein, also includes, after the construction of the training sample set is completed, carrying out normalization processing to the training sample set, and the specific detection model is based on normalization The normalized training sample set is obtained through training.

4. The pen tip tracking method according to any one of technical solutions 1-3, wherein the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting conditions and different shooting conditions.

5. The nib tracking method described in any one of claim technical solutions 1-4, wherein, a plurality of nib tracking video clips captured under the same shooting conditions include:

6. The nib tracking method as described in any one of technical solutions 1-5, wherein said acquisition of a plurality of nib images in all said frame image data based on said multi-frame frame image data includes:

7. The pen tip tracking method according to any one of technical solutions 1-6, wherein the plurality of pen tip tracking video clips include a hard pen tip tracking video clip and a soft pen tip tracking video clip.

8. The nib tracking method as described in any one of technical schemes 1-7, wherein, adopting a specific detection model to obtain a template image from the pen-moving video includes:

9. The nib tracking method according to any one of technical solutions 1-8, wherein, based on the template image, the pen video and the tracking model based on twin network construction, determine the nib position in the pen video, including :

10. as the nib tracking method described in any one of technical scheme 1-9, wherein, based on described template image, described pen video and the tracking model based on twin network construction, determine the nib position in described pen video, also include:

11. The nib tracking method as described in any one of technical solutions 1-10, wherein, matching the second characteristic response with the first characteristic response as the target type, and matching the second characteristic response on the second characteristic response The position of the target type, as the pen tip position in the pen movement video, includes:

Carrying out convolutional cross-correlation calculations on the first characteristic response and the second characteristic response to obtain the response distribution of the first pen tip image in each frame of image after the one frame of image in the pen movement video result;

12. as described in any one of technical scheme 1-11 nib tracking method, wherein, also comprise after obtaining the pen movement video, carry out frame splitting to described pen movement video, obtain multi-frame pen movement video frame data, and described pen movement video The frame data is subjected to normalization processing, and the specific detection model acquires a template image based on the normalized processing of the pen-moving video frame data, and determines the position of the pen tip.

13. A medium, on which a computer program is stored, wherein when the computer program is executed by a processor, the method according to any one of technical solutions 1-12 is implemented.

14. A computing device, characterized in that the computing device includes a processor, and the processor is configured to implement the method according to any one of technical solutions 1-12 when executing a computer program stored in a memory.

Claims

A pen tip tracking method comprising:

Obtain the pen video;

Using a specific detection model to obtain a template image from the pen movement video, the template image includes a pen tip to be tracked, and the specific detection model is obtained based on a training sample set including a plurality of different pen tip images;

Based on the template image, the pen-moving video and the tracking model constructed based on the Siamese network, the position of the pen tip in the pen-moving video is determined.
The nib tracking method according to claim 1, wherein the specific detection model is obtained by training based on a training sample set comprising a plurality of different nib images, comprising:

Get multiple nib tracking video clips;

Carrying out frame-by-frame splitting of the multiple pen tip tracking video clips to obtain multi-frame frame image data;

Based on the multi-frame frame image data, obtain a plurality of nib images in all the frame image data, and construct a training sample set based on the plurality of nib images;

The specific detection model is trained based on the training sample set, so that the specific detection model can automatically detect the pen tip image.
The nib tracking method according to claim 2, further comprising, after the training sample set is constructed, performing normalization processing on the training sample set, and the specific detection model is based on normalization processing. The training sample set is obtained by training.
The pen tip tracking method according to claim 2, wherein the multiple pen tip tracking video clips include: multiple pen tip tracking video clips shot under the same shooting condition and different shooting conditions.
The nib tracking method according to claim 4, wherein the multiple nib tracking video clips captured under the same shooting conditions include:

Multiple pen tip tracking video frequency bands shot under the same shooting angle, shooting light, and shooting background;

A plurality of nib tracking video clips captured under the different shooting conditions include:

Multiple nib tracking video clips taken at different camera angles, shooting lights, and shooting backgrounds.
The nib tracking method according to any one of claims 2-5, wherein said acquiring a plurality of nib images in all said frame image data based on said multi-frame frame image data comprises:

Based on the multiple frames of frame image data, a specific tool is used to detect and acquire multiple pen tip images from all the frame image data.
The pen tip tracking method according to any one of claims 2-5, wherein the plurality of pen tip tracking video clips include a hard pen tip tracking video clip and a soft pen tip tracking video clip.
The pen tip tracking method according to claim 1, wherein, adopting a specific detection model to obtain a template image from the pen movement video includes:

The first pen tip image detected by the specific detection model from the pen movement video is used as a template image.
A medium on which a computer program is stored, wherein the computer program implements the method according to any one of claims 1-8 when executed by a processor.
A computing device, characterized in that the computing device includes a processor, and the processor is configured to implement the method according to any one of claims 1-8 when executing a computer program stored in a memory.