CN113095347A

CN113095347A - Deep learning-based mark recognition method and training method, system and electronic equipment thereof

Info

Publication number: CN113095347A
Application number: CN202010022240.7A
Authority: CN
Inventors: 孙俊; 蒋坤君; 胡增新
Original assignee: Sunny Optical Zhejiang Research Institute Co Ltd
Current assignee: Sunny Optical Zhejiang Research Institute Co Ltd
Priority date: 2020-01-09
Filing date: 2020-01-09
Publication date: 2021-07-09

Abstract

Provided are a mark identification method and a training method based on deep learning, a system thereof and electronic equipment. The mark identification method based on deep learning comprises the following steps: acquiring an original mark image, wherein the original mark image is an image obtained by shooting a mark through an image acquisition device; and inputting the original marked image into a pre-trained mark identification model constructed based on the multitask cascade convolution network for mark identification so as to output the positions of the marked frame and the key point on the original marked image, thereby realizing the identification of the mark.

Description

Deep learning-based mark recognition method and training method, system and electronic equipment thereof

Technical Field

The invention relates to the technical field of mark recognition, in particular to a mark recognition method and a training method based on deep learning, a system thereof and electronic equipment.

Background

A Marker (Marker), which is a particular type of flat pattern, is recognized from a visual scene, typically using algorithms such as image processing, to determine the coordinates of the 4 vertices and center point of the Marker. Since the marker is usually preset and its size parameters are also known, the inside and outside parameters of the camera can be solved according to the correspondence between the marker coordinate system and the camera coordinate system, and the marker can also be used as a marker relative to the world coordinate system of the camera, so that marker recognition is often applied in the fields of camera calibration, robot navigation, Augmented Reality (AR), and the like.

The existing mark recognition method is usually based on the traditional image processing technology, and uses the edge information, the geometric information and the chroma information of the mark for recognition. Specifically, in the existing identification method, a marked image is converted into a binary image through graying and threshold segmentation processing technologies, then an enclosing frame of the mark is obtained through technologies such as corrosion, frame extraction and Hough (Hough) transformation, and finally a feature point set on the mark is obtained through means such as seed filling and geometric limitation.

However, since the existing mark recognition method utilizes the edge information, the geometric information and the chromaticity information of the mark, the recognition accuracy of the existing mark recognition method is directly affected by the quality of the image of the mark and is extremely sensitive to factors such as the uniformity of illumination and jitter. For example, when a part of the acquired mark image is too dark and too bright due to uneven illumination in the application scene, or the acquired mark image is blurred due to the jitter of the AR device, the noise of the effective information is greatly increased, and even the effective information is lost, and finally the mark cannot be accurately identified. Particularly, for the AR device, the lighting condition in the application scene where the AR device is located is poor, and the AR device may shake along with the limb movement of the user when the AR device is worn by the user, so that the existing marker identification method is difficult to be applied to the AR field; that is, the marker recognition result obtained by the existing marker recognition method cannot be used in SLAM positioning of the AR device.

Disclosure of Invention

An advantage of the present invention is to provide a deep learning-based tag recognition method and training method, a system and an electronic device thereof, which can improve robustness against uneven illumination and camera shake so as to obtain an accurate tag recognition result in an application scene with poor illumination conditions or shake.

Another advantage of the present invention is to provide a deep learning-based mark recognition method, a training method, a system thereof, and an electronic device, wherein in an embodiment of the present invention, the deep learning-based mark recognition method employs an end-to-end deep learning technique, and the frame and the key point coordinates of the mark can be directly output only by inputting the mark image, which is helpful for reducing difficulty in mark recognition and improving recognition accuracy.

Another advantage of the present invention is to provide a method and a system for recognizing a mark based on deep learning, and an electronic device using the method and the system, wherein in an embodiment of the present invention, the method can avoid using only edge information, geometric information, and chrominance information of the mark to solve the problem of sensitivity to uneven illumination and image blur, as in the conventional method for recognizing a mark.

Another advantage of the present invention is to provide a deep learning-based mark recognition method and training method, and a system and an electronic device thereof, wherein in an embodiment of the present invention, the deep learning-based mark recognition method can participate in model training by using more blurred or uneven-illumination mark image samples, so as to adapt to various application scenarios with poor jitter or illumination conditions, and can implement more stable mark recognition.

Another advantage of the present invention is to provide a deep learning-based tag identification method, a training method, a system thereof and an electronic device, wherein in an embodiment of the present invention, the deep learning-based tag identification method does not require a large number of labeled samples, and can perform image enhancement processing on a labeled pattern by means of image fusion to generate a large number of labeled samples, which is helpful to reduce the cost of model training.

Another advantage of the present invention is to provide a deep learning-based tag identification method, a training method, a system thereof, and an electronic device, wherein in an embodiment of the present invention, the deep learning-based tag identification method has the characteristics of fast identification speed, high identification accuracy, and low omission factor, and can control the time overhead of tag identification within a reasonable range to meet the requirements of practical application scenarios.

Another advantage of the present invention is to provide a deep learning-based mark recognition method and training method, and a system and an electronic device thereof, wherein in order to achieve the above advantages, the present invention does not need to adopt a complex structure and a huge amount of computation, and has low requirements on software and hardware. Therefore, the present invention successfully and effectively provides a solution, which not only provides a deep learning-based mark recognition method and training method, and a system and an electronic device thereof, but also increases the practicability and reliability of the deep learning-based mark recognition method and training method, and the system and the electronic device thereof.

To achieve at least one of the above advantages or other advantages and objects, the present invention provides a deep learning based mark recognition method, including the steps of:

acquiring an original mark image, wherein the original mark image is an image obtained by shooting a mark through an image acquisition device; and

and inputting the original marked image into a pre-trained mark identification model constructed based on the multitask cascade convolution network for mark identification so as to output the positions of the marked frame and the key point on the original marked image, thereby realizing the identification of the mark.

In an embodiment of the present invention, the step of acquiring an original mark image, wherein the original mark image is an image obtained by capturing a mark by an image capturing device, includes the steps of:

preprocessing the original marked image to generate an image pyramid, wherein the image pyramid comprises preprocessed images of different sizes;

inputting the preprocessed images in the image pyramid into a recommendation network of the mark identification model one by one to generate a plurality of candidate areas of the marked frame on the original marked image;

inputting the candidate area and the original marked image into an optimization network of the mark identification model to generate an optimization area of the frame of the mark on the original marked image; and

the preferred region and the original mark image are input to an output network of the mark recognition model to output the frame coordinates and the key point coordinates of the mark.

In an embodiment of the present invention, the step of inputting the candidate region and the original marked image into the optimization network of the mark recognition model to generate the optimized region of the border of the mark on the original marked image includes the steps of:

according to the candidate area, cutting the original marked image and adjusting the original marked image to a first preset size to obtain a corresponding candidate image; and

and optimizing the candidate image through the optimization network to obtain the optimized region after filtering and adjusting the candidate region.

In an embodiment of the present invention, the step of inputting the preferred region and the original mark image into an output network of the mark recognition model to output the frame coordinates and the key point coordinates of the mark includes the steps of:

according to the optimized area, cutting the original marked image and adjusting the original marked image to a second preset size to obtain a corresponding optimized image; and

and performing border regression and key point positioning processing on the optimized image through the optimization network to determine the real positions of the marked border and the marked key point on the original marked image respectively.

In an embodiment of the present invention, the first predetermined size is 24 × 24; and the second predetermined dimension is 48 x 48.

In an embodiment of the invention, the image acquisition device is one selected from a camera, a machine vision device, and an AR device.

According to another aspect of the present invention, the present invention further provides a training method of a tag recognition model, comprising the steps of:

acquiring labeling samples of a plurality of labeled images; and

and training a recommendation network, an optimization network and an output network in a label recognition model based on the label sample of the label image.

In an embodiment of the present invention, the step of training a recommendation network, an optimization network and an output network in a label recognition model based on the labeled sample of the labeled image includes the steps of:

carrying out image enhancement processing on the original marked images to obtain a plurality of marked images; and

and fusing the marked images into the detection frames of the target detection samples to generate a large number of marked samples.

According to another aspect of the present invention, the present invention also provides a deep learning based tag recognition system for recognizing a tag in an original tag image, wherein the deep learning based tag recognition system comprises:

the acquisition module is used for acquiring the original mark image, wherein the original mark image is an image obtained by shooting a mark through an image acquisition device; and

and the mark identification module is used for inputting the original mark image into a mark identification model which is trained in advance and constructed based on the multitask cascade convolution network to carry out mark identification so as to output the positions of the marked frame and the key point on the original mark image, thereby realizing the identification of the mark.

In an embodiment of the present invention, the mark recognition module includes a preprocessing module, a region recommendation module, a region optimization module, and a region output module, which are communicably connected to each other, wherein the preprocessing module is configured to preprocess the original mark image to generate an image pyramid, and the image pyramid includes preprocessed images with different sizes; the region recommending module is used for inputting the preprocessed images in the image pyramid into a recommending network of the mark identifying model one by one so as to generate a plurality of candidate regions of the marked border on the original marked image; the region optimization module is used for inputting the candidate region and the original mark image into an optimization network of the mark identification model so as to generate an optimized region of the frame of the mark on the original mark image; the region output module is used for inputting the preferred region and the original mark image into an output network of the mark identification model so as to output the frame coordinates and the key point coordinates of the mark.

According to another aspect of the present invention, the present invention also provides an electronic device comprising:

at least one processor configured to execute instructions; and

a memory communicatively coupled to the at least one processor, wherein the memory has at least one instruction, wherein the instruction is executable by the at least one processor to cause the at least one processor to perform some or all of the steps of a deep learning based token identification method, wherein the deep learning based token identification method comprises the steps of:

an AR device; and

a deep learning based marker recognition system, wherein the deep learning based marker recognition system is configured with the AR device for recognizing markers in an original marker image captured via the AR device, wherein the deep learning based marker recognition system comprises, communicatively connected in sequence:

Further objects and advantages of the invention will be fully apparent from the ensuing description and drawings.

These and other objects, features and advantages of the present invention will become more fully apparent from the following detailed description, the accompanying drawings and the claims.

Drawings

Fig. 1 is a flowchart illustrating a deep learning-based mark recognition method according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart illustrating the mark recognition step of the deep learning-based mark recognition method according to the above embodiment of the present invention.

Fig. 3 is a schematic flow chart illustrating a region optimization step in the deep learning-based mark identification method according to the above embodiment of the present invention.

Fig. 4 is a flowchart illustrating a region output step in the deep learning based mark identification method according to the above embodiment of the present invention.

Fig. 5 shows an application example of the deep learning-based mark recognition method according to the above-described embodiment of the present invention.

Fig. 6 is a schematic diagram illustrating a framework of a recommendation network in a tag identification model adopted by the deep learning-based tag identification method according to the above embodiment of the present invention.

FIG. 7 is a schematic diagram of a framework of an optimized network in the tag recognition model according to the above embodiment of the present invention.

Fig. 8 is a schematic diagram of a framework of an output network in the tag recognition model according to the above embodiment of the present invention.

FIG. 9 is a flowchart illustrating a method for training a tag recognition model according to an embodiment of the present invention.

FIG. 10 shows a block diagram schematic of a deep learning based tag recognition system according to an embodiment of the invention.

FIG. 11 shows a block diagram schematic of an electronic device according to an embodiment of the invention.

FIG. 12 illustrates a perspective view of another electronic device in accordance with an embodiment of the present invention.

Detailed Description

The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.

In the present invention, the terms "a" and "an" in the claims and the description should be understood as meaning "one or more", that is, one element may be one in number in one embodiment, and the element may be more than one in number in another embodiment. The terms "a" and "an" should not be construed as limiting the number unless the number of such elements is explicitly recited as one in the present disclosure, but rather the terms "a" and "an" should not be construed as being limited to only one of the number.

In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "connected" and "connected" are to be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

In application scenarios such as camera calibration, robot navigation, augmented reality, and the like, the recognition result of the marker is often used for solving internal and external parameters of the camera or in SLAM positioning, and the accuracy of the recognition result of the marker also directly affects the accuracy of the internal and external parameters of the camera or SLAM positioning. The existing mark identification method generally utilizes edge information, geometric information and chrominance information of a mark to identify the mark, so that the identification precision of the existing mark identification method is directly influenced by the quality of a marked image and is extremely sensitive to factors such as illumination uniformity, jitter and the like. The lighting condition in the application scene is poor and movement or jitter is easy to occur, so that the quality problem of uneven or fuzzy lighting usually occurs in the collected mark image, further noise of effective information is greatly improved, even the effective information is lost, and the mark cannot be accurately identified by the existing mark identification method.

In recent years, although the application of the deep learning technology is more and more extensive with the rapid development of the deep learning technology, the existing target detection method based on the deep learning can only detect the frame (i.e. the bounding box) of the mark, and can not determine the key points (i.e. the central point and the four corner points) of the mark. Therefore, after obtaining the frame of the mark, the existing target detection method based on deep learning still needs to introduce the existing mark identification method to determine the key point of the mark by using the edge information, the geometric information and the chrominance information of the mark. Thus, in the case that the mark is fuzzy or the lighting condition is poor, the existing target detection method based on deep learning still has difficulty in accurately identifying the key point of the mark.

Therefore, in order to solve the above problems, the present invention proposes a deep learning-based marker recognition method and training method, a system thereof, and an electronic device, which can improve robustness against uneven illumination and camera shake so as to obtain an accurate marker recognition result in an application scenario with poor illumination conditions or shake, especially an application scenario of SLAM positioning of an AR device.

Illustrative method

Referring to fig. 1-4 of the drawings, a deep learning based marker identification method according to an embodiment of the present invention is illustrated. Specifically, as shown in fig. 1, the deep learning-based mark recognition method includes the steps of:

s100: acquiring an original mark image, wherein the original mark image is an image obtained by shooting a mark through an image acquisition device; and

s200: inputting the original marked image into a pre-trained mark identification model constructed based on a multitask cascade convolution network for mark identification so as to output the positions of the marked frame and the key point on the original marked image, thereby realizing the identification of the mark.

It is noted that the image capturing device may be, but is not limited to, implemented as a camera, a machine vision device, or an AR device (e.g., AR glasses), etc. a mobile device having the function of capturing the image of the Mark (Mark). Because the mark identification model is constructed based on the multitask cascade convolution network, the positions of the frame and the key points (namely the center point and the four corner points of the mark) of the mark in the original mark image can be directly identified and output, the mark identification method based on the deep learning does not need to determine the key points of the mark by utilizing the edge information, the geometric information and the chrominance information of the mark, so that the robustness of the mark identification method to scenes such as uneven illumination or shaking is improved, and the method is particularly suitable for SLAM positioning scenes of AR glasses. In addition, in the process of training the mark recognition model, the method can adopt more fuzzy or uneven-illumination image samples to carry out model training so as to further improve the robustness of the mark recognition method based on deep learning to fuzzy or poor-illumination conditions and further realize more stable mark detection.

Particularly, as a large number of marks (marks) are not needed in practical application, although the Mark identification model in the deep learning-based Mark identification method is constructed based on the multitask cascaded convolutional network, the time overhead of Mark identification can be controlled within a reasonable range, so as to utilize the characteristics of high detection speed, high detection precision and low omission factor of the multitask cascaded convolutional network to the maximum extent.

More specifically, in the step S100 of the present invention, the mark may be a planar pattern of a specific pattern, such as a two-dimensional code or the like.

According to the above embodiment of the present invention, as shown in fig. 2, the step S200 of the deep learning based mark recognition method of the present invention may include the steps of:

s210: pre-processing the original marked image to generate an image pyramid, wherein the image pyramid comprises pre-processed images of different sizes;

s220: inputting the preprocessed images in the image pyramid one by one to a recommendation network of the mark recognition model to generate a plurality of candidate regions of the border of the mark on the original mark image;

s230: inputting the candidate region and the original mark image into an optimization network of the mark recognition model to generate an optimized region of the border of the mark on the original mark image; and

s240: and inputting the optimized area and the original mark image into an output network of the mark identification model to output the frame coordinates and the key point coordinates of the mark.

It is noted that, as shown in fig. 5, in the step S210 of the present invention, preferably with the aspect ratio maintained, the original mark image is scaled to several different sizes to obtain pre-processed images of different sizes, and then the pre-processed images are sequentially stacked from large to small or from small to large to form an image pyramid having several layers, so as to scale the marks in the original mark image to the proper size that can be detected by the mark recognition model, which is helpful for recognizing marks of different sizes. For example, the original mark image is reduced to six different sizes with the aspect ratio maintained, so as to obtain six sizes of preprocessed images, and further obtain an image pyramid with a six-layer structure.

As shown in fig. 5, in step S220 of the present invention, for all the preprocessed images in the image pyramid, the recommended network (P-Net for short) of the mark recognition model first obtains the probability of belonging to a mark and the regression result of the marked frame, and then maps back to the original mark image to obtain a large number of candidate regions of the marked frame on the original mark image.

Exemplarily, as shown in fig. 6, a preprocessed image with a size of 12 × 12 is input, and is first subjected to Convolution (Convolution, Conv) with a size of 3 × 3 and a kernel number of 10 and max pooling (MaxPooling, MP) with a size of 3 × 3, to obtain a feature map with a size of 5 × 10; obtaining a 3 × 16 characteristic diagram through convolution with the size of 3 × 3 and the number of kernels of 16; then, obtaining a feature map of 3 × 64 by convolution with the size of 3 × 3 and the number of kernels of 32; finally, convolution with the size of 1 × 1 and the number of kernels of 2, 4 and 10 respectively is performed to obtain the following data corresponding to the candidate regions: mark classification (MarkerClassification), bounding box regression (bounding box regression), and mark keypoint localization (markrpoints localization).

In more detail, in this example, the label is classified in 1 x 2 dimensions for representing the probability that the candidate region is the label; the frame returns to 1 x 4 dimensions and is used for representing the pixel coordinate values of the upper part, the lower part, the left part and the right part of the candidate area; the key points of the mark are positioned in 1 x 10 dimensions and are used for representing the pixel coordinates of five key points (namely, the center point and the four corner points of the mark) of the mark corresponding to the candidate region, wherein each key point comprises a 2-dimensional horizontal coordinate and a 2-dimensional vertical coordinate.

It should be noted that, in the above embodiment of the present invention, as shown in fig. 3, the step S230 of the deep learning based mark identification method may include the steps of:

s231: according to the candidate area, cutting the original marked image and adjusting the original marked image to a first preset size to obtain a corresponding candidate image; and

s232: and optimizing the candidate image through the optimization network to obtain the optimized region after filtering and adjusting the candidate region.

Preferably, the first predetermined size is implemented as 24 × 24, that is, as shown in fig. 5, the candidate image with the size of 24 × 24 is input into the optimization network (R-Net), so as to filter the candidate region with a low score by using a bounding box score threshold, and to filter a part of redundant candidate regions by using a non-maximum suppression (NMS) method, so as to adjust the positions of the remaining candidate regions to obtain the preferred region.

Illustratively, as shown in fig. 7, the candidate image with the size of 24 × 24 is input, and the convolution with the size of 3 × 3 and the kernel number of 28 and the maximum pooling with the size of 3 × 3 are firstly performed to obtain a feature map with the size of 11 × 28; obtaining a 4 × 48 characteristic diagram through convolution with the size of 3 × 3 and the number of kernels of 48 and maximum pooling with the size of 3 × 3; then, obtaining a characteristic diagram of 1 × 32 through convolution with the size of 2 × 2 and the kernel number of 64; and finally, obtaining 128-dimensional output through a full connection layer to respectively obtain mark classification, frame regression and mark key point positioning corresponding to the optimized region.

In more detail, in this example, the label is classified in 2 dimensions for representing the probability that the preferred region is the label; the frame returns to 4 dimensions and is used for representing the pixel coordinate values of the upper part, the lower part, the left part and the right part of the preferred area; the key points of the mark are positioned in 10 dimensions and are used for representing the pixel coordinates of five key points (namely, the center point and the four corner points of the mark) of the mark corresponding to the preferred area, wherein each key point comprises 2-dimensional horizontal and vertical coordinates.

Accordingly, in the above embodiment of the present invention, as shown in fig. 4, the step S240 of the deep learning based mark recognition method may include the steps of:

s241: according to the optimized area, cutting the original marked image and adjusting the cut original marked image to a second preset size to obtain a corresponding optimized image; and

s242: and performing border regression and key point positioning processing on the optimized image through the output network to determine the real positions of the marked border and the marked key point on the original marked image respectively.

Preferably, the first predetermined size is implemented as 24 × 24, that is, as shown in fig. 5, the candidate image with the size of 24 × 24 is input into the output network (english output network, abbreviated as O-Net) to filter the preferred region with a low score through a frame score threshold, and after the positions of the remaining preferred regions are adjusted, a part of redundant preferred regions are removed through a non-maximum suppression (NMS) method, so as to determine the real positions of the frames of the marks on the original mark image, and determine the real positions of the key points of the marks on the original mark image through key point regression.

Illustratively, as shown in fig. 8, the preferred image with size 48 × 48 is input, and is first subjected to convolution with size 3 × 3 and kernel number 32 and maximum pooling with size 3 × 3, to obtain a feature map of 23 × 32; obtaining a 10 × 64 characteristic diagram through convolution with the size of 3 × 3 and the number of kernels of 32 and maximum pooling with the size of 3 × 3; then obtaining a 4 x 64 characteristic diagram through convolution with the size of 3 x 3 and the kernel number of 64 and maximum pooling with the size of 2 x 2; then, obtaining a 3 × 128 feature map through convolution with the size of 2 × 2 and the number of kernels of 128; and finally, obtaining 256-dimensional output through a full connection layer to respectively obtain mark classification, frame regression and marked key point positioning corresponding to the marked output region.

In more detail, in this example, the label is classified in 2 dimensions for representing the probability that the output region is the label; the frame returns to 4 dimensions and is used for representing pixel coordinate values of the upper part, the lower part, the left part and the right part of the output area; the key points of the mark are positioned in 10 dimensions and are used for representing the pixel coordinates of five key points (namely, the center point and the four corner points of the mark) of the mark corresponding to the output area, wherein each key point comprises 2-dimensional horizontal and vertical coordinates.

According to another aspect of the present invention, the present invention further provides a training method of the label recognition model. Specifically, in an embodiment of the present invention, as shown in fig. 9, the training method of the label recognition model includes the steps of:

s310: acquiring labeling samples of a plurality of labeled images; and

s320: and training a recommendation network, an optimization network and an output network in a label recognition model based on the label sample of the label image.

It is to be noted that, in the process of obtaining the labeled sample in the step S310, the training method of the label recognition model of the present invention can process the labeled image by using image enhancement means such as flipping, illumination non-uniformity, and blurring in an image fusion manner, and further fuse the processed labeled image into the detection frame of the open-source target detection sample to generate a large number of labeled samples. Therefore, the training method of the marker recognition model of the invention can obtain a large number of marker samples through image enhancement without acquiring a large number of marker images in advance for model training, is beneficial to improving the robustness of the marker recognition model to factors such as poor illumination conditions, jitter and the like, and can also reduce the training cost of the marker recognition model.

Illustratively, as shown in fig. 9, the step S310 of the training method of the token recognition model of the present invention may include the steps of:

s311: carrying out image enhancement processing on the original marked image to obtain a plurality of marked images; and

s312: and fusing the marked images into a detection frame of a target detection sample to generate a large number of marked samples.

Illustrative System

Referring to FIG. 10 of the drawings, a deep learning based marker recognition system for recognizing a marker in an original marker image to obtain the location of the marker's bounding box and keypoints on the original marker image according to an embodiment of the present invention is illustrated. Specifically, as shown in fig. 10, the deep learning based marker recognition system 1 may include an acquisition module 10 and a marker recognition module 20 communicably connected to each other, wherein the acquisition module 10 is configured to acquire the original marker image, wherein the original marker image is an image obtained by capturing a marker via an image capturing device; the mark recognition module 20 is configured to input the original mark image into a mark recognition model which is trained in advance and constructed based on a multitask cascaded convolutional network for mark recognition, so as to output positions of a frame and a key point of the mark on the original mark image, thereby realizing recognition of the mark.

More specifically, as shown in fig. 10, the mark recognition module 20 includes a preprocessing module 21, an area recommendation module 22, an area optimization module 23, and an area output module 24, which are communicatively connected to each other, wherein the preprocessing module 21 is configured to preprocess the original mark image to generate an image pyramid, wherein the image pyramid includes preprocessed images of different sizes; the region recommendation module 22 is configured to input the preprocessed images in the image pyramid to a recommendation network of the mark identification model one by one, so as to generate a plurality of candidate regions of the marked border on the original marked image; wherein the region optimization module 23 is configured to input the candidate region and the original mark image into an optimization network of the mark recognition model to generate an optimized region of the frame of the mark on the original mark image; wherein the region output module 24 is configured to input the preferred region and the original mark image into an output network of the mark recognition model to output the frame coordinates and the key point coordinates of the mark.

In an example of the present invention, the region optimization module 23 is further configured to crop and adjust the original marked image to a first predetermined size according to the candidate region to obtain a corresponding candidate image; and performing optimization processing on the candidate image through the optimization network to obtain the optimized region after filtering and adjusting the candidate region.

In an example of the present invention, the region output module 24 is further configured to crop and adjust the original marked image to a second predetermined size according to the optimized region, so as to obtain a corresponding optimized image; and performing border regression and key point positioning processing on the optimized image through the optimization network to determine the real positions of the marked border and the key point on the original marked image respectively.

Illustrative electronic device

Next, an electronic apparatus according to an embodiment of the present invention is described with reference to fig. 11. As shown in fig. 11, the electronic device 90 includes one or more processors 91 and memory 92.

The processor 91 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 90 to perform desired functions. In other words, the processor 91 comprises one or more physical devices configured to execute instructions. For example, the processor 91 may be configured to execute instructions that are part of: one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, implement a technical effect, or otherwise arrive at a desired result.

The processor 91 may include one or more processors configured to execute software instructions. Additionally or alternatively, the processor 91 may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. The processors of the processor 91 may be single core or multicore, and the instructions executed thereon may be configured for serial, parallel, and/or distributed processing. The various components of the processor 91 may optionally be distributed over two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the processor 91 may be virtualized and executed by remotely accessible networked computing devices configured in a cloud computing configuration.

The memory 92 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 11 to implement some or all of the steps of the above-described exemplary methods of the present invention described above, and/or other desired functions.

In other words, the memory 92 comprises one or more physical devices configured to hold machine-readable instructions executable by the processor 91 to implement the methods and processes described herein. In implementing these methods and processes, the state of the memory 92 may be transformed (e.g., to hold different data). The memory 92 may include removable and/or built-in devices. The memory 92 may include optical memory (e.g., CD, DVD, HD-DVD, blu-ray disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. The memory 92 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.

It is understood that the memory 92 comprises one or more physical devices. However, aspects of the instructions described herein may alternatively be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.) that is not held by a physical device for a limited period of time. Aspects of the processor 91 and the memory 92 may be integrated together into one or more hardware logic components. These hardware logic components may include, for example, Field Programmable Gate Arrays (FPGAs), program and application specific integrated circuits (PASIC/ASIC), program and application specific standard products (PSSP/ASSP), system on a chip (SOC), and Complex Programmable Logic Devices (CPLDs).

In one example, as shown in FIG. 11, the electronic device 90 may also include an input device 93 and an output device 94, which may be interconnected via a bus system and/or other form of connection mechanism (not shown). The input device 93 may be, for example, a camera module or the like for capturing image data or video data. As another example, the input device 93 may include or interface with one or more user input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input device 93 may include or interface with a selected Natural User Input (NUI) component. Such component parts may be integrated or peripheral and the transduction and/or processing of input actions may be processed on-board or off-board. Example NUI components may include a microphone for speech and/or voice recognition; infrared, color, stereo display and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer and/or gyroscope for motion detection and/or intent recognition; and an electric field sensing component for assessing brain activity and/or body movement; and/or any other suitable sensor.

The output device 94 may output various information including the classification result and the like to the outside. The output devices 94 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, the electronic device 90 may further comprise the communication means, wherein the communication means may be configured to communicatively couple the electronic device 90 with one or more other computer devices. The communication means may comprise wired and/or wireless communication devices compatible with one or more different communication protocols. As a non-limiting example, the communication subsystem may be configured for communication via a wireless telephone network or a wired or wireless local or wide area network. In some embodiments, the communications device may allow the electronic device 90 to send and/or receive messages to and/or from other devices via a network such as the internet.

It will be appreciated that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Also, the order of the above-described processes may be changed.

Of course, for the sake of simplicity, only some of the components of the electronic device 90 relevant to the present invention are shown in fig. 11, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 90 may include any other suitable components, depending on the particular application.

According to another aspect of the present invention, the present invention further provides an electronic device, such as an AR device or the like configured with a camera module, wherein the electronic device is configured with the above-mentioned deep learning-based marker recognition system for recognizing the markers in the original marker image captured via the electronic device. Illustratively, as shown in fig. 12, the electronic device includes an AR device 600 and the deep learning based mark recognition system 1, wherein the deep learning based mark recognition system 1 is configured on the AR device 600 and is used for recognizing a mark in an original mark image acquired through the AR device 600 to obtain a position of a border and a key point of the mark on the original mark image. It is to be appreciated that the AR device 600 may be implemented, but is not limited to, as AR glasses with camera functionality.

Illustrative computing program product

In addition to the above-described methods and apparatus, embodiments of the present invention may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the present invention described in the "exemplary methods" section above of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, an embodiment of the present invention may also be a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps of the above-described method of the present specification.

The computer readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.

The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the apparatus, devices and methods of the present invention, the components or steps may be broken down and/or re-combined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are given by way of example only and are not limiting of the invention. The objects of the invention have been fully and effectively accomplished. The functional and structural principles of the present invention have been shown and described in the examples, and any variations or modifications of the embodiments of the present invention may be made without departing from the principles.

Claims

1. The mark identification method based on deep learning is characterized by comprising the following steps:

2. The deep learning-based marker recognition method according to claim 1, wherein the step of acquiring an original marker image, which is an image obtained by photographing a marker via an image pickup device, comprises the steps of:

3. The deep learning-based mark recognition method of claim 2, wherein the step of inputting the candidate region and the original mark image into an optimization network of the mark recognition model to generate an optimized region of the mark border on the original mark image comprises the steps of:

4. The deep learning-based tag recognition method of claim 3, wherein the step of inputting the preferred region and the original tag image into an output network of the tag recognition model to output the bounding box coordinates and the key point coordinates of the tag comprises the steps of:

5. The deep learning based marker identification method of claim 4, wherein the first predetermined size is 24 x 24; and the second predetermined dimension is 48 x 48.

6. The deep learning based marker recognition method of any one of claims 1 to 5, wherein the image capture device is one selected from a camera, a machine vision device, and an AR device.

7. The training method of the mark recognition model is characterized by comprising the following steps:

acquiring labeling samples of a plurality of labeled images; and

8. The method for training a label recognition model according to claim 7, wherein the step of training a recommendation network, an optimization network and an output network in a label recognition model based on the labeled sample of the label image comprises the steps of:

9. A deep learning based marker recognition system for recognizing a marker in an original marker image, wherein the deep learning based marker recognition system comprises, communicatively coupled to each other:

10. The deep learning based marker recognition system of claim 9, wherein the marker recognition module comprises a preprocessing module, a region recommendation module, a region optimization module, and a region output module communicatively connected to each other, wherein the preprocessing module is configured to preprocess the original marker image to generate an image pyramid, wherein the image pyramid comprises preprocessed images of different sizes; the region recommending module is used for inputting the preprocessed images in the image pyramid into a recommending network of the mark identifying model one by one so as to generate a plurality of candidate regions of the marked border on the original marked image; the region optimization module is used for inputting the candidate region and the original mark image into an optimization network of the mark identification model so as to generate an optimized region of the frame of the mark on the original mark image; the region output module is used for inputting the preferred region and the original mark image into an output network of the mark identification model so as to output the frame coordinates and the key point coordinates of the mark.

11. An electronic device, comprising:

at least one processor configured to execute instructions; and

12. An electronic device, comprising:

an AR device; and