CN111523390A

CN111523390A - Image recognition method and augmented reality AR icon recognition system

Info

Publication number: CN111523390A
Application number: CN202010217757.1A
Authority: CN
Inventors: 林健; 周志敏; 刘海伟; 丛林
Original assignee: Hangzhou Yixian Advanced Technology Co ltd
Current assignee: Hangzhou Yixian Advanced Technology Co ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2020-08-11
Anticipated expiration: 2040-03-25
Also published as: CN111523390B

Abstract

The invention discloses a system for identifying Augmented Reality (AR) icons, which comprises: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device indicates the projector to play the dynamic effect corresponding to the type according to the type of the icon, the problem that the AR projection system icon detection accuracy and efficiency are not high is solved, and the AR projection system icon detection accuracy and efficiency are improved.

Description

Image recognition method and augmented reality AR icon recognition system

Technical Field

The invention relates to the field of image recognition, in particular to an image recognition method and an Augmented Reality (AR) icon recognition system.

Background

An interactive Augmented Reality (AR) projection system is a man-machine interaction system combining a projector and a color depth camera, an interaction interface is projected onto any plane through the projector, human gesture interaction actions are detected through the color depth camera, animation, pictures and sound are played at a recognized object or a specific position through the response of the projector, the purpose of Augmented Reality is achieved, and the AR projection system is very suitable for industries such as education, entertainment and the like.

Under the detection scene of the AR projection system, the detection of multiple classes of icons needs to be realized; the characteristics of the icons are poor compared with human faces, vehicles and the like, scale differences may exist among the icons, and various conditions such as confusion exist; in addition, the rapid development of models is required, each application software APP corresponds to different detection models, and a large amount of time cannot be spent on collecting a large amount of data for model training. In addition, the interactive AR projection system has two characteristics of complex light and shadow conditions and high delay requirement when identifying the scene to identify the icon. Therefore, the developed algorithm needs to meet the requirement of light weight so as to reduce time consumption, has better illumination invariance so as to deal with illumination changes such as over-brightness, over-darkness and the like, and completes multi-icon type classification and identification.

In the related art, in the process of detecting an AR projection system icon, a Multi-task convolved neural network (MTCNN) algorithm is directly used and expanded, or the number of feature points is adjusted, or a network structure is slightly changed, or a new feature enhancement module is connected in series, and after corresponding adjustment is completed, the method is directly migrated to different use scenes, and essentially single-category key point detection is performed, and a large amount of data needs to be collected to ensure the MTCNN training effect.

Aiming at the problem that the icon detection accuracy and efficiency of an AR projection system are not high in the related technology, an effective solution is not provided at present.

Disclosure of Invention

Aiming at the problem that the icon detection accuracy and efficiency of the AR projection system are not high in the related art, the embodiment of the invention at least solves the problem.

According to an aspect of the present invention, there is provided an icon identifying method, the method including:

acquiring an image of an icon, and marking a marking frame on the icon on the image;

determining a training sample according to the marking frame as a reference, and performing data enhancement on the training sample;

training a multi-task convolutional neural network (MTCNN) model according to the training samples to obtain and optimize a foreground image and an identification type of the icon in the image;

and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for image identification of the icon.

In one embodiment, the determining the training samples based on the labeled box comprises:

generating a candidate frame according to the marking frame as a reference;

and determining the foreground area of the icon and a training sample of the recognition category according to the intersection ratio IOU of the labeling frame and the candidate frame.

In one embodiment, the determining a training sample according to the reference of the labeled box includes:

according to a preset proportion, extending by taking the marking frame as a reference to obtain an extension frame;

executing grabcut algorithm in the expansion frame to finish the extraction of the foreground label;

and determining a marking frame and a candidate frame on the foreground marking, and determining a foreground area of the icon and a training sample of the identification type according to the intersection ratio IOU of the marking frame and the candidate frame.

In one embodiment, the training of the multi-task convolutional neural network MTCNN model according to the training samples to obtain and optimize the foreground image and the recognition type of the icon in the image includes:

in the case that the MTCNN model comprises a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.

According to another aspect of the present invention, there is also provided an icon recognition method applied to an augmented reality AR projection system, the method including:

acquiring an image of an icon to be identified;

extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network (MTCNN) model to obtain a first candidate frame according to the image;

performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame;

the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, and identifies the category of the icon.

In one embodiment, the acquiring the image of the icon to be recognized includes:

and acquiring an image of the icon to be identified in a preset detection area of the projected image.

In one embodiment, the performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame includes:

and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into a class of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames.

In one embodiment, after the identifying the category of the icon, the method includes:

and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.

In one embodiment, before the acquiring the image of the icon to be recognized, the method includes:

and under the condition that a projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction instructs to acquire the image of the icon.

According to another aspect of the present invention, there is also provided a system for augmented reality AR icon recognition, the system comprising: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;

the projector projects an image in a working area of the camera device;

the camera device acquires the image of the icon to be identified;

the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network (MTCNN) model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame;

the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;

and the master control device instructs the projector to play the dynamic effect corresponding to the type according to the type of the icon.

The invention provides a system for identifying Augmented Reality (AR) icons, which comprises the following components: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device indicates the projector to play the dynamic effect corresponding to the type according to the type of the icon, the problem that the AR projection system icon detection accuracy and efficiency are not high is solved, and the AR projection system icon detection accuracy and efficiency are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of an augmented reality AR icon recognition system according to an embodiment of the present invention;

FIG. 2 is a first flowchart of a method for identifying icons according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a scenario of training sample labeling according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating the effect of extracting the extension of the foreground label according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating the effect of foreground annotation extraction according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of foreground labeling in place of background data augmentation, according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of a training phase of icon recognition according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a method for identifying icons according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart illustrating MTCNN model icon detection-based according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a program APP detecting a program icon according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The interactive AR projection system is a man-machine interaction system combining a projector and a color depth camera, an interactive interface is projected onto any plane through the projector, human gesture interaction actions are detected through the color depth camera, animation/pictures/sounds are played at identified objects or specific positions through the response of the projector, the purpose of reality enhancement is achieved, and the interactive AR projection system is very suitable for being used in the education industry.

Color depth camera: a device that can capture color images (RGB Frame) and Depth images (Depth Frame), may be abbreviated as "RGB-D camera". The color image acquisition principle is the same as that of a common camera; the depth image acquisition principle includes structured light, Time of flight (Time of flight), a binocular camera, and the like. Taking the structured light scheme as an example, the camera includes an infrared emission module, an infrared light supplement module, an RGB + infrared camera module, and the like.

The projector is a Device capable of projecting images or videos onto any plane, and a projector manufacturer integrates a Digital micro reflector (DMD) display core, a light source, a lens light path and heat dissipation into a single mechanism to form an integral component.

In this embodiment, a system for recognizing an augmented reality AR icon is provided, and fig. 1 is a block diagram of a structure of a system for recognizing an augmented reality AR icon according to an embodiment of the present invention, as shown in fig. 1, the system includes: a projector 12, a main control device 14, and a camera device 16; wherein, the main control device 14 is respectively connected with the projector 12 and the camera device 16; the projector 12 projects an image on the working area of the camera 16; the image pickup device 16 acquires the image of the icon to be recognized; the main control device 14 extracts a foreground region of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device 14 instructs the projector 12 to play the dynamic effect corresponding to the type of the icon according to the type of the icon, and by the system, the problem that the icon detection accuracy and efficiency of the AR projection system are not high is solved, and the icon detection accuracy and efficiency of the AR projection system are improved.

In one embodiment, card icon detection via an interactive AR projection system is a very popular form of education. Various types of APP can be developed based on a detection algorithm, and the infant can complete corresponding learning through interaction with an entity card, specifically including animal detection, color and shape detection and the like. Compared with pure click interaction of pads, mobile phones and the like, the card interaction device has the advantages that the card interaction device can perform entity interaction with objects such as card icons and the like, and is richer in form and more popular with infants.

The embodiment of the invention provides a novel object detection algorithm of an interactive AR projection system, realizes the stable detection of icons based on MTCNN, is particularly suitable for preschool education and education scenes, and different APPs correspond to different detection contents such as animal icon detection, plant icon detection, shape icon detection and the like according to different education scenes; when the interactive AR projection system is actually used, the background projection of the corresponding app is superposed on the entity card; meanwhile, the color camera is easily influenced by ambient light when acquiring images, and problems of overexposure, over darkness and the like are caused; therefore the detection algorithm needs to be robust to light and shadow variations; compared with a completely fixed position card identification algorithm, the application range of unfixed position detection is wider, and the detection algorithm is more friendly to both development of APP and use of infant users, so that card detection in a certain area needs to be realized; meanwhile, the hardware limit of the AR projection system is considered, the forward delay of the detection algorithm is small, and the overall detection delay is within 500 ms; if confusion easily occurs between actually recognized objects, the detection algorithm is required to have distinguishing capability, and false detection and missing detection are reduced. For example, when detecting objects of color and shape categories, a rectangle is formed by splicing two squares, and the detection algorithm needs to avoid false detection of the rectangle as two squares.

In this embodiment, a method for icon identification is provided, and fig. 2 is a flowchart of a first icon identification method according to an embodiment of the present invention, as shown in fig. 2, the method includes the following steps:

step S202, acquiring an image of an icon, and marking a frame on the icon on the image; the method comprises the steps that an image of an icon to be identified is collected through an AR projection system, for example, the size of the obtained image can be 640x480 pixels, the image can comprise a plurality of icons to be identified, a rectangular marking frame needs to be provided for all the icons, and marking software such as labelme can be selected for marking the marking frame;

step S204, determining a training sample according to the marking frame as a reference, performing data enhancement on the training sample, wherein the data enhancement needs to be performed after the training sample is extracted, the data enhancement comprises enhancement of geometric transformation such as addition of rotation and perspective transformation, in addition, because the AR projection system needs to adapt to higher requirements on illumination, illumination enhancement needs to be added, data states under different conditions such as simulation of overexposure, shadow, contrast change and the like can be included, and after the training sample data is enhanced, the training sample data can be uniformly scaled to a certain scale so as to facilitate subsequent model training;

step S206, training a multitask convolutional neural network (MTCNN) model according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image, wherein the MTCNN model can realize foreground image and multi-class detection of the icon;

and S208, screening and storing the MTCNN model according to the preset identification condition, wherein the MTCNN model is used for image identification of the icon, and a combination with a good effect can be screened from each trained MTCNN model and converted into a commonly used pb format for application of subsequent image identification.

Through the steps S202 to S210, the method comprises a training stage and a detection stage, wherein the training stage determines a training sample according to an image of a labeled icon, the MTCNN model is trained according to the training sample, and the detection stage detects the icon in an actual scene by means of the obtained MTCNN model, so that the problems of low icon detection accuracy and efficiency of an AR projection system are solved, and the icon detection accuracy and efficiency of the AR projection system are improved.

In one embodiment, the process of determining the training samples based on the labeled boxes comprises: generating a candidate frame according to the marked frame as a reference; and determining a foreground area of the icon and a training sample of the recognition category according to an Intersection-over-Union (IOU) of the marked frame and the candidate frame. Optionally, the MTCNN model performs detection by extracting a foreground region from an input image and determining a specific category of the foreground region, fig. 3 is a schematic view of a scene labeled by a training sample according to an embodiment of the present invention, and as shown in fig. 3, the training sample may include a pos sample, a part sample, and a neg sample generated according to labeling data, where the pos sample directs a network to locate the foreground region and identify the category, the part sample directs the network to locate the foreground region, and the neg sample directs the network to identify the background region. The data generation method of the training sample randomly generates candidate rectangular regions with different scales and different positions by taking a labeling frame as a reference, and classifies the candidate rectangular regions by judging an IOU of the labeling frame, wherein the IOU is a concept used in target detection, and the overlapping rate of generated area (C) (candidate frame) and area (G) (original labeling frame), namely the ratio of the intersection to the union of the area (C) (candidate frame) and the area (G) (original labeling frame) is calculated, as shown in a calculation formula 1:

wherein, the candidate box with the mark box IOU smaller than 0.3 is a neg sample, the candidate box with the mark box IOU larger than 0.65 is a pos sample, and the candidate box in the middle of the mark box IOU is a part sample. In addition, in the class notation, 0 represents the neg sample, 1 to n represent the pos sample, and-1 to-n represent the part sample. In one embodiment, the determining a training sample based on the labeled box as a reference, the performing data enhancement on the training sample includes: according to a preset proportion, extending by taking the marking frame as a reference to obtain an extension frame; executing grabcut algorithm in the expansion frame to complete the extraction of the foreground label; determining a labeling frame and a candidate frame on the foreground label, and determining a foreground area of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame, for example, in the generation process of the training sample data, because the labeling frame is always rectangular and is located in the foreground area, if the foreground area is damaged in the generation process of the training sample data, the identification stability may be affected, so that the foreground area needs to be further extracted, under the condition that the integrity of the foreground area is ensured in the training sample data, the influence of the background where the icon is located can be reduced, but if the foreground of the icon has an irregular shape, the foreground area is manually labeled, the foreground can be automatically extracted by using a grabcut algorithm, FIG. 4 is an effect schematic diagram of extracting the extension of the foreground label according to the embodiment of the present invention, FIG. 5 is an effect schematic diagram of extracting the foreground label according to the embodiment of the present invention, as shown in fig. 4 and 5, the specific process of extraction is: taking an original rectangular marking frame as a reference, extending to obtain an extension frame, wherein the extension ratio is 1.2, and the extension ratio can be adjusted according to the actual icon condition; executing grabcut algorithm in the expansion frame to finish foreground label extraction; in addition, fig. 6 is a schematic diagram of replacing background data with a foreground label according to an embodiment of the present invention, as shown in fig. 6, after a foreground label is extracted, data enhancement may be performed by replacing a plurality of preset backgrounds, and a pos sample, a part sample, and a neg sample may be generated based on a label frame on the basis of the foreground label, so as to ensure the integrity of the whole training sample after the foreground label is extracted, improve the robustness of the recognition model to different desktop environments, and perform data enhancement on the training sample by extracting the foreground label, thereby improving the accuracy of icon recognition.

In some embodiments, where the MTCNN model includes a Pnet layer and an Rnet layer, the Pnet layer and the Rnet layer being cascaded, the training sample being input to the Pnet layer to extract the foreground region of the icon, the Rnet layer identifying the category of the icon and optimizing the location of the foreground region of the icon. Fig. 7 is a schematic flow diagram of a training phase of icon recognition according to an embodiment of the present invention, and as shown in fig. 7, after training sample data is enhanced, the training sample data may be uniformly scaled to a certain scale to facilitate model training, where a Pnet layer may require an input data size of 12x12 pixels, and an Rnet layer may require an input data size of 24x24 pixels; and further managing and packaging the training sample data, wherein the managing and packaging comprises controlling the ratio of the neg-part-pos sample to be 2:1:1, packaging the data into a format required by training and the like. The joint detection algorithm of the Pnet layer and the Rnet layer in the MTCNN model transfers single-class key point detection in the related technology to multi-class detection, modifies the algorithm, cancels a third-class network of key point detection, expands the first two-class neural networks of the Pnet layer and the Rnet layer into multi-class detection, and realizes the enhancement and optimization of the training function of the neural networks.

In addition, the MTCNN model needs to train two layers including a Pnet layer and an Rnet layer, the two layers are used in a cascade mode when in use, the Pnet layer completes detection and extraction of a foreground area on the whole image, the Rnet layer further completes recognition in the foreground area to obtain specific categories and optimize foreground positions, and finally results are returned. The Pnet layer and the Rnet layer can be trained respectively by using training sample data of corresponding scales. The model of the embodiment of the invention does not specify the frame used for training, can use various frames such as matlab, caffe, tenserflow, keras and pyrrch, and preferably can use tenserflow to complete corresponding training. Both the Pnet layer and the Rnet layer may be full convolution networks, and include three branches of foreground judgment, minimum bounding box (bbox) regression, and category classification. And after the model training is finished, verifying the effect of the cascade model of Pnet layer detection and Rnet layer identification, screening out a combination with a better effect from each historical training model, and converting the combination into a commonly used pb format for a subsequent image identification detection stage.

In this embodiment, a method for icon identification is provided, where the method is applied to an augmented reality AR projection system, fig. 8 is a second flowchart of a method for icon identification according to an embodiment of the present invention, fig. 9 is a schematic flowchart of a MTCNN model-based icon detection according to an embodiment of the present invention, and as shown in fig. 8 and fig. 9, the method includes the following steps:

step S502, acquiring an image of the icon to be identified, where the acquired image may be a high-resolution color image projected by the AR projection system, for example, the image is a color image with 640 × 480 pixels;

step S504, according to the image, extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network (MTCNN) model to obtain a first candidate frame, and performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame; after loading files of a Pnet layer and a Rnet layer in an AR projection system, calculating a ratio of a preset scale of the icon to a training scale of the Pnet layer, zooming an original image of the image, running the Pnet layer to perform detection and extract a corresponding foreground area to obtain a candidate frame of the Pnet layer, wherein the candidate frame comprises position information and category information, for example, if the size of the icon to be recognized is about 50 x 50 pixels, the preset scale is 50, and if the ratio of the icon to the training scale 12 of the Pnet layer is 0.24, the original image is reduced to 0.24 time. In some embodiments, because the sizes of the cards of the icons in the same execution task are basically consistent, the detection phase of the Pnet layer is only fixed to one scale, and meanwhile, due to the adaptability of the Pnet layer to the foreground area, the preset scale of the icons is not sensitive to the interference of the AR projection system and has better scale invariance. In addition, under the condition that the image sent to the Pnet layer for identification is too small, the classification result obtained by the Pnet layer is often inaccurate, and the image can be further accurately classified through the Rnet layer; when the sizes of all the icons are basically consistent, or the icons are not consistent in size but are not easy to be confused with each other, the candidate frames generated by the Pnet layer are directly sent to the Rnet layer for classification, but if icon confusion occurs, for example, the part of the icon A is similar to the icon B, the error detection is caused by directly sending the candidate frames to the Rnet layer for training. At the moment, the candidate frames of the Pnet layer need to be layered and aggregated, and the candidate frames with basically correct positions and sizes are obtained and sent to the Rnet layer;

step S506, the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, identifies the type of the icon, intercepts the corresponding image from the original image by using the candidate frame generated directly by the Pnet layer or obtained by hierarchical aggregation, zooms to 24 × 24 pixels, sends the image to the Rnet layer for type identification, and optimizes the position of the candidate frame.

Through the steps S502 to S506, the rapid detection of the card type icons is realized according to the requirements of the AR projection system scene, the MTCNN model algorithm realizes the multi-class detection, the model supports the detection of the icons with different scales through the hierarchical aggregation algorithm, the recognition speed is high and accurate, the method is suitable for the development of interactive cases under relevant scenes such as education and the like, the problem that the AR projection system icon detection accuracy and efficiency are low is solved, and the AR projection system icon detection accuracy and efficiency are improved.

In one embodiment, the hierarchically aggregating the first candidate frame to obtain the second candidate frame includes: and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into a class of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames. The aggregation method is to judge the relevance of each candidate frame according to the position of the candidate frame, classify the candidate frames with overlapped positions into one class and generate the candidate frame with a circumscribed rectangle as a whole. The hierarchical meaning is that a large-scale candidate frame is obtained through aggregation of small-scale candidate frames. In the hierarchical aggregation flow, all candidate frames generated by the Pnet layer are input, and the candidate frames after aggregation are output, and the specific flow is as follows:

step S1, obtaining all candidate frames generated by the Pnet layer, wherein the total number is n;

step S2, calculating the intersection degree of the candidate frames pairwise, and generating an n x n upper triangular candidate frame distance matrix D;

step S3, sorting all the candidate frames, and obtaining an aggregation vector C by taking the respective serial numbers as aggregation ids, wherein all the candidate frames are mutually independent in an initial state;

step S4, the candidate frame for i is traversed by 1 … n;

step S5, the candidate frame for j ═ i … n is traversed;

step S6, if the distance d_ijIf the value is larger than the threshold value, combining the candidate frame j and the candidate frame i according to the corresponding condition;

step S7, if the candidate frames i and j are independent, C_j＝i；

Step S8, the candidate box i is independent, the candidate box j is not independent, and C is inevitable at this time_j<i, then C_i＝j；

Step S9, if the candidate frame i is not independent and the candidate frame j is independent, then C_j＝i；

In step S10, if the candidate frames i and j are not independent, Min is Min (C)_j,C_i)，Max＝max(C_j,C_i) Traversing the aggregation vector C, and replacing the value equal to Max with Min;

step S11, obtaining an updated aggregation vector C, wherein the unrepeated id is an aggregation center;

step S12, fusing candidate frames belonging to the same aggregation center according to the aggregation vector C, and taking upper and lower extreme values to obtain an external rectangle;

and step S13, returning all the circumscribed rectangles which are the candidate frames after the Pnet hierarchical aggregation.

Through a hierarchical aggregation algorithm, the identified neural network model of the AR projection system can adapt to the detection of icons of different scales. If the multi-scale problem is solved directly through the neural network model, more training data need to be collected, a larger and heavier neural network model is used, the forward time consumption of deployment is increased, and meanwhile, the problem of false detection cannot be completely eradicated. The hierarchical aggregation can solve the problem of multi-scale confusion only through data post-processing, and is more flexible and efficient.

In an embodiment, in a preset detection area of a projected image, an image of an icon to be identified is acquired, fig. 10 is a schematic diagram of a programming APP according to an embodiment of the present invention for detecting the programming icon, as shown in fig. 10, in the system for identifying an AR icon, a card corresponding to the icon is prepared to be detected in the corresponding APP, and an MTCNN algorithm model is used for detecting the provided image of the whole projection area. In another embodiment, a detection area may be preset in the projection area, and the MTCNN algorithm model may be used to detect the preset detection area in the projection area in the set detection area, so as to further reduce the forward delay of the MTCNN algorithm model.

In one embodiment, in the case that the projection trigger button of the projection area is detected to be triggered, a trigger instruction is generated again, and the trigger instruction instructs to acquire the image of the icon. For example, an icon of a card to be recognized is placed in a projection area, a trigger button detected on the projection area is clicked, an operating system end of a projection system triggers a detection task, and an image of the whole projection area or a preset detection area is detected through an MTCNN algorithm.

In one embodiment, after the category of the icon is identified, an indication signal is generated according to the category, where the indication signal is used to instruct the projector 12 to play a dynamic effect corresponding to the category, the identification result is returned to the system layer of the AR projection system, after the projection system receives the identification result, the projection system plays a corresponding animation, sound effect, and the like through the projector, and the detection algorithm waits for the next detection trigger.

In another embodiment of the invention, a computer-readable storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method of icon recognition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An icon recognition method, the method comprising:

2. The method of claim 1, wherein the determining training samples based on the labeled boxes comprises:

generating a candidate frame according to the marking frame as a reference;

3. The method of claim 1, wherein the determining training samples based on the labeled boxes comprises:

executing grabcut algorithm in the expansion frame to complete extraction of foreground label and replace the background of the foreground label for data enhancement;

4. The method of claim 1, wherein the training a multi-tasking convolutional neural network (MTCNN) model according to the training samples to obtain and optimize foreground images and recognition types of the icons in the images comprises:

5. An icon recognition method applied to an Augmented Reality (AR) projection system, the method comprising:

acquiring an image of an icon to be identified;

6. The method of claim 5, wherein the obtaining the image of the icon to be recognized comprises:

7. The method of claim 5, wherein the hierarchically aggregating the first candidate frame to obtain a second candidate frame comprises:

8. The method of claim 5, wherein after identifying the category of the icon, the method comprises:

9. The method of claim 5, wherein before the obtaining the image of the icon to be recognized, the method comprises:

10. A system for Augmented Reality (AR) icon recognition, the system comprising: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;

the projector projects an image in a working area of the camera device;

the camera device acquires the image of the icon to be identified;

and the master control device 14 instructs the projector to play the dynamic effect corresponding to the type according to the type of the icon.