CN111523390B

CN111523390B - Image recognition method and augmented reality AR icon recognition system

Info

Publication number: CN111523390B
Application number: CN202010217757.1A
Authority: CN
Inventors: 林健; 周志敏; 刘海伟; 丛林
Original assignee: Hangzhou Yixian Advanced Technology Co ltd
Current assignee: Hangzhou Yixian Advanced Technology Co ltd
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2023-11-03
Anticipated expiration: 2040-03-25
Also published as: CN111523390A

Abstract

The application discloses a system for identifying Augmented Reality (AR) icons, which comprises: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device instructs the projector to play dynamic effects corresponding to the types according to the types of the icons, so that the problem that the detection accuracy and efficiency of the AR projection system icons are low is solved, and the detection accuracy and efficiency of the AR projection system icons are improved.

Description

Image recognition method and augmented reality AR icon recognition system

Technical Field

The application relates to the field of image recognition, in particular to an image recognition method and an augmented reality AR icon recognition system.

Background

An interactive augmented reality (Augmented Reality, abbreviated as AR) projection system is a man-machine interaction system combining a projector and a color depth camera, wherein an interaction interface is projected to any plane through the projector, human gesture interaction actions are detected through the color depth camera, and animation, pictures and sounds are played at recognized objects or specific positions through response of the projector, so that the purpose of augmented reality is achieved, and the interactive augmented reality projection system is very suitable for industries such as education and entertainment.

In the detection scene of the AR projection system, multi-category icon detection is needed to be realized; the characteristics of the icons are leaner than those of faces, vehicles and the like, scale differences can exist between the icons, and various conditions such as confusion exist; in addition, rapid development of the model is required, and each application software APP corresponds to different detection models, so that a large amount of time cannot be spent for acquiring a large amount of data for model training. In addition, the interactive AR projection system has two characteristics of identifying the scene by the icon, namely complex light and shadow conditions and higher delay requirements. Therefore, the developed algorithm needs to meet the requirement of light weight, so that time consumption is reduced, and the algorithm has better illumination invariance to cope with illumination changes such as over-illumination, over-darkness and the like, and multi-icon type classification and identification are completed.

In the related art, in the process of detecting an icon of an AR projection system, a multitasking convolutional neural network (Multi-task Cascaded Convolutional Networks, abbreviated as MTCNN) algorithm is directly used and expanded, or the number of feature points is adjusted, or a network structure is slightly changed, or a new feature strengthening module is connected in series, so that after the corresponding adjustment, the method is directly migrated to different use scenes, and is essentially single-class key point detection, and a large amount of data is required to be collected to ensure the training effect of the MTCNN.

Aiming at the problem of low icon detection accuracy and efficiency of an AR projection system in the related art, no effective solution is proposed at present.

Disclosure of Invention

Aiming at the problem of low icon detection accuracy and efficiency of an AR projection system in the related art, the embodiment of the application at least solves the problem.

According to an aspect of the present application, there is provided an icon recognition method, the method including:

acquiring an image of an icon, and marking a mark frame on the icon on the image;

determining a training sample according to the marking frame as a reference, and carrying out data enhancement on the training sample;

training an MTCNN model of the multi-task convolutional neural network according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image;

and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for identifying the image of the icon.

In one embodiment, the determining the training sample based on the labeling frame includes:

generating a candidate frame according to the labeling frame as a reference;

and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.

In one embodiment, the determining the training sample according to the labeling frame as a reference, and performing data enhancement on the training sample includes:

according to a preset proportion, the extension frame is obtained by taking the marking frame as a reference in an extending way;

executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label;

and determining a labeling frame and a candidate frame on the foreground label, and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.

In one embodiment, training the MTCNN model of the multitasking convolutional neural network according to the training sample, and obtaining and optimizing the foreground image and the recognition type of the icon in the image includes:

in the case that the MTCNN model includes a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.

According to another aspect of the present application, there is also provided an icon recognition method applied to an augmented reality AR projection system, the method including:

acquiring an image of an icon to be identified;

extracting a foreground region of the image through a Pnet layer of an MTCNN model of a multitasking convolutional neural network according to the image to obtain a first candidate frame;

layering and polymerizing the first candidate frames to obtain second candidate frames;

and the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon.

In one embodiment, the acquiring the image of the icon to be identified includes:

and acquiring an image of the icon to be identified in a preset detection area of the projection image.

In one embodiment, the hierarchically aggregating the first candidate frame to obtain the second candidate frame includes:

and classifying the plurality of first candidate frames into one type of candidate frames under the condition that the positions of the first candidate frames are overlapped, and taking the circumscribed rectangle of the plurality of first candidate frames as the circumscribed rectangle of the second candidate frame.

In one embodiment, after the identifying the category of the icon, the method includes:

and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.

In one embodiment, before the capturing the image of the icon to be identified, the method includes:

and under the condition that the projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction indicates that the image of the icon is acquired.

According to another aspect of the present application, there is also provided a system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;

the projector projects images in a working area of the image pickup device;

the camera device acquires the image of the icon to be identified;

the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames;

the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;

and the main control device instructs the projector to play the dynamic effect corresponding to the category according to the category of the icon.

By the present application, there is provided a system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device instructs the projector to play dynamic effects corresponding to the types according to the types of the icons, so that the problem that the detection accuracy and efficiency of the AR projection system icons are low is solved, and the detection accuracy and efficiency of the AR projection system icons are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a block diagram of an augmented reality AR icon recognition system according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for icon recognition according to an embodiment of the present application;

FIG. 3 is a schematic view of a scenario of training sample annotation according to an embodiment of the application;

FIG. 4 is a schematic diagram of the effect of foreground annotation extraction of epitaxial extensions according to an embodiment of the application;

FIG. 5 is a schematic diagram of an effect after foreground annotation extraction according to an embodiment of the application;

FIG. 6 is a schematic diagram of foreground annotation replacement background data augmentation according to an embodiment of the application;

FIG. 7 is a flow diagram of a training phase of icon recognition according to an embodiment of the application;

FIG. 8 is a second flowchart of an icon recognition method according to an embodiment of the present application;

FIG. 9 is a flow chart of MTCNN model based icon detection according to an embodiment of the present application;

FIG. 10 is a schematic diagram of programming APP detecting a programming icon in accordance with an embodiment of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

The interactive AR projection system is a man-machine interaction system which combines a projector and a color depth camera, an interaction interface is projected to any plane through the projector, human gesture interaction actions are detected through the color depth camera, and animation/pictures/sounds are played at the identified object or a specific position through the response of the projector, so that the aim of augmented reality is fulfilled, and the interactive AR projection system is very suitable for the education industry.

Color depth camera: an apparatus that can capture color images (RGB frames) and Depth images (Depth frames), which can be abbreviated as "RGB-D camera". The color image acquisition principle is the same as that of a common camera; depth image acquisition principles include structured light, time of flight (Time of flight), binocular cameras, etc. Taking the structured light scheme as an example, the camera comprises an infrared emission module, an infrared light supplementing module, an RGB+infrared camera module and the like.

The projector is a device capable of projecting images or videos on any plane, and a projector manufacturer integrates a digital micro-reflector (Digital Micromirror Device, abbreviated as DMD) display core, a light source, a lens light path and heat dissipation into a mechanism to form an integral component.

In this embodiment, a system for augmented reality AR icon recognition is provided, fig. 1 is a block diagram of a system for augmented reality AR icon recognition according to an embodiment of the present application, as shown in fig. 1, the system includes: projector 12, master control device 14, and image pickup device 16; wherein the main control device 14 is respectively connected with the projector 12 and the camera device 16; the projector 12 projects an image on the working area of the camera 16; the image pickup device 16 acquires the image of the icon to be recognized; the main control device 14 extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device 14 instructs the projector 12 to play the dynamic effect corresponding to the category according to the category of the icon, and through the system, the problem of low detection accuracy and efficiency of the AR projection system icon is solved, and the detection accuracy and efficiency of the AR projection system icon are improved.

In one embodiment, card-like icon detection by means of an interactive AR projection system is a very popular form of education. Based on the detection algorithm, various APP can be developed, and infants can complete corresponding learning through interaction with the entity card, and the learning specifically comprises animal detection, color and shape detection and the like. Compared with pure click interaction of pad, mobile phone and the like, the method has the advantages that the method is more abundant in form and popular with infants when being used for carrying out entity interaction with objects such as card icons and the like.

The embodiment of the application provides an object detection algorithm of a novel interactive AR projection system, which realizes stable detection of icons based on MTCNN, is particularly suitable for preschool education and education scenes, and is characterized in that different APP corresponds to different detection contents, such as animal icon detection, plant icon detection, shape icon detection and the like, according to different education scenes, different detection models are correspondingly required to be provided according to different scenes, rapid development of the models is required to be realized, and a general and robust flow from data generation to model training is constructed; when the interactive AR projection system is actually used, the background projection of the corresponding app is superimposed on the entity card; meanwhile, the color camera is easy to be influenced by environmental illumination when acquiring images, and the problems of overexposure, darkness and the like are generated; therefore, the detection algorithm needs to be robust to changes in shadows; compared with a card recognition algorithm with a completely fixed position, the application range of detection with an unfixed position is wider, and the application range is more friendly for both APP development and infant users, so that the detection algorithm needs to realize card detection in a certain area; meanwhile, considering the hardware limitation of an AR projection system, the forward delay of a detection algorithm is small, and the overall detection delay is within 500 ms; if confusion is easy to occur between actually recognized objects, the detection algorithm is required to have the capability of distinguishing, so that false detection and omission are reduced. For example, when detecting a color and shape class object, the rectangle is formed by splicing two squares, and the detection algorithm needs to avoid false detection of the rectangle as two squares.

In this embodiment, a method for identifying an icon is provided, fig. 2 is a flowchart of a method for identifying an icon according to an embodiment of the present application, as shown in fig. 2, and the method includes the following steps:

step S202, an image of an icon is obtained, and a frame is marked on the icon on the image; the image of the icon to be identified is acquired through the AR projection system, for example, the acquired image can be 640x480 pixels, a plurality of icons to be identified can be contained, rectangular labeling frames are required to be provided for all the icons, and labeling frames are optionally labeled by label software such as labelme;

step S204, determining a training sample according to the labeling frame as a reference, performing data enhancement on the training sample, wherein the data enhancement comprises the addition of geometric transformations such as rotation, perspective transformation and the like after the training sample is extracted, and in addition, because the AR projection system is required to have higher requirements on illumination, the addition of illumination enhancement is also required, and the method can comprise the step of simulating data states under different conditions such as overexposure, shadow, contrast change and the like, and can be uniformly scaled to a certain scale after the training sample data enhancement is finished so as to facilitate the subsequent model training;

step S206, training an MTCNN model of the multi-task convolutional neural network according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image, wherein the MTCNN model can realize the foreground image and multi-category detection of the icon;

step S208, screening and storing the MTCNN model according to the preset conditions, wherein the MTCNN model is used for image recognition of the icon, and the combination with good effect can be screened from the MTCNN models trained by the MTCNN model, and converted into a common pb format for application of subsequent image recognition.

Through the steps S202 to S210, the method includes two parts, namely a training phase and a detection phase, wherein the training phase determines a training sample according to the image of the marked icon, trains the MTCNN model according to the training sample, and the detection phase detects the icon under the actual scene by means of the obtained MTCNN model, so that the problem of low icon detection accuracy and efficiency of the AR projection system is solved, and the icon detection accuracy and efficiency of the AR projection system are improved.

In one embodiment, the process of determining training samples based on the annotation box comprises: generating a candidate frame according to the labeling frame as a reference; and determining a foreground region of the icon and a training sample for identifying the category according to the Intersection-over-Union (IOU) of the annotation frame and the candidate frame. Optionally, the MTCNN model is used for detecting a foreground region required to be extracted from an input image and judging a specific category thereof, fig. 3 is a schematic view of a scene marked by a training sample according to an embodiment of the present application, and as shown in fig. 3, the training sample may include a pos sample, a part sample and a neg sample generated according to marking data, where the pos sample directs a network to locate the foreground region and identify the category, the part sample directs the network to locate the foreground region, and the neg sample directs the network to identify the background region. The data generating method of the training sample is to randomly generate candidate rectangular areas with different scales and different positions by taking a labeling frame as a reference, and classify the candidate rectangular areas by judging the IOU with the labeling frame, wherein the IOU is a concept used in target detection, and calculates the overlapping rate of the generated area (C) (candidate bound) and the area (G) (original labeling frame (ground truth bound)), namely the ratio of the intersection and the union of the candidate rectangular areas, as shown in a calculation formula 1:

the candidate frame with the labeling frame IOU smaller than 0.3 is a neg sample, the candidate frame with the labeling frame IOU larger than 0.65 is a pos sample, and the candidate frame between the labeling frame IOU and the labeling frame IOU is a part sample. In addition, in the class label, a neg sample is represented by 0, pos samples are represented by 1-n, and part samples are represented by-1-n. In one embodiment, the determining the training sample based on the labeling frame, and the enhancing the data of the training sample includes: according to the preset proportion, the extension frame is obtained by taking the marking frame as a reference extension; executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label; determining a label frame and a candidate frame on the foreground label, and determining a foreground region of the icon and a training sample of an identification category according to an intersection ratio IOU of the label frame and the candidate frame, for example, in a training sample data generation process, because the label frame is always rectangular and is positioned in the foreground region, but if the foreground region is damaged in the training sample generation process, identification stability may be affected, so that the foreground region needs to be further extracted, under the condition that the integrity of the foreground region is ensured in training sample data, the influence of the background where the icon is positioned can be reduced, but if the foreground of the icon has an irregular shape, the foreground region is manually labeled, which is too time-consuming and labor-consuming, the foreground can be automatically extracted by using a grabcut algorithm, fig. 4 is an effect schematic diagram of the epitaxial extension of the foreground label extraction according to an embodiment of the application, and fig. 5 is an effect schematic diagram after the foreground label extraction according to an embodiment of the application, as shown in fig. 4 and fig. 5, the specific flow of extraction is: the original rectangular marking frame is used as a reference, an expansion frame is obtained in an extending mode, the expansion proportion is 1.2, and the expansion frame can be adjusted according to actual icon conditions; executing a grabcut algorithm in the expansion frame to finish foreground annotation extraction; in addition, fig. 6 is a schematic diagram of foreground labeling replacing background data addition, as shown in fig. 6, after foreground labeling is extracted, data enhancement can be performed through replacement of a plurality of preset backgrounds, a pos sample, a part sample and a neg sample can be generated based on the foreground labeling on the basis of a labeling frame, the integral integrity of a training sample after the foreground labeling is ensured, the robustness of an identification model to different desktop environments is improved, and the accuracy of icon identification is improved through extraction of the foreground labeling, wherein the data enhancement is performed on the training sample.

In some embodiments, where the MTCNN model includes a Pnet layer and an Rnet layer, the Pnet layer and the Rnet layer are cascaded, the training sample input the Pnet layer extracts a foreground region of the icon, and the Rnet layer identifies a category of the icon and optimizes a location of the foreground region of the icon. FIG. 7 is a schematic flow chart of a training stage of icon recognition according to an embodiment of the present application, as shown in FIG. 7, after training sample data is enhanced, the training sample data may be uniformly scaled to a certain scale to facilitate model training, where a Pnet layer may require that input data have a size of 12×12 pixels, and an Rnet layer may require that input data have a size of 24×24 pixels; and further managing and packaging the training sample data, wherein the managing and packaging comprises the steps of controlling the ratio of neg-part-pos samples to be 2:1:1, packaging the data into a format required by training, and the like. The Pnet layer and Rnet layer combined detection algorithm in the MTCNN model transfers single-class key point detection in the related technology to multi-class detection, modifies the algorithm, cancels a third-stage network for key point detection, expands a two-stage neural network in front of the Pnet layer and the Rnet layer into multi-class detection, and realizes enhancement and optimization of a neural network training function.

In addition, the MTCNN model needs to train two layers including a Pnet layer and an Rnet layer, the Pnet layer is used in cascade when in use, the Pnet layer finishes detection and extraction of a foreground region on the whole graph, the Rnet layer further finishes identification in the foreground region to obtain a specific category and optimizes a foreground position, and finally a result is returned. The Pnet layer and the Rnet layer may be trained respectively using training sample data of corresponding scales, respectively. The model of the embodiment of the application does not specify the framework used for training, can use a plurality of frameworks such as matlab, caffe, tensorflow, keras, pytorch and the like, and can preferably use tensorf low to complete corresponding training. The Pnet layer and the Rnet layer can be full convolution networks, and each comprise three branches of foreground judgment, minimum Bounding box (called bbox for short) regression and category classification. After model training is completed, the cascading model effect of Pnet layer detection and Rnet layer identification needs to be verified, and a combination with good effect is screened from each historical training model and is converted into a common pb format for a detection stage of subsequent image identification.

In this embodiment, a method for identifying an icon is provided, the method is applied to an augmented reality AR projection system, fig. 8 is a flowchart two of a method for identifying an icon according to an embodiment of the present application, fig. 9 is a flowchart illustrating an icon detection based on an MTCNN model according to an embodiment of the present application, and as shown in fig. 8 and 9, the method includes the following steps:

step S502, acquiring an image of the icon to be identified, where the acquired image may be a high-resolution color map projected by the AR projection system, for example, the image is a color map of 640×480 pixels;

step S504, extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model to obtain a first candidate frame, and carrying out hierarchical aggregation on the first candidate frame to obtain a second candidate frame; after the AR projection system loads the Pnet layer and Rnet layer files, calculating the ratio of the preset scale of the icon to the Pnet layer training scale, after scaling the original image, running the Pnet layer to perform detection and extract the corresponding foreground region to obtain the candidate frame of the Pnet layer, where the candidate frame includes position information and category information, for example, the size of the icon to be identified is about 50×50 pixels, the preset scale is 50, the ratio of the preset scale to the Pnet layer training scale 12 is 0.24, and the original image is reduced to 0.24 times. In some embodiments, since the sizes of the cards of the icons are substantially uniform in the same execution task, the detection stage of the Pnet layer is only fixed to one scale, and the preset scale of the icons is insensitive to the interference of the AR projection system and has good scale invariance due to the adaptability of the Pnet layer to the foreground region. In addition, under the condition that the image identified by the Pnet layer is sent to be too small, the category result obtained by the Pnet layer is often inaccurate, and the images can be further accurately classified by the Rnet layer; when the sizes of all icons are basically consistent, or the icons are inconsistent but are not easy to mix, the candidate frames generated by the Pnet layer are directly sent to the Rnet layer for classification, but when the icons are mixed, for example, the part of the icon A is similar to the part of the icon B, the error detection is caused by directly sending the candidate frames to the Rnet layer for training. At the moment, layering aggregation is needed to be carried out on candidate frames of the Pnet layer, and the candidate frames with basically correct positions and sizes are obtained and sent into the Rnet layer;

and S506, the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, identifies the category of the icon, intercepts the corresponding image from the original image by utilizing the candidate frame directly generated by the Pnet layer or subjected to layering aggregation, zooms to 24x24 pixels, sends the zoomed image to the Rnet layer for category identification, and optimizes the position of the candidate frame.

Through the requirements of the scenes of the AR projection system from step S502 to step S506, the rapid detection of card icons is realized, the MTCNN model algorithm is used for realizing multi-category detection, the model supports the detection of icons with different scales through the hierarchical aggregation algorithm, the recognition speed is high and accurate, the method is suitable for the interactive case development under relevant scenes such as education, the problem of low detection accuracy and efficiency of the icons of the AR projection system is solved, and the detection accuracy and efficiency of the icons of the AR projection system are improved.

In one embodiment, the hierarchically aggregating the first candidate frame to obtain a second candidate frame includes: and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into one type of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames. The aggregation method is to judge the correlation of the candidate frames according to the positions of the candidate frames, classify the candidate frames with overlapped positions into one class and generate the candidate frames with circumscribed rectangles as a whole. Layering means that large-scale candidate frames are obtained through aggregation of small-scale candidate frames. In the layering aggregation flow, all candidate frames generated by the Pnet layer are input, and the aggregated candidate frames are output, wherein the specific flow is as follows:

step S1, obtaining all candidate frames generated by a Pnet layer, wherein the total number of the candidate frames is n;

step S2, calculating the intersection degree between every two candidate frames to generate an n x n upper triangle candidate frame distance matrix D;

step S3, sequencing all the candidate frames, taking the sequence numbers of the candidate frames as aggregation ids to obtain an aggregation vector C, wherein all the candidate frames are mutually independent in an initial state;

step S4, traversing the candidate box for i= … n;

step S5, traversing the candidate box for j=i … n;

step S6, if distance d _ij If the number is larger than the threshold value, merging the candidate frame j with the candidate frame i according to the corresponding situation;

step S7, if the candidate frames i and j are independent, C _j ＝i；

Step S8, the candidate frames i are independent, the candidate frames j are not independent, and C is inevitable at the moment _j <i is C _i ＝j；

Step S9, if the candidate frame i is not independent and the candidate frame j is independent, C _j ＝i；

Step S10, if neither candidate box i nor j is independent, min=min (C _j ,C _i )，Max＝max(C _j ,C _i ) Traversing the aggregate vector C to replace the value equal to Max therein with Min;

step S11, obtaining an updated aggregate vector C, wherein the unrepeated id is an aggregate center;

step S12, fusing candidate frames belonging to the same aggregation center according to an aggregation vector C, and taking an upper extreme value and a lower extreme value to obtain an external rectangle;

and S13, returning all the circumscribed rectangles to obtain candidate frames after Pnet layering aggregation.

Through a hierarchical aggregation algorithm, the identified neural network model of the AR projection system can be adapted to detection of icons of different scales. If the multi-scale problem is directly solved through the neural network model, more training data are required to be acquired, the larger and heavier neural network model is used, the forward time consumption of deployment is increased, and meanwhile, the problem of false detection cannot be completely eradicated. Hierarchical aggregation solves the problem of multi-scale confusion by only data post-processing, and is more flexible and efficient.

In one embodiment, an image of an icon to be identified is acquired in a preset detection area of a projection image, fig. 10 is a schematic diagram of detecting a programmed icon by a programming APP according to an embodiment of the present application, as shown in fig. 10, in the AR icon identification system, a card corresponding to the icon is prepared to be detected in a corresponding APP, and an MTCNN algorithm model detects the image of the whole projection area provided. In another embodiment, a detection area may be preset in the projection area, and the MTCNN algorithm model is used to detect the preset detection area in the projection area in the preset detection area, so as to further reduce the forward delay of the MTCNN algorithm model.

In one embodiment, upon detecting that a projected trigger button of the projected area is triggered, a trigger instruction is regenerated, the trigger instruction indicating that an image of the icon is acquired. For example, the card icon to be identified is placed in the projection area, the trigger button detected on the projection area is clicked, the operation system end of the projection system triggers the detection task, and the whole projection area or the image of the preset detection area is detected through the MTCNN algorithm.

In one embodiment, after the category of the icon is identified, an indication signal is generated according to the category, where the indication signal is used to instruct the projector 12 to play a dynamic effect corresponding to the category, the identification result is returned to the system layer of the AR projection system, and after the projection system receives the identification result, the projection system projects the identification result through the projector, plays a corresponding animation, a sound effect, and so on, and the detection algorithm waits for the next detection trigger.

In another embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of icon recognition.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. An icon recognition method, applied to an augmented reality AR projection system, comprising:

acquiring an image of an icon to be identified;

wherein layering and polymerizing the first candidate frame to obtain a second candidate frame comprises:

2. The method according to claim 1, characterized in that before acquiring the image of the icon to be identified, the method comprises:

3. The method of claim 2, wherein determining training samples based on the annotation box comprises:

generating a candidate frame according to the labeling frame as a reference;

4. The method of claim 2, wherein determining a training sample based on the annotation frame, and wherein data enhancing the training sample comprises:

executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label and replace the background of the foreground label to carry out data enhancement;

5. The method of claim 2, wherein training the MTCNN model of the multitasking convolutional neural network based on the training samples to obtain and optimize the foreground image and the recognition type of the icon in the image comprises:

6. The method of claim 1, wherein the acquiring an image of an icon to be identified comprises:

7. The method of claim 1, wherein after the identifying the category of the icon, the method comprises:

8. The method of claim 1, wherein prior to the capturing of the image of the icon to be identified, the method comprises:

9. A system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;

the projector projects images in a working area of the image pickup device;

the camera device acquires the image of the icon to be identified;

the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; layering and polymerizing the first candidate frames to obtain second candidate frames;

the main control device instructs the projector to play dynamic effects corresponding to the categories according to the categories of the icons;