CN111523390B - Image recognition method and augmented reality AR icon recognition system - Google Patents

Image recognition method and augmented reality AR icon recognition system Download PDF

Info

Publication number
CN111523390B
CN111523390B CN202010217757.1A CN202010217757A CN111523390B CN 111523390 B CN111523390 B CN 111523390B CN 202010217757 A CN202010217757 A CN 202010217757A CN 111523390 B CN111523390 B CN 111523390B
Authority
CN
China
Prior art keywords
icon
image
frame
candidate
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010217757.1A
Other languages
Chinese (zh)
Other versions
CN111523390A (en
Inventor
林健
周志敏
刘海伟
丛林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yixian Advanced Technology Co ltd
Original Assignee
Hangzhou Yixian Advanced Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yixian Advanced Technology Co ltd filed Critical Hangzhou Yixian Advanced Technology Co ltd
Priority to CN202010217757.1A priority Critical patent/CN111523390B/en
Publication of CN111523390A publication Critical patent/CN111523390A/en
Application granted granted Critical
Publication of CN111523390B publication Critical patent/CN111523390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a system for identifying Augmented Reality (AR) icons, which comprises: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device instructs the projector to play dynamic effects corresponding to the types according to the types of the icons, so that the problem that the detection accuracy and efficiency of the AR projection system icons are low is solved, and the detection accuracy and efficiency of the AR projection system icons are improved.

Description

Image recognition method and augmented reality AR icon recognition system
Technical Field
The application relates to the field of image recognition, in particular to an image recognition method and an augmented reality AR icon recognition system.
Background
An interactive augmented reality (Augmented Reality, abbreviated as AR) projection system is a man-machine interaction system combining a projector and a color depth camera, wherein an interaction interface is projected to any plane through the projector, human gesture interaction actions are detected through the color depth camera, and animation, pictures and sounds are played at recognized objects or specific positions through response of the projector, so that the purpose of augmented reality is achieved, and the interactive augmented reality projection system is very suitable for industries such as education and entertainment.
In the detection scene of the AR projection system, multi-category icon detection is needed to be realized; the characteristics of the icons are leaner than those of faces, vehicles and the like, scale differences can exist between the icons, and various conditions such as confusion exist; in addition, rapid development of the model is required, and each application software APP corresponds to different detection models, so that a large amount of time cannot be spent for acquiring a large amount of data for model training. In addition, the interactive AR projection system has two characteristics of identifying the scene by the icon, namely complex light and shadow conditions and higher delay requirements. Therefore, the developed algorithm needs to meet the requirement of light weight, so that time consumption is reduced, and the algorithm has better illumination invariance to cope with illumination changes such as over-illumination, over-darkness and the like, and multi-icon type classification and identification are completed.
In the related art, in the process of detecting an icon of an AR projection system, a multitasking convolutional neural network (Multi-task Cascaded Convolutional Networks, abbreviated as MTCNN) algorithm is directly used and expanded, or the number of feature points is adjusted, or a network structure is slightly changed, or a new feature strengthening module is connected in series, so that after the corresponding adjustment, the method is directly migrated to different use scenes, and is essentially single-class key point detection, and a large amount of data is required to be collected to ensure the training effect of the MTCNN.
Aiming at the problem of low icon detection accuracy and efficiency of an AR projection system in the related art, no effective solution is proposed at present.
Disclosure of Invention
Aiming at the problem of low icon detection accuracy and efficiency of an AR projection system in the related art, the embodiment of the application at least solves the problem.
According to an aspect of the present application, there is provided an icon recognition method, the method including:
acquiring an image of an icon, and marking a mark frame on the icon on the image;
determining a training sample according to the marking frame as a reference, and carrying out data enhancement on the training sample;
training an MTCNN model of the multi-task convolutional neural network according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image;
and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for identifying the image of the icon.
In one embodiment, the determining the training sample based on the labeling frame includes:
generating a candidate frame according to the labeling frame as a reference;
and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.
In one embodiment, the determining the training sample according to the labeling frame as a reference, and performing data enhancement on the training sample includes:
according to a preset proportion, the extension frame is obtained by taking the marking frame as a reference in an extending way;
executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label;
and determining a labeling frame and a candidate frame on the foreground label, and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.
In one embodiment, training the MTCNN model of the multitasking convolutional neural network according to the training sample, and obtaining and optimizing the foreground image and the recognition type of the icon in the image includes:
in the case that the MTCNN model includes a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.
According to another aspect of the present application, there is also provided an icon recognition method applied to an augmented reality AR projection system, the method including:
acquiring an image of an icon to be identified;
extracting a foreground region of the image through a Pnet layer of an MTCNN model of a multitasking convolutional neural network according to the image to obtain a first candidate frame;
layering and polymerizing the first candidate frames to obtain second candidate frames;
and the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon.
In one embodiment, the acquiring the image of the icon to be identified includes:
and acquiring an image of the icon to be identified in a preset detection area of the projection image.
In one embodiment, the hierarchically aggregating the first candidate frame to obtain the second candidate frame includes:
and classifying the plurality of first candidate frames into one type of candidate frames under the condition that the positions of the first candidate frames are overlapped, and taking the circumscribed rectangle of the plurality of first candidate frames as the circumscribed rectangle of the second candidate frame.
In one embodiment, after the identifying the category of the icon, the method includes:
and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.
In one embodiment, before the capturing the image of the icon to be identified, the method includes:
and under the condition that the projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction indicates that the image of the icon is acquired.
According to another aspect of the present application, there is also provided a system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;
the projector projects images in a working area of the image pickup device;
the camera device acquires the image of the icon to be identified;
the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;
and the main control device instructs the projector to play the dynamic effect corresponding to the category according to the category of the icon.
By the present application, there is provided a system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device instructs the projector to play dynamic effects corresponding to the types according to the types of the icons, so that the problem that the detection accuracy and efficiency of the AR projection system icons are low is solved, and the detection accuracy and efficiency of the AR projection system icons are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a block diagram of an augmented reality AR icon recognition system according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for icon recognition according to an embodiment of the present application;
FIG. 3 is a schematic view of a scenario of training sample annotation according to an embodiment of the application;
FIG. 4 is a schematic diagram of the effect of foreground annotation extraction of epitaxial extensions according to an embodiment of the application;
FIG. 5 is a schematic diagram of an effect after foreground annotation extraction according to an embodiment of the application;
FIG. 6 is a schematic diagram of foreground annotation replacement background data augmentation according to an embodiment of the application;
FIG. 7 is a flow diagram of a training phase of icon recognition according to an embodiment of the application;
FIG. 8 is a second flowchart of an icon recognition method according to an embodiment of the present application;
FIG. 9 is a flow chart of MTCNN model based icon detection according to an embodiment of the present application;
FIG. 10 is a schematic diagram of programming APP detecting a programming icon in accordance with an embodiment of the present application.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
The interactive AR projection system is a man-machine interaction system which combines a projector and a color depth camera, an interaction interface is projected to any plane through the projector, human gesture interaction actions are detected through the color depth camera, and animation/pictures/sounds are played at the identified object or a specific position through the response of the projector, so that the aim of augmented reality is fulfilled, and the interactive AR projection system is very suitable for the education industry.
Color depth camera: an apparatus that can capture color images (RGB frames) and Depth images (Depth frames), which can be abbreviated as "RGB-D camera". The color image acquisition principle is the same as that of a common camera; depth image acquisition principles include structured light, time of flight (Time of flight), binocular cameras, etc. Taking the structured light scheme as an example, the camera comprises an infrared emission module, an infrared light supplementing module, an RGB+infrared camera module and the like.
The projector is a device capable of projecting images or videos on any plane, and a projector manufacturer integrates a digital micro-reflector (Digital Micromirror Device, abbreviated as DMD) display core, a light source, a lens light path and heat dissipation into a mechanism to form an integral component.
In this embodiment, a system for augmented reality AR icon recognition is provided, fig. 1 is a block diagram of a system for augmented reality AR icon recognition according to an embodiment of the present application, as shown in fig. 1, the system includes: projector 12, master control device 14, and image pickup device 16; wherein the main control device 14 is respectively connected with the projector 12 and the camera device 16; the projector 12 projects an image on the working area of the camera 16; the image pickup device 16 acquires the image of the icon to be recognized; the main control device 14 extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frames to obtain second candidate frames; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device 14 instructs the projector 12 to play the dynamic effect corresponding to the category according to the category of the icon, and through the system, the problem of low detection accuracy and efficiency of the AR projection system icon is solved, and the detection accuracy and efficiency of the AR projection system icon are improved.
In one embodiment, card-like icon detection by means of an interactive AR projection system is a very popular form of education. Based on the detection algorithm, various APP can be developed, and infants can complete corresponding learning through interaction with the entity card, and the learning specifically comprises animal detection, color and shape detection and the like. Compared with pure click interaction of pad, mobile phone and the like, the method has the advantages that the method is more abundant in form and popular with infants when being used for carrying out entity interaction with objects such as card icons and the like.
The embodiment of the application provides an object detection algorithm of a novel interactive AR projection system, which realizes stable detection of icons based on MTCNN, is particularly suitable for preschool education and education scenes, and is characterized in that different APP corresponds to different detection contents, such as animal icon detection, plant icon detection, shape icon detection and the like, according to different education scenes, different detection models are correspondingly required to be provided according to different scenes, rapid development of the models is required to be realized, and a general and robust flow from data generation to model training is constructed; when the interactive AR projection system is actually used, the background projection of the corresponding app is superimposed on the entity card; meanwhile, the color camera is easy to be influenced by environmental illumination when acquiring images, and the problems of overexposure, darkness and the like are generated; therefore, the detection algorithm needs to be robust to changes in shadows; compared with a card recognition algorithm with a completely fixed position, the application range of detection with an unfixed position is wider, and the application range is more friendly for both APP development and infant users, so that the detection algorithm needs to realize card detection in a certain area; meanwhile, considering the hardware limitation of an AR projection system, the forward delay of a detection algorithm is small, and the overall detection delay is within 500 ms; if confusion is easy to occur between actually recognized objects, the detection algorithm is required to have the capability of distinguishing, so that false detection and omission are reduced. For example, when detecting a color and shape class object, the rectangle is formed by splicing two squares, and the detection algorithm needs to avoid false detection of the rectangle as two squares.
In this embodiment, a method for identifying an icon is provided, fig. 2 is a flowchart of a method for identifying an icon according to an embodiment of the present application, as shown in fig. 2, and the method includes the following steps:
step S202, an image of an icon is obtained, and a frame is marked on the icon on the image; the image of the icon to be identified is acquired through the AR projection system, for example, the acquired image can be 640x480 pixels, a plurality of icons to be identified can be contained, rectangular labeling frames are required to be provided for all the icons, and labeling frames are optionally labeled by label software such as labelme;
step S204, determining a training sample according to the labeling frame as a reference, performing data enhancement on the training sample, wherein the data enhancement comprises the addition of geometric transformations such as rotation, perspective transformation and the like after the training sample is extracted, and in addition, because the AR projection system is required to have higher requirements on illumination, the addition of illumination enhancement is also required, and the method can comprise the step of simulating data states under different conditions such as overexposure, shadow, contrast change and the like, and can be uniformly scaled to a certain scale after the training sample data enhancement is finished so as to facilitate the subsequent model training;
step S206, training an MTCNN model of the multi-task convolutional neural network according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image, wherein the MTCNN model can realize the foreground image and multi-category detection of the icon;
step S208, screening and storing the MTCNN model according to the preset conditions, wherein the MTCNN model is used for image recognition of the icon, and the combination with good effect can be screened from the MTCNN models trained by the MTCNN model, and converted into a common pb format for application of subsequent image recognition.
Through the steps S202 to S210, the method includes two parts, namely a training phase and a detection phase, wherein the training phase determines a training sample according to the image of the marked icon, trains the MTCNN model according to the training sample, and the detection phase detects the icon under the actual scene by means of the obtained MTCNN model, so that the problem of low icon detection accuracy and efficiency of the AR projection system is solved, and the icon detection accuracy and efficiency of the AR projection system are improved.
In one embodiment, the process of determining training samples based on the annotation box comprises: generating a candidate frame according to the labeling frame as a reference; and determining a foreground region of the icon and a training sample for identifying the category according to the Intersection-over-Union (IOU) of the annotation frame and the candidate frame. Optionally, the MTCNN model is used for detecting a foreground region required to be extracted from an input image and judging a specific category thereof, fig. 3 is a schematic view of a scene marked by a training sample according to an embodiment of the present application, and as shown in fig. 3, the training sample may include a pos sample, a part sample and a neg sample generated according to marking data, where the pos sample directs a network to locate the foreground region and identify the category, the part sample directs the network to locate the foreground region, and the neg sample directs the network to identify the background region. The data generating method of the training sample is to randomly generate candidate rectangular areas with different scales and different positions by taking a labeling frame as a reference, and classify the candidate rectangular areas by judging the IOU with the labeling frame, wherein the IOU is a concept used in target detection, and calculates the overlapping rate of the generated area (C) (candidate bound) and the area (G) (original labeling frame (ground truth bound)), namely the ratio of the intersection and the union of the candidate rectangular areas, as shown in a calculation formula 1:
the candidate frame with the labeling frame IOU smaller than 0.3 is a neg sample, the candidate frame with the labeling frame IOU larger than 0.65 is a pos sample, and the candidate frame between the labeling frame IOU and the labeling frame IOU is a part sample. In addition, in the class label, a neg sample is represented by 0, pos samples are represented by 1-n, and part samples are represented by-1-n. In one embodiment, the determining the training sample based on the labeling frame, and the enhancing the data of the training sample includes: according to the preset proportion, the extension frame is obtained by taking the marking frame as a reference extension; executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label; determining a label frame and a candidate frame on the foreground label, and determining a foreground region of the icon and a training sample of an identification category according to an intersection ratio IOU of the label frame and the candidate frame, for example, in a training sample data generation process, because the label frame is always rectangular and is positioned in the foreground region, but if the foreground region is damaged in the training sample generation process, identification stability may be affected, so that the foreground region needs to be further extracted, under the condition that the integrity of the foreground region is ensured in training sample data, the influence of the background where the icon is positioned can be reduced, but if the foreground of the icon has an irregular shape, the foreground region is manually labeled, which is too time-consuming and labor-consuming, the foreground can be automatically extracted by using a grabcut algorithm, fig. 4 is an effect schematic diagram of the epitaxial extension of the foreground label extraction according to an embodiment of the application, and fig. 5 is an effect schematic diagram after the foreground label extraction according to an embodiment of the application, as shown in fig. 4 and fig. 5, the specific flow of extraction is: the original rectangular marking frame is used as a reference, an expansion frame is obtained in an extending mode, the expansion proportion is 1.2, and the expansion frame can be adjusted according to actual icon conditions; executing a grabcut algorithm in the expansion frame to finish foreground annotation extraction; in addition, fig. 6 is a schematic diagram of foreground labeling replacing background data addition, as shown in fig. 6, after foreground labeling is extracted, data enhancement can be performed through replacement of a plurality of preset backgrounds, a pos sample, a part sample and a neg sample can be generated based on the foreground labeling on the basis of a labeling frame, the integral integrity of a training sample after the foreground labeling is ensured, the robustness of an identification model to different desktop environments is improved, and the accuracy of icon identification is improved through extraction of the foreground labeling, wherein the data enhancement is performed on the training sample.
In some embodiments, where the MTCNN model includes a Pnet layer and an Rnet layer, the Pnet layer and the Rnet layer are cascaded, the training sample input the Pnet layer extracts a foreground region of the icon, and the Rnet layer identifies a category of the icon and optimizes a location of the foreground region of the icon. FIG. 7 is a schematic flow chart of a training stage of icon recognition according to an embodiment of the present application, as shown in FIG. 7, after training sample data is enhanced, the training sample data may be uniformly scaled to a certain scale to facilitate model training, where a Pnet layer may require that input data have a size of 12×12 pixels, and an Rnet layer may require that input data have a size of 24×24 pixels; and further managing and packaging the training sample data, wherein the managing and packaging comprises the steps of controlling the ratio of neg-part-pos samples to be 2:1:1, packaging the data into a format required by training, and the like. The Pnet layer and Rnet layer combined detection algorithm in the MTCNN model transfers single-class key point detection in the related technology to multi-class detection, modifies the algorithm, cancels a third-stage network for key point detection, expands a two-stage neural network in front of the Pnet layer and the Rnet layer into multi-class detection, and realizes enhancement and optimization of a neural network training function.
In addition, the MTCNN model needs to train two layers including a Pnet layer and an Rnet layer, the Pnet layer is used in cascade when in use, the Pnet layer finishes detection and extraction of a foreground region on the whole graph, the Rnet layer further finishes identification in the foreground region to obtain a specific category and optimizes a foreground position, and finally a result is returned. The Pnet layer and the Rnet layer may be trained respectively using training sample data of corresponding scales, respectively. The model of the embodiment of the application does not specify the framework used for training, can use a plurality of frameworks such as matlab, caffe, tensorflow, keras, pytorch and the like, and can preferably use tensorf low to complete corresponding training. The Pnet layer and the Rnet layer can be full convolution networks, and each comprise three branches of foreground judgment, minimum Bounding box (called bbox for short) regression and category classification. After model training is completed, the cascading model effect of Pnet layer detection and Rnet layer identification needs to be verified, and a combination with good effect is screened from each historical training model and is converted into a common pb format for a detection stage of subsequent image identification.
In this embodiment, a method for identifying an icon is provided, the method is applied to an augmented reality AR projection system, fig. 8 is a flowchart two of a method for identifying an icon according to an embodiment of the present application, fig. 9 is a flowchart illustrating an icon detection based on an MTCNN model according to an embodiment of the present application, and as shown in fig. 8 and 9, the method includes the following steps:
step S502, acquiring an image of the icon to be identified, where the acquired image may be a high-resolution color map projected by the AR projection system, for example, the image is a color map of 640×480 pixels;
step S504, extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model to obtain a first candidate frame, and carrying out hierarchical aggregation on the first candidate frame to obtain a second candidate frame; after the AR projection system loads the Pnet layer and Rnet layer files, calculating the ratio of the preset scale of the icon to the Pnet layer training scale, after scaling the original image, running the Pnet layer to perform detection and extract the corresponding foreground region to obtain the candidate frame of the Pnet layer, where the candidate frame includes position information and category information, for example, the size of the icon to be identified is about 50×50 pixels, the preset scale is 50, the ratio of the preset scale to the Pnet layer training scale 12 is 0.24, and the original image is reduced to 0.24 times. In some embodiments, since the sizes of the cards of the icons are substantially uniform in the same execution task, the detection stage of the Pnet layer is only fixed to one scale, and the preset scale of the icons is insensitive to the interference of the AR projection system and has good scale invariance due to the adaptability of the Pnet layer to the foreground region. In addition, under the condition that the image identified by the Pnet layer is sent to be too small, the category result obtained by the Pnet layer is often inaccurate, and the images can be further accurately classified by the Rnet layer; when the sizes of all icons are basically consistent, or the icons are inconsistent but are not easy to mix, the candidate frames generated by the Pnet layer are directly sent to the Rnet layer for classification, but when the icons are mixed, for example, the part of the icon A is similar to the part of the icon B, the error detection is caused by directly sending the candidate frames to the Rnet layer for training. At the moment, layering aggregation is needed to be carried out on candidate frames of the Pnet layer, and the candidate frames with basically correct positions and sizes are obtained and sent into the Rnet layer;
and S506, the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, identifies the category of the icon, intercepts the corresponding image from the original image by utilizing the candidate frame directly generated by the Pnet layer or subjected to layering aggregation, zooms to 24x24 pixels, sends the zoomed image to the Rnet layer for category identification, and optimizes the position of the candidate frame.
Through the requirements of the scenes of the AR projection system from step S502 to step S506, the rapid detection of card icons is realized, the MTCNN model algorithm is used for realizing multi-category detection, the model supports the detection of icons with different scales through the hierarchical aggregation algorithm, the recognition speed is high and accurate, the method is suitable for the interactive case development under relevant scenes such as education, the problem of low detection accuracy and efficiency of the icons of the AR projection system is solved, and the detection accuracy and efficiency of the icons of the AR projection system are improved.
In one embodiment, the hierarchically aggregating the first candidate frame to obtain a second candidate frame includes: and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into one type of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames. The aggregation method is to judge the correlation of the candidate frames according to the positions of the candidate frames, classify the candidate frames with overlapped positions into one class and generate the candidate frames with circumscribed rectangles as a whole. Layering means that large-scale candidate frames are obtained through aggregation of small-scale candidate frames. In the layering aggregation flow, all candidate frames generated by the Pnet layer are input, and the aggregated candidate frames are output, wherein the specific flow is as follows:
step S1, obtaining all candidate frames generated by a Pnet layer, wherein the total number of the candidate frames is n;
step S2, calculating the intersection degree between every two candidate frames to generate an n x n upper triangle candidate frame distance matrix D;
step S3, sequencing all the candidate frames, taking the sequence numbers of the candidate frames as aggregation ids to obtain an aggregation vector C, wherein all the candidate frames are mutually independent in an initial state;
step S4, traversing the candidate box for i= … n;
step S5, traversing the candidate box for j=i … n;
step S6, if distance d ij If the number is larger than the threshold value, merging the candidate frame j with the candidate frame i according to the corresponding situation;
step S7, if the candidate frames i and j are independent, C j =i;
Step S8, the candidate frames i are independent, the candidate frames j are not independent, and C is inevitable at the moment j <i is C i =j;
Step S9, if the candidate frame i is not independent and the candidate frame j is independent, C j =i;
Step S10, if neither candidate box i nor j is independent, min=min (C j ,C i ),Max=max(C j ,C i ) Traversing the aggregate vector C to replace the value equal to Max therein with Min;
step S11, obtaining an updated aggregate vector C, wherein the unrepeated id is an aggregate center;
step S12, fusing candidate frames belonging to the same aggregation center according to an aggregation vector C, and taking an upper extreme value and a lower extreme value to obtain an external rectangle;
and S13, returning all the circumscribed rectangles to obtain candidate frames after Pnet layering aggregation.
Through a hierarchical aggregation algorithm, the identified neural network model of the AR projection system can be adapted to detection of icons of different scales. If the multi-scale problem is directly solved through the neural network model, more training data are required to be acquired, the larger and heavier neural network model is used, the forward time consumption of deployment is increased, and meanwhile, the problem of false detection cannot be completely eradicated. Hierarchical aggregation solves the problem of multi-scale confusion by only data post-processing, and is more flexible and efficient.
In one embodiment, an image of an icon to be identified is acquired in a preset detection area of a projection image, fig. 10 is a schematic diagram of detecting a programmed icon by a programming APP according to an embodiment of the present application, as shown in fig. 10, in the AR icon identification system, a card corresponding to the icon is prepared to be detected in a corresponding APP, and an MTCNN algorithm model detects the image of the whole projection area provided. In another embodiment, a detection area may be preset in the projection area, and the MTCNN algorithm model is used to detect the preset detection area in the projection area in the preset detection area, so as to further reduce the forward delay of the MTCNN algorithm model.
In one embodiment, upon detecting that a projected trigger button of the projected area is triggered, a trigger instruction is regenerated, the trigger instruction indicating that an image of the icon is acquired. For example, the card icon to be identified is placed in the projection area, the trigger button detected on the projection area is clicked, the operation system end of the projection system triggers the detection task, and the whole projection area or the image of the preset detection area is detected through the MTCNN algorithm.
In one embodiment, after the category of the icon is identified, an indication signal is generated according to the category, where the indication signal is used to instruct the projector 12 to play a dynamic effect corresponding to the category, the identification result is returned to the system layer of the AR projection system, and after the projection system receives the identification result, the projection system projects the identification result through the projector, plays a corresponding animation, a sound effect, and so on, and the detection algorithm waits for the next detection trigger.
In another embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method of icon recognition.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (9)

1. An icon recognition method, applied to an augmented reality AR projection system, comprising:
acquiring an image of an icon to be identified;
extracting a foreground region of the image through a Pnet layer of an MTCNN model of a multitasking convolutional neural network according to the image to obtain a first candidate frame;
layering and polymerizing the first candidate frames to obtain second candidate frames;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;
wherein layering and polymerizing the first candidate frame to obtain a second candidate frame comprises:
and classifying the plurality of first candidate frames into one type of candidate frames under the condition that the positions of the first candidate frames are overlapped, and taking the circumscribed rectangle of the plurality of first candidate frames as the circumscribed rectangle of the second candidate frame.
2. The method according to claim 1, characterized in that before acquiring the image of the icon to be identified, the method comprises:
acquiring an image of an icon, and marking a mark frame on the icon on the image;
determining a training sample according to the marking frame as a reference, and carrying out data enhancement on the training sample;
training an MTCNN model of the multi-task convolutional neural network according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image;
and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for identifying the image of the icon.
3. The method of claim 2, wherein determining training samples based on the annotation box comprises:
generating a candidate frame according to the labeling frame as a reference;
and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.
4. The method of claim 2, wherein determining a training sample based on the annotation frame, and wherein data enhancing the training sample comprises:
according to a preset proportion, the extension frame is obtained by taking the marking frame as a reference in an extending way;
executing a grabcut algorithm in the expansion frame to finish the extraction of the foreground label and replace the background of the foreground label to carry out data enhancement;
and determining a labeling frame and a candidate frame on the foreground label, and determining a foreground region of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame.
5. The method of claim 2, wherein training the MTCNN model of the multitasking convolutional neural network based on the training samples to obtain and optimize the foreground image and the recognition type of the icon in the image comprises:
in the case that the MTCNN model includes a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.
6. The method of claim 1, wherein the acquiring an image of an icon to be identified comprises:
and acquiring an image of the icon to be identified in a preset detection area of the projection image.
7. The method of claim 1, wherein after the identifying the category of the icon, the method comprises:
and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.
8. The method of claim 1, wherein prior to the capturing of the image of the icon to be identified, the method comprises:
and under the condition that the projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction indicates that the image of the icon is acquired.
9. A system for augmented reality AR icon recognition, the system comprising: the device comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;
the projector projects images in a working area of the image pickup device;
the camera device acquires the image of the icon to be identified;
the main control device extracts a foreground region of the image through a Pnet layer of a multi-task convolutional neural network MTCNN model according to the image to obtain a first candidate frame; layering and polymerizing the first candidate frames to obtain second candidate frames;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;
the main control device instructs the projector to play dynamic effects corresponding to the categories according to the categories of the icons;
wherein layering and polymerizing the first candidate frame to obtain a second candidate frame comprises:
and classifying the plurality of first candidate frames into one type of candidate frames under the condition that the positions of the first candidate frames are overlapped, and taking the circumscribed rectangle of the plurality of first candidate frames as the circumscribed rectangle of the second candidate frame.
CN202010217757.1A 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system Active CN111523390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010217757.1A CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010217757.1A CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Publications (2)

Publication Number Publication Date
CN111523390A CN111523390A (en) 2020-08-11
CN111523390B true CN111523390B (en) 2023-11-03

Family

ID=71910429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010217757.1A Active CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Country Status (1)

Country Link
CN (1) CN111523390B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686851B (en) * 2020-12-25 2022-02-08 合肥联宝信息技术有限公司 Image detection method, device and storage medium
CN113808186B (en) * 2021-03-04 2024-01-16 京东鲲鹏(江苏)科技有限公司 Training data generation method and device and electronic equipment
CN113012189A (en) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 Image recognition method and device, computer equipment and storage medium
CN113409231A (en) * 2021-06-10 2021-09-17 杭州易现先进科技有限公司 AR portrait photographing method and system based on deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609485A (en) * 2017-08-16 2018-01-19 中国科学院自动化研究所 The recognition methods of traffic sign, storage medium, processing equipment
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
CN109635768A (en) * 2018-12-20 2019-04-16 深圳市捷顺科技实业股份有限公司 Parking stall condition detection method, system and relevant device in a kind of picture frame
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN110826391A (en) * 2019-09-10 2020-02-21 中国三峡建设管理有限公司 Bleeding area detection method, bleeding area detection system, computer device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7466841B2 (en) * 2004-08-16 2008-12-16 Siemens Corporate Research, Inc. Method for traffic sign detection
WO2017166137A1 (en) * 2016-03-30 2017-10-05 中国科学院自动化研究所 Method for multi-task deep learning-based aesthetic quality assessment on natural image
US11222196B2 (en) * 2018-07-11 2022-01-11 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609485A (en) * 2017-08-16 2018-01-19 中国科学院自动化研究所 The recognition methods of traffic sign, storage medium, processing equipment
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
CN109635768A (en) * 2018-12-20 2019-04-16 深圳市捷顺科技实业股份有限公司 Parking stall condition detection method, system and relevant device in a kind of picture frame
CN110826391A (en) * 2019-09-10 2020-02-21 中国三峡建设管理有限公司 Bleeding area detection method, bleeding area detection system, computer device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Zhaowei Cai et al..Cascade R-CNN: Delving into High Quality Object Detection.《https://arxiv.org》.2017,全文. *

Also Published As

Publication number Publication date
CN111523390A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN111523390B (en) Image recognition method and augmented reality AR icon recognition system
CN109947967B (en) Image recognition method, image recognition device, storage medium and computer equipment
US10440276B2 (en) Generating image previews based on capture information
CN109815843B (en) Image processing method and related product
KR102220174B1 (en) Learning-data enhancement device for machine learning model and method for learning-data enhancement
CN111738908B (en) Scene conversion method and system for generating countermeasure network by combining instance segmentation and circulation
CN112560999A (en) Target detection model training method and device, electronic equipment and storage medium
CN108292362A (en) Gesture identification for cursor control
CN110737785B (en) Picture labeling method and device
US20230394743A1 (en) Sub-pixel data simulation system
Beyeler OpenCV with Python blueprints
CN111476271B (en) Icon identification method, device, system, computer equipment and storage medium
CN111368944B (en) Method and device for recognizing copied image and certificate photo and training model and electronic equipment
US10891740B2 (en) Moving object tracking apparatus, moving object tracking method, and computer program product
US11657568B2 (en) Methods and systems for augmented reality tracking based on volumetric feature descriptor data
CN113570615A (en) Image processing method based on deep learning, electronic equipment and storage medium
US20200364034A1 (en) System and Method for Automated Code Development and Construction
Gerhardt et al. Neural network-based traffic sign recognition in 360° images for semi-automatic road maintenance inventory
Bong et al. Face recognition and detection using haars features with template matching algorithm
Barra et al. Can Existing 3D Monocular Object Detection Methods Work in Roadside Contexts? A Reproducibility Study
CN115115705A (en) Point cloud labeling method and device and vehicle
CN114821062A (en) Commodity identification method and device based on image segmentation
Bekhit Computer Vision and Augmented Reality in iOS
CN114120170A (en) Video picture analysis method, apparatus, device, medium, and program product
Kulishova et al. Impact of the textbooks’ graphic design on the augmented reality applications tracking ability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant