CN111523390A - Image recognition method and augmented reality AR icon recognition system - Google Patents

Image recognition method and augmented reality AR icon recognition system Download PDF

Info

Publication number
CN111523390A
CN111523390A CN202010217757.1A CN202010217757A CN111523390A CN 111523390 A CN111523390 A CN 111523390A CN 202010217757 A CN202010217757 A CN 202010217757A CN 111523390 A CN111523390 A CN 111523390A
Authority
CN
China
Prior art keywords
icon
image
frame
candidate
candidate frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010217757.1A
Other languages
Chinese (zh)
Other versions
CN111523390B (en
Inventor
林健
周志敏
刘海伟
丛林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yixian Advanced Technology Co ltd
Original Assignee
Hangzhou Yixian Advanced Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yixian Advanced Technology Co ltd filed Critical Hangzhou Yixian Advanced Technology Co ltd
Priority to CN202010217757.1A priority Critical patent/CN111523390B/en
Publication of CN111523390A publication Critical patent/CN111523390A/en
Application granted granted Critical
Publication of CN111523390B publication Critical patent/CN111523390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a system for identifying Augmented Reality (AR) icons, which comprises: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device indicates the projector to play the dynamic effect corresponding to the type according to the type of the icon, the problem that the AR projection system icon detection accuracy and efficiency are not high is solved, and the AR projection system icon detection accuracy and efficiency are improved.

Description

Image recognition method and augmented reality AR icon recognition system
Technical Field
The invention relates to the field of image recognition, in particular to an image recognition method and an Augmented Reality (AR) icon recognition system.
Background
An interactive Augmented Reality (AR) projection system is a man-machine interaction system combining a projector and a color depth camera, an interaction interface is projected onto any plane through the projector, human gesture interaction actions are detected through the color depth camera, animation, pictures and sound are played at a recognized object or a specific position through the response of the projector, the purpose of Augmented Reality is achieved, and the AR projection system is very suitable for industries such as education, entertainment and the like.
Under the detection scene of the AR projection system, the detection of multiple classes of icons needs to be realized; the characteristics of the icons are poor compared with human faces, vehicles and the like, scale differences may exist among the icons, and various conditions such as confusion exist; in addition, the rapid development of models is required, each application software APP corresponds to different detection models, and a large amount of time cannot be spent on collecting a large amount of data for model training. In addition, the interactive AR projection system has two characteristics of complex light and shadow conditions and high delay requirement when identifying the scene to identify the icon. Therefore, the developed algorithm needs to meet the requirement of light weight so as to reduce time consumption, has better illumination invariance so as to deal with illumination changes such as over-brightness, over-darkness and the like, and completes multi-icon type classification and identification.
In the related art, in the process of detecting an AR projection system icon, a Multi-task convolved neural network (MTCNN) algorithm is directly used and expanded, or the number of feature points is adjusted, or a network structure is slightly changed, or a new feature enhancement module is connected in series, and after corresponding adjustment is completed, the method is directly migrated to different use scenes, and essentially single-category key point detection is performed, and a large amount of data needs to be collected to ensure the MTCNN training effect.
Aiming at the problem that the icon detection accuracy and efficiency of an AR projection system are not high in the related technology, an effective solution is not provided at present.
Disclosure of Invention
Aiming at the problem that the icon detection accuracy and efficiency of the AR projection system are not high in the related art, the embodiment of the invention at least solves the problem.
According to an aspect of the present invention, there is provided an icon identifying method, the method including:
acquiring an image of an icon, and marking a marking frame on the icon on the image;
determining a training sample according to the marking frame as a reference, and performing data enhancement on the training sample;
training a multi-task convolutional neural network (MTCNN) model according to the training samples to obtain and optimize a foreground image and an identification type of the icon in the image;
and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for image identification of the icon.
In one embodiment, the determining the training samples based on the labeled box comprises:
generating a candidate frame according to the marking frame as a reference;
and determining the foreground area of the icon and a training sample of the recognition category according to the intersection ratio IOU of the labeling frame and the candidate frame.
In one embodiment, the determining a training sample according to the reference of the labeled box includes:
according to a preset proportion, extending by taking the marking frame as a reference to obtain an extension frame;
executing grabcut algorithm in the expansion frame to finish the extraction of the foreground label;
and determining a marking frame and a candidate frame on the foreground marking, and determining a foreground area of the icon and a training sample of the identification type according to the intersection ratio IOU of the marking frame and the candidate frame.
In one embodiment, the training of the multi-task convolutional neural network MTCNN model according to the training samples to obtain and optimize the foreground image and the recognition type of the icon in the image includes:
in the case that the MTCNN model comprises a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.
According to another aspect of the present invention, there is also provided an icon recognition method applied to an augmented reality AR projection system, the method including:
acquiring an image of an icon to be identified;
extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network (MTCNN) model to obtain a first candidate frame according to the image;
performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, and identifies the category of the icon.
In one embodiment, the acquiring the image of the icon to be recognized includes:
and acquiring an image of the icon to be identified in a preset detection area of the projected image.
In one embodiment, the performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame includes:
and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into a class of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames.
In one embodiment, after the identifying the category of the icon, the method includes:
and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.
In one embodiment, before the acquiring the image of the icon to be recognized, the method includes:
and under the condition that a projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction instructs to acquire the image of the icon.
According to another aspect of the present invention, there is also provided a system for augmented reality AR icon recognition, the system comprising: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;
the projector projects an image in a working area of the camera device;
the camera device acquires the image of the icon to be identified;
the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network (MTCNN) model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;
and the master control device instructs the projector to play the dynamic effect corresponding to the type according to the type of the icon.
The invention provides a system for identifying Augmented Reality (AR) icons, which comprises the following components: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device; the projector projects images in a working area of the camera device; the camera device acquires the image of the icon to be identified; the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device indicates the projector to play the dynamic effect corresponding to the type according to the type of the icon, the problem that the AR projection system icon detection accuracy and efficiency are not high is solved, and the AR projection system icon detection accuracy and efficiency are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of an augmented reality AR icon recognition system according to an embodiment of the present invention;
FIG. 2 is a first flowchart of a method for identifying icons according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a scenario of training sample labeling according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the effect of extracting the extension of the foreground label according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating the effect of foreground annotation extraction according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of foreground labeling in place of background data augmentation, according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of a training phase of icon recognition according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating a method for identifying icons according to an embodiment of the present invention;
FIG. 9 is a schematic flow chart illustrating MTCNN model icon detection-based according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a program APP detecting a program icon according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The interactive AR projection system is a man-machine interaction system combining a projector and a color depth camera, an interactive interface is projected onto any plane through the projector, human gesture interaction actions are detected through the color depth camera, animation/pictures/sounds are played at identified objects or specific positions through the response of the projector, the purpose of reality enhancement is achieved, and the interactive AR projection system is very suitable for being used in the education industry.
Color depth camera: a device that can capture color images (RGB Frame) and Depth images (Depth Frame), may be abbreviated as "RGB-D camera". The color image acquisition principle is the same as that of a common camera; the depth image acquisition principle includes structured light, Time of flight (Time of flight), a binocular camera, and the like. Taking the structured light scheme as an example, the camera includes an infrared emission module, an infrared light supplement module, an RGB + infrared camera module, and the like.
The projector is a Device capable of projecting images or videos onto any plane, and a projector manufacturer integrates a Digital micro reflector (DMD) display core, a light source, a lens light path and heat dissipation into a single mechanism to form an integral component.
In this embodiment, a system for recognizing an augmented reality AR icon is provided, and fig. 1 is a block diagram of a structure of a system for recognizing an augmented reality AR icon according to an embodiment of the present invention, as shown in fig. 1, the system includes: a projector 12, a main control device 14, and a camera device 16; wherein, the main control device 14 is respectively connected with the projector 12 and the camera device 16; the projector 12 projects an image on the working area of the camera 16; the image pickup device 16 acquires the image of the icon to be recognized; the main control device 14 extracts a foreground region of the image through a Pnet layer of a multitask convolutional neural network MTCNN model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame; the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon; the main control device 14 instructs the projector 12 to play the dynamic effect corresponding to the type of the icon according to the type of the icon, and by the system, the problem that the icon detection accuracy and efficiency of the AR projection system are not high is solved, and the icon detection accuracy and efficiency of the AR projection system are improved.
In one embodiment, card icon detection via an interactive AR projection system is a very popular form of education. Various types of APP can be developed based on a detection algorithm, and the infant can complete corresponding learning through interaction with an entity card, specifically including animal detection, color and shape detection and the like. Compared with pure click interaction of pads, mobile phones and the like, the card interaction device has the advantages that the card interaction device can perform entity interaction with objects such as card icons and the like, and is richer in form and more popular with infants.
The embodiment of the invention provides a novel object detection algorithm of an interactive AR projection system, realizes the stable detection of icons based on MTCNN, is particularly suitable for preschool education and education scenes, and different APPs correspond to different detection contents such as animal icon detection, plant icon detection, shape icon detection and the like according to different education scenes; when the interactive AR projection system is actually used, the background projection of the corresponding app is superposed on the entity card; meanwhile, the color camera is easily influenced by ambient light when acquiring images, and problems of overexposure, over darkness and the like are caused; therefore the detection algorithm needs to be robust to light and shadow variations; compared with a completely fixed position card identification algorithm, the application range of unfixed position detection is wider, and the detection algorithm is more friendly to both development of APP and use of infant users, so that card detection in a certain area needs to be realized; meanwhile, the hardware limit of the AR projection system is considered, the forward delay of the detection algorithm is small, and the overall detection delay is within 500 ms; if confusion easily occurs between actually recognized objects, the detection algorithm is required to have distinguishing capability, and false detection and missing detection are reduced. For example, when detecting objects of color and shape categories, a rectangle is formed by splicing two squares, and the detection algorithm needs to avoid false detection of the rectangle as two squares.
In this embodiment, a method for icon identification is provided, and fig. 2 is a flowchart of a first icon identification method according to an embodiment of the present invention, as shown in fig. 2, the method includes the following steps:
step S202, acquiring an image of an icon, and marking a frame on the icon on the image; the method comprises the steps that an image of an icon to be identified is collected through an AR projection system, for example, the size of the obtained image can be 640x480 pixels, the image can comprise a plurality of icons to be identified, a rectangular marking frame needs to be provided for all the icons, and marking software such as labelme can be selected for marking the marking frame;
step S204, determining a training sample according to the marking frame as a reference, performing data enhancement on the training sample, wherein the data enhancement needs to be performed after the training sample is extracted, the data enhancement comprises enhancement of geometric transformation such as addition of rotation and perspective transformation, in addition, because the AR projection system needs to adapt to higher requirements on illumination, illumination enhancement needs to be added, data states under different conditions such as simulation of overexposure, shadow, contrast change and the like can be included, and after the training sample data is enhanced, the training sample data can be uniformly scaled to a certain scale so as to facilitate subsequent model training;
step S206, training a multitask convolutional neural network (MTCNN) model according to the training sample to obtain and optimize a foreground image and an identification type of the icon in the image, wherein the MTCNN model can realize foreground image and multi-class detection of the icon;
and S208, screening and storing the MTCNN model according to the preset identification condition, wherein the MTCNN model is used for image identification of the icon, and a combination with a good effect can be screened from each trained MTCNN model and converted into a commonly used pb format for application of subsequent image identification.
Through the steps S202 to S210, the method comprises a training stage and a detection stage, wherein the training stage determines a training sample according to an image of a labeled icon, the MTCNN model is trained according to the training sample, and the detection stage detects the icon in an actual scene by means of the obtained MTCNN model, so that the problems of low icon detection accuracy and efficiency of an AR projection system are solved, and the icon detection accuracy and efficiency of the AR projection system are improved.
In one embodiment, the process of determining the training samples based on the labeled boxes comprises: generating a candidate frame according to the marked frame as a reference; and determining a foreground area of the icon and a training sample of the recognition category according to an Intersection-over-Union (IOU) of the marked frame and the candidate frame. Optionally, the MTCNN model performs detection by extracting a foreground region from an input image and determining a specific category of the foreground region, fig. 3 is a schematic view of a scene labeled by a training sample according to an embodiment of the present invention, and as shown in fig. 3, the training sample may include a pos sample, a part sample, and a neg sample generated according to labeling data, where the pos sample directs a network to locate the foreground region and identify the category, the part sample directs the network to locate the foreground region, and the neg sample directs the network to identify the background region. The data generation method of the training sample randomly generates candidate rectangular regions with different scales and different positions by taking a labeling frame as a reference, and classifies the candidate rectangular regions by judging an IOU of the labeling frame, wherein the IOU is a concept used in target detection, and the overlapping rate of generated area (C) (candidate frame) and area (G) (original labeling frame), namely the ratio of the intersection to the union of the area (C) (candidate frame) and the area (G) (original labeling frame) is calculated, as shown in a calculation formula 1:
Figure BDA0002425019480000071
wherein, the candidate box with the mark box IOU smaller than 0.3 is a neg sample, the candidate box with the mark box IOU larger than 0.65 is a pos sample, and the candidate box in the middle of the mark box IOU is a part sample. In addition, in the class notation, 0 represents the neg sample, 1 to n represent the pos sample, and-1 to-n represent the part sample. In one embodiment, the determining a training sample based on the labeled box as a reference, the performing data enhancement on the training sample includes: according to a preset proportion, extending by taking the marking frame as a reference to obtain an extension frame; executing grabcut algorithm in the expansion frame to complete the extraction of the foreground label; determining a labeling frame and a candidate frame on the foreground label, and determining a foreground area of the icon and a training sample of the identification category according to the intersection ratio IOU of the labeling frame and the candidate frame, for example, in the generation process of the training sample data, because the labeling frame is always rectangular and is located in the foreground area, if the foreground area is damaged in the generation process of the training sample data, the identification stability may be affected, so that the foreground area needs to be further extracted, under the condition that the integrity of the foreground area is ensured in the training sample data, the influence of the background where the icon is located can be reduced, but if the foreground of the icon has an irregular shape, the foreground area is manually labeled, the foreground can be automatically extracted by using a grabcut algorithm, FIG. 4 is an effect schematic diagram of extracting the extension of the foreground label according to the embodiment of the present invention, FIG. 5 is an effect schematic diagram of extracting the foreground label according to the embodiment of the present invention, as shown in fig. 4 and 5, the specific process of extraction is: taking an original rectangular marking frame as a reference, extending to obtain an extension frame, wherein the extension ratio is 1.2, and the extension ratio can be adjusted according to the actual icon condition; executing grabcut algorithm in the expansion frame to finish foreground label extraction; in addition, fig. 6 is a schematic diagram of replacing background data with a foreground label according to an embodiment of the present invention, as shown in fig. 6, after a foreground label is extracted, data enhancement may be performed by replacing a plurality of preset backgrounds, and a pos sample, a part sample, and a neg sample may be generated based on a label frame on the basis of the foreground label, so as to ensure the integrity of the whole training sample after the foreground label is extracted, improve the robustness of the recognition model to different desktop environments, and perform data enhancement on the training sample by extracting the foreground label, thereby improving the accuracy of icon recognition.
In some embodiments, where the MTCNN model includes a Pnet layer and an Rnet layer, the Pnet layer and the Rnet layer being cascaded, the training sample being input to the Pnet layer to extract the foreground region of the icon, the Rnet layer identifying the category of the icon and optimizing the location of the foreground region of the icon. Fig. 7 is a schematic flow diagram of a training phase of icon recognition according to an embodiment of the present invention, and as shown in fig. 7, after training sample data is enhanced, the training sample data may be uniformly scaled to a certain scale to facilitate model training, where a Pnet layer may require an input data size of 12x12 pixels, and an Rnet layer may require an input data size of 24x24 pixels; and further managing and packaging the training sample data, wherein the managing and packaging comprises controlling the ratio of the neg-part-pos sample to be 2:1:1, packaging the data into a format required by training and the like. The joint detection algorithm of the Pnet layer and the Rnet layer in the MTCNN model transfers single-class key point detection in the related technology to multi-class detection, modifies the algorithm, cancels a third-class network of key point detection, expands the first two-class neural networks of the Pnet layer and the Rnet layer into multi-class detection, and realizes the enhancement and optimization of the training function of the neural networks.
In addition, the MTCNN model needs to train two layers including a Pnet layer and an Rnet layer, the two layers are used in a cascade mode when in use, the Pnet layer completes detection and extraction of a foreground area on the whole image, the Rnet layer further completes recognition in the foreground area to obtain specific categories and optimize foreground positions, and finally results are returned. The Pnet layer and the Rnet layer can be trained respectively by using training sample data of corresponding scales. The model of the embodiment of the invention does not specify the frame used for training, can use various frames such as matlab, caffe, tenserflow, keras and pyrrch, and preferably can use tenserflow to complete corresponding training. Both the Pnet layer and the Rnet layer may be full convolution networks, and include three branches of foreground judgment, minimum bounding box (bbox) regression, and category classification. And after the model training is finished, verifying the effect of the cascade model of Pnet layer detection and Rnet layer identification, screening out a combination with a better effect from each historical training model, and converting the combination into a commonly used pb format for a subsequent image identification detection stage.
In this embodiment, a method for icon identification is provided, where the method is applied to an augmented reality AR projection system, fig. 8 is a second flowchart of a method for icon identification according to an embodiment of the present invention, fig. 9 is a schematic flowchart of a MTCNN model-based icon detection according to an embodiment of the present invention, and as shown in fig. 8 and fig. 9, the method includes the following steps:
step S502, acquiring an image of the icon to be identified, where the acquired image may be a high-resolution color image projected by the AR projection system, for example, the image is a color image with 640 × 480 pixels;
step S504, according to the image, extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network (MTCNN) model to obtain a first candidate frame, and performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame; after loading files of a Pnet layer and a Rnet layer in an AR projection system, calculating a ratio of a preset scale of the icon to a training scale of the Pnet layer, zooming an original image of the image, running the Pnet layer to perform detection and extract a corresponding foreground area to obtain a candidate frame of the Pnet layer, wherein the candidate frame comprises position information and category information, for example, if the size of the icon to be recognized is about 50 x 50 pixels, the preset scale is 50, and if the ratio of the icon to the training scale 12 of the Pnet layer is 0.24, the original image is reduced to 0.24 time. In some embodiments, because the sizes of the cards of the icons in the same execution task are basically consistent, the detection phase of the Pnet layer is only fixed to one scale, and meanwhile, due to the adaptability of the Pnet layer to the foreground area, the preset scale of the icons is not sensitive to the interference of the AR projection system and has better scale invariance. In addition, under the condition that the image sent to the Pnet layer for identification is too small, the classification result obtained by the Pnet layer is often inaccurate, and the image can be further accurately classified through the Rnet layer; when the sizes of all the icons are basically consistent, or the icons are not consistent in size but are not easy to be confused with each other, the candidate frames generated by the Pnet layer are directly sent to the Rnet layer for classification, but if icon confusion occurs, for example, the part of the icon A is similar to the icon B, the error detection is caused by directly sending the candidate frames to the Rnet layer for training. At the moment, the candidate frames of the Pnet layer need to be layered and aggregated, and the candidate frames with basically correct positions and sizes are obtained and sent to the Rnet layer;
step S506, the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, identifies the type of the icon, intercepts the corresponding image from the original image by using the candidate frame generated directly by the Pnet layer or obtained by hierarchical aggregation, zooms to 24 × 24 pixels, sends the image to the Rnet layer for type identification, and optimizes the position of the candidate frame.
Through the steps S502 to S506, the rapid detection of the card type icons is realized according to the requirements of the AR projection system scene, the MTCNN model algorithm realizes the multi-class detection, the model supports the detection of the icons with different scales through the hierarchical aggregation algorithm, the recognition speed is high and accurate, the method is suitable for the development of interactive cases under relevant scenes such as education and the like, the problem that the AR projection system icon detection accuracy and efficiency are low is solved, and the AR projection system icon detection accuracy and efficiency are improved.
In one embodiment, the hierarchically aggregating the first candidate frame to obtain the second candidate frame includes: and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into a class of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames. The aggregation method is to judge the relevance of each candidate frame according to the position of the candidate frame, classify the candidate frames with overlapped positions into one class and generate the candidate frame with a circumscribed rectangle as a whole. The hierarchical meaning is that a large-scale candidate frame is obtained through aggregation of small-scale candidate frames. In the hierarchical aggregation flow, all candidate frames generated by the Pnet layer are input, and the candidate frames after aggregation are output, and the specific flow is as follows:
step S1, obtaining all candidate frames generated by the Pnet layer, wherein the total number is n;
step S2, calculating the intersection degree of the candidate frames pairwise, and generating an n x n upper triangular candidate frame distance matrix D;
step S3, sorting all the candidate frames, and obtaining an aggregation vector C by taking the respective serial numbers as aggregation ids, wherein all the candidate frames are mutually independent in an initial state;
step S4, the candidate frame for i is traversed by 1 … n;
step S5, the candidate frame for j ═ i … n is traversed;
step S6, if the distance dijIf the value is larger than the threshold value, combining the candidate frame j and the candidate frame i according to the corresponding condition;
step S7, if the candidate frames i and j are independent, Cj=i;
Step S8, the candidate box i is independent, the candidate box j is not independent, and C is inevitable at this timej<i, then Ci=j;
Step S9, if the candidate frame i is not independent and the candidate frame j is independent, then Cj=i;
In step S10, if the candidate frames i and j are not independent, Min is Min (C)j,Ci),Max=max(Cj,Ci) Traversing the aggregation vector C, and replacing the value equal to Max with Min;
step S11, obtaining an updated aggregation vector C, wherein the unrepeated id is an aggregation center;
step S12, fusing candidate frames belonging to the same aggregation center according to the aggregation vector C, and taking upper and lower extreme values to obtain an external rectangle;
and step S13, returning all the circumscribed rectangles which are the candidate frames after the Pnet hierarchical aggregation.
Through a hierarchical aggregation algorithm, the identified neural network model of the AR projection system can adapt to the detection of icons of different scales. If the multi-scale problem is solved directly through the neural network model, more training data need to be collected, a larger and heavier neural network model is used, the forward time consumption of deployment is increased, and meanwhile, the problem of false detection cannot be completely eradicated. The hierarchical aggregation can solve the problem of multi-scale confusion only through data post-processing, and is more flexible and efficient.
In an embodiment, in a preset detection area of a projected image, an image of an icon to be identified is acquired, fig. 10 is a schematic diagram of a programming APP according to an embodiment of the present invention for detecting the programming icon, as shown in fig. 10, in the system for identifying an AR icon, a card corresponding to the icon is prepared to be detected in the corresponding APP, and an MTCNN algorithm model is used for detecting the provided image of the whole projection area. In another embodiment, a detection area may be preset in the projection area, and the MTCNN algorithm model may be used to detect the preset detection area in the projection area in the set detection area, so as to further reduce the forward delay of the MTCNN algorithm model.
In one embodiment, in the case that the projection trigger button of the projection area is detected to be triggered, a trigger instruction is generated again, and the trigger instruction instructs to acquire the image of the icon. For example, an icon of a card to be recognized is placed in a projection area, a trigger button detected on the projection area is clicked, an operating system end of a projection system triggers a detection task, and an image of the whole projection area or a preset detection area is detected through an MTCNN algorithm.
In one embodiment, after the category of the icon is identified, an indication signal is generated according to the category, where the indication signal is used to instruct the projector 12 to play a dynamic effect corresponding to the category, the identification result is returned to the system layer of the AR projection system, after the projection system receives the identification result, the projection system plays a corresponding animation, sound effect, and the like through the projector, and the detection algorithm waits for the next detection trigger.
In another embodiment of the invention, a computer-readable storage medium is also provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method of icon recognition.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An icon recognition method, the method comprising:
acquiring an image of an icon, and marking a marking frame on the icon on the image;
determining a training sample according to the marking frame as a reference, and performing data enhancement on the training sample;
training a multi-task convolutional neural network (MTCNN) model according to the training samples to obtain and optimize a foreground image and an identification type of the icon in the image;
and screening and storing the MTCNN model according to the identified preset conditions, wherein the MTCNN model is used for image identification of the icon.
2. The method of claim 1, wherein the determining training samples based on the labeled boxes comprises:
generating a candidate frame according to the marking frame as a reference;
and determining the foreground area of the icon and a training sample of the recognition category according to the intersection ratio IOU of the labeling frame and the candidate frame.
3. The method of claim 1, wherein the determining training samples based on the labeled boxes comprises:
according to a preset proportion, extending by taking the marking frame as a reference to obtain an extension frame;
executing grabcut algorithm in the expansion frame to complete extraction of foreground label and replace the background of the foreground label for data enhancement;
and determining a marking frame and a candidate frame on the foreground marking, and determining a foreground area of the icon and a training sample of the identification type according to the intersection ratio IOU of the marking frame and the candidate frame.
4. The method of claim 1, wherein the training a multi-tasking convolutional neural network (MTCNN) model according to the training samples to obtain and optimize foreground images and recognition types of the icons in the images comprises:
in the case that the MTCNN model comprises a Pnet layer and an Rnet, the Pnet layer and the Rnet are cascaded, the training sample is input into the Pnet layer to extract a foreground region of the icon, and the Rnet identifies a category of the icon and optimizes a position of the foreground region of the icon.
5. An icon recognition method applied to an Augmented Reality (AR) projection system, the method comprising:
acquiring an image of an icon to be identified;
extracting a foreground region of the image through a Pnet layer of a multi-task convolutional neural network (MTCNN) model to obtain a first candidate frame according to the image;
performing hierarchical aggregation on the first candidate frame to obtain a second candidate frame;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame, and identifies the category of the icon.
6. The method of claim 5, wherein the obtaining the image of the icon to be recognized comprises:
and acquiring an image of the icon to be identified in a preset detection area of the projected image.
7. The method of claim 5, wherein the hierarchically aggregating the first candidate frame to obtain a second candidate frame comprises:
and under the condition that the positions of the first candidate frames are overlapped, classifying the first candidate frames into a class of candidate frames, and taking the circumscribed rectangles of the first candidate frames as the circumscribed rectangles of the second candidate frames.
8. The method of claim 5, wherein after identifying the category of the icon, the method comprises:
and generating an indication signal according to the category, wherein the indication signal is used for indicating the projector to play the dynamic effect corresponding to the category.
9. The method of claim 5, wherein before the obtaining the image of the icon to be recognized, the method comprises:
and under the condition that a projection trigger button of the projection area is triggered, generating a trigger instruction, wherein the trigger instruction instructs to acquire the image of the icon.
10. A system for Augmented Reality (AR) icon recognition, the system comprising: the system comprises a projector, a main control device and a camera device; the main control device is respectively connected with the projector and the camera device;
the projector projects an image in a working area of the camera device;
the camera device acquires the image of the icon to be identified;
the main control device extracts a foreground area of the image through a Pnet layer of a multitask convolutional neural network (MTCNN) model according to the image to obtain a first candidate frame; classifying and aggregating the first candidate frame to obtain a second candidate frame;
the Rnet layer of the MTCNN model intercepts the image of the icon from the second candidate frame and identifies the category of the icon;
and the master control device 14 instructs the projector to play the dynamic effect corresponding to the type according to the type of the icon.
CN202010217757.1A 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system Active CN111523390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010217757.1A CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010217757.1A CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Publications (2)

Publication Number Publication Date
CN111523390A true CN111523390A (en) 2020-08-11
CN111523390B CN111523390B (en) 2023-11-03

Family

ID=71910429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010217757.1A Active CN111523390B (en) 2020-03-25 2020-03-25 Image recognition method and augmented reality AR icon recognition system

Country Status (1)

Country Link
CN (1) CN111523390B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686851A (en) * 2020-12-25 2021-04-20 合肥联宝信息技术有限公司 Image detection method, device and storage medium
CN113012189A (en) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 Image recognition method and device, computer equipment and storage medium
CN113409231A (en) * 2021-06-10 2021-09-17 杭州易现先进科技有限公司 AR portrait photographing method and system based on deep learning
CN113808186A (en) * 2021-03-04 2021-12-17 京东鲲鹏(江苏)科技有限公司 Training data generation method and device and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060034484A1 (en) * 2004-08-16 2006-02-16 Claus Bahlmann Method for traffic sign detection
CN107609485A (en) * 2017-08-16 2018-01-19 中国科学院自动化研究所 The recognition methods of traffic sign, storage medium, processing equipment
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
US20190026884A1 (en) * 2016-03-30 2019-01-24 Institute Of Automation, Chinese Academy Of Sciences Method for assessing aesthetic quality of natural image based on multi-task deep learning
CN109635768A (en) * 2018-12-20 2019-04-16 深圳市捷顺科技实业股份有限公司 Parking stall condition detection method, system and relevant device in a kind of picture frame
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
US20200019759A1 (en) * 2018-07-11 2020-01-16 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums
CN110826391A (en) * 2019-09-10 2020-02-21 中国三峡建设管理有限公司 Bleeding area detection method, bleeding area detection system, computer device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060034484A1 (en) * 2004-08-16 2006-02-16 Claus Bahlmann Method for traffic sign detection
US20190026884A1 (en) * 2016-03-30 2019-01-24 Institute Of Automation, Chinese Academy Of Sciences Method for assessing aesthetic quality of natural image based on multi-task deep learning
CN107609485A (en) * 2017-08-16 2018-01-19 中国科学院自动化研究所 The recognition methods of traffic sign, storage medium, processing equipment
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN108764208A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method and device, storage medium, electronic equipment
US20200019759A1 (en) * 2018-07-11 2020-01-16 Samsung Electronics Co., Ltd. Simultaneous recognition of facial attributes and identity in organizing photo albums
CN109635768A (en) * 2018-12-20 2019-04-16 深圳市捷顺科技实业股份有限公司 Parking stall condition detection method, system and relevant device in a kind of picture frame
CN110826391A (en) * 2019-09-10 2020-02-21 中国三峡建设管理有限公司 Bleeding area detection method, bleeding area detection system, computer device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAOWEI CAI ET AL.: "Cascade R-CNN: Delving into High Quality Object Detection" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686851A (en) * 2020-12-25 2021-04-20 合肥联宝信息技术有限公司 Image detection method, device and storage medium
CN112686851B (en) * 2020-12-25 2022-02-08 合肥联宝信息技术有限公司 Image detection method, device and storage medium
CN113808186A (en) * 2021-03-04 2021-12-17 京东鲲鹏(江苏)科技有限公司 Training data generation method and device and electronic equipment
CN113808186B (en) * 2021-03-04 2024-01-16 京东鲲鹏(江苏)科技有限公司 Training data generation method and device and electronic equipment
CN113012189A (en) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 Image recognition method and device, computer equipment and storage medium
CN113409231A (en) * 2021-06-10 2021-09-17 杭州易现先进科技有限公司 AR portrait photographing method and system based on deep learning

Also Published As

Publication number Publication date
CN111523390B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
CN111523390B (en) Image recognition method and augmented reality AR icon recognition system
US11030458B2 (en) Generating synthetic digital assets for a virtual scene including a model of a real-world object
EP3621029A1 (en) Car insurance image processing method, apparatus, server and system
CN109947967B (en) Image recognition method, image recognition device, storage medium and computer equipment
KR102220174B1 (en) Learning-data enhancement device for machine learning model and method for learning-data enhancement
CN112560999A (en) Target detection model training method and device, electronic equipment and storage medium
CN110737785B (en) Picture labeling method and device
CN111368600A (en) Method and device for detecting and identifying remote sensing image target, readable storage medium and equipment
Beyeler OpenCV with Python blueprints
CN111476271B (en) Icon identification method, device, system, computer equipment and storage medium
CN111368944B (en) Method and device for recognizing copied image and certificate photo and training model and electronic equipment
US10891740B2 (en) Moving object tracking apparatus, moving object tracking method, and computer program product
US20230237777A1 (en) Information processing apparatus, learning apparatus, image recognition apparatus, information processing method, learning method, image recognition method, and non-transitory-computer-readable storage medium
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN111124863B (en) Intelligent device performance testing method and device and intelligent device
CN114511589A (en) Human body tracking method and system
Gerhardt et al. Neural network-based traffic sign recognition in 360° images for semi-automatic road maintenance inventory
CN115546824B (en) Taboo picture identification method, apparatus and storage medium
CN111488776A (en) Object detection method, object detection device and electronic equipment
CN114821062A (en) Commodity identification method and device based on image segmentation
CN114998962A (en) Living body detection and model training method and device
Bekhit Computer Vision and Augmented Reality in iOS
CN114596624B (en) Human eye state detection method and device, electronic equipment and storage medium
Qian et al. Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design
Kulishova et al. Impact of the textbooks’ graphic design on the augmented reality applications tracking ability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant