CN110211240A - A kind of augmented reality method for exempting from sign-on ID - Google Patents

A kind of augmented reality method for exempting from sign-on ID Download PDF

Info

Publication number
CN110211240A
CN110211240A CN201910467466.5A CN201910467466A CN110211240A CN 110211240 A CN110211240 A CN 110211240A CN 201910467466 A CN201910467466 A CN 201910467466A CN 110211240 A CN110211240 A CN 110211240A
Authority
CN
China
Prior art keywords
information
augmented reality
data
model
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910467466.5A
Other languages
Chinese (zh)
Other versions
CN110211240B (en
Inventor
张元�
张乐
王智豪
焦世超
田杰
马珩钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of China filed Critical North University of China
Priority to CN201910467466.5A priority Critical patent/CN110211240B/en
Publication of CN110211240A publication Critical patent/CN110211240A/en
Application granted granted Critical
Publication of CN110211240B publication Critical patent/CN110211240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Transfer Between Computers (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention belongs to augmented reality fields, disclose a kind of augmented reality method for exempting from sign-on ID, this method is with C/S (client-server end) framework, information transmission is carried out using udp protocol, client provides the dynamic loading function of human-computer interaction, information collection and dummy model, the information received is carried out identification classification by the convolutional neural networks of transfer learning training by server-side, dummy model is provided, to realize the effect of augmented reality.Human-computer interaction includes the present invention using by dummy object model upload service end and by client dynamically load, and while the memory needed for reducing client application, the load and interaction of a variety of models can also be realized in the case where not updating client;Solve the problems, such as that traditional augmented reality is demanding to new scene bad adaptability, and exploitation high to mark dependence.This method is applicable in the augmented reality application of a large amount of dummy models, especially in engineering model field.

Description

Registration-free identification augmented reality method
Technical Field
The invention belongs to the technical field of augmented reality, and particularly relates to a registration-identifier-free augmented reality method.
Background
The augmented reality technology is a new technology for seamlessly integrating real world information and virtual world information, and is characterized in that entity information (visual information, sound, taste, touch and the like) which is difficult to experience in a certain time space range of the real world originally is overlapped after simulation through scientific technologies such as computers and the like, virtual information is applied to the real world and is perceived by human senses, and therefore the sensory experience beyond reality is achieved. The real environment and the virtual object are superimposed on the same picture or space in real time and exist simultaneously.
The Augmented Reality method is specifically realized by performing three-dimensional registration in the real world, placing virtual information into a three-dimensional site, and finally displaying the virtual information by display equipment. In addition, objective factors such as an angle, a distance, and external light of an AR (augmented reality) device all affect the tracking recognition of the identification information and the effect of model loading.
Disclosure of Invention
Aiming at the problems that identification information and virtual information in the traditional AR (Augmented Reality) technology need to be placed in advance, and the identification and tracking of the identification information are interfered by external factors, so that the user experience is influenced, the Augmented Reality method free of the registration identification is provided. The method is suitable for dynamically loading virtual information of target information and is used in augmented reality application.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a registration-free identification augmented reality method adopts a C/S (Client-Server) Client-Server architecture and adopts a UDP (user Datagram protocol) protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved.
The virtual object model is uploaded to the server and dynamically loaded by the client, so that the loading and interaction of various models can be realized without updating the client while the memory required by the application of the client is reduced;
the convolutional neural network of the transfer learning training has high recognition and classification precision and high speed, and the recognition object is not limited to a specific identifier any more due to the high generalization capability of the convolutional neural network;
in the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.
Further, the human-computer interaction, information acquisition and dynamic loading functions of the virtual model of the client comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a spatial mapping function and a dynamic loading function.
Still further, the information acquisition function, the gaze recognition function, the gesture recognition function, the voice recognition function, the spatial mapping function and the dynamic loading function of the client are realized by adopting the following steps:
c1, information acquisition function of the client; photographing the target by using holographic glasses HoloLens, and storing the photographing result in a format of Jpg; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying;
step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized; the client application may enable selection and movement of virtual objects through gaze recognition functionality.
A gesture recognition function of the client in step C3.; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene;
c4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors;
the development of the space mapping function of the client side in the step C5.; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the surrounding environment data of the user and built-in triangulation by using a depth camera and an environment perception camera which are equipped by holographic glasses HoloLens to realize modeling and digitalization of the real world and obtain digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more; the method of space mapping is adopted for identification tracking, and the actual position limitation of identification can be eliminated, so that the virtual information can be more accurately and reasonably combined with the real world.
A dynamic loading function of the client model of step C6.; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server.
Further, the input gesture in step 3 includes three types, namely Air-tap, Navigation capture and Bloom.
Further, the virtual model stored in the server in the step C6 is a compressed package obtained by packing the virtual model and the script into AssetBundle in advance through the Unity3D engine and uploading the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized. The types of objects in daily life are thousands of objects, all the objects are difficult to be placed in a client application program in advance, and compared with a high-performance computer, the holographic glasses are very limited in rendering capacity, memory and performance, so that the head-mounted augmented reality glasses cannot carry a large number of models for loading, and therefore the method for dynamically loading the models from the server is adopted.
Furthermore, the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; the good sample data set is the basis of information classification identification, sample images are obtained through an internet channel, the sample images are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the number proportion to expand the sample data set, and the sample data set is finally manufactured into a data set for information identification after expansion;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 70-80% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 40-100;
s3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is represented;
the loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)
Wherein,1{yij is an indicative function whose value is 1 when the "{ }" internal value is true, and 0 otherwise; the closer the loss value is to 0, the better the training result of the network model is represented;
overfitting ratio is shown in equation (3)
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
In the formula, TrainExactQuantity is the number of correct training data classification results, and TrainTotalQuantity is the total number of training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.
Further, the model training process of step S2 is:
s2.1 pre-training the AlexNet network model on the data set of ImageNet, and initializing the parameters of the AlexNet network model through the step;
step S2.2 because the last three layers of the AlexNet network model are configured to 1000 classes, the last three fully-connected layers are retrained to adapt to the new classes, and the parameters of the new fully-connected layers are retained through this step to adapt to the class of the data set established in step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
Furthermore, the information transmission is to process the information of the sending end, transmit the information by using a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.
Further, the information processing of the transmitting end includes the steps of:
step F1, coding the sent information, coding the type content according to the information type of the information, and inserting the coding result into the file type of the header;
step F2., counting the result length of the transmitted information after being coded, coding the content of the result of the counting, and inserting the coded result into the file length of the header;
step F3. equally dividing the coded result of the transmitted information into multiple groups, coding the content of the total number of packets, and inserting the coded result into the data packet number of the header;
step F4. numbering the divided data groups in sequence, and coding the numbering content, the coded result is inserted into the data number of the header;
step F5. repeats steps F1 to F4, and the split information is pre-processed in sequence and then transmitted, ensuring that the information is not transmitted in large quantities at the same time.
Further, the information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/total number of packets) in the header, thereby ensuring the sequence of the file content;
step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number according to the file type, the file length and the numbering position of the empty container information;
the step R5. repeats the steps R1 to R4, decodes and rewrites the information in the container according to the file type correspondence, restores the file.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention adopts an augmented reality method without registration identification, uploads a virtual object model to a server and is dynamically loaded by a client, reduces the memory required by the application of the client, and can realize the loading and interaction of various models without updating the client; the problems that the traditional augmented reality is poor in adaptability to a new scene, high in dependency on identification and high in development requirement are solved.
2. The trained convolutional neural network has high recognition and classification precision and high speed, and the recognition object is not limited to a specific mark any more due to the good generalization capability of the trained convolutional neural network.
3. The convolutional neural network is trained by using a transfer learning method, so that the requirement on the number of sample data sets is greatly reduced, the training time is reduced, a network model can be conveniently and rapidly adapted to an application scene, and classification services are provided.
4. The method of space mapping is adopted to replace the identification tracking in the traditional augmented reality method, the actual position limitation of the identification can be eliminated, and the virtual information can be more accurately and reasonably combined with the real world.
5. In the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a diagram of a client architecture;
FIG. 3 is a UDP protocol header information diagram;
FIG. 4 is a server side architecture diagram;
FIG. 5 is a sample data set of example 1;
FIG. 6 is a sample expansion method of example 1;
FIG. 7 is the results of example 1 model training;
FIG. 8 is implementation 1 taking a picture of a tank of formula 99;
FIG. 9 is information transmission of embodiment 1;
FIG. 10 is information identification classification of embodiment 1;
fig. 11 is dynamic loading of virtual information of embodiment 1.
Detailed Description
The technical solutions of the present invention will be further described in detail and fully with reference to the accompanying drawings and specific embodiments, it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, in the embodiment, an augmented reality method without a registration identifier is implemented by using a client-server architecture and using a UDP protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved. The client-side human-computer interaction, information acquisition and virtual model dynamic loading functions comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function.
The following describes the function implementation of the client, the server, and the information transmission in detail.
A client:
architecture diagram of the client as shown in fig. 2, the function of the client is realized by the following steps:
c1, information acquisition function of the client; photographing the target by using holographic glasses HoloLens, and storing the photographing result in a format of Jpg; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying; so as to conveniently and directly see the photographing result on the UI and re-photograph the photos with poor effect.
Step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized; the client application effecting selection and movement of the virtual object using gaze recognition functionality;
a gesture recognition function of the client in step C3.; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene; the input gestures include three types, namely Air-tap, Navigation capture and Bloom.
C4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors; in this embodiment, specific operation instructions and response behaviors of the voice recognition and the gesture recognition are shown in table 1.
TABLE 1 specific operation commands and response behaviors for speech recognition and gesture recognition
The development of the space mapping function of the client side in the step C5.; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the surrounding environment data of the user and built-in triangulation by using a depth camera and an environment perception camera which are equipped by holographic glasses HoloLens to realize modeling and digitalization of the real world and obtain digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more;
a dynamic loading function of the client model of step C6.; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server. The dynamic loading of the model is realized by loading the virtual model by the HoloLens head-mounted augmented reality glasses through accessing a server. In daily life, the types of objects are thousands of, all objects are difficult to be placed in an augmented reality application program in advance, and the HoloLens has very limited rendering capability, memory and performance compared with a high-performance computer, so that the HoloLens cannot carry a large number of models for loading. Aiming at the problem, the invention adopts a method for dynamically loading the model from the server, the virtual model and the script are packaged into an AssetBundle compression packet through a Unity3D engine and are uploaded to the server, and the Hololens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, thereby realizing the dynamic loading of the model.
Information transmission:
the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists. The UDP protocol header information is shown in fig. 3.
The information processing of the sending end comprises the following steps:
step F1, coding the sent information, coding the type content according to the information type of the information, and inserting the coding result into the file type of the header;
step F2., counting the result length of the transmitted information after being coded, coding the content of the result of the counting, and inserting the coded result into the file length of the header;
step F3. equally dividing the coded result of the transmitted information into multiple groups, coding the content of the total number of packets, and inserting the coded result into the data packet number of the header;
step F4. numbering the divided data groups in sequence, and coding the numbering content, the coded result is inserted into the data number of the header;
step F5. repeats steps F1 to F4, and the split information is pre-processed in sequence and then transmitted, ensuring that the information is not transmitted in large quantities at the same time.
The information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/total number of packets) in the header, thereby ensuring the sequence of the file content;
step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number according to the file type, the file length and the numbering position of the empty container information;
the step R5. repeats the steps R1 to R4, decodes and rewrites the information in the container according to the file type correspondence, restores the file.
The server side:
the architecture diagram of the server is shown in fig. 4. The server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the method comprises the steps of identifying and classifying tanks, armored vehicles, fighter planes and the like according to the existing armor models, building a server by using Apache, and uploading the existing virtual models to the server. The convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; the invention collects 15 types of tank and armored car image samples through the Internet, obtains 1444 total image samples after arrangement and labeling, and the sample types and the quantity distribution are shown in figure 5. In order to reduce and avoid the overfitting phenomenon during model training and reduce the influence of the recognition effect caused by uneven quantity distribution of various types of sample data, sample images acquired through an internet channel are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the quantity proportion to expand the sample data set, as shown in (a) - (e) of fig. 6. After expansion, the number of data sets reaches 9012, and finally the data sets are made into data sets for tank armor identification;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 75% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 48 times; the training process is as follows:
step S2.1, pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;
step S2.2 because the last three layers of the AlexNet network model are configured into 1000 classes, retraining the last three fully-connected layers to adapt to the new classes, and reserving the parameters of the new fully-connected layers through the step to adapt to the class of the tank armor-recognized data set established in the step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
S3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is.
The loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)
Wherein,1{yij is an indicative function whose value is 1 when the "{ }" internal value is true, and 0 otherwise; the closer the loss value is to 0, the better the training result of the network model is represented.
Overfitting ratio is shown in equation (3)
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
In the formula, TrainExactQuantity is the number of correct training data classification results, and TrainTotalQuantity is the total number of training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented. The training results of the network model through the transfer learning training are shown in fig. 7. The average value of the accuracy of the final test is 97.51%, and the overfitting ratio of the model is basically stable at about 1.03, which shows that the network model trained by the method has good generalization capability.
And using the system to carry out actual operation, connecting the HoloLens with Wifi, realizing information transmission with the server according to the IP address and the port number, and starting to test functions of all parts of the system. The results of the photographs of the HoloLens are shown in fig. 8, and fig. 9 shows the receiving effect of the information of the server. As can be seen from fig. 8 and 9, by performing information verification at the application layer, the picture packet loss phenomenon is basically solved, the server information classification result is shown in fig. 10, and fig. 11 is a dynamic loading graph of virtual information. As can be seen from FIG. 10, the convolutional neural network of the transfer learning training of the present invention has high recognition and classification accuracy and high speed, and its high generalization capability enables the recognition object to be no longer limited to a specific identifier, FIG. 11 loads the 99-type tank model and moves to a rack by a space mapping method (the dotted line in FIG. 9 is a tripod). As can be seen from fig. 11, the method of using spatial mapping for identification tracking can get rid of the actual location limitation of identification. By the aid of the C/S framework, the virtual object model is uploaded to the server and dynamically loaded by the client, so that loading and interaction of multiple models can be realized without updating the client while memory required by application of the client is reduced.
Example 2:
embodiment 2 differs from embodiment 1 only in step S2.
In the embodiment 2, 70% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as testing data sets, the iteration times are 40 times, and model training is carried out; the results were the same as in example 1.
Example 3:
embodiment 3 differs from embodiment 1 only in step S2.
In the embodiment 3, 80% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as test data sets, the iteration times are 100 times, and model training is carried out; the results were the same as in example 1.
The method is suitable for augmented reality application requiring a large number of virtual models, especially in the field of engineering models.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (10)

1. A registration-free identification augmented reality method is characterized in that: the augmented reality method adopts a client-server architecture and adopts a UDP protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved.
2. The augmented reality method of claim 1, wherein the augmented reality method comprises: the client-side human-computer interaction, information acquisition and virtual model dynamic loading functions comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function.
3. The augmented reality method of claim 2, wherein the augmented reality method comprises: the information acquisition function, the gaze recognition function, the gesture recognition function, the voice recognition function, the spatial mapping function and the dynamic loading function of the client are realized by adopting the following steps:
c1, information acquisition function of the client; photographing the target by using holographic glasses HoloLens, and storing the photographing result in a format of Jpg; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying;
step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized;
a gesture recognition function of the client in step C3.; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene;
c4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors;
the development of the space mapping function of the client side in the step C5.; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the surrounding environment data of the user and built-in triangulation by using a depth camera and an environment perception camera which are equipped by holographic glasses HoloLens to realize modeling and digitalization of the real world and obtain digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more;
a dynamic loading function of the client model of step C6.; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server.
4. The augmented reality method of claim 3, wherein the augmented reality method comprises: the input gesture in the step C3 comprises three types, namely Air-tap, Navigation capture and Bloom.
5. The augmented reality method of claim 3, wherein the augmented reality method comprises: the virtual model stored in the server in the step C6 is a compressed package which packages the virtual model and the script into AssetBundle in advance through a Unity3D engine and uploads the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized.
6. The augmented reality method of claim 1, wherein the augmented reality method comprises: the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; the good sample data set is the basis of information classification identification, sample images are obtained through an internet channel, the sample images are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the number proportion to expand the sample data set, and the sample data set is finally manufactured into a data set for information identification after expansion;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 70-80% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 40-100;
s3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is represented;
the loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)
Wherein,1{yij is an indicative function whose value is 1 when the "{ }" internal value is true, and 0 otherwise; the closer the loss value is to 0, the better the training result of the network model is represented;
overfitting ratio is shown in equation (3)
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
Wherein, TrainExactQuantity is the correct number of the training data classification results, and TrainTotalQuantity is the total number of the training data; the closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.
7. The augmented reality method of claim 6, wherein the augmented reality method comprises: the model training process of step S2 is:
step S2.1, pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;
step S2.2 because the last three layers of the AlexNet network model are configured to 1000 classes, the last three fully-connected layers are retrained to adapt to the new classes, and the parameters of the new fully-connected layers are retained through this step to adapt to the class of the data set established in step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
8. The augmented reality method of claim 1, wherein the augmented reality method comprises: the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.
9. The augmented reality method of claim 8, wherein the augmented reality method comprises: the information processing of the sending end comprises the following steps:
step F1, coding the sent information, coding the type content according to the information type of the information, and inserting the coding result into the file type of the header;
step F2., counting the result length of the transmitted information after being coded, coding the content of the result of the counting, and inserting the coded result into the file length of the header;
step F3. equally dividing the coded result of the transmitted information into multiple groups, coding the content of the total number of packets, and inserting the coded result into the data packet number of the header;
step F4. numbering the divided data groups in sequence, and coding the numbering content, the coded result is inserted into the data number of the header;
step F5. repeats steps F1 to F4, and the split information is pre-processed in sequence and then transmitted, ensuring that the information is not transmitted in large quantities at the same time.
10. The augmented reality method of claim 8, wherein the augmented reality method comprises: the information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/total number of packets) in the header, thereby ensuring the sequence of the file content;
step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number according to the file type, the file length and the numbering position of the empty container information;
the step R5. repeats the steps R1 to R4, decodes and rewrites the information in the container according to the file type correspondence, restores the file.
CN201910467466.5A 2019-05-31 2019-05-31 Registration-free identification augmented reality method Active CN110211240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910467466.5A CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910467466.5A CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Publications (2)

Publication Number Publication Date
CN110211240A true CN110211240A (en) 2019-09-06
CN110211240B CN110211240B (en) 2022-10-21

Family

ID=67789867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910467466.5A Active CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Country Status (1)

Country Link
CN (1) CN110211240B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782525A (en) * 2019-11-08 2020-02-11 腾讯科技(深圳)有限公司 Method, apparatus and medium for identifying virtual object in virtual environment
CN112486322A (en) * 2020-12-07 2021-03-12 济南浪潮高新科技投资发展有限公司 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
CN114205429A (en) * 2021-12-14 2022-03-18 深圳壹账通智能科技有限公司 Voice packet processing method, system, equipment and storage medium based on UDP protocol
CN117689508A (en) * 2023-12-19 2024-03-12 杭州露电数字科技集团有限公司 Intelligent teaching aid method and system based on MR equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739872A (en) * 2012-07-13 2012-10-17 苏州梦想人软件科技有限公司 Mobile terminal, and augmented reality method used for mobile terminal
CN106373198A (en) * 2016-09-18 2017-02-01 福州大学 Method for realizing augmented reality
CN106846237A (en) * 2017-02-28 2017-06-13 山西辰涵影视文化传媒有限公司 A kind of enhancing implementation method based on Unity3D
CN107015642A (en) * 2017-03-13 2017-08-04 武汉秀宝软件有限公司 A kind of method of data synchronization and system based on augmented reality
CN107331220A (en) * 2017-09-01 2017-11-07 国网辽宁省电力有限公司锦州供电公司 Transformer O&M simulation training system and method based on augmented reality
CN107451661A (en) * 2017-06-29 2017-12-08 西安电子科技大学 A kind of neutral net transfer learning method based on virtual image data collection
US20180012411A1 (en) * 2016-07-11 2018-01-11 Gravity Jack, Inc. Augmented Reality Methods and Devices
CN108416846A (en) * 2018-03-16 2018-08-17 北京邮电大学 It is a kind of without the three-dimensional registration algorithm of mark

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739872A (en) * 2012-07-13 2012-10-17 苏州梦想人软件科技有限公司 Mobile terminal, and augmented reality method used for mobile terminal
US20180012411A1 (en) * 2016-07-11 2018-01-11 Gravity Jack, Inc. Augmented Reality Methods and Devices
CN106373198A (en) * 2016-09-18 2017-02-01 福州大学 Method for realizing augmented reality
CN106846237A (en) * 2017-02-28 2017-06-13 山西辰涵影视文化传媒有限公司 A kind of enhancing implementation method based on Unity3D
CN107015642A (en) * 2017-03-13 2017-08-04 武汉秀宝软件有限公司 A kind of method of data synchronization and system based on augmented reality
CN107451661A (en) * 2017-06-29 2017-12-08 西安电子科技大学 A kind of neutral net transfer learning method based on virtual image data collection
CN107331220A (en) * 2017-09-01 2017-11-07 国网辽宁省电力有限公司锦州供电公司 Transformer O&M simulation training system and method based on augmented reality
CN108416846A (en) * 2018-03-16 2018-08-17 北京邮电大学 It is a kind of without the three-dimensional registration algorithm of mark

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋亦凡: "基于移动端的无标识增强现实系统研究与开发", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
张乐 等: "一种免注册标识的增强现实方法", 《科学技术与工程》 *
张乐: "基于HoloLens的增强现实识别系统", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
戚纯: "基于混合现实技术的数字博物馆应用研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782525A (en) * 2019-11-08 2020-02-11 腾讯科技(深圳)有限公司 Method, apparatus and medium for identifying virtual object in virtual environment
CN112486322A (en) * 2020-12-07 2021-03-12 济南浪潮高新科技投资发展有限公司 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
CN114205429A (en) * 2021-12-14 2022-03-18 深圳壹账通智能科技有限公司 Voice packet processing method, system, equipment and storage medium based on UDP protocol
CN117689508A (en) * 2023-12-19 2024-03-12 杭州露电数字科技集团有限公司 Intelligent teaching aid method and system based on MR equipment

Also Published As

Publication number Publication date
CN110211240B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN110211240B (en) Registration-free identification augmented reality method
CN111127304B (en) Cross-domain image conversion
US20200320777A1 (en) Neural rerendering from 3d models
CN110866977B (en) Augmented reality processing method, device, system, storage medium and electronic equipment
CN111476708B (en) Model generation method, model acquisition method, device, equipment and storage medium
CN107148632A (en) Robust feature for the target identification based on image is recognized
CN111429338B (en) Method, apparatus, device and computer readable storage medium for processing video
CN104637035A (en) Method, device and system for generating cartoon face picture
CN110728319B (en) Image generation method and device and computer storage medium
CN110852940A (en) Image processing method and related equipment
CN111062362B (en) Face living body detection model, method, device, equipment and storage medium
CN111508033A (en) Camera parameter determination method, image processing method, storage medium, and electronic apparatus
CN110874575A (en) Face image processing method and related equipment
CN112752119B (en) Delay error correction method, terminal equipment, server and storage medium
US10198842B2 (en) Method of generating a synthetic image
KR20230014607A (en) Method and apparatus for generating mega size augmented reality image information
EP4264557A1 (en) Pixel-aligned volumetric avatars
US11443477B2 (en) Methods and systems for generating a volumetric two-dimensional representation of a three-dimensional object
CN111179408A (en) Method and apparatus for three-dimensional modeling
US20240046583A1 (en) Real-time photorealistic view rendering on augmented reality (ar) device
CN115984949B (en) Low-quality face image recognition method and equipment with attention mechanism
CN116030040B (en) Data processing method, device, equipment and medium
WO2024059374A1 (en) User authentication based on three-dimensional face modeling using partial face images
CN103927341B (en) A kind of method and device for obtaining scene information
WO2023200499A1 (en) Concurrent human pose estimates for virtual representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant