CN110211240B - Registration-free identification augmented reality method - Google Patents

Registration-free identification augmented reality method Download PDF

Info

Publication number
CN110211240B
CN110211240B CN201910467466.5A CN201910467466A CN110211240B CN 110211240 B CN110211240 B CN 110211240B CN 201910467466 A CN201910467466 A CN 201910467466A CN 110211240 B CN110211240 B CN 110211240B
Authority
CN
China
Prior art keywords
information
data
augmented reality
model
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910467466.5A
Other languages
Chinese (zh)
Other versions
CN110211240A (en
Inventor
张元�
张乐
王智豪
焦世超
田杰
马珩钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of China filed Critical North University of China
Priority to CN201910467466.5A priority Critical patent/CN110211240B/en
Publication of CN110211240A publication Critical patent/CN110211240A/en
Application granted granted Critical
Publication of CN110211240B publication Critical patent/CN110211240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention belongs to the technical field of augmented reality, and discloses a registration-identifier-free augmented reality method, which adopts a C/S (client-server) framework and UDP (user Datagram protocol) protocol to transmit information, wherein the client provides the functions of man-machine interaction, information acquisition and dynamic loading of a virtual model, and the server identifies and classifies the received information through a convolutional neural network trained by transfer learning to provide the virtual model, thereby realizing the effect of augmented reality. The man-machine interaction comprises the steps that the virtual object model is uploaded to a server side and dynamically loaded by a client side, so that the loading and interaction of various models can be realized under the condition of not updating the client side while the memory required by the application of the client side is reduced; the problems that the traditional augmented reality is poor in adaptability to a new scene, high in dependency on identification and high in development requirement are solved. The method is suitable for augmented reality application requiring a large number of virtual models, especially in the field of engineering models.

Description

Registration-free identification augmented reality method
Technical Field
The invention belongs to the technical field of augmented reality, and particularly relates to a registration-identifier-free augmented reality method.
Background
The augmented reality technology is a new technology for seamlessly integrating real world information and virtual world information, and is characterized in that entity information (visual information, sound, taste, touch and the like) which is difficult to experience in a certain time space range of the real world originally is overlapped after simulation through scientific technologies such as computers and the like, virtual information is applied to the real world and is perceived by human senses, and therefore the sensory experience beyond reality is achieved. The real environment and the virtual object are superimposed on the same picture or space in real time and exist simultaneously.
The Augmented Reality method is specifically realized by performing three-dimensional registration in the real world, placing virtual information into a three-dimensional site, and finally displaying the virtual information by display equipment. In addition, objective factors such as an angle, a distance, and external light of an AR (Augmented Reality) device all affect the tracking recognition of the identification information and the effect of model loading.
Disclosure of Invention
Aiming at the problems that identification information and virtual information in the traditional AR (Augmented Reality) technology need to be placed in advance, and the identification and tracking of the identification information are interfered by external factors, so that the user experience is influenced, the Augmented Reality method free of the registration identification is provided. The method is suitable for dynamically loading virtual information of target information and is used in augmented reality application.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a registration-identifier-free augmented reality method adopts a C/S (Client-Server) Client-Server architecture and UDP (user Datagram protocol) protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved.
The virtual object model is uploaded to the server and dynamically loaded by the client, so that the loading and interaction of various models can be realized without updating the client while the memory required by the application of the client is reduced;
the convolutional neural network of the transfer learning training has high recognition and classification precision and high speed, and the recognition object is not limited to a specific identifier any more due to the high generalization capability of the convolutional neural network;
in the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.
Further, the human-computer interaction, information acquisition and dynamic loading functions of the virtual model of the client comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a spatial mapping function and a dynamic loading function.
Still further, the information acquisition function, the gaze recognition function, the gesture recognition function, the voice recognition function, the spatial mapping function and the dynamic loading function of the client are realized by adopting the following steps:
c1, information acquisition function of the client; photographing a target by using holographic glasses HoloLens, and storing a photographing result in a Jpg format; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying;
c2, a gaze identification function of the client; the gaze identification is based on an eye tracking technology and is used for tracking and selecting a holographic object, and after the gaze identification collides with the holographic object by means of Physical Raycast Physical rays of a Unity3D engine according to the position and the direction of the head of a user, feedback of a collision result is obtained, wherein the feedback comprises the position of a collision point and information of the collision object, so that the tracking and the selection of the holographic object in a scene are realized; the client application may enable selection and movement of virtual objects through gaze recognition functionality.
C3, a gesture recognition function of the client side; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene;
c4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors;
c5, developing a space mapping function of the client; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the environmental data around the user and built-in triangulation by using a depth camera and an environmental perception camera which are equipped with the holographic glasses HoloLens so as to realize the modeling and digitization of the real world and obtain the digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more; the method of space mapping is used for identification tracking, and the actual position limitation of identification can be eliminated, so that the virtual information can be more accurately and reasonably combined with the real world.
C6, a dynamic loading function of the client model; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server.
Further, the input gesture in step 3 includes three types, namely Air-tap, navigation capture and Bloom.
Furthermore, the virtual model stored in the server in the step C6 is a compressed package that is obtained by packing the virtual model and the script into AssetBundle in advance through the Unity3D engine and uploading the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized. The types of objects in daily life are thousands of objects, all the objects are difficult to be placed in a client application program in advance, and compared with a high-performance computer, the holographic glasses are very limited in rendering capacity, memory and performance, so that the head-mounted augmented reality glasses cannot carry a large number of models for loading, and therefore the method for dynamically loading the models from the server is adopted.
Furthermore, the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; the good sample data set is the basis of information classification identification, sample images are obtained through an internet channel, the sample images are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the number proportion to expand the sample data set, and the sample data set is finally manufactured into a data set for information identification after expansion;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 70-80% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 40-100;
s3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
Figure BDA0002079865140000041
In the formula (1), exact quantity represents the correct number of the classification results of the test data, and TotalQuantity represents the total number of the test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is represented;
the loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)
Figure BDA0002079865140000051
Wherein,
Figure BDA0002079865140000052
1{y i = j } is an indicative function, which has a value of 1 when the "{ }" internal value is true, otherwise it is 0; the closer the loss value is to 0, the better the training result of the network model is represented;
overfitting ratio is shown in equation (3)
Figure BDA0002079865140000053
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
Figure BDA0002079865140000054
In the formula, the TrainExactQuantity is the correct number of the classification results of the training data, and the TrainTotalQuantity is the total number of the training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.
Further, the model training process of step S2 is:
s2.1 pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;
step S2.2, because the last three layers of the AlexNet network model are configured to 1000 classes, the last three fully-connected layers are retrained to adapt to the new classes, and the parameters of the new fully-connected layers are retained through the step to adapt to the class of the data set established in step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
Furthermore, the information transmission is to process the information of the sending end, transmit the information by using a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.
Further, the information processing of the transmitting end includes the steps of:
step F1, coding the sent information, coding the type content according to the information type of the information, and inserting the coding result into the file type of the header;
step F2, counting the result length of the coded transmitted information, coding the content of the counted result, and inserting the coded result into the file length of the header;
step F3, equally dividing the coded result of the transmitted information into a plurality of groups, coding the content of the total number of the groups, and inserting the coded result into the data group number of the header;
step F4, numbering the divided data groups in sequence, coding the numbering content, and inserting the coding result into the data number of the header;
and F5, repeating the steps F1 to F4, and sequentially preprocessing and sending the segmented information to ensure that a large amount of information cannot be sent at the same time.
Further, the information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/packet total number) in the header, thereby ensuring the sequence of the file content;
step R4, verifying the received content, if empty information exists in the container, indicating that data packet is lost, feeding back through the IP and the source port number recorded in the step R1, and applying for the sending end to resend information corresponding to the number through the file type, the file length and the number position of the empty information of the container;
and R5, repeating the steps R1 to R4, correspondingly decoding and rewriting the information in the container according to the file type, and restoring the file.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention adopts an augmented reality method without registration identification, uploads a virtual object model to a server and is dynamically loaded by a client, reduces the memory required by the application of the client, and can realize the loading and interaction of various models without updating the client; the problems that the traditional augmented reality is poor in adaptability to a new scene, high in dependency on identification and high in development requirement are solved.
2. The trained convolutional neural network has high recognition and classification accuracy and high speed, and the good generalization capability of the convolutional neural network ensures that a recognition object is not limited to a specific identifier any more.
3. The convolutional neural network is trained by using a transfer learning method, so that the requirement on the number of sample data sets is greatly reduced, the training time is reduced, a network model can be conveniently and rapidly adapted to an application scene, and classification services are provided.
4. The method of space mapping is adopted to replace the identification tracking in the traditional augmented reality method, the actual position limitation of the identification can be eliminated, and the virtual information can be more accurately and reasonably combined with the real world.
5. In the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a diagram of a client architecture;
FIG. 3 is a UDP protocol header information diagram;
FIG. 4 is a server side architecture diagram;
FIG. 5 is a sample data set of example 1;
FIG. 6 is a sample expansion method of embodiment 1;
FIG. 7 is the results of example 1 model training;
FIG. 8 is a photograph of a tank of type 99 taken by embodiment 1;
FIG. 9 is information transmission of embodiment 1;
FIG. 10 is information identification classification of embodiment 1;
fig. 11 is dynamic loading of virtual information of embodiment 1.
Detailed Description
The technical solutions of the present invention will be further described clearly and completely with reference to the accompanying drawings and specific embodiments, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, in the embodiment, an augmented reality method without a registration identifier is implemented by using a client-server architecture and using a UDP protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved. The client-side human-computer interaction, information acquisition and virtual model dynamic loading functions comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function.
The following describes the function implementation of the client, the server, and the information transmission in detail.
A client:
architecture diagram of the client as shown in fig. 2, the function of the client is realized by the following steps:
c1, information acquisition function of the client; photographing a target by using holographic glasses HoloLens, and storing a photographing result in a Jpg format; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying; so as to conveniently and directly see the photographing result on the UI and re-photograph the photos with poor effect.
C2, a staring identification function of the client; the gaze identification is based on an eye tracking technology and is used for tracking and selecting a holographic object, and the feedback of a collision result including the position of a collision point and the information of the collision object is obtained after the head of a user collides with the holographic object by means of Physical Raycast Physical rays of a Unity3D engine according to the position and the direction of the head of the user, so that the tracking and the selection of the holographic object in a scene are realized; the client application effecting selection and movement of the virtual object using gaze recognition functionality;
c3, a gesture recognition function of the client side; the gesture recognition is to recognize and track the position and state of the hand of the user and capture input gestures at the same time, and the system automatically triggers corresponding feedback so as to manipulate virtual objects in the scene; the input gestures include three types, namely Air-tap, navigation capture and Bloom.
C4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program through a voice method, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors; in this embodiment, specific operation instructions and response behaviors of the voice recognition and the gesture recognition are shown in table 1.
TABLE 1 specific operating instructions and response behaviors for speech recognition and gesture recognition
Figure BDA0002079865140000091
C5, developing a space mapping function of the client; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the surrounding environment data of the user and built-in triangulation by using a depth camera and an environment perception camera which are equipped by holographic glasses HoloLens to realize modeling and digitalization of the real world and obtain digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing the virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not limited by the position of the identification information in the real world;
c6, a dynamic loading function of the client model; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens load a virtual model by accessing a server. The dynamic loading of the model is realized by loading the virtual model by the HoloLens head-mounted augmented reality glasses through accessing a server. In daily life, the types of objects are thousands of, all objects are difficult to be placed in advance in an augmented reality application program, and the glolens has very limited rendering capability, memory and performance compared with a high-performance computer, so that the glolens cannot carry a large number of models for loading. Aiming at the problem, the invention adopts a method for dynamically loading the model from the server, the virtual model and the script are packaged into an AssetBundle compression packet through a Unity3D engine and are uploaded to the server, and the HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, thereby realizing the dynamic loading of the model.
Information transmission:
the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists. The UDP protocol header information is shown in fig. 3.
The information processing of the sending end comprises the following steps:
step F1, coding the sent information, coding the type content according to the information type of the sent information, and inserting the coding result into the file type of the header;
f2, counting the result length of the coded transmitted information, coding the content of the counted result, and inserting the coded result into the file length of the header;
step F3, the result after the transmitted information coding is equally divided into a plurality of groups, the content of the total number of the groups is coded, and the coding result is inserted into the data group number of the header;
step F4, numbering the divided data groups in sequence, coding the numbering content, and inserting the coding result into the data number of the header;
and F5, repeating the steps F1 to F4, and preprocessing and sending the divided information in sequence to ensure that a large amount of information cannot be sent simultaneously.
The information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/total number of packets) in the header, thereby ensuring the sequence of the file content;
step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back through the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number through the file type, the file length and the numbering position of the empty information of the container;
and R5, repeating the steps from R1 to R4, correspondingly decoding and rewriting the information in the container according to the file type, and restoring the file.
The server side:
the architecture diagram of the server is shown in fig. 4. The server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the method comprises the steps of identifying and classifying tanks, armored vehicles, fighter planes and the like according to the existing armor models, building a server by using Apache, and uploading the existing virtual models to the server. The convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; a good sample data set is the basis of information classification and identification, 15 types of tank and armored car image samples are collected through an internet way, after arrangement and labeling, 1444 image samples are obtained in total, and the type and the quantity distribution of the samples are shown in figure 5. In order to reduce and avoid the overfitting phenomenon during model training and reduce the influence of the recognition effect caused by uneven quantity distribution of various types of sample data, sample images acquired through an internet channel are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the quantity proportion to expand the sample data set, as shown in (a) - (e) of fig. 6. After expansion, the number of data sets reaches 9012, and finally the data sets are made into data sets for tank armor identification;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 75% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 48 times; the training process is as follows:
step S2.1, pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;
step S2.2, because the last three layers of the AlexNet network model are configured into 1000 classifications, retraining the last three layers of full connection layers to adapt to new classifications, and reserving the parameters of the new full connection layers through the step to adapt to the classification classes of the tank armor recognized data set established in the step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
S3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
Figure BDA0002079865140000131
In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is.
The loss value is obtained by a cross entropy loss function of Softmax, as shown in formula (2)
Figure BDA0002079865140000132
Wherein,
Figure BDA0002079865140000133
1{y i = j } is an indicative function, which has a value of 1 when the internal value of "{ }" is true, otherwise it is 0; the closer the loss value is to 0, the better the training result of the network model is.
Overfitting ratio is shown in equation (3)
Figure BDA0002079865140000134
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
Figure BDA0002079865140000135
In the formula, trainExactQuantity is the number of correct training data classification results, and TrainTotalQuantity is the total number of training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented. The training results of the network model through the transfer learning are shown in fig. 7. The average value of the accuracy of the final test is 97.51%, and the overfitting ratio of the model is basically stable at about 1.03, which shows that the network model trained by the method has good generalization capability.
And using the system to carry out actual operation, connecting the HoloLens with Wifi, realizing information transmission with the server according to the IP address and the port number, and starting to test functions of all parts of the system. The results of the photographs of the HoloLens are shown in fig. 8, and fig. 9 shows the receiving effect of the information of the server. As can be seen from fig. 8 and 9, by performing information verification at the application layer, the picture packet loss phenomenon is substantially solved, the server information classification result is shown in fig. 10, and fig. 11 is a dynamic loading graph of virtual information. As can be seen from FIG. 10, the convolutional neural network of the transfer learning training of the present invention has high recognition and classification accuracy and high speed, and its high generalization capability enables the recognition object to be no longer limited to a specific identifier, and FIG. 11 loads a 99-type tank model and moves the tank model to a rack by a space mapping method (a dotted line is a tripod in FIG. 9). As can be seen from fig. 11, the method of using spatial mapping for identification tracking can get rid of the actual location limitation of identification. By the aid of the C/S framework, the virtual object model is uploaded to the server and dynamically loaded by the client, so that loading and interaction of multiple models can be realized without updating the client while memory required by application of the client is reduced.
Example 2:
embodiment 2 differs from embodiment 1 only in step S2.
In the embodiment 2, 70% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as testing data sets, the iteration times are 40 times, and model training is carried out; the results were the same as in example 1.
Example 3:
example 3 differs from example 1 only in step S2.
In the embodiment 3, 80% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as test data sets, the iteration times are 100 times, and model training is carried out; the results were the same as in example 1.
The method is suitable for augmented reality application requiring a large number of virtual models, especially in the field of engineering models.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (8)

1. A registration-free identification augmented reality method is characterized in that: the augmented reality method adopts a client-server architecture and adopts a UDP protocol for information transmission; the client provides the functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server identifies and classifies the received information through a convolutional neural network of transfer learning training to provide the virtual model, so that the effect of augmented reality is realized;
the human-computer interaction, information acquisition and dynamic loading functions of the virtual model of the client comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function, and are realized by adopting the following steps:
c1, an information acquisition function of the client; shooting a target by utilizing holographic glasses HoloLens, and storing a shooting result in a Jpg format; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying;
c2, a gaze identification function of the client; the gaze identification is based on an eye tracking technology and is used for tracking and selecting a holographic object, and the feedback of a collision result including the position of a collision point and the information of the collision object is obtained after the head of a user collides with the holographic object by means of Physical Raycast Physical rays of a Unity3D engine according to the position and the direction of the head of the user, so that the tracking and the selection of the holographic object in a scene are realized;
c3, a gesture recognition function of the client side; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene;
c4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors;
c5, developing a space mapping function of the client; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;
step C5.1, scanning the environmental data around the user and built-in triangulation by using a depth camera and an environmental perception camera which are equipped with the holographic glasses HoloLens so as to realize the modeling and digitization of the real world and obtain the digital physical space information of the real world;
c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more;
c6, a dynamic loading function of the client model; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server.
2. The registration-free identity augmented reality method according to claim 1, wherein: the input gesture in the step C3 comprises three types of Air-tap, navigation capture and Bloom.
3. The augmented reality method of claim 1, wherein the augmented reality method comprises: the virtual model stored in the server in the step C6 is a compressed package which is obtained by packing the virtual model and the script into an AssetBundle in advance through a Unity3D engine and uploading the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized.
4. The augmented reality method of claim 1, wherein the augmented reality method comprises: the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:
s1, establishing a sample data set; the good sample data set is the basis of information classification and identification, sample images are obtained through an internet channel, the sample images are rotated by 90 degrees, rotated by 180 degrees, subjected to horizontal mirror image and vertical mirror image operation to expand the sample data set according to the number proportion, and finally the sample data set for information identification is manufactured after expansion;
s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 70-80% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 40-100;
s3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)
Figure FDA0003770896010000031
In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is represented;
the loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)
Figure FDA0003770896010000032
Wherein,
Figure FDA0003770896010000033
1{y i = j } is an indicative function, which has a value of 1 when the "{ }" internal value is true, otherwise it is 0; the closer the loss value is to 0, the better the training result of the network model is represented;
overfitting ratio is shown in equation (3)
Figure FDA0003770896010000034
Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4
Figure FDA0003770896010000035
Wherein, trainExactQuantity is the correct number of the classification results of the training data, and TrainTotalQuantity is the total number of the training data; the closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.
5. The registration-free identity augmented reality method of claim 4, wherein: the model training process of the step S2 comprises the following steps:
step S2.1, pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;
step S2.2 because the last three layers of the AlexNet network model are configured to 1000 classes, retraining the last three fully connected layers to adapt to the new classes, and retaining the parameters of the new fully connected layers through this step to adapt to the class of the data set established in step S1;
and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.
6. The augmented reality method of claim 1, wherein the augmented reality method comprises: the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.
7. The augmented reality method of claim 6, wherein the augmented reality method comprises: the information processing of the sending end comprises the following steps:
step F1, coding the sent information, coding the type content according to the information type of the sent information, and inserting the coding result into the file type of the header;
step F2, counting the result length of the coded transmitted information, coding the content of the counted result, and inserting the coded result into the file length of the header;
step F3, equally dividing the coded result of the transmitted information into a plurality of groups, coding the content of the total number of the groups, and inserting the coded result into the data group number of the header;
step F4, numbering the divided data groups in sequence, coding the numbering content, and inserting the coding result into the data number of the header;
and F5, repeating the steps F1 to F4, and sequentially preprocessing and sending the segmented information to ensure that a large amount of information cannot be sent at the same time.
8. The registration-free identity augmented reality method of claim 6, wherein: the information processing of the receiving end comprises the following steps:
r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;
step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the file lengths represent a plurality of files of the same type;
step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length in the header, thereby ensuring the sequence of the file content;
step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back through the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number through the file type, the file length and the numbering position of the empty information of the container;
and R5, repeating the steps R1 to R4, correspondingly decoding and rewriting the information in the container according to the file type, and restoring the file.
CN201910467466.5A 2019-05-31 2019-05-31 Registration-free identification augmented reality method Active CN110211240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910467466.5A CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910467466.5A CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Publications (2)

Publication Number Publication Date
CN110211240A CN110211240A (en) 2019-09-06
CN110211240B true CN110211240B (en) 2022-10-21

Family

ID=67789867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910467466.5A Active CN110211240B (en) 2019-05-31 2019-05-31 Registration-free identification augmented reality method

Country Status (1)

Country Link
CN (1) CN110211240B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782525B (en) * 2019-11-08 2023-04-25 腾讯科技(深圳)有限公司 Method, device and medium for identifying virtual object in virtual environment
CN112486322A (en) * 2020-12-07 2021-03-12 济南浪潮高新科技投资发展有限公司 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
CN114205429A (en) * 2021-12-14 2022-03-18 深圳壹账通智能科技有限公司 Voice packet processing method, system, equipment and storage medium based on UDP protocol
CN117689508A (en) * 2023-12-19 2024-03-12 杭州露电数字科技集团有限公司 Intelligent teaching aid method and system based on MR equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106846237A (en) * 2017-02-28 2017-06-13 山西辰涵影视文化传媒有限公司 A kind of enhancing implementation method based on Unity3D
CN107015642A (en) * 2017-03-13 2017-08-04 武汉秀宝软件有限公司 A kind of method of data synchronization and system based on augmented reality

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739872A (en) * 2012-07-13 2012-10-17 苏州梦想人软件科技有限公司 Mobile terminal, and augmented reality method used for mobile terminal
WO2018013495A1 (en) * 2016-07-11 2018-01-18 Gravity Jack, Inc. Augmented reality methods and devices
CN106373198A (en) * 2016-09-18 2017-02-01 福州大学 Method for realizing augmented reality
CN107451661A (en) * 2017-06-29 2017-12-08 西安电子科技大学 A kind of neutral net transfer learning method based on virtual image data collection
CN107331220A (en) * 2017-09-01 2017-11-07 国网辽宁省电力有限公司锦州供电公司 Transformer O&M simulation training system and method based on augmented reality
CN108416846A (en) * 2018-03-16 2018-08-17 北京邮电大学 It is a kind of without the three-dimensional registration algorithm of mark

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106846237A (en) * 2017-02-28 2017-06-13 山西辰涵影视文化传媒有限公司 A kind of enhancing implementation method based on Unity3D
CN107015642A (en) * 2017-03-13 2017-08-04 武汉秀宝软件有限公司 A kind of method of data synchronization and system based on augmented reality

Also Published As

Publication number Publication date
CN110211240A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110211240B (en) Registration-free identification augmented reality method
US11288857B2 (en) Neural rerendering from 3D models
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN108319901B (en) Biopsy method, device, computer equipment and the readable medium of face
CN109508638A (en) Face Emotion identification method, apparatus, computer equipment and storage medium
KR20190021187A (en) Vehicle license plate classification methods, systems, electronic devices and media based on deep running
US11734570B1 (en) Training a network to inhibit performance of a secondary task
US11853895B2 (en) Mirror loss neural networks
CN111429338B (en) Method, apparatus, device and computer readable storage medium for processing video
CN110728319B (en) Image generation method and device and computer storage medium
CN108388889B (en) Method and device for analyzing face image
CN111508033A (en) Camera parameter determination method, image processing method, storage medium, and electronic apparatus
CN110874575A (en) Face image processing method and related equipment
US10198842B2 (en) Method of generating a synthetic image
WO2023217138A1 (en) Parameter configuration method and apparatus, device, storage medium and product
CN110619602B (en) Image generation method and device, electronic equipment and storage medium
WO2023200499A1 (en) Concurrent human pose estimates for virtual representation
CN110490950B (en) Image sample generation method and device, computer equipment and storage medium
CN113807150A (en) Data processing method, attitude prediction method, data processing device, attitude prediction device, and storage medium
CN111461091A (en) Universal fingerprint generation method and device, storage medium and electronic device
CN117746192B (en) Electronic equipment and data processing method thereof
US12002227B1 (en) Deep partial point cloud registration of objects
CN116030040B (en) Data processing method, device, equipment and medium
CN116645468B (en) Human body three-dimensional modeling method, method and device for training human body structure to generate model
CN114612510B (en) Image processing method, apparatus, device, storage medium, and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant