CN110211240A

CN110211240A - A kind of augmented reality method for exempting from sign-on ID

Info

Publication number: CN110211240A
Application number: CN201910467466.5A
Authority: CN
Inventors: 张元�; 张乐; 王智豪; 焦世超; 田杰; 马珩钧
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-09-06
Anticipated expiration: 2039-05-31
Also published as: CN110211240B

Abstract

The invention belongs to augmented reality fields, disclose a kind of augmented reality method for exempting from sign-on ID, this method is with C/S (client-server end) framework, information transmission is carried out using udp protocol, client provides the dynamic loading function of human-computer interaction, information collection and dummy model, the information received is carried out identification classification by the convolutional neural networks of transfer learning training by server-side, dummy model is provided, to realize the effect of augmented reality.Human-computer interaction includes the present invention using by dummy object model upload service end and by client dynamically load, and while the memory needed for reducing client application, the load and interaction of a variety of models can also be realized in the case where not updating client；Solve the problems, such as that traditional augmented reality is demanding to new scene bad adaptability, and exploitation high to mark dependence.This method is applicable in the augmented reality application of a large amount of dummy models, especially in engineering model field.

Description

Registration-free identification augmented reality method

Technical Field

The invention belongs to the technical field of augmented reality, and particularly relates to a registration-identifier-free augmented reality method.

Background

The augmented reality technology is a new technology for seamlessly integrating real world information and virtual world information, and is characterized in that entity information (visual information, sound, taste, touch and the like) which is difficult to experience in a certain time space range of the real world originally is overlapped after simulation through scientific technologies such as computers and the like, virtual information is applied to the real world and is perceived by human senses, and therefore the sensory experience beyond reality is achieved. The real environment and the virtual object are superimposed on the same picture or space in real time and exist simultaneously.

The Augmented Reality method is specifically realized by performing three-dimensional registration in the real world, placing virtual information into a three-dimensional site, and finally displaying the virtual information by display equipment. In addition, objective factors such as an angle, a distance, and external light of an AR (augmented reality) device all affect the tracking recognition of the identification information and the effect of model loading.

Disclosure of Invention

Aiming at the problems that identification information and virtual information in the traditional AR (Augmented Reality) technology need to be placed in advance, and the identification and tracking of the identification information are interfered by external factors, so that the user experience is influenced, the Augmented Reality method free of the registration identification is provided. The method is suitable for dynamically loading virtual information of target information and is used in augmented reality application.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a registration-free identification augmented reality method adopts a C/S (Client-Server) Client-Server architecture and adopts a UDP (user Datagram protocol) protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved.

The virtual object model is uploaded to the server and dynamically loaded by the client, so that the loading and interaction of various models can be realized without updating the client while the memory required by the application of the client is reduced;

the convolutional neural network of the transfer learning training has high recognition and classification precision and high speed, and the recognition object is not limited to a specific identifier any more due to the high generalization capability of the convolutional neural network;

in the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.

Further, the human-computer interaction, information acquisition and dynamic loading functions of the virtual model of the client comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a spatial mapping function and a dynamic loading function.

Still further, the information acquisition function, the gaze recognition function, the gesture recognition function, the voice recognition function, the spatial mapping function and the dynamic loading function of the client are realized by adopting the following steps:

c1, information acquisition function of the client; photographing the target by using holographic glasses HoloLens, and storing the photographing result in a format of Jpg; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying;

step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized; the client application may enable selection and movement of virtual objects through gaze recognition functionality.

A gesture recognition function of the client in step C3.; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene;

c4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors;

the development of the space mapping function of the client side in the step C5.; the space mapping is realized by superposing a virtual world and a real world and adopting the following method;

step C5.1, scanning the surrounding environment data of the user and built-in triangulation by using a depth camera and an environment perception camera which are equipped by holographic glasses HoloLens to realize modeling and digitalization of the real world and obtain digital physical space information of the real world;

c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more; the method of space mapping is adopted for identification tracking, and the actual position limitation of identification can be eliminated, so that the virtual information can be more accurately and reasonably combined with the real world.

A dynamic loading function of the client model of step C6.; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server.

Further, the input gesture in step 3 includes three types, namely Air-tap, Navigation capture and Bloom.

Further, the virtual model stored in the server in the step C6 is a compressed package obtained by packing the virtual model and the script into AssetBundle in advance through the Unity3D engine and uploading the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized. The types of objects in daily life are thousands of objects, all the objects are difficult to be placed in a client application program in advance, and compared with a high-performance computer, the holographic glasses are very limited in rendering capacity, memory and performance, so that the head-mounted augmented reality glasses cannot carry a large number of models for loading, and therefore the method for dynamically loading the models from the server is adopted.

Furthermore, the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:

s1, establishing a sample data set; the good sample data set is the basis of information classification identification, sample images are obtained through an internet channel, the sample images are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the number proportion to expand the sample data set, and the sample data set is finally manufactured into a data set for information identification after expansion;

s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 70-80% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 40-100;

s3, judging the effect of model training through the loss value, the overfitting ratio and the accuracy of test data classification; wherein, the accuracy rate of the classification of the test data is shown as the formula (1)

In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is represented;

the loss value is obtained by a cross entropy loss function of Softmax, as shown in a formula (2)

Wherein,1{y_ij is an indicative function whose value is 1 when the "{ }" internal value is true, and 0 otherwise; the closer the loss value is to 0, the better the training result of the network model is represented;

overfitting ratio is shown in equation (3)

Wherein TrainAcc represents the accuracy of training data, and is shown in formula 4

In the formula, TrainExactQuantity is the number of correct training data classification results, and TrainTotalQuantity is the total number of training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.

Further, the model training process of step S2 is:

s2.1 pre-training the AlexNet network model on the data set of ImageNet, and initializing the parameters of the AlexNet network model through the step;

step S2.2 because the last three layers of the AlexNet network model are configured to 1000 classes, the last three fully-connected layers are retrained to adapt to the new classes, and the parameters of the new fully-connected layers are retained through this step to adapt to the class of the data set established in step S1;

and step S2.3, combining the first five convolutional layers and the corresponding pooling layers, the activation functions and the model parameters in the step S2.1 with the fully-connected layers and the parameters in the step S2.2, and performing fine tuning to finish the training of the model.

Furthermore, the information transmission is to process the information of the sending end, transmit the information by using a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.

Further, the information processing of the transmitting end includes the steps of:

step F1, coding the sent information, coding the type content according to the information type of the information, and inserting the coding result into the file type of the header;

step F2., counting the result length of the transmitted information after being coded, coding the content of the result of the counting, and inserting the coded result into the file length of the header;

step F3. equally dividing the coded result of the transmitted information into multiple groups, coding the content of the total number of packets, and inserting the coded result into the data packet number of the header;

step F4. numbering the divided data groups in sequence, and coding the numbering content, the coded result is inserted into the data number of the header;

step F5. repeats steps F1 to F4, and the split information is pre-processed in sequence and then transmitted, ensuring that the information is not transmitted in large quantities at the same time.

Further, the information processing of the receiving end comprises the following steps:

r1, decoding the received data header data, classifying according to the file type, the IP and the source port number, and simultaneously creating a new thread to receive new information;

step R2, a receiving container is created according to the length of the header file, and a plurality of file lengths are received at the same time, wherein the plurality of files represent the same type;

step R3, inserting the data content into the corresponding position in the container according to the data number and the packet length (file length/total number of packets) in the header, thereby ensuring the sequence of the file content;

step R4, checking the received content, if the container has empty information, indicating data packet loss, feeding back the IP and the source port number recorded in the step R1, and applying the sending end to resend the information corresponding to the number according to the file type, the file length and the numbering position of the empty container information;

the step R5. repeats the steps R1 to R4, decodes and rewrites the information in the container according to the file type correspondence, restores the file.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention adopts an augmented reality method without registration identification, uploads a virtual object model to a server and is dynamically loaded by a client, reduces the memory required by the application of the client, and can realize the loading and interaction of various models without updating the client; the problems that the traditional augmented reality is poor in adaptability to a new scene, high in dependency on identification and high in development requirement are solved.

2. The trained convolutional neural network has high recognition and classification precision and high speed, and the recognition object is not limited to a specific mark any more due to the good generalization capability of the trained convolutional neural network.

3. The convolutional neural network is trained by using a transfer learning method, so that the requirement on the number of sample data sets is greatly reduced, the training time is reduced, a network model can be conveniently and rapidly adapted to an application scene, and classification services are provided.

4. The method of space mapping is adopted to replace the identification tracking in the traditional augmented reality method, the actual position limitation of the identification can be eliminated, and the virtual information can be more accurately and reasonably combined with the real world.

5. In the aspect of man-machine interaction, the model is interacted in a convenient and rapid gesture, voice and staring mode, and the user operation is comfortable and natural.

Drawings

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a diagram of a client architecture;

FIG. 3 is a UDP protocol header information diagram;

FIG. 4 is a server side architecture diagram;

FIG. 5 is a sample data set of example 1;

FIG. 6 is a sample expansion method of example 1;

FIG. 7 is the results of example 1 model training;

FIG. 8 is implementation 1 taking a picture of a tank of formula 99;

FIG. 9 is information transmission of embodiment 1;

FIG. 10 is information identification classification of embodiment 1;

fig. 11 is dynamic loading of virtual information of embodiment 1.

Detailed Description

The technical solutions of the present invention will be further described in detail and fully with reference to the accompanying drawings and specific embodiments, it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

as shown in fig. 1, in the embodiment, an augmented reality method without a registration identifier is implemented by using a client-server architecture and using a UDP protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved. The client-side human-computer interaction, information acquisition and virtual model dynamic loading functions comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function.

The following describes the function implementation of the client, the server, and the information transmission in detail.

A client:

architecture diagram of the client as shown in fig. 2, the function of the client is realized by the following steps:

c1, information acquisition function of the client; photographing the target by using holographic glasses HoloLens, and storing the photographing result in a format of Jpg; converting the photographed picture into a Sprite format and loading the converted photographed picture to a UI carrier provided by a Uity3D engine for displaying; so as to conveniently and directly see the photographing result on the UI and re-photograph the photos with poor effect.

Step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized; the client application effecting selection and movement of the virtual object using gaze recognition functionality;

a gesture recognition function of the client in step C3.; the gesture recognition is to capture input gestures while recognizing and tracking the position and state of the user's hand, and the system automatically triggers corresponding feedback to manipulate virtual objects in the scene; the input gestures include three types, namely Air-tap, Navigation capture and Bloom.

C4, a voice recognition function of the client; the voice recognition is realized by setting keywords and corresponding feedback behaviors in a client application program, and when a user speaks the keywords, the client application program responds to the preset feedback behaviors; in this embodiment, specific operation instructions and response behaviors of the voice recognition and the gesture recognition are shown in table 1.

TABLE 1 specific operation commands and response behaviors for speech recognition and gesture recognition

c5.2, calculating whether the obtained digital physical space can be used for placing a virtual holographic object in real time; by means of the space mapping function of the client, the space position of the virtual model is not restricted by the position of the identification information in the real world any more;

a dynamic loading function of the client model of step C6.; the dynamic loading of the model is realized by adopting a method that holographic glasses HoloLens loads a virtual model by accessing a server. The dynamic loading of the model is realized by loading the virtual model by the HoloLens head-mounted augmented reality glasses through accessing a server. In daily life, the types of objects are thousands of, all objects are difficult to be placed in an augmented reality application program in advance, and the HoloLens has very limited rendering capability, memory and performance compared with a high-performance computer, so that the HoloLens cannot carry a large number of models for loading. Aiming at the problem, the invention adopts a method for dynamically loading the model from the server, the virtual model and the script are packaged into an AssetBundle compression packet through a Unity3D engine and are uploaded to the server, and the Hololens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, thereby realizing the dynamic loading of the model.

Information transmission:

the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists. The UDP protocol header information is shown in fig. 3.

The information processing of the sending end comprises the following steps:

The information processing of the receiving end comprises the following steps:

The server side:

the architecture diagram of the server is shown in fig. 4. The server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the method comprises the steps of identifying and classifying tanks, armored vehicles, fighter planes and the like according to the existing armor models, building a server by using Apache, and uploading the existing virtual models to the server. The convolutional neural network for transfer learning training is specifically realized by the following steps:

s1, establishing a sample data set; the invention collects 15 types of tank and armored car image samples through the Internet, obtains 1444 total image samples after arrangement and labeling, and the sample types and the quantity distribution are shown in figure 5. In order to reduce and avoid the overfitting phenomenon during model training and reduce the influence of the recognition effect caused by uneven quantity distribution of various types of sample data, sample images acquired through an internet channel are rotated by 90 degrees, rotated by 180 degrees, horizontally mirrored and vertically mirrored according to the quantity proportion to expand the sample data set, as shown in (a) - (e) of fig. 6. After expansion, the number of data sets reaches 9012, and finally the data sets are made into data sets for tank armor identification;

s2, performing model training on the data set for information identification finally manufactured in the step S1, randomly selecting 75% of sample images from different categories as training data sets, using the rest sample images as test data sets, and performing model training with the iteration times of 48 times; the training process is as follows:

step S2.1, pre-training an AlexNet network model on a data set of ImageNet, and initializing parameters of the AlexNet network model through the step;

step S2.2 because the last three layers of the AlexNet network model are configured into 1000 classes, retraining the last three fully-connected layers to adapt to the new classes, and reserving the parameters of the new fully-connected layers through the step to adapt to the class of the tank armor-recognized data set established in the step S1;

In formula (1), exact quantity represents the correct number of test data classification results, and TotalQuantity represents the total number of test data; the higher the accuracy of the classification of the test data is, the better the classification effect of the network model is.

Wherein,1{y_ij is an indicative function whose value is 1 when the "{ }" internal value is true, and 0 otherwise; the closer the loss value is to 0, the better the training result of the network model is represented.

Overfitting ratio is shown in equation (3)

In the formula, TrainExactQuantity is the number of correct training data classification results, and TrainTotalQuantity is the total number of training data. The closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented. The training results of the network model through the transfer learning training are shown in fig. 7. The average value of the accuracy of the final test is 97.51%, and the overfitting ratio of the model is basically stable at about 1.03, which shows that the network model trained by the method has good generalization capability.

And using the system to carry out actual operation, connecting the HoloLens with Wifi, realizing information transmission with the server according to the IP address and the port number, and starting to test functions of all parts of the system. The results of the photographs of the HoloLens are shown in fig. 8, and fig. 9 shows the receiving effect of the information of the server. As can be seen from fig. 8 and 9, by performing information verification at the application layer, the picture packet loss phenomenon is basically solved, the server information classification result is shown in fig. 10, and fig. 11 is a dynamic loading graph of virtual information. As can be seen from FIG. 10, the convolutional neural network of the transfer learning training of the present invention has high recognition and classification accuracy and high speed, and its high generalization capability enables the recognition object to be no longer limited to a specific identifier, FIG. 11 loads the 99-type tank model and moves to a rack by a space mapping method (the dotted line in FIG. 9 is a tripod). As can be seen from fig. 11, the method of using spatial mapping for identification tracking can get rid of the actual location limitation of identification. By the aid of the C/S framework, the virtual object model is uploaded to the server and dynamically loaded by the client, so that loading and interaction of multiple models can be realized without updating the client while memory required by application of the client is reduced.

Example 2:

embodiment 2 differs from embodiment 1 only in step S2.

In the embodiment 2, 70% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as testing data sets, the iteration times are 40 times, and model training is carried out; the results were the same as in example 1.

Example 3:

embodiment 3 differs from embodiment 1 only in step S2.

In the embodiment 3, 80% of sample images in different categories are randomly selected as training data sets, the rest sample images are used as test data sets, the iteration times are 100 times, and model training is carried out; the results were the same as in example 1.

The method is suitable for augmented reality application requiring a large number of virtual models, especially in the field of engineering models.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. A registration-free identification augmented reality method is characterized in that: the augmented reality method adopts a client-server architecture and adopts a UDP protocol for information transmission; the client side provides functions of man-machine interaction, information acquisition and dynamic loading of the virtual model, and the server side identifies and classifies the received information through the convolutional neural network of transfer learning training and provides the virtual model, so that the effect of augmented reality is achieved.

2. The augmented reality method of claim 1, wherein the augmented reality method comprises: the client-side human-computer interaction, information acquisition and virtual model dynamic loading functions comprise an information acquisition function, a gaze recognition function, a gesture recognition function, a voice recognition function, a space mapping function and a dynamic loading function.

3. The augmented reality method of claim 2, wherein the augmented reality method comprises: the information acquisition function, the gaze recognition function, the gesture recognition function, the voice recognition function, the spatial mapping function and the dynamic loading function of the client are realized by adopting the following steps:

step C2. gaze recognition functionality of the client; the gaze recognition is based on an eye tracking technology and is used for tracking and selecting the holographic object, and the feedback of a collision result is obtained after the holographic object is collided according to the position and the direction of the head of a user by means of Physical Raycast Physical rays of a Unity3D engine, wherein the feedback comprises the position of a collision point and the information of the collision object, so that the tracking and the selection of the holographic object in the scene are realized;

4. The augmented reality method of claim 3, wherein the augmented reality method comprises: the input gesture in the step C3 comprises three types, namely Air-tap, Navigation capture and Bloom.

5. The augmented reality method of claim 3, wherein the augmented reality method comprises: the virtual model stored in the server in the step C6 is a compressed package which packages the virtual model and the script into AssetBundle in advance through a Unity3D engine and uploads the compressed package to the server; and the holographic glasses HoloLens accesses the server to download and decompress the AssetBundle compression packet of the corresponding model according to the result identified by the server, so that the dynamic loading of the model is realized.

6. The augmented reality method of claim 1, wherein the augmented reality method comprises: the server side identifies and classifies the received information through a convolutional neural network trained by transfer learning, and provides a virtual model; the convolutional neural network for transfer learning training is specifically realized by the following steps:

overfitting ratio is shown in equation (3)

Wherein, TrainExactQuantity is the correct number of the training data classification results, and TrainTotalQuantity is the total number of the training data; the closer the overfitting ratio is to 1, the better the generalization ability of the network model is represented.

7. The augmented reality method of claim 6, wherein the augmented reality method comprises: the model training process of step S2 is:

8. The augmented reality method of claim 1, wherein the augmented reality method comprises: the information transmission is to process the information of the sending end, transmit the information by adopting a UDP protocol, and process and restore the received information by the receiving end; the method comprises the steps of preprocessing transmitted information according to the maximum byte number which can be transmitted once, acquiring a picture at a transmitting end according to a file absolute path, then carrying out data coding, data cutting and operation of adding header information to the picture, adding a file type, a file data length, a data packet number and a data number in the header information, finishing data decoding and recombination by a receiving end according to the data header information, checking whether data are available or not, returning check information and applying the transmitting end to resend lost packet information according to the header data number if the lost packet exists.

9. The augmented reality method of claim 8, wherein the augmented reality method comprises: the information processing of the sending end comprises the following steps:

10. The augmented reality method of claim 8, wherein the augmented reality method comprises: the information processing of the receiving end comprises the following steps: