WO2017182523A1

WO2017182523A1 - A method and a system for real-time remote support with use of computer vision and augmented reality

Info

Publication number: WO2017182523A1
Application number: PCT/EP2017/059290
Authority: WO
Inventors: Dante MOCCETTI; Fabio Rezzonico; Pietro VERAGOUTH; Antonino TRAMONTE; Lorenzo CAMPO; Jacopo BOSIO; Antonio Leonardo Jacopo MURCIANO
Original assignee: Newbiquity Sagl
Priority date: 2016-04-20
Filing date: 2017-04-19
Publication date: 2017-10-26

Abstract

A method for real-time remote support with use of computer vision and augmented reality, comprising following steps: providing an augmented reality engine having computer vision programs and residing partly in an electronic device of a user requesting support and partly in said cloud computing data network; and wherein the following steps are carried out in real time: acquiring, with a camera of the electronic device of the user requesting support, a video of a work environment; transmitting and displaying said video on an electronic device of a support provider; affixing, by said support provider, a graphic marker in augmented reality on a dot that is to be tagged in said video; running said computer vision programs so as to display said video containing said graphic marker on the electronic device of said user requesting support, the spatial coordinates of the graphic marker being recalculated so that, in the sequence of images of the video displayed on said electronic device of said user requesting support, the graphic marker permanently points on the tagged dot.

Description

A METHOD AND A SYSTEM FOR REAL-TIME REMOTE SUPPORT WITH USE OF COMPUTER VISION AND AUGMENTED REALITY

DESCRIPTION

The present invention relates to a method and a system for real-time remote support with use of computer vision and augmented reality.

In more detail, the present invention relates to a method for remote maintenance/user support using computer vision for enabling an expert user to enrich the contents of the environment for a user requesting support in real time.

Computer vision mechanisms in use enable to extract characteristic features from the environment so as to recognise objects, textures, parts of objects, in relation to an image.

The contents that are affixed to the environment that the camera of the user requesting support captures are "tags" appearing as graphic symbols in augmented reality.

By augmented reality (AR) or processor-mediated reality, is meant the enrichment of human sensory perception by means of information, usually manipulated and conveyed electronically, which would not be perceptible using only the five senses.

In 2009, thanks to technological improvement, augmented reality, already used in very specific fields such as the military, medicine or research, was presented to the wider public in both the form of communication campaigns, i.e. "augmented advertising" published in newspapers or on the web, and through a progressively growing number of smartphone apps for entertainment (games) or for intensifying an experience by enriching contents to be associated to the environment.

Maintenance/support systems are known that use augmented reality.

These software systems enable a non-specialised operator who does not know the environment and the object he/she is to operate on to receive a quantity of augmented reality information so as to be able to carry out the requested operation. In general, for maintenance/support with augmented reality techniques, information is used that the system already stores in its memory.

In this case, there is no real-time relationship between the expert operator and the operator requesting support.

In these cases, recognition of the environment is based on recognition of a specific object and the support provided by the system is therefore limited to a particular object on which the system has been "trained" and a rigid sequence of contents.

In these cases, there is no analysis of a variable environment and the recognition of the object can take place either via physical markers affixed to the object or via specific image recognition for which the system is trained.

There also exist "live" systems that envisage an interaction between the user requesting support and the external operator offering that support.

These systems generally include only the sending of static images.

In other words, the expert operator works on a static frame and the operator requesting support receives a picture on which "illustrative" contents have been affixed.

Lastly, systems can be hypothesised in which a video-call is used to offer live video support.

In these cases, however, the calculations are carried out entirely locally on the processor of the device which either captures the video or adds augmented reality contents. This approach might be effective in a case where the device carrying out the calculations is a powerful processor. In these cases, however two types of problem would arise, due to the fact that computer vision calculations used in parallel to a video stream take up much of the CPU of mobile devices.

Firstly, a temporal deviation between the moment when the environment in which the non-expert operator has to operate and the response information on what to do causes great difficulty in relation to the correct interpretation by the non-expert operator on what to do and where to do it. Secondly, the use of fast - though not very sophisticated - algorithms generates a large number of false positives and false negatives (in other terms erroneous detections).

These problems would emerge very strongly should the user requesting support have modified his or her point of observation with respect to the point where the detection of the environment and the object in which to operate was determined.

This is because the calculations in this case have to flow more rapidly and must at the same time offer robust results.

The technical task of the invention is to obviate the above-described drawbacks in the prior art, while still providing a system that operates in real time.

In the scope of this technical task, an object of the present invention is to provide a method and a system for real-time remote support with use of computer vision and augmented reality, which ensure efficient and reliable support in real time.

This and other aims of the present invention are attained by a real-time remote support method with use of computer vision and augmented reality, characterised in that it comprises following steps:

- providing a cloud computing data network

- providing an augmented reality engine having computer vision programs residing in a mobile electronic device of a user requesting support and in said cloud computing data network;

and wherein the following steps are carried out in real time:

- acquiring, with a camera of the electronic device of the user requesting support, a video of a work environment;

- transmitting and displaying said video on an electronic device of a support provider;

- affixing, by said support provider, a graphic marker in augmented reality on a dot that is to be tagged in said video;

- running said computer vision programs so as to display said video containing said graphic marker on the electronic device of said user requesting support, the spatial coordinates of the graphic marker being recalculated so that, in the sequence of images of the video displayed on said electronic device of said user requesting support, the graphic marker permanently points on the tagged dot.

In a preferred embodiment of the invention, said programs residing in said cloud computing data network operate in an on-demand mode when said programs residing in said electronic device of said user requesting support do not provide a reliable outcome.

In a preferred embodiment of the invention said cloud computing network comprises a streaming server.

In a preferred embodiment of the invention said programs comprise a program for extracting images from a video and transformation of the images into dot matrices.

In a preferred embodiment of the invention said programs comprise an augmented reality filter.

In a preferred embodiment of the invention, said augmented reality filter operates on said dot matrices so as to recalculate the spatial coordinates of the graphic marker.

In a preferred embodiment of the invention, said video acquired by said electronic device of said user requesting support is transmitted to said streaming server from which it is in turn transmitted to said electronic device of said support provider which affixes said graphic marker and returns the spatial coordinates of the tagged dot to said streaming server.

In a preferred embodiment of the invention said spatial coordinates are made available to said augmented reality filter in order to be processed.

In a preferred embodiment of the invention said processed spatial coordinates are made available to said electronic device requesting support and to said electronic device providing support.

The present invention also relates to a system for real-time remote support with use of computer vision and augmented reality, characterised in that it comprises following components: - an electronic device of a user requesting support provided with a camera and display screen and an electronic device of a support provider provided with a display screen;

- a cloud computing data network comprising at least one streaming server;

- an augmented reality engine having computer vision programs residing in said electronic device of said user requesting support and in said cloud computing data network; said augmented reality engine being configured so as to display on said electronic device of the user requesting support a video acquired by said user requesting support to which the support provider has affixed in augmented reality a graphic marker on a dot, the spatial coordinates of which are recalculated so that in the sequence of images of the video the graphic marker points permanently on the tagged dot.

The augmented reality engine thus has computer vision programs divided into two different parts, a first part which draws the augmented reality contents, and a second part, different from the first pail, which performs the computer vision calculations necessary for establishing where to draw the augmented reality contents.

The remote support method of the invention is based on a detection of the environment, which is done in duplicate mode on the electronic device of the user requesting support and in the cloud. This system, by maintaining a real-time modality in the interaction between the support provider and the user requesting support, has been shown to be more reliable, including in cases in which the scene changes by changing the position of the user requesting support, and has also been demonstrated to be faster than known systems.

These advantages originate from transferring part of the calculation (the heaviest and most complex part) onto a cloud computing network and therefore using, in parallel, both distributed calculation resources, i.e. local resources in the electronic device of the user requesting support, and central resources, i.e. resources of the cloud computing network. The complex calculations, which use heavy computer vision algorithms, instead of being "locally" performed are performed in the cloud in an on-demand mode when the locally-performed calculations give results that are not substantially reliable, while the lighter calculations are performed locally in a continuous manner.

Therefore, by performing the more laborious calculations in the cloud an integration of the service in third-party applications becomes more accessible and faster.

However, as the calculations performed in clouds are performed only on demand, when the probability of error in the local calculations show to be too high, the architecture enables to limit the infrastructural costs of the cloud.

Further, as the calculations are performed in the cloud, it becomes simpler to refine the quality of the positioning process of the graphic marker.

This is for two different reasons: on the one hand, there are no stringent limitations to the calculating resources and various algorithms can be used in parallel; on the other hand, the cloud algorithms are used when the "faster and lighter" algorithms locally used fail.

This improves the reliability and precision of the system.

Part of the augmented reality engine can be improved and implemented without requiring the users to install updates or new software versions given that the software updates can be directly made on the cloud.

These and other aspects of the invention will be more fully clarified upon reading of the following description of a preferred embodiment thereof, in which:

- figure 1 shows a block diagram of the functioning of the support system.

The system for real-time remote support with use of computer vision and augmented reality comprises an electronic device 1 of a user requesting support, an electronic device 2 of a support provider, a cloud computing data network 3 comprising at least one streaming server 4, and an augmented reality engine 5, 6, 7, 8, 9 having computer vision programs residing in the electronic device 1 of the user requesting support and in the cloud computing data network 3;

The electronic devices 1, 2 to which reference will be made in the following are in particular portable devices such as smartphones, though it is understood that any electronic device of another type suitable for the purpose can be used.

The augmented reality engine 5, 6, 7, 8, 9 therefore includes a part 7, 8, 9 that locally resides on the smartphone 1 of the user requesting support and a part 5, 6 residing in the cloud computing network 3.

The support method is articulated in following steps.

The user requesting support, using his or her smartphone, contacts the support provider on his or her smartphone.

When the support provider responds to the request for support, a one-way video call is activated in which the user requesting support sends the video, captured by the camera of his or her smartphone, to the smartphone of the support provider.

The user requesting support frames on his/her smartphone the object on which he/she desires to receive support.

The support provider receives the video and uses a special interface to affix tags on the object in augmented reality.

The tags function as references which indicate specific parts of the scene with respect to which it is giving support.

The tags are kept in the correct position even when the user requesting support moves his/her smartphone.

This is made possible by the support of the above-mentioned computer vision programs, for example but not necessarily programs made available by the open source library OpenCV. The computer vision programs are activated as follows.

The support provider, when placing a tag on the video it receives from the smartphone of the user requesting support, sends the x and y coordinates of the dot to be "tagged" via the streaming server 4.

Therefore, these coordinates tag a dot on an image.

The computer vision algorithms can thus analyse a part of the scene by proceeding to extract characteristic features that will serve, as the images of the video flow, to recalculate the x and y coordinates of the dot to be tagged, and are in this way able to move the tag according to the evolution of the situation so as to maintain it on the object or part of the object to be tagged.

Therefore, the computer vision algorithms used serve to collect the characteristic features that can be used for denoting a part of significant information within the images which are flowing in the video streaming.

These characteristic features thus enable the algorithms to calculate and progressively recalculate the correct position in which the tags must be drawn and inserted with an augmented reality method.

Specifically, the algorithms used perform "proximity" operations applied to an image, specific structures of the image itself, dots or lines in an image or even complex structures such as objects in images.

The characteristic features can also relate to a sequence of images in movement, forms defined in terms of curves or borders between different regions of the image or specific properties of a region of the image.

Some algorithms are very fast and "light" in terms of calculation resources, such as for example algorithms based on geometric transformations.

These algorithms can therefore be run on the smartphone of the user requesting support. Other algorithms are instead slower and take up greater resources. These heavier algorithms are ran in clouds and are therefore more reliable.

It is worthy of note that the faster and lighter algorithms tend to have greater margins of error in specific conditions, such as for example in cases in which the scene completely changes and then returns to the original scene.

In other terms the support method includes using efficient and economical algorithms where resources are limited, i.e. on the smartphone of the user requesting support, and more reliable and powerful algorithms where the resources are scalable, i.e. in a cloud network.

In this way results are rapidly available and these results, when considered inadequate, are adjusted by more robust calculations.

At the same time, where the fast calculations fail, more complex calculations can be attempted, which can instead give results.

In the following, a more detailed description of the functioning of the support system is given.

The smartphone 1 of the user requesting support sends the video stream captured by its camera, based for example on the Wowza proprietary software, to the streaming server 4.

The streaming server 4 makes the video flow available to the smartphone 2 of the support provider which displays it.

The smartphone 2 of the support provider is appropriately provided with a graphic interface and commands that enable to position a graphic marker on a dot to be tagged on the screen. Once the graphic marker is positioned, the metadata is sent to the streaming server 4, which metadata represent the spatial coordinates x and y of the dot to be tagged.

The metadata (spatial coordinates x and y) is made available and used by the program 7, 8 of the augmented reality engine residing in the smartphone 1 of the user requesting support in order to perform positioning calculations and computer visions that are "light and rapid". These calculations enable to reposition the dot to be tagged in the flow of video images.

This mechanism enables to "tye" a graphic marker to elements, or parts of elements, present in the scene as captured by the smartphone 1 of the user requesting support.

In practice, a program 7 of the augmented reality engine residing in the smartphone 1 of the user requesting support deconstructs the video into images and transforms the images into dot matrices, and a program 8 of the augmented reality engine, again residing in the smartphone 1 of the user requesting support, performs the calculations on the dot matrices so as to recalculate the spatial coordinates x and y to be tagged on the images of the video.

Lastly, a program 9 of the augmented reality engine, again residing in the smartphone 1 of the user requesting support, uses the spatial coordinates x and y, recalculated in this way, to draw the tag in augmented reality on the images of the video.

The calculations used locally by the smartphone 1 of the user requesting support for the positioning are calculations based on dot matrices and on transformations thereof in the sequence of the video images and require a relatively moderate extraction of features characteristic of the image.

This choice enables great rapidity and low consumption of calculating resources.

However, in unusual or complex situations it is susceptible to an increase of false positives or false negatives regarding recognition.

For this reason, the augmented reality engine also has support programs residing in the cloud computing network which is not subject to stringent limitations in the calculating resources and intervenes when the augmented reality engine programs residing locally in the smartphone 1 of the user requesting support give evidence of a recognition that is not sufficiently reliable.

As they have available the video images which via the streaming server 4 arrive from the smartphone 1 of the user requesting support and the metadata relative to the spatial coordinates x, y of the dot to be tagged which via the streaming server 4 arrive from the smartphone 2 of the support provider, the support programs residing in the cloud computing network 3 at this point possess the data necessary for them to function.

There are at least two support programs residing in the cloud computing network 3.

The first support program 5 residing in the cloud computing network has the task of deconstructing, from the video flow sent from the smartphone 1 of the user requesting support, the single images which will be transformed into matrices.

The second support program 6 residing in the cloud computing network, for example written in Java and based on the Computer Vision library known as OpenCV, has the task of analysing the matrices created starting from the video images and, using the metadata sent from the smartphone 2 of the support provider, performing the calculations necessary for providing new metadata, i.e. the spatial coordinates x and y of the dot to be tagged for each video image.

The first support program 5 becomes necessary as the computer vision algorithms analyse images (i.e. frames) transformed into matrices.

The matrices describe relations between images of a same scene.

Given the projection of a dot of the scene in one of the images, it becomes possible to search for the corresponding dot in the other image.

The nonnal flow of the streaming through the streaming server 4 would not enable to perform calculation operations on dot matrices.

In fact, the streaming server 4, in itself, serves only for receiving a video flow from a source and making it available to an audience.

The first support program 5, relying on a function made available by the streaming server 4 to access the video images, transforms the images into dot matrices which at this point can be processed by the second support program 6.

Therefore, the second support program 6, by using various computer vision algorithms based on the image recognition concept, can remap the x and y coordinates positioned by the smartphone of the support provider 2 on each subsequent image of the video flow.

The second support program 6 makes new metadata available when the calculations performed locally are considered to be not sufficiently reliable.

In this way both the smartphones 1, 2 receive, from the streaming server 4, new x and y coordinates which can then be used for drawing the markers on the video images.

The method and system for real-time remote support with use of computer vision and augmented reality as they are conceived are susceptible to numerous modifications and variants, all falling within the inventive concept described and claimed.

Claims

1. A method for real-time remote support with use of computer vision and augmented reality, characterised in that it comprises following steps:

- providing a cloud computing data network;

- providing an augmented reality engine having computer vision programs residing in an electronic device of a user requesting support and in said cloud computing data network; and wherein the following steps are carried out in real time:

2. The method for real-time remote support with use of computer vision and augmented reality according to the preceding claim, characterised in that said programs residing in said cloud computing data network operate in an on-demand mode when said programs residing in said electronic device of said user requesting support do not provide a reliable outcome.

3. The method for real-time remote support with use of computer vision and augmented reality according to claim 1, characterised in that said cloud computing network comprises a streaming server.

4. The method for real-time remote support with use of computer vision and augmented reality according to any preceding claim, characterised in that said programs comprise a program for extracting images from a video and transformation of the images into dot matrices.

5. The method for real-time remote support with use of computer vision and augmented reality according to any preceding claim, characterised in that said programs comprise an augmented reality filter.

6. The method for real-time remote support with use of computer vision and augmented reality according to the preceding claim, characterised in that said augmented reality filter operates on said dot matrices so as to recalculate the spatial coordinates of the graphic marker.

7. The method for real-time remote support with use of computer vision and augmented reality according to any preceding claim, characterised in that said video acquired by said electronic device of said user requesting support is transmitted to said streaming server from which it is in turn transmitted to said electronic device of said support provider which affixes said graphic marker and returns the spatial coordinates of the tagged dot to said streaming server.

8. The method for real-time remote support with use of computer vision and augmented reality according to the preceding claim, characterised in that said spatial coordinates are made available to said augmented reality filter in order to be processed.

9. The method for real-time remote support with use of computer vision and augmented reality according to the preceding claim, characterised in that said processed spatial coordinates are made available to said electronic device requesting support and to said electronic device providing support.

10. The system for real-time remote support with use of computer vision and augmented reality, characterised in that it comprises the following components:

- an electronic device of a user requesting support provided with a camera and display screen and an electronic device of a support provider provided with a display screen;

- a cloud computing data network comprising at least one streaming server;

- an augmented reality engine having computer vision programs and residing in said electronic device of said user requesting support and in said cloud computing data network; said augmented reality engine being configured so as to display on said electronic device of the user requesting support a video acquired by said user requesting support to which the support provider has affixed in augmented reality a graphic marker on a dot, the spatial coordinates of which are recalculated so that in the sequence of images of the video the graphic marker points permanently on the tagged dot.