NL2014682B1

NL2014682B1 - Method of simulating conversation between a person and an object, a related computer program, computer system and memory means.

Info

Publication number: NL2014682B1
Application number: NL2014682A
Authority: NL
Inventors: Desain Peter; Dimitriadis George; De Roos André
Original assignee: Mindaffect B V
Priority date: 2015-04-20
Filing date: 2015-04-20
Publication date: 2017-01-20
Also published as: NL2014682A

Abstract

A method of simulating a conversation between a person and an object uses a display device connected to a computer and com- prises first identifying the object and then executing the following steps at least once: (a) executing the sub steps of: 1) presenting on the display device, of a first expression, and 2) showing an image or video of the object on the display device, comprising an ani- mation of at least part of the object that is pronouncing the first expression; and (b) inputting to the display device, of a second expression, wherein the display device is a mobile device, and the image or video comprises a real time image that shows part of the environment of the display device in a perspective substantially identical to that of human eyesight, the image being contiguous with the actual field of vision the person has of his environment, and wherein the animation is overlaid on the real-time image, such that the object maintains the same position in the realtime image with respect to other objects in that image when the mobile device is moved. The image may be made by a camera fixed to the mobile device. The method may also comprise: (3) determining the viewing angle (ex) between the portable dis- play device and the object, (4) adapting the perspective of the animation of the object to show the animation of the object under said viewing angle (ex) . The viewing angle may be determined by means of a camera and image-processing.

Description

METHOD OF SIMULATING CONVERSATION BETWEEN A PERSON AND AN OBJECT, A RELATED COMPUTER PROGRAM, COMPUTER SYSTEM AND MEMORY MEANS

The invention relates to a method according to the preamble of claim 1. A conversation thus obtained consists of a series of first and second expressions in any order, typically presented audibly and/or visibly, for example as answers by the simulated object to questions by the person at the display device .

Such a conversation may be obtained by selecting the first and second expressions from a graph data structure that represents expressions as well as the order in which these expressions are allowed to occur in a possible conversation, as is known per se in the art; see e.g. patent EP 1 293 076 B9 of one of the present inventors. However, such a conversation does not need to have a predefined order of the first and second expressions and also allows for arbitrary orders of first and second expressions, to simulate for example a conversation serving the goal of making conversation ("small talk") or a conversation with a chaotic person. A similar method is disclosed in US2002026457 A1 on the name of the applicant Jensen, with an accompanying communication system for interactive distribution of information such as advertising over a computer network. The simulated conversation is between an object shown on a display device and a user of the device. The system includes a first computer, an information server containing the information to be distributed, and any number of second computers arranged to display said information as well as transmitting information such as requests for further information back to the information server over a computer network.

The information in the information server is arranged according to an information tree structure, such that when a first message is displayed on the information client, the us- er will be able to choose from a number of requests for further information, and when these further messages are presented, the user is again given such a choice, thus creating a conversation involving the user. In a particular embodiment of the system, the messages are animated computer graphics and sound. The text message from the information server may be sent through a text to speech converter which converts the text to a string of phonemes, diphones or some other representation of speech and forwards this to the information client that may include a speech synthesizer and possibly a computer graphics program generating a "talking head". Of course, prerecorded speech could be used here too. This method and system are intended to keep the attention of users for a prolonged period of time, in online advertisements .

There is a need for a realistic simulation of a conversation between the user of the device and a material or virtual object in an environment where the user of the device and the object are present. For example, the environment may be a museum, where a user consults a computer display device for obtaining information about pieces of art he sees, via a dialogue .

The invention aims to offer a method of the type described in the preamble that enables such a realistic simulated conversation.

This goal is realized by the method of claim 1.

It appears that, when a person sees the actual scene around the portable display device, and the real-time, dynamic, image on the display device in a natural perspective of human eyesight (typically such as provided by a lens of focal length between 35-50 mm on a 36x24 mm photo camera), and the real-time image is substantially contiguous with the actual field of vision the person has of his environment, this information evokes the impression that the display device shows real things present in the real situation, typically obtained by a camera, as if it were transparent like a glass window. Thus the animated object, overlaid on the real-time image, is also regarded real.

The degree to which this occurs will naturally depend on what is present in the real-time image. For example, in a museum hall, other persons may walk around in the hall and become captured in the real time image, which will add to sensation that the image shows real things present in the hall.

Also in the case the object is virtual, for instance a person who explains what is visible in a landscape shown in the real time image, the object or person is kept in the same position in the real-time image; thus the same position with respect to objects in the landscape.

The object may also be a person explaining parts of the image, such as a sculpture or a painting in the real-time image; in that case the object maintains the same position with respect to the sculpture or painting it is explaining.

The portable display device is typically that of a handheld mobile device, including a tablet computer, a smart phone or even wearable augmented reality devices such as Google Glass®, or a small laptop.

Preferably, the real-time image is obtained by a camera fixed to the mobile device, for instance integrated in the device, with the camera looking in the same direction as the person viewing the display.

Advantageously, the second expressions, i.e. those to be inputted, may be obtained from the display of the display device, where they are shown for example as a selection collection or a list of allowed second expressions, i.e. second expressions where the person holding the device may choose from. In such a case, the second expression may be selected by one of, but not limited to: pointing a cursor by a mouse device and then clicking, touching a touch screen of the display device by a finger on the desired expression on a shown list, and reading out loud the expression by the user or another person and then registering the sound and speech processing the registered sound by the at least one computer to identify the expression among the other shown selectable expressions .

The at least one computer may be the computer of the same mobile display device that contains the display, but may also be a remote computer to which the display device communicates over, for instance, a network connection such as internet. Typically, the remote computer would be a server system connected to the internet.

In an embodiment, the method also comprises the steps of : (3) by the at least one computer, determining the viewing angle (a) between the portable display device and the object, as defined by the relative direction of a line (L) running through the portable display device and the object with respect to a defined coordinate system of the object, (4) adapting, by the at least one computer, of the perspective of the animation (271) of the object shown on the portable display device to the viewing angle determined in step (3) to show the animation of the object under said viewing angle (a).

By adapting the perspective of the animated object shown on the portable display device, it will appear to the user as if the shown object is viewed under the same viewing angle as the material object, and therefore the user will obtain the impression, or illusion, of looking at the material object when looking at the display device. Such an impression is absent when the adaptation of the perspective is not made. As a result of said impression of looking at the material object, the user will also be led to a higher degree into the illusion of having a conversation with the material object itself, or, in the case the object is a sculpture or image of a real person, with that person herself or himself. Thus, a more realistic conversation is obtained. Preferably, the viewing angle is adapted on a regular basis, for instance each 0.1 sec.

In an embodiment, the distance between the material object and the display device is determined and the sizing of the shown image of the object on the display device is also adapted to said distance.

This further enhances the illusion of looking at the real, material object. The determination of the distance may be realized in a manner known per se. For example, the size of the material object may have been stored in the graph data structure or elsewhere, and may be compared to the actual size of the object on the image.

In a preferred embodiment of the invention, the viewing angle is determined by means of a camera and image-processing. This allows for a fast and precise determination of the viewing angle and the same camera and image-processing techniques may be used as those needed in creating the overlay.

The viewing angle may be determined by creating an imaginary line between the middle of the display of the display device and the middle of the material object, the material object being provided with a 2D or 3D coordinate system through its middle. It should be clear that although the middle is preferred, any other point of the display device or the object may be used as well, since in most cases the distance between the display device and the object are large enough with respect to the distances within the display device and the object to obtain viewing angles that will still offer an almost identical realistic perspective to the user as when the middles are used.

In another embodiment, the at least one computer collects data about the user and adapts at least one of the first expressions to said collected data about the user.

By taking into account data about the user, the first expressions, often being responses, and therewith the conversation, may be personalized to the user. As such the illusion of a conversation with the material object, or even a person represented by the material object, will be even stronger. In fact, the illusion may be created that the object 'knows' the user and/or has already met before, in particular when the data taken into account informs about the interests and/or knowledge of the user and/or her/his preferred ways of interacting .

Preferably, the data about the user includes at least one of, but not limited to: data collected from a social media profile of the user, observations made by a camera, microphone, fingerprint sensor or other sensor, data representing emotional or other mental states of the user as derived from observations made by sensors connected to the computer or by inputted data to the computer made by a person, data collected from any personal or group profile of the user .

In an embodiment, the display device comprises at least two sound reproduction units and at least one of the responses and/or at least one of the visual animations comprise an audio counterpart that is transmitted by said sound reproduction units, and wherein the audio counterpart is adapted at least once to sound to the user of the display device as if the audio originates from the material object.

By thus changing the audio from the sound reproduction units, e.g. in volume, phase and/or echoes, the illusion of a conversation with the material object is further enhanced. Technically, this may be realized by signal processing software available on the market.

In a preferred embodiment, the display device is part of a mobile device which also comprises a camera directed at the viewer side of the display, the method also comprising the steps of: - capturing an image by the camera and showing the image captured by the camera and the image of the object simultaneously on the display device, and - recording on video the conversation, as shown and possibly made audible in said two images on the display device.

In this manner, a film/movie is created of both actors in the interaction.

Preferably, here the first expressions are made audible via the display device, and the selection of second expressions is realized by inputting them via a microphone of the display device and speech recognition by the at least one computer .

As such, the conversation becomes even more realistic due to the absence of visible information other than the animated image or video of the object.

With an advantage, the video is created by the mobile device, yielding less data traffic and a more reliable process .

In an embodiment, identifying the object comprises the use of data obtained by the portable display device which data is location dependant, such as RFID data, WiFi data, GPS data, and earth magnetic field data.

In the case of non-moving, or stationary, objects, this may be preferable for cost effectiveness or reliability etc.

In a particularly advantageous embodiment, identifying the object comprises the steps of: (2) directing the camera of the portable display device towards the object, (3) collecting from the camera, by the at least one computer, of data about the object for identification thereof, and (4) identifying the object by the at least one computer by matching said collected data about the object with object data stored in the computer, using image recognition steps.

The use of a camera and image-recognition steps, as an alternative to, or complementary to, the use of location dependant data, has as an advantage that it offers a highly accurate identification. In particular the combination of location dependant data and camera data offers a highly increased accuracy. Another advantage of using image recognition tools is that it uses a camera that is already present in many mobile devices nowadays, leaving the only additional costs in image recognition software.

The invention further relates to a computer program that implements the method of the present invention, and to a computer system running this program and memory means storing said computer program.

The invention will now be illustrated on the basis of a preferred embodiment, referring to the accompanying drawings and merely as an illustration of the invention and not in limitation thereof. In the drawings, similar parts are given similar reference numerals. The use of "or" indicates a nonexclusive alternative without limitation unless otherwise noted. The use of "including" means "including, but not limited to", unless otherwise noted. Here

Figure 1 shows a computer system running a program that implements the method according to the invention,

Figure 2 shows a flow diagram of an implementation of the method according to the invention in the program run by the system of Figure 1,

Figure 3 shows a modified flow diagram of the method of Figure 2, now with the display device shown in addition to the steps of the method,

Figure 4a shows a person in a museum hall using a display device executing the method according to the invention, and

Figure 4b shows the display device of Figure 2a during the execution of the method according to the invention.

In Figure 1 a computer system 1 of a museum comprises a portable display device in the shape of a tablet computer 2, such as an Apple Ipad or Asus Transformer T100, and a server computer 3, and a NFC tag 4 located in the vicinity of a painting 5. The portable tablet computer 2 is connected to the server computer 3, via a WiFi access point 6 with WiFi antenna 6a and a data communication network such as the LAN 7 of the museum in this embodiment but possibly internet in another embodiment, neither of which belongs to the computer system 1, which server computer 3 is arranged to run software code that implements the method according to the invention, as will be discussed into more detail in Figure 2. Instead of, or in addition to, the LAN 7, internet may be used.

The tablet computer 2 is shown schematically and comprises a display 8 with a touch screen, a central processing unit 9, a memory unit 10, a rear camera 11 directed away from the viewer side of the display 8 and a front camera 12 directed towards the viewer side of the display 8, a microphone 13, built-in sound reproduction units in the shape of loudspeakers 14, a WiFi module 15 including a WiFi antenna 16, and a battery 17. A single multifunctional button 18 is located next to the display 8; no fixed hardware keyboard is present since the tablet computer 2 has the ability to show an interactive keyboard on the touch screen when necessary.

The components 8-18 are connected for power and/or information exchange in a common and known manner, which is shown schematically by the wiring 19.

The server computer 3 comprises a central processing unit 20, a memory 21, a network module 22 and a long term storage memory 23 comprising a relational database system.

The components 20-23 are connected by wiring for power and/or information exchange.

In Figures 2 and 3, the steps are shown of an embodiment of the method according to the invention, as implemented in server software installed on the server computer 3 and in an application on the tablet computer 2 of Figure 1.

This software comprises a graph data structure representing expressions as well as the order in which these expressions may occur in a possible conversation.

It is assumed that the server computer 3 is operational and the server software is running. The LAN 7 and WiFi access point 6 and the tablet computer 2 are also operational.

First, in step 200, the user of the tablet computer 2 starts the application program, which program logs on at the server via a webpage (using the hypertext transfer protocol or similar), via its WiFi module 15 and the LAN 7. The application program starts the rear camera 11 to capture video images .

Then, in step 210, the server computer 3 refreshes the web page on the touch screen display of the display device, now showing a message "point your tablet at an object you want to have a conversation with". In other embodiments, this explicit instruction step 210 may be left out.

In step 220, the person points the rear camera 11 at the painting 5.

In step 230, the tablet computer 2 and server computer 3 identify the painting 5, by the tablet computer 2 comparing the image obtained by the rear camera 11 to reference images present in the database system of the long term storage 23 retrieved via the server computer 3, using image processing techniques known per se.

Once the painting 5 is identified, the message is removed from the display screen and the conversation with the painting 5 starts, represented by the group 240 of steps 250 - 300.

In step 250, the images obtained by the camera are used to determine the viewing angle and distance to the object. In this embodiment, this is realized by comparing an image obtained by the camera to a reference image of the painting 5, or object, of which reference image the viewing angle and distance are known. In the images, contour lines are detected and distances between such lines or points thereon are calculated, and compared in a known manner for calculating the actual viewing angle.

In step 260 the server retrieves a first expression, which in this embodiment always comprises two parts: an utterance 261 and a selection list 262 of allowed second expressions .

In other embodiments, the selection list of allowed second expressions may be embodied in the utterance, for example, or may even be absent. In the latter case the user may, for example, provide a free format spoken expression via the microphone 13 which is then mapped onto a similar item in the list of allowed expressions, in a manner known per se.

In step 270, first the utterance is made audible on the tablet computer 2 via the loudspeakers 14 and then the image obtained by the camera is used to calculate a new image that shows an animated copy 271 of the painting 5 overlaid on the real-time video captured by the camera, which animated copy 271 is adapted to the calculated viewing angle. Next to the copy of the painting 5, the selection list of allowed second expressions is shown.

An example of the result from the steps done so far is shown in Figure 4; Figure 4a shows a user 272 holding her tablet computer 2 pointing towards the painting 5 in the museum. The painting 5 shows a self portrait of the well known Dutch post-impressionist painter Vincent van Gogh, so the tablet is directed towards an image of Vincent van Gogh. An image of this painting, obtained from the server computer 3 via the WiFi access point 6 and WiFi module 15, is also shown on the display 8, as is visible more clearly in Figure 4b.

The first expression, by the computer, is made audible via the loudspeakers of the tablet computer 2, as indicated by the balloon 271.

Then, in step 280, the user inputs a selection of a single question, or second expression, on the touch screen and, in step 290, the application on the tablet computer sends the inputted selection to the server, and returns to step 250.

The next time(s) the step 260 is done, the inputted selection causes the server computer 3 to look up a response to the input in the graph data structure, i.e. a first expression, which is then shown animated as a moving 3D object as seen with an updated calculated viewing angle a in step 270.

The viewing angle a is determined by comparing the direction of the line L shown in Figure 4a, running between the centers of the painting 5 and the tablet computer 2, and the directions of the three mutually perpendicular axes x, y, z that constitute a coordinate system. It is common practice to express the viewing angle a in terms of two angles, each one with respect to one of the three axes x, y and z.

The selection list also offers a second expression (not shown in Figure 4) that expresses the desire to end the conversation, i.e. to move on to another object or to stop. Once "end the conversation" is selected, at selection point 300, the server asks, at selection point 310, to stop or continue to another object, and if "continue" is selected, the software jumps back to step 230.

Although the steps in the shown embodiment are performed by two computers, the server computer 3 and the tablet computer 2, they may also be divided over more than these two computers and may be done on a single computer, in particular by the tablet computer.

An addition to any of the embodiments described above is the "selfie mode", which comprises switching from the rear camera 11 to the front camera 12 and directing the front camera 12 at both the user and the object. After identification of the object, the real time image captured by the front camera 12 is merged with, or overlaid on, the animation of the interaction, just like in the normal, non-selfie, mode. This is then recorded as digital photo and/or video recording, and thus is suited to be played back later, and/or to be sent to friends on social networks, for example.

For the most realistic results, the conversation, or dialogue, is not done visually, but entirely in audio, thus via loudspeakers and microphone. In this manner, no buttons or words need to be shown on the display 8.

Claims

A method for simulating a conversation between a person (272) and an object (5), which method uses a display device (2) connected for data communication to at least one computer (2; 3) which has a central processing unit (9) ) and a memory (10) and wherein the method comprises, with the display device (2), first identifying the object (5) and then performing the following steps at least once: (a) performing the sub-steps: 1 ) presenting (270), by the computer (2; 3) and on the display device (8), a first expression (261, 262), and 2) at least overlapping in time with presenting the first expression, showing an image or video (271) of at least a portion of the object on the display device, the image or video displayed on the display device comprising an animation of at least a portion of the object expressing the first expression; and (b) inputting a second expression on the display device, and which method is characterized in that the display device is a mobile device, and the image or video comprises a real time image showing part of the environment of the display device in a perspective that is substantially identical to that of human eyesight, where the real-time image is essentially contiguous with the actual field of vision that the person has on his environment, and because the animation is superimposed over the real-time image, by means of image processing techniques, such that the object holds the same position in the real time image relative to other objects in that image when the mobile device is moved.

Method according to claim 1, characterized in that the real-time image is produced by a camera fixedly attached to the mobile device in position and orientation.

Method according to claim 1, characterized in that it furthermore comprises the steps of: (3) determining at least once by at least one computer the viewing angle (ex) between the portable display device and the object as defined by the relative direction of a line (L) passing through the portable display device and the object with respect to a defined coordinate system of the object, (4) adjusting, by the at least one computer, the perspective of the animation (271) of the object object shown on the portable display device at the viewing angle (ex) determined in step (3).

Method according to one of the preceding claims, wherein the distance between the material object and the display device is determined and the size of the displayed image of the object on the display device is also adjusted to said distance.

Method according to one of the preceding claims, wherein the viewing angle is determined by means of a camera (11; 12) and image processing.

The method according to any of the preceding claims, wherein the at least one computer collects data about the user and adjusts at least one of the first expressions to said data about the user.

The method of claim 6, wherein the data about the user comprises at least one of, but is not limited to: data collected from a user's social media profile, observations made by a camera, microphone, fingerprint sensor or other sensor, data representing emotional or other mental states of the user as derived from data connected to any personal or group profile of the user by sensors connected to the computer or from data entered by a person on the computer.

A method according to any one of the preceding claims, wherein the display device comprises at least two sound reproducing units and at least one of the responses and / or at least one of the visual animations comprises a sound counterpart represented by said sound reproducing units, and wherein the sound counterpart at least once is adapted to sound to the user of the display device as if the sound came from the material object.

A method according to any one of the preceding claims, wherein the display device forms part of a mobile device which also comprises a camera (12) which is directed to the viewer side of the display, the method furthermore comprising the steps of: - an image recording by means of the camera and showing the image registered by said camera (12) and the image of the object simultaneously on the display device (2), and - recording the conversation on video, as shown on said two images on the display device.

The method of claim 9, wherein the first expressions are made audible via the display device, and the selection of second expressions is accomplished by inputting them through a microphone of the display device and speech recognition by the at least one computer.

A method according to any one of the preceding claims, wherein identifying the object comprises the use of data obtained by the portable display device that are location-dependent, such as RFID data, WiFi data, GPS data, and data on the earth's magnetic field.

A method according to any of the preceding claims, wherein identifying the object comprises the steps of: (1) pointing the camera of the portable display device to the object, (2) collecting data about the object for identification from the camera thereof, by the at least one computer using image recognition steps, and (3) identifying the object by comparing said at least one computer by said collected data about the object with data stored in the computer about the object.

A computer program comprising instructions which, when executed on a computer system comprising at least one processing unit, a memory, and a display device (8), cause the system to perform the method according to any of claims 1 to 12.

A computer system comprising at least one computer comprising a processing unit, a memory and an operating system capable of executing computer programs, a display device connected to said computer for communication, a computer program running on the at least one computer ensures that the program executes the method according to any of claims 1 to 12.

15. Memory means on which a computer program according to claim 14 is stored non-volatile.