WO2024111728A1

WO2024111728A1 - User emotion interaction method and system for extended reality based on non-verbal elements

Info

Publication number: WO2024111728A1
Application number: PCT/KR2022/019237
Authority: WO
Inventors: 송광헌; 이금탁; 양승남; 이은희; 김창모; 신명지
Original assignee: 주식회사 피씨엔
Priority date: 2022-11-24
Filing date: 2022-11-30
Publication date: 2024-05-30
Also published as: KR20240077627A

Abstract

Disclosed are a user emotion interaction method and system for extended reality based on non-verbal elements. A user emotion interaction method performed on a computing device according to one aspect of the present invention comprises the steps of: registering user emotion information for a user-customized service; acquiring captured images of the user's facial expressions and gestures; analyzing the captured images to determine the user's emotional state on the basis of the user emotion information and trained emotion recognition technology; and applying the emotional state to a provided service.

Description

User emotional interaction method and system for extended reality based on non-verbal elements

The present invention relates to a user emotional interaction method and system for extended reality based on non-verbal elements.

Recently, services for extended reality are expanding beyond virtual reality such as Metaverse.

Conventional user interfaces for extended reality used only linguistic input devices such as keyboards and mice, which was extremely limited and limited in reflecting user movements and user expressions (emotions).

In addition, professional personnel such as game producers and broadcasters use a lot of expensive equipment to perform gesture recognition, facial expression recognition, etc., but this has limitations in applying it to metaverse services for various users.

Therefore, the present invention was developed to solve the above-mentioned problems, and is intended to provide a user emotion interaction method and system for extended reality based on non-verbal elements that can express user emotions using user images.

In addition, the present invention provides user emotional interaction for extended reality based on non-verbal elements that can recognize and utilize more accurate user emotional states using artificial intelligence even when using light interfaces such as webcams and minimal wearable devices. It is intended to provide a method and system.

Other objects of the present invention will become clearer through the preferred embodiments described below.

According to one aspect of the present invention, a user emotion interaction method performed on a computing device includes: registering user emotion information for a customized service; Acquiring a captured image of the user's facial expressions and gestures; Based on the user emotion information and learned emotion recognition technology, analyzing the captured video to determine the user's emotional state; A user emotional interaction method for extended reality based on non-verbal elements, including the step of reflecting the emotional state in the provided service, and a computer program executing the method are provided.

Here, the step of registering the user emotion information includes providing the user with emotion-inducing content including videos or photos that induce a plurality of emotions; And it may include generating and storing user emotion information using the user's facial expression or gesture that changes while the emotion-inducing content is played and the contents of the emotion-inducing content at that point in time.

Additionally, the user emotion information can be generated based on learning data corresponding to the user's age, gender, and face shape.

In addition, the emotional state may be determined using sensing information from one or more wearable devices worn by the user, and the determination reflection rate of the emotional state may be applied differently depending on the type and model of the wearable device.

In addition, the current situation, including at least one of the location where the user is located, content currently playing around, and people nearby, can be recognized and used to determine the emotional state.

In addition, it further includes calculating an accuracy value of the emotional state using the quality of the captured image, recognition rate of facial expression, etc., and the reflection rate of the emotional state into the provided service can be applied differently depending on the accuracy value. .

In addition, a step of calculating an intensity value of the emotional state according to the size of the change in the user's facial expression is further included, and a different reflection method for the emotional state in the provided service can be applied depending on the intensity value.

According to another aspect of the present invention, a storage unit for registering user emotion information for customized services; a communication unit for acquiring captured images of the user's facial expressions and gestures from the user terminal; An emotion recognition unit that determines the user's emotional state by analyzing the captured video based on the user emotion information and learned emotion recognition technology; A user emotional interaction system for extended reality based on non-verbal elements is provided, including an interaction unit that reflects the emotional state in the provided service.

Other aspects, features and advantages in addition to those described above will become apparent from the following drawings, claims and detailed description of the invention.

According to the present invention, the user's emotions can be expressed using images captured of the user and applied to extended reality.

In addition, according to the present invention, even when using a lightweight interface such as a webcam and a minimal wearable device, it is possible to use artificial intelligence to more accurately recognize the user's emotional state and use it in extended reality.

Figure 1 is an example diagram schematically showing a user emotion recognition method for extended reality based on non-verbal elements using a simple interface according to an embodiment of the present invention.

Figure 2 is a functional block diagram showing the configuration of a system for user emotional interaction for extended reality based on non-verbal elements according to an embodiment of the present invention.

Figure 3 is a flowchart showing a user emotion interaction process according to an embodiment of the present invention.

Figure 4 is a flowchart showing a process of registering user emotion information for customized user recognition according to an embodiment of the present invention.

Figure 5 is a flowchart showing an emotional state recognition process using a wearable device in addition to an image according to an embodiment of the present invention.

Figure 6 is a flowchart showing an interaction process using the accuracy and intensity of a recognized emotional state according to an embodiment of the present invention.

Figure 7 is an example diagram showing an example of service reflection by applying the user's emotional state to an avatar according to an embodiment of the present invention.

Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, terms such as first threshold value and second threshold value, which will be described later, may be pre-designated as threshold values that are substantially different or partially the same, but may cause confusion when expressed with the same word threshold. Since there is room, for convenience of classification, terms such as first and second will be used together.

The terms used in this specification are merely used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that it does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

In addition, the components of the embodiments described with reference to each drawing are not limited to the corresponding embodiments, and may be implemented to be included in other embodiments within the scope of maintaining the technical spirit of the present invention, and may also be included in separate embodiments. Even if the description is omitted, it is natural that a plurality of embodiments may be re-implemented as a single integrated embodiment.

In addition, when describing with reference to the accompanying drawings, identical or related reference numbers will be assigned to identical or related elements regardless of the drawing symbols, and overlapping descriptions thereof will be omitted. In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

Figure 1 is an exemplary diagram schematically showing a user emotion recognition method for extended reality based on non-verbal elements using a simple interface according to an embodiment of the present invention, and Figure 2 is an illustration of an extended reality based on non-verbal elements according to an embodiment of the present invention. This is a functional block diagram showing the configuration of a system for user emotional interaction for reality.

First, referring to FIG. 1, according to this embodiment, the user's emotional state and gestures can be recognized using a photographing means such as a webcam. Additionally, by further utilizing minimized wearable devices such as wristwatch-type devices, user emotional interaction services for extended reality based on non-verbal elements can be provided.

Extended Reality (eXtended Reality) is a term that encompasses mixed reality (MR) technology that encompasses virtual reality (VR) and augmented reality (AR). While virtual reality (VR) is a technology that allows users to experience a new reality based on 360-degree images, augmented reality (AR) displays information and content through computer graphics (CG) on real objects. Augmented reality (AR) and virtual reality (VR) are separate, but these two technologies are evolving together while complementing each other's shortcomings. However, at this stage, the differences are clearly visible. Virtual reality (VR) requires a headset-type (HMD) terminal that covers the entire eye, and augmented reality (AR) can be expressed with glasses such as Google Glass.

Extended reality (XR) creates expanded reality by freely selecting individual or mixed use of virtual and augmented reality (VR and AR) technologies. HoloLens developed by Microsoft (MS) is a glasses-shaped device, but it can be seen as a form of extended reality (XR) in that it displays an optimized 3D hologram by understanding real space and object information. Extended reality (XR) is expected to be applied to various fields, including education, healthcare, and manufacturing.

In order to interact with users in this extended reality, technology that recognizes the user's real-time emotional state and reflects it in the service is important.

Referring to FIG. 2 showing the configuration of a system for providing a user emotional interaction service according to an embodiment of the present invention, the system according to this embodiment includes a storage unit 10, a communication unit 20, and a control unit 30. ), but the control unit 30 may include a user management unit 31, an emotion recognition unit 32, and an interaction unit 32.

The storage unit 10 stores data necessary for the control unit 30 to function, and also stores user emotion information for customized services. User emotion information will be explained in detail later.

The communication unit 20 is a communication means for providing services utilizing user emotional information to user terminals connected through a communication network, etc. For example, a captured image of the user's facial expressions and gestures is acquired from the user terminal through the communication unit 20, and service data reflecting the recognized user emotional state is transmitted to the user terminal. Since these communication means will be obvious to those skilled in the art, further detailed description will be omitted.

When a captured image of the user's facial expressions and gestures is acquired from the user terminal, the control unit 30 analyzes the captured image based on learned emotional recognition technology (using artificial intelligence) to recognize the user's emotional state. The emotional state of the user is reflected in the provided service.

The emotion recognition unit 32 of the control unit 30 determines the user's emotional state through analysis of captured images by learning learning data about the facial expressions of various people according to various emotional states. In addition, the emotion recognition unit 32 can improve the recognition accuracy of emotional states through continuous learning while providing services using artificial intelligence. In particular, the recognition accuracy of emotional states can be increased by further using user emotion information registered in advance to correspond to the user as described above. User emotion information is intended to provide customized services to users, and information on the user's facial expressions according to each emotional state is stored and utilized in advance as user emotion information. The user management unit 31 performs management functions of storing, deleting, and updating such user emotion information. A detailed description of this will be provided later with reference to FIG. 4 .

The interaction unit 33 of the control unit 30 reflects the determined emotional state of the user in the provided service. For example, if an avatar service is being provided, gestures and facial expressions corresponding to the user's emotional state are applied to the user's avatar (see Figure 7). Of course, this is just one example, and the user's emotional state can be applied to all services provided as extended reality in addition to avatar services.

Referring to Figure 3, for example, the user emotional interaction method performed on a computing device implemented in the form of a server acquires real-time captured images of the user (S20), analyzes the captured images, and determines the user's emotional state ( S30), includes the process of reflecting the determined emotional state in the provided service (S40).

In other words, the user's emotional state and gestures are analyzed by analyzing the user's face and gestures captured by a recording device such as a webcam. In other words, rather than using sensing values from an expensive sensor, the user's emotional state is recognized by analyzing images captured of the user using analysis technology based on various learning data.

In order to increase recognition accuracy, a step (S10) of registering and managing user emotion information may be preceded, and the recognition accuracy of emotional states can be increased by further utilizing such user emotion information.

Referring to FIG. 4, emotion-inducing content including videos or photos that induce a plurality of emotions is provided to the user for playback (S410).

While emotion-inducing content is being played, the captured video of the user is analyzed (S420).

It is determined whether there has been a change in the user's facial expression and/or gesture (S430), and if there has been a change, the contents of the emotion-inducing content at the current time and the user's facial expression and/or gesture are generated and stored as user emotion information (S440).

For example, when humorous content appears as emotion-inducing content and a change occurs in the user's facial expression, image feature information according to the user's facial expression at that time and information about the emotional state such as laughter are stored as user emotion information. You can. Of course, it is natural that a recognition process using learning data can be added to determine the emotional state of the user's facial expression.

Also, at this time, user emotion information may be generated based on learning data corresponding to the user's age, gender, and face shape. For example, there may be differences between the smiling expressions of teenagers and the smiling expressions of people in their 40s, so learning data corresponding to the user's age, gender, and face shape is first used to analyze the user's facial expressions and gestures to improve usability. This high level of user emotional information can be generated.

According to this embodiment, content that induces various emotional states is provided to the user in advance to watch, the changes in the face or gesture at the time are observed to specify the characteristics of the user's facial expression in each emotional state, and later emotion recognition is performed. By using it in poetry, the accuracy of emotion recognition can be improved.

As described above, more accurate recognition of the user's emotional state can be achieved by utilizing not only the image captured of the user but also the wearable device worn by the user.

Referring to FIG. 5, when sensing information from a wearable device is acquired (S510), the reflection rate according to the type and model of the wearable device is determined (S520), and the sensing information is converted into information according to the analysis result of the captured image according to the determined reflection rate. It is used together with to determine the emotional state (S530).

For example, compared to a wearable device that measures only heart rate, a device that can measure not only heart rate but also body temperature and blood pressure will reflect the sensed values at a higher rate to determine emotional state.

In addition, if information is additionally acquired from the user terminal (e.g., location information where the user is located, information about content currently playing nearby, information about people existing nearby, etc.), the current situation is recognized and recognized. The user's emotional state can be determined by further utilizing the current situation. For ease of understanding, for example, if the current location is indoors, music with a cheerful rhythm is playing nearby, and the situation is perceived as being with friends, the user's current emotional state is likely to be [excited] or [happy]. Therefore, based on this information, the user's emotional state is recognized by analyzing the captured video and the sensing information of the wearable device.

Referring to FIG. 6, the accuracy value of the recognized user's emotional state is calculated using the image quality of the captured image, the facial expression recognition rate, etc., and the intensity value of the emotional state according to the size of the change in the user's facial expression is calculated (S610 ).

For example, if the image quality of the user's facial expression in the captured video is low and there is no emotional state that clearly corresponds to the analyzed user's facial expression, the accuracy value will be calculated low. For example, if the user smiles loudly compared to smiling softly, the change in facial expression will be higher, so the intensity value at this time may be calculated higher.

Depending on the accuracy value, the reflection rate of the emotional state into the provided service is determined (S620). For example, if the recognized emotional state is [Laughing] and the accuracy value is high, a smiling expression is applied to the avatar as is. If the accuracy value is low, only a slightly smiling expression is applied briefly to the avatar, and the reflection ratio is applied differently.

Then, the method of reflecting the emotional state is determined according to the intensity value (S630). For example, if the intensity value is high, when expressing the emotional state [laughter] on the avatar's face, a laughter special effect such as greatly enlarging the face size is applied. In contrast, if the intensity value is low, the avatar only has a smiling expression without any special effects. Apply to the face.

A computer program stored in a computer-readable medium may be provided to perform the user emotional interaction method for extended reality based on non-verbal elements according to the present invention described above.

Additionally, the user emotional interaction method for extended reality based on non-verbal elements described above can be implemented as computer-readable code on a computer-readable recording medium. Computer-readable recording media include all types of recording media storing data that can be deciphered by a computer system. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc. Additionally, the computer-readable recording medium can be distributed to computer systems connected through a computer communication network, and stored and executed as code that can be read in a distributed manner.

In addition, although the present invention has been described above with reference to preferred embodiments, those skilled in the art will understand the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that it can be modified and changed in various ways.

Claims

In a user emotional interaction method performed on a computing device,

Registering user emotion information for customized services;

Acquiring a captured image of the user's facial expressions and gestures;

Based on the user emotion information and learned emotion recognition technology, analyzing the captured video to determine the user's emotional state; and

A user emotional interaction method for extended reality based on non-verbal elements, including the step of reflecting the emotional state in the provided service.
According to paragraph 1,

The step of registering the user emotion information is,

Providing emotion-inducing content including videos or photos that induce a plurality of emotions to the user; and

User emotion interaction for extended reality based on non-verbal elements, including the step of generating and storing user emotion information using the user's facial expression or gesture that changes while the emotion-evoking content is played and the contents of the emotion-evoking content at that point in time. method.
According to paragraph 2,

A user emotion interaction method for extended reality based on non-verbal elements that generates the user emotion information based on learning data corresponding to the user's age, gender, and face shape.
According to paragraph 1,

The emotional state is determined by further using sensing information from one or more wearable devices worn by the user,

A user emotional interaction method for extended reality based on non-verbal elements, in which the decision reflection rate of the emotional state is applied differently depending on the type and model of the wearable device.
According to paragraph 1,

A user emotional interaction method for extended reality based on non-verbal elements that recognizes the current situation including at least one of the location where the user is located, content currently playing around, and people around him and uses it to determine the emotional state.
According to paragraph 1,

It further includes calculating an accuracy value of the emotional state using the quality of the captured image, the recognition rate of facial expressions, etc.,

A user emotional interaction method for extended reality based on non-verbal elements, in which the reflection rate of the emotional state into the provided service is applied differently depending on the accuracy value.
According to paragraph 1,

It further includes calculating an intensity value of the emotional state according to the size of the change in the user's facial expression,

A user emotional interaction method for extended reality based on non-verbal elements that applies different reflection methods to the provided service for the emotional state depending on the intensity value.
A computer program stored on a computer-readable medium for performing a user emotional interaction method, the computer program causing a computer to perform the following steps, the steps comprising:

Registering user emotion information for customized services;

Acquiring a captured image of the user's facial expressions and gestures;

Based on the user emotion information and learned emotion recognition technology, analyzing the captured video to determine the user's emotional state; and

A computer program stored in a computer-readable medium, comprising the step of reflecting the emotional state to the provided service.
A storage unit for registering user emotional information for customized services;

a communication unit for acquiring captured images of the user's facial expressions and gestures from the user terminal;

An emotion recognition unit that determines the user's emotional state by analyzing the captured video based on the user emotion information and learned emotion recognition technology; and

A user emotional interaction system for extended reality based on non-verbal elements, including an interaction unit that reflects the emotional state in the provided service.