CN114968054A

CN114968054A - Cognitive training interaction system and method based on mixed reality

Info

Publication number: CN114968054A
Application number: CN202210509406.7A
Authority: CN
Inventors: 刘娟; 郭燕; 陈佳宁; 卞玉龙; 刘维莹; 董晓舟
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-05-11
Filing date: 2022-05-11
Publication date: 2022-08-30
Anticipated expiration: 2042-05-11
Also published as: CN114968054B

Abstract

The invention belongs to the technical field of mixed reality, and provides an interactive system and method for cognitive training based on mixed reality, which comprises the following steps: predicting the obtained color image of the target top view by adopting a trained target detection model to obtain the category of the target and the coordinate information of the target top view, and obtaining the depth coordinate information of the target by utilizing the depth image; and responding to the touch information of the target area, comparing and judging the acquired touch coordinate information with the depth coordinate information of the target, and if the touch coordinate information is matched with the position of the target area, displaying the basic information of the target on a screen, so that the category and the coordinate information of the target can be acquired in real time, and the identification effect of the virtual-real fused target is better realized.

Description

Cognitive training interaction system and method based on mixed reality

Technical Field

The invention belongs to the technical field of mixed reality, and particularly relates to an interactive system and method for cognitive training based on mixed reality.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Traditional cognitive training methods typically involve the use of visual support materials by a therapist, either through oral or behavioral guidance. Autistic children, however, often have difficulty understanding spoken language, and thus intuitive and vivid multi-sensory interactive content may be a better form. However, it is often difficult to freely modify things in a real environment to meet these conditions.

Therefore, new approaches should be explored to meet the various needs of autistic children. The virtual reality, augmented reality, mixed reality and other augmented reality technologies gradually show potential value in the rehabilitation training of the autism individuals. Numerous studies have shown that virtual reality, augmented reality, and mixed reality technologies can provide a vivid immersive training environment, which can be used to facilitate rehabilitation training for autistic patients in many ways. However, research has found that the safety and appeal of virtual reality may lead to addictive behavior by users and refusal to return to the real world. Existing virtual reality techniques and environments are primarily directed to autistic patients over 6 years of age and are not recommended for young children.

Although there is a certain amount of research on mixed reality technology at home and abroad, compared with virtual reality technology, the research on the aspect is still less, and a large amount of research is needed to explore and examine the technology since the technology becomes a mature method. In recent years, although an existing interaction system based on mixed reality can present a 3D virtual scene, has an interaction mode of virtual-real fusion and simple and realistic, the system still has some limitations, such as lack of real object interaction, failure to well balance the relationship between a real object and a virtual scene, and failure to satisfy the interest of interaction and improve the initiative of participants.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the invention provides an interactive system and method for cognitive training based on mixed reality, which can directly interact with a mixed reality scene through an infrared touch frame, can also obtain answers to problems in real time through direct voice questioning, improves game interest and children initiative, and provides immersive experience at the same time, thereby providing vivid mixed reality experience.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides an interactive system based on cognitive training of mixed reality, which comprises: the system comprises a target identification module and a touch interaction module;

the object recognition module configured to: predicting the obtained color image of the target top view by adopting a trained target detection model to obtain the category of the target and the coordinate information of the target top view, obtaining the depth coordinate information of the target by utilizing the depth image, and obtaining the three-dimensional coordinate information of the target according to the coordinate information of the target top view and the depth coordinate information of the target;

the touch interaction module is configured to: and responding to the touch information of the target area, comparing and judging the acquired touch coordinate information with the three-dimensional coordinate information of the target, and displaying the basic information of the target on the screen if the touch coordinate information is matched with the position of the target area.

The invention provides an interactive method based on mixed reality cognitive training, which comprises the following steps:

predicting the obtained color image of the target top view by adopting a trained target detection model to obtain the category of the target and the coordinate information of the target top view, obtaining the depth coordinate information of the target by utilizing the depth image, and obtaining the three-dimensional coordinate information of the target according to the coordinate information of the target top view and the depth coordinate information of the target;

and responding to the touch information of the target area, comparing and judging the acquired touch coordinate information with the three-dimensional coordinate information of the target, and displaying the basic information of the target on the screen if the touch coordinate information is matched with the position of the target area.

A third aspect of the invention provides a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method of interaction for mixed reality based cognitive training as described above.

A fourth aspect of the invention provides a computer apparatus.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a mixed reality based cognitive training interaction method as described above when executing the program.

Compared with the prior art, the invention has the beneficial effects that:

the user of the invention can observe the real target in a short distance, the relationship between the real object and the virtual scene is better balanced, besides, the related content of target detection is introduced into the research, the category and the coordinate information of the target can be obtained in real time, and the target identification effect of virtual-real fusion is better realized.

According to the method, the user can directly interact with the mixed reality scene through the infrared touch frame, answers of questions can be obtained in real time through direct voice questioning, game interest and children initiative are improved, immersive experience is provided, and vivid mixed reality experience is provided.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.

FIG. 1 is a block diagram of a system of a fruit recognition box for cognitive rehabilitation training of autism children based on mixed reality;

FIG. 2 is a flow chart of the overall implementation of a target detection model based on YOLOv 4-Tiny;

FIG. 3 is a schematic diagram of coordinate information of a mixed reality-based target top view;

FIG. 4 is a mixed reality based infrared touch frame touch interaction principle;

fig. 5 is a flow diagram of mixed reality-based touch interaction.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The Mixed Reality technology (abbreviated as MR) is a computer technology for displaying and interacting real world and virtual objects in the same visual space, and is a further development of the Mixed Reality technology, and the technology builds an interactive feedback information loop among the real world, the virtual world and a user by presenting virtual scene information in a real scene so as to enhance the Reality of user experience. MR is a rapidly developing field that is now widely used in all industries.

The invention considers that the virtual reality environment is more applicable to adult autism individuals, and the advantages of the augmented reality and mixed reality technology are suitable for the cognitive characteristics of autism children, so that the invention has greater potential in the aspects of treatment and rehabilitation promotion. The environment built using mixed reality technology not only can stimulate the active interaction of autistic children through virtual objects in the system, but also can expand their interest in training courses.

Example one

The embodiment provides an interactive system for cognitive training based on mixed reality, which comprises: the system comprises a Kinect3 camera, an infrared touch frame, a host, a projector and a projection film, wherein the Kinect3 camera, the infrared touch frame, the projector and the projection film are respectively connected with the host;

the Kinect3 camera is used for acquiring a target image and target coordinate information;

the infrared touch frame is provided with a USB interface, and is regarded as mouse equipment after being connected to a host, and driver software is not required to be installed.

The projector and the projection film are used for projecting and displaying specific learning scenes and animations.

The size of the infrared touch frame can be set according to actual requirements, for example, the size can be set as follows: 1920mm 1080mm, the ratio is 16: 9.

As shown in fig. 1, the host includes a target image obtaining module, a target recognition module, a touch interaction module, and a voice interaction module.

A target image acquisition module configured to: acquiring a color image of a top view of a target and a depth image corresponding to the color image;

an object recognition module configured to: predicting the obtained color image of the target top view by adopting a trained target detection model to obtain the category of the target and the coordinate information of the target top view, obtaining the depth coordinate information of the target by utilizing the depth image, and obtaining the three-dimensional coordinate information of the target updated in real time according to the coordinate information of the target top view and the depth coordinate information of the target;

a touch interaction module configured to: responding to touch information of the target area, comparing and judging the acquired touch coordinate information with the real-time updated three-dimensional coordinate information of the target, and if the touch coordinate information is matched with the position of the target area, displaying basic information of the target on a screen;

the touch interaction module is further configured to: and displaying a specific learning scene and animation on the projection film by using a projector, creating a main interface on the infrared touch frame, monitoring a touch click event, and popping up information of a related target if the coordinates are matched.

The theme idea of the interface design on the whole is as follows: the main elements are unified in standard, comfortable in color, reasonable in layout and friendly in control appearance, interest of children in the interface is enhanced by adding the cartoon elements, and interestingness of the whole interaction process is increased.

The touch interaction module is further configured to: and randomly generating a real object picture of different targets on the interface, wherein the target to be searched is contained, and clicking the picture to obtain answer feedback.

As shown in fig. 5, the voice interaction module includes a recording module, a voice dictation module, an answer matching module, and a voice synthesizing and playing module;

the sound recording module is configured to: calling a speed _ recording library in Python, recording by using a microphone in an experimental equipment computer, and finishing recording in real time when a recorder finishes inquiring;

the voice dictation module configured to: accurately identifying an audio file generated after the recording is finished as a character and storing the character in a text file;

it should be noted that the voice dictation module may use a voice dictation streaming interface of science, university and news, or may select other models according to actual situations, which is not described herein.

An answer matching module configured to: setting questions and answers corresponding to the questions from the aspect of different characteristics of the targets, and matching the corresponding answers according to keywords of the questions.

In order for the autistic child to correctly recognize the fruit from all aspects, the system provides the questioner with answers to five questions about the fruit, relating to color, taste, eating efficacy, shape, growth environment, respectively. And selecting some common keywords for each question, setting corresponding answers in advance, wherein the words of the answers take the understanding degree and the understanding mode of the children on the things into consideration.

The speech synthesis and play module configured to: and converting and playing the characters in the text file.

For example, after the user learns the fruit information of the interface, the user can click the "to play bar" button in the interface, start to play a fruit finding mini game, and perform positive feedback on the result. Wherein the specific design of finding a fruit mini-game is as follows: and randomly generating physical pictures of 4 fruits on the interface, wherein the physical pictures comprise the fruits to be searched, and clicking the pictures to obtain answer feedback.

In the target image obtaining module, a Kinect3 camera is used for obtaining a color image and a depth image, wherein the Kinect3 camera is located at the center of a frame right above the equipment, the color image is used for identifying fruits, and the depth image is used for obtaining fruit depth coordinate information;

as shown in fig. 2, in the identification module, the target detection model adopts a target detection model based on YOLOv 4-Tiny;

the obtained color images are from two sources, one part is a network picture, the other part is a picture collected by a camera, and a label picture is obtained by using labelimg software through manual labeling; the data is divided into a training set, a testing set and a verification set according to a certain proportion.

The main work of constructing the network part is to construct a backbone network and a characteristic pyramid and compile a decoding module. On the training parameters, label _ smoothing is set to 0.005, batch is set to 32, the learning rate is used for training 100 epochs with 0.0001, and the learning rate is updated every 5 cycles with cosine annealing. The model converged gradually from 25epoch with a training time of about 180 minutes. Regarding the category of the target, on the basis of paying attention to daily and easy use, and combining medical value, fruits can be selected, and for example, the selected types comprise apple, orange, banana, grape, mango, pineapple and Hami melon.

As shown in fig. 3 to 4, in the touch interaction module, the specific determination method is as follows:

matching the X-axis coordinate information of the touch coordinate with the X-axis coordinate information of the three-dimensional coordinate of the target, and matching the Y-axis coordinate information if the range matching is consistent;

when the Y-axis coordinate information of the touch coordinate is larger than the Y-axis coordinate information of the three-dimensional coordinate of the target, matching is successful, and the successfully matched target information is stored;

and judging the Z-axis coordinate, selecting the target closest to the touch plane, and if the Z-axis coordinate is equal to the touch plane, selecting the target with the coordinate information stored preferentially.

The purpose of this judgment process is: the acquired touch coordinate information (obtained x y) and the three-dimensional coordinate information (x y z) are judged, and the judgment of z is used to select one of the objects that meet x y coordinates.

As shown in fig. 5, the interactive interface design module uses a projector to display specific learning scenes and animations on a projection film on the back of the identification box, uses PyQt5 to design, creates a main interface on the infrared touch frame to monitor touch click events, and pops up information about related fruits if coordinates match.

For a clear understanding of the solution of the present invention, the present embodiment describes a specific interaction process with a specific target, assuming that the target is fruit.

Using an infrared touch frame with a touch area of 1920 mm-1080 mm and a ratio of 16: 9; using a Kinect3 camera; a RedmiBook14 computer with a processor model of Intel (R) core (TM) i7-8565U CPU @1.80GHz 1.99GHz, a machine-mounted RAM of 8.00GB and a hardware type of x64 is used; a projector and a projection film are used.

The specific interactive process comprises the following steps:

firstly, putting fruits into an identification box, and acquiring a color image and a depth image of a top view of the fruits in real time through a Kinect3 camera located at the center of a frame right above the equipment;

predicting a color image acquired in real time by using a trained target detection model based on YOLOv4-Tiny to obtain fruit category and fruit top view coordinate information, and acquiring a fruit depth coordinate by using a depth image corresponding to the color image;

and a user touches and clicks a screen corresponding to a fruit area at the front side of the identification box, and when the system monitors that touch information is input, the system responds and compares and judges the acquired touch coordinate information with the real-time updated X-axis coordinate, Y-axis coordinate and Z-axis coordinate information of the fruit. If the touch position of the user is matched with the position of the fruit area, displaying the basic information of the fruit on a screen;

the user can read and learn, click the fruit cartoon image area and carry out voice conversation with the system, ask questions related to the fruit and obtain voice answers;

after the fruit information in the interface is learned, a 'to play bar' button in the interface can be clicked to start to search a fruit game. The system randomly generates 4 kinds of fruit real object pictures on the interface, wherein the 4 kinds of fruit real object pictures comprise a kind of fruit to be searched, and the user clicks the picture to obtain the feedback of correct or wrong answers.

Example two

The embodiment provides an interactive method for cognitive training based on mixed reality, which comprises the following steps:

predicting the obtained color image of the target top view by adopting a trained target detection model to obtain the category of the target and the coordinate information of the target top view, and obtaining the depth coordinate information of the target by utilizing the depth image;

and responding to the touch information of the target area, comparing and judging the acquired touch coordinate information with the depth coordinate information of the target, and displaying the basic information of the target on the screen if the touch coordinate information is matched with the position of the target area.

The step of comparing and judging the acquired touch coordinate information with the depth coordinate information of the target comprises the following steps:

matching the X-axis coordinate of the touch coordinate with the target depth coordinate, and matching the Y-axis coordinate if the range matching is consistent;

when the Y coordinate of the touch coordinate is larger than the target depth coordinate, matching is successful, and target information which is matched successfully is stored;

and judging the Z-axis coordinate, selecting the target closest to the touch plane, and if the Z-axis coordinate is equal to the touch plane, selecting the target of which the coordinate information is preferentially stored.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps in an interactive method for mixed reality based cognitive training as described above.

Example four

The embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps in the interactive method based on mixed reality cognitive training as described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An interactive system for cognitive training based on mixed reality, comprising: the system comprises a target identification module and a touch interaction module;

2. The interactive system for mixed reality based cognitive training as claimed in claim 1, further comprising a target image acquisition module configured to: a color image of a top view of the target and a depth image corresponding to the color image are acquired.

3. The interactive system for cognitive training based on mixed reality as claimed in claim 1, wherein said interactive system further comprises a voice interactive module, said voice interactive module comprises a recording module, a voice dictation module, an answer matching module, and a voice synthesis and playing module.

4. The interactive system for mixed reality based cognitive training as recited in claim 3, wherein the recording module is configured to: recording by a microphone in the experimental equipment computer, and finishing recording in real time when the recorder finishes inquiring;

the voice dictation module configured to: accurately identifying the audio file generated after the recording is finished as characters and storing the characters in a text file;

the answer matching module configured to: setting questions and answers corresponding to the questions from the aspect of different characteristics of the targets, and matching the corresponding answers according to keywords of the questions;

the speech synthesis and play module configured to: and converting the characters in the text file and playing the converted characters.

5. The mixed reality-based cognitive training interaction system of claim 1, wherein the target detection model is a YOLOv 4-Tiny-based target detection model.

6. The interactive system for cognitive training based on mixed reality as claimed in claim 1, wherein the touch interaction module is configured to determine whether the obtained touch coordinate information is compared with the depth coordinate information of the target by:

and when the Y-axis coordinate information of the touch coordinate is larger than that of the three-dimensional coordinate of the target, the matching is successful, and the successfully matched target information is stored.

7. An interaction method for cognitive training based on mixed reality is characterized by comprising the following steps:

and responding to the touch information of the target area, comparing and judging the acquired touch coordinate information with the three-dimensional coordinate information of the target, and displaying the basic information of the target on a screen if the touch coordinate information is matched with the position of the target area.

8. The interactive method for cognitive training based on mixed reality as claimed in claim 7, wherein the comparing and determining the obtained touch coordinate information and the depth coordinate information of the target comprises:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method of interaction in a mixed reality based cognitive training according to any of the claims 7-8.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in an interactive method for mixed reality based cognitive training according to any of claims 7-8 when executing the program.