CN113589958A

CN113589958A - Text input method, device, equipment and storage medium

Info

Publication number: CN113589958A
Application number: CN202110897043.4A
Authority: CN
Inventors: 梁海宁; 陆学仕; 俞迪枫
Original assignee: Xian Jiaotong Liverpool University
Current assignee: Xian Jiaotong Liverpool University
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-02

Abstract

The application relates to a text input method, a text input device, text input equipment and a storage medium, which belong to the technical field of computers, and the method comprises the following steps: displaying a current scene image through a display component; responding to an instruction of acquiring text input in a current scene image, and operating a virtual keyboard; determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data acquired by the first sensor, and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different; displaying character information displayed when the eye data acquired by the second sensor indicate that the user blinks in the target area; the problem that the stay time length influences the text input efficiency in the text input process based on the stay can be solved; the efficiency and accuracy of text input can be improved.

Description

Text input method, device, equipment and storage medium

[ technical field ] A method for producing a semiconductor device

The application relates to a text input method, a text input device, text input equipment and a text input storage medium, and belongs to the technical field of computers.

[ background of the invention ]

Text input is an important and frequent task in interactive devices including Augmented Reality (AR) Head Mounted Displays (HMDs). Text input refers to inputting related text contents into interactive equipment to realize human-computer interaction functions such as reading, writing, commands and the like.

In an interactive apparatus, a text input method includes: by indicating the character position using the head/eye movement of the object, the character is determined to be input based on the stay time at the character position.

However, selecting input characters by staying for a long time has limitations in that text input efficiency may be reduced, and staying for a short time may increase the possibility of erroneous selection, thereby reducing the accuracy of text input.

[ summary of the invention ]

The application provides a text input method, a text input device, text input equipment and a storage medium, which can solve the problem that the stay time length influences the text input efficiency in the text input process based on stay. The application provides the following technical scheme:

in a first aspect, a text input method is provided for an interactive device, comprising a display component, a sensing component and a support component; the display assembly and the sensing assembly are mounted on the support assembly; the supporting assembly is used for enabling a user to wear the interactive device, the sensing assembly comprises a first sensor and a second sensor, the first sensor is used for collecting motion data of the user, and the second sensor is used for tracking eyeballs of the user to obtain eyeball data; the method comprises the following steps:

displaying, by the display component, a current scene image;

responding to an instruction of acquiring text input in the current scene image, and operating a virtual keyboard;

determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data collected by the first sensor, and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different;

and displaying character information displayed when the eye data acquired by the second sensor indicate that the user blinks in the target area.

Optionally, the running virtual keyboard comprises:

covering a layer with a preset size at the appointed position of the current scene image to obtain the virtual keyboard, wherein the layer comprises each key position required by text input, and different positions of different keys on the layer are different.

Optionally, each key comprises a letter key and a function key;

the letter key position is not displayed under the condition that the letter key position is not indicated by the action data, and is displayed on the layer in a first mode under the condition that the letter key position is indicated by the action data;

and the function key position is displayed on the map layer through a second mode.

Optionally, the method further comprises:

acquiring a recommended word corresponding to at least one piece of character information based on the at least one piece of character information in the target area;

displaying the recommended word in a recommended word display area;

and replacing the at least one character information with the recommended word in the target area if the selected operation of the recommended word is received.

Optionally, the method further comprises:

determining whether the action data collected by the first sensor indicates the position of the recommended word;

determining whether the eyeball data collected by the second sensor indicates that the user blinks or not under the condition that the action data indicates the position of the recommended word;

and under the condition that the eyeball data collected by the second sensor indicates that the user blinks, determining that the selection operation of the recommended word is received, wherein the recommended word indicated by the selection operation is the recommended word displayed at the position indicated by the action data when the user blinks.

Optionally, the obtaining of the recommended word corresponding to the at least one piece of character information based on the at least one piece of character information in the target area includes:

determining the keyboard location based on the motion data;

inputting the keyboard position into a preset space model to obtain character information corresponding to the keyboard position, wherein the space model is obtained by estimating model parameters in a binary Gaussian distribution model based on sample data, and the sample data comprises sample input characters and input position information corresponding to the sample input characters;

and inputting the at least one character information into a preset language model to obtain the recommended word, wherein the language model is used for determining the prior probability of each word in a preset corpus by using the at least one character information and determining the recommended word based on the prior probability.

Optionally, the method further comprises:

under the condition that the position indicated by the action data does not belong to the area where the virtual keyboard is located and does not belong to the target area, controlling the second sensor to suspend collecting the eyeball data; and when the position indicated by the action data belongs to the area of the virtual keyboard, acquiring the eyeball data again through the second sensor.

In a second aspect, a text input device is provided for an interactive apparatus, comprising a display component, a sensing component and a support component; the display assembly and the sensing assembly are mounted on the support assembly; the supporting assembly is used for enabling a user to wear the interactive device, the sensing assembly comprises a first sensor and a second sensor, the first sensor is used for collecting motion data of the user, and the second sensor is used for tracking eyeballs of the user to obtain eyeball data; the device comprises:

the image display module is used for displaying the current scene image through the display component;

the keyboard operation module is used for responding to the instruction of acquiring the text input in the current scene image and operating a virtual keyboard;

the character display module is used for determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data collected by the first sensor and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different;

and the character input module is used for displaying the character information displayed during blinking in the target area under the condition that the eyeball data acquired by the second sensor indicates that the user blinks.

In a third aspect, an interactive apparatus is provided, the apparatus comprising a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the text input method provided by the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, in which a program is stored, which, when executed by a processor, is configured to implement the text input method provided in the first aspect.

The beneficial effects of this application include at least:

displaying a current scene image through a display component; responding to an instruction of acquiring text input in a current scene image, and operating a virtual keyboard; determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data acquired by the first sensor, and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different; displaying character information displayed when the eye data acquired by the second sensor indicate that the user blinks in the target area; the problem that the stay time length influences the text input efficiency in the text input process based on the stay can be solved; since text input can be performed immediately without performing a stay operation using the subject in the case where the eyeball data indicates blinking of the subject, and the length of time of blinking is shorter than the length of time of performing the stay operation, text input efficiency can be improved. In addition, the problem of low accuracy of text input caused by short stay time does not exist because the stay time does not need to be set, and the blinking action is different from other eyeball actions, so that the blinking action can be accurately identified, and the accuracy of text input is improved.

In addition, the letter position of the virtual keyboard is set to be not displayed under the condition that the letter position is not indicated by the action data, and the character information corresponding to the position is displayed under the condition that the letter position is indicated by the action data, so that the problem of image shielding of the virtual keyboard can be solved, and the character information indicated by the action data can be displayed.

In addition, the function key positions are displayed on the virtual keyboard in a second mode, and because the quantity of the function key positions is less and the function key positions are positioned in the edge area when the text is input, the transparency of the function key positions is low, the position of the function key positions can be found by a user in time, and the text input efficiency is improved; and does not affect the display of the current scene image.

In addition, under the condition that the position indicated by the action data does not belong to the virtual keyboard and the target area, the second sensor is controlled to pause the collection of eyeball data, so that the eye rest can be realized, and the problem of text input errors caused by mistaken blinking is prevented.

In addition, acquiring a recommended word corresponding to at least one piece of character information based on the at least one piece of character information in the target area; displaying a recommended word in a recommended word display area; under the condition that the selection operation of the recommended word is received, replacing at least one character information by the recommended word in the target area; the problem of low efficiency of inputting texts only through characters can be solved; by displaying the recommended word in real time, when the recommended word is a word which needs to be input by a user, the time for inputting subsequent characters can be saved, and the efficiency of text input is improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

[ description of the drawings ]

FIG. 1 is a schematic diagram of an interactive apparatus according to an embodiment of the present application;

FIG. 2 is a flow chart of a text entry method provided by one embodiment of the present application;

FIG. 3 is a schematic diagram of a text entry process provided by one embodiment of the present application;

FIG. 4 is a schematic diagram of a text entry process provided by another embodiment of the present application;

FIG. 5 is a diagram illustrating a recommended word displayed in accordance with one embodiment of the present application;

FIG. 6 is a flow chart of a text entry method provided by another embodiment of the present application;

FIG. 7 is a graphical illustration of text entry speed data for different text entry methods provided by one embodiment of the present application;

FIG. 8 is a graphical illustration of text entry error rate data for different text entry methods provided by one embodiment of the present application;

FIG. 9 is a block diagram of a text input device provided in one embodiment of the present application;

FIG. 10 is a block diagram of an interactive apparatus provided in one embodiment of the present application.

[ detailed description ] embodiments

The following detailed description of embodiments of the present application will be made with reference to the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

The traditional text input method based on interactive equipment comprises the following steps:

the first method adopts an air text input technology, indicates a character position corresponding to a gesture through an air gesture, or selects one key based on the gesture, so as to realize text input in a Virtual Reality (VR) screen. However, in a scenario where the hands are busy with other tasks, the air text input technology cannot achieve text input.

The second is to implement text input by employing speech recognition technology. That is, the interactive apparatus collects the user's language and converts the language into text input. However, since the speech recognition technology has high requirements for the environment, text input cannot be realized in a noisy environment.

Further, there are also techniques for determining an input character based on a stay time period by indicating a character position using head/eye movement of a subject. However, a long dwell time may reduce the efficiency of text entry, while a short dwell time may increase the likelihood of a mis-selection, reducing the accuracy of text entry.

Fig. 1 is a schematic structural diagram of an interactive apparatus according to an embodiment of the present application. In this embodiment, the interactive device refers to a device capable of implementing human-computer interaction. Alternatively, the interactive device may be Augmented Reality (AR), VR, Mixed Reality (MR), and the like, and the present embodiment does not limit the device type of the interactive device. As shown in fig. 1, the interactive apparatus includes at least: support assembly 110, sensor assembly 120, controller 130, and display assembly 140.

Support assembly 110 is used for wearing interactive devices by a subject. The implementation of the support assembly 110 is determined based on the wearing pattern of the interactive apparatus. Such as: the interactive apparatus is worn on the head through the support component 110, the support component 110 may be a glasses frame, or may be a helmet support, and the embodiment does not limit the implementation manner of the support component 110.

The sensor assembly 120 is mounted on the support assembly 110 such that the sensor assembly 120 moves when the object is used to move the support assembly 110.

The sensor assembly 120 is configured to collect sensing data and send the sensing data to the controller 130, so that the controller 130 controls the interactive device to implement a corresponding function according to the sensing data.

In this embodiment, the sensor assembly 120 includes a first sensor 121 and a second sensor 122.

The first sensor 121 is configured to collect motion data of the user object and transmit the collected motion data to the controller 130.

Illustratively, the motion data may be head motion data of the subject of use. At this time, the first sensor 121 moves as the head of the object moves, and the motion data corresponding to different head poses are different.

Optionally, the first sensor 121 may be an acceleration sensor, or may also be a geomagnetic sensor, and the embodiment does not limit the type of the first sensor.

The second sensor 122 is used for tracking the eyeball of the subject to obtain eyeball data and transmitting the collected eyeball data to the controller 130.

Alternatively, the second sensor 122 may be a camera or an infrared camera, and the embodiment does not limit the type of the second sensor.

Alternatively, the number of the first sensors may be one or at least two, the number of the second sensors may be one or at least two, and the number of the first sensors and the number of the second sensors are not limited in this embodiment.

The display component 140 is used to display image content that the interactive apparatus currently needs to display. Optionally, display assembly 140 is mounted on support assembly 110. In other embodiments, the display module 140 may also be implemented by a separate display screen, and the implementation manner of the display module 140 is not limited in this embodiment.

The controller 130 is used to control the interactive apparatus. Such as: the control functions implemented by the controller 130 are not limited in this embodiment, and may be used to control the on/off, screen display, text input, and the like of the interactive device.

The controller 130 is communicatively coupled to the sensor assembly 120 and the display assembly 140, respectively. Alternatively, the controller 130 may be mounted on the support assembly 110; alternatively, a separate computer device is implemented in communication with the sensor assembly 120. The computer device may be a desktop computer, a notebook computer, a mobile phone, a television, and the like, and the implementation manner of the controller 130 is not limited in this embodiment.

In this embodiment, the controller 130 is configured to: displaying a current scene image through a display component; responding to an instruction of acquiring text input in a current scene image, and operating a virtual keyboard; determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data collected by the first sensor, and displaying character information corresponding to the keyboard position; and in the case that the eyeball data collected by the second sensor indicates that the user blinks, displaying character information displayed when the user blinks in the target area.

Wherein, the keyboard positions corresponding to different action data are different.

The controller 130 displays the image content in real time through the display component 140. The current scene image refers to image content to be subjected to text input. The current scene image may be an image of a real scene and/or an image of a virtual scene generated by the controller 130. Specifically, under the condition that the interactive device is an AR, the current scene image is an image of a real scene; in the case where the interactive apparatus is a VR, the current scene image is an image of a virtual scene generated by the controller 130.

In this embodiment, the controller 130 can determine the keyboard position corresponding to the action data by acquiring the action data acquired by the first sensor 121, and display character information corresponding to the keyboard position; by acquiring eyeball data acquired by the second sensor 122, when the eyeball data indicates that the user blinks, character information corresponding to the keyboard position at the moment is displayed in the target area; the problem of low text input efficiency under the condition that text input is realized by using an object to execute a stay operation can be solved; since text input can be performed immediately without performing an operation of staying with the subject in the case where the eyeball data indicates blinking of the subject, and the duration of blinking is shorter than the duration of performing the staying operation, the text input effect can be improved. In addition, since the stay time period does not need to be set, there is no problem that the accuracy of text input is low when the stay time period is set to be short, and since the blinking motion is distinguished from other eye motions, the controller 130 can accurately recognize the blinking motion, thereby improving the accuracy of text input.

The text input method provided by the present application is described below.

Fig. 2 is a flowchart of a text input method according to an embodiment of the present application, which is described as an example of the method used in the controller 130 of the interactive apparatus of fig. 1, and the method at least includes the following steps:

step 201, displaying the current scene image through the display component.

The current scene image refers to image content to be subjected to text input. The current scene image may be an image of a real scene and/or an image of a virtual scene generated by the controller.

Specifically, in the case where the interactive apparatus is an AR, the current scene image includes an image of a real scene. At the moment, the image of the real scene is acquired by carrying out image acquisition on the real scene through the image acquisition component on the AR. Optionally, the current scene image may further include a virtual image generated by the controller, that is, the current scene image is a combination of an image of a real scene and the virtual image.

In the case where the interactive apparatus is a VR, the current scene image is an image of a virtual scene generated by the controller.

Step 202, in response to acquiring an instruction for text input in the current scene image, operating a virtual keyboard.

The instruction of text input is used for indicating that the object is used for performing text input on the current scene image. Optionally, the instruction for text input comprises a location of the text input.

In this embodiment, the triggering manner of the instruction of text input includes, but is not limited to, at least one of the following:

the first method comprises the following steps: the interactive device is provided with a text input control, and when the text input control is triggered, a text input instruction is generated. The text input control can be an object key arranged on the interactive equipment; or may be a virtual control displayed by a display component. When the text input control is a virtual control displayed through the display assembly, the controller acquires action data acquired by the first sensor and detects whether the action data indicates the position of the virtual control; when the action data indicates a location of the virtual control and a confirmation operation performed using the object is received at the location, it is determined that the virtual control is triggered. The confirmation operation may be a blinking operation or a stopping operation, and the implementation manner of the confirmation operation is not limited in this embodiment.

And the second method comprises the following steps: the controller carries out image recognition on the current scene image; and when the image recognition result indicates that the current scene image comprises the target image content, generating an instruction of text input. The target image content refers to image content which needs text input. Such as: the target image content is: books, nameplates, paper, etc., and the present embodiment does not limit the type of the target image content.

Wherein, the controller can perform image recognition on the current scene image by using an image recognition algorithm, which includes but is not limited to: target detection and identification are carried out through a Convolutional Neural Network (CNN); or, target detection and identification are performed through a Single Shot multi box Detector (SSD), a You Only Look at Once (yoly Look one, YOLO) series algorithm, and the like, and the implementation manner of the image identification algorithm is not limited in this embodiment.

The virtual keyboard includes a variety of virtual key locations for use in text entry, such as: the virtual keyboard comprises letter keys and function keys. Wherein, the letter keys include but not limited to 26 English letters, and the function keys include but not limited to: digit key position, delete key position, space key position, confirmation key position, and/or symbol key position, etc., and the present embodiment does not limit the kinds of the letter key position and the function key position.

In this embodiment, the virtual keyboard is implemented by a layer, where the layer includes various keys required for text input, and different keys are corresponding to different positions on the layer. Accordingly, running a virtual keyboard comprises: and covering a layer with a preset size at the specified position of the current scene image to obtain the virtual keyboard.

Optionally, the positions of the different key positions on the layer may be set by using an object in a self-defined manner, or may be set by default in the interactive device, and the setting manner of the positions of the key positions on the layer is not limited in this embodiment.

It should be added that the positions of the keys on the layer are pre-stored in the interactive device.

Alternatively, the size of each key may be set using the object customization, and accordingly, the preset size is adaptively determined based on the size of each key. Alternatively, the size of each key is set by default in the interactive device, such as: the default setting of the key width of the interactive device is 3.4cm (in practical implementation, other values can be used, the embodiment does not limit the size value of the key), and correspondingly, the preset size is also the default setting of the interactive device, for example: the default value of the preset size is 40cm × 20cm (in actual implementation, other values may be used, and the value of the preset size is not limited in this embodiment), and the setting mode of the size of each key position is not limited in this embodiment.

In addition, the interactive apparatus also supports the use object to individually adjust the value of the preset size, that is, the size of each key position is set by default, and the preset size is set by the use object. At this time, the interval between two adjacent key positions on the layer can be adaptively adjusted along with the change of the preset size.

The designated position is used to indicate a display position of the virtual keyboard. Optionally, the designated position is a center position of the virtual keyboard, at this time, since the preset size of the virtual keyboard is known, the controller takes the designated position as the center position of the virtual keyboard, and generates a map layer with the preset size at the center position, so as to obtain the virtual keyboard.

Optionally, the specified position is a position one meter away from the center of the field of view of the subject. In actual implementation, the designated position may also be a middle position of the current scene image, or a position designated by using the object, and the implementation manner of the designated position is not limited in this embodiment.

Such as: referring to fig. 3, a virtual keyboard 31 is displayed on the lower half of the current scene image 30. It should be noted that the frame at the virtual keyboard 31 is only used to indicate the position of the virtual keyboard, and in actual implementation, the frame of the virtual keyboard 31 may not be displayed in the current scene image 30.

Optionally, the transparency of the letter key positions in the layer can be adjusted; and/or the transparency of the functional key position can be adjusted.

In one example, the letter key is located in the middle region of the layer and the function key is located in the edge region of the layer, and accordingly, the transparency of the letter key is greater than that of the function key. Because the quantity of the function key positions is less and the function key positions are positioned in the edge area when the text is input, the transparency of the function key positions is lower, the position of the function key positions can be found in time by using the object, and the text input efficiency is improved; and does not affect the display of the current scene image. The number of the letter key positions is usually large, the display position is centered, and the transparency of the letter key positions is high, so that the virtual keyboard can be ensured not to influence the display of the current scene image.

The transparency of the letter key position is taken as 100% for illustration. At the moment, the letter key position is not displayed under the condition of not being indicated by action data, and is displayed on the layer in a first mode under the condition of being indicated by the action data; and the function key position is displayed on the picture layer through a second mode.

The first method includes displaying a virtual cursor at a position indicated by the motion data, and displaying character information corresponding to the position based on the position of the virtual cursor. Wherein displaying character information based on the position of the virtual cursor includes: character information is displayed at a position spaced apart from a center position of the virtual cursor by a preset distance. The preset distance may be 0.5cm or the width of one character, and the value of the preset distance is not limited in this embodiment.

Such as: referring to fig. 3, a virtual cursor (indicated by a dot) is displayed at a position 32 indicated by the motion data, and character information g corresponding to the position 32 is displayed in the vicinity of the virtual cursor.

In actual implementation, the first mode may be to display character information corresponding to the position indicated by the motion data at the position indicated by the motion data, and the embodiment does not limit the implementation of the first mode.

The second mode includes that the function key is displayed on the layer with a preset transparency, the transparency of the function key may be 80%, or may also be 60%, and the transparency of the function key is not limited in this embodiment.

Such as: referring to fig. 3, the function key buttons include three kinds, which are a skip key button 33, a space key button 34 and a delete key button 35, respectively, and each function key button is displayed in the form of a rectangular frame having a transparency of 80% (the transparency is 80% in fig. 3 as represented by oblique lines).

Step 203, determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data collected by the first sensor, and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different.

Optionally, the motion data corresponds to pixel positions of the current scene image in a one-to-one manner. At this time, determining a keyboard position corresponding to the motion data on the virtual keyboard based on the motion data collected by the first sensor includes: and acquiring the pixel position indicated by the action data to obtain the keyboard position corresponding to the action data.

Alternatively, the action data is generated when a certain action is performed using an object. In the present embodiment, the motion data is head motion data of the use object, that is, data generated when the head is moved using the object. In actual implementation, the motion data may be hand motion data of the object to be used, that is, data generated when the object is used to move the hand, and the embodiment is not limited to the implementation of the motion data.

Optionally, displaying character information corresponding to the keyboard position, including: inputting the keyboard position into a preset space model to obtain character information corresponding to the keyboard position; and displaying character information corresponding to the keyboard position in a first mode.

The spatial model is obtained by estimating model parameters in the binary Gaussian distribution model based on sample data. The sample data comprises sample input characters and input position information corresponding to the sample input characters.

Such as: the sample input characters may be 26 english letters, and each english letter has one-to-one corresponding position information. When the input processes of the 26 English letters are independent, the parameter values of the binary Gaussian distribution can be calculated according to the input English letters and the position information by following the binary Gaussian distribution. The space model can be adjusted according to the obtained parameter values, so that the character information corresponding to the current position can be obtained when the space model is triggered. For the related description of the specific spatial model, reference is made to the following, and the description of this embodiment is not repeated herein.

It should be noted that, when the motion data indicates a keyboard position, character information corresponding to the keyboard position is displayed in real time.

Such as: referring to fig. 4, taking text input through the head-mounted AR device using an object as an example, the virtual keyboard 31 is displayed at a position one meter away from the center of the field of view of the using object. And driving the AR equipment to move by using the object, and correspondingly, obtaining the motion data of the head by the AR equipment. At the same time, the AR device displays a virtual cursor (indicated by a dot) at the position 32 indicated by the motion data in real time, and displays character information g corresponding to the position 32 in the vicinity of the virtual cursor.

And step 204, displaying the character information displayed during blinking in the target area under the condition that the eyeball data acquired by the second sensor indicates that the user blinks.

Before this step, the interactive apparatus needs to determine whether the subject blinks based on the eyeball data.

Optionally, the eye data comprises eye data for the left eye and/or eye data for the right eye. In a case where the eyeball data includes eyeball data of a left eye and does not include eyeball data of a right eye, the controller determines whether the subject is blinking using only the eyeball data of the left eye. In a case where the eyeball data includes eyeball data of a right eye and does not include eyeball data of a left eye, the controller determines whether the subject is blinking using only the eyeball data of the right eye. In a case where the eyeball data includes eyeball data of a left eye and eyeball data of a right eye, the controller determines whether the subject is blinking in combination with the eyeball data of the left eye and the eyeball data of the right eye.

The accuracy of determining whether the subject blinks using the eyeball data of both eyes is higher than that of determining whether the subject blinks using the eyeball data of one eye. Based on this, in the present embodiment, an example is described in which the controller determines whether or not the subject blinks in conjunction with the eyeball data of the left eye and the eyeball data of the right eye.

Specifically, determining whether the subject is blinking includes: and under the condition that the time length for which the second sensor does not acquire eyeball data is longer than the preset time length, determining that the user blinks.

Wherein the preset time duration may be preset, such as 150 milliseconds; or may be obtained based on the blinking habit of the user, and the value of the preset duration is not limited in this embodiment.

Optionally, the interactive device may further include a shortest interval duration between n consecutive blinks to distinguish the n consecutive blinks, where a value of n is an integer greater than 1. Wherein, the shortest interval duration may be preset, such as 400 ms; alternatively, the time interval may be acquired based on the blinking habit of the subject, and the value of the shortest time interval is not limited in this embodiment.

The target area refers to a display area where the input character is confirmed using the object. The display position of the target area can be set by using an object user; alternatively, it may be a default setting, such as: above the virtual keyboard, the present embodiment does not limit the display position of the target area.

Such as: referring to fig. 3, the target area 36 is displayed above the virtual keyboard. The input character information "bad for th" is displayed in the target area 36.

Alternatively, the interactive apparatus may provide audio feedback after displaying the character information displayed at the time of blinking in the target area, such as: typing sound and key sound, the embodiment does not limit the specific implementation manner of the audio feedback. In practical implementation, after the character information displayed during blinking is displayed in the target area, the interactive device may also provide vibration feedback, and the present embodiment does not limit the prompting manner for prompting the user that the character information has been input.

Optionally, under the condition that the position indicated by the action data does not belong to the area of the virtual keyboard and does not belong to the target area, the interactive device controls the second sensor to suspend the collection of eyeball data; and acquiring eyeball data again through the second sensor until the position indicated by the action data belongs to the area of the virtual keyboard. Thus, the interactive device can be used for resting eyes of the user, and the problem of text input errors caused by mistaken blinking is prevented.

In other embodiments, the interactive apparatus may also display an eye rest area via the display component, and the controller no longer recognizes the blinking when the position indicated by the motion data is in the eye rest area. Optionally, the second sensor may also suspend collecting eye data. When the position indicated by the motion data leaves the eye rest area, the second sensor collects eyeball data again, and the controller continues to recognize blinking.

In summary, in the text input method provided by the embodiment, the current scene image is displayed through the display component; responding to an instruction of acquiring text input in a current scene image, and operating a virtual keyboard; determining a keyboard position corresponding to the action data on the virtual keyboard based on the action data acquired by the first sensor, and displaying character information corresponding to the keyboard position, wherein the keyboard positions corresponding to different action data are different; displaying character information displayed when the eye data acquired by the second sensor indicate that the user blinks in the target area; the problem that the stay time length influences the text input efficiency in the text input process based on the stay can be solved; since text input can be performed immediately without performing an operation of staying with the subject in the case where the eyeball data indicates blinking of the subject, and the duration of blinking is shorter than the duration of performing the staying operation, the text input effect can be improved. In addition, the problem of low accuracy of text input caused by short stay time does not exist because the stay time does not need to be set, and the blinking action is different from other eyeball actions, so that the blinking action can be accurately identified, and the accuracy of text input is improved.

Optionally, after the interactive apparatus receives character information input by using the object confirmation, the interactive apparatus may further generate a recommended word in real time based on the at least one character information input, so as to improve text input efficiency using the object.

The recommended word is a word which is generated based on at least one character information and is related to the character information. In other words, the recommended word is a word that the interactive apparatus predicts will be input by the user.

Specifically, in step 204, in case the eye data indicates that the subject is blinking, the method further comprises: acquiring a recommended word corresponding to at least one piece of character information based on the at least one piece of character information in the target area; displaying a recommended word in a recommended word display area; and in the case of receiving a selection operation on the recommended word, replacing at least one character information with the recommended word in the target area.

Acquiring a recommended word corresponding to at least one character information based on the at least one character information in the target area, wherein the acquiring includes: determining a corresponding keyboard location based on the motion data; inputting the keyboard position into a preset space model to obtain character information corresponding to the keyboard position; and inputting the character information corresponding to each keyboard position into a preset language model to obtain a recommended word.

In this embodiment, the interactive apparatus stores a statistical decoder in advance, and the statistical decoder includes a spatial model and a language model.

Wherein the spatial model gives the probability distribution of the respective keyboard positions. Specifically, the spatial model is obtained by estimating model parameters in a binary gaussian distribution model based on sample data. The sample data includes sample input characters and input position information corresponding to the sample input characters.

The language model determines prior probabilities for each word in a predetermined corpus based on the character information. Specifically, the language model is used for determining the prior probability of each word in a preset corpus by using the at least one character information, and determining the recommended word based on the prior probability.

In the following, a specific implementation of the statistical decoder is described as an example.

Assume that a set of input points on the virtual keyboard indicated by the motion data is S ═ S₁，s₂，s₃，s₄，...s_nThe language model in the statistical decoder gives the most probable word in the predetermined corpus L, i.e. the recommended word, and the calculation mode of the recommended word is represented by the following formula:

according to bayesian rules, equation (1) can be converted into:

where L is a predetermined corpus, W represents a specific word in the predetermined corpus, and S is an input point. In this case, P (W) is a prior distribution of a specific word in the predetermined corpus, and P (S | W) is calculated by using a spatial model, and the specific calculation formula is as follows:

wherein the word W is composed ofA group of letters (c)₁，c₂，c₃，c₄，...c_n) Composition of the set of letters and input points(s)₁，s₂，s₃，s₄，...s_n) And correspond to each other. Assuming that the input process for each letter is independent, the character input behavior using objects follows a binary Gaussian distribution, P(s)_i|c_i) The calculation formula is as follows:

where, where (mu)_ix，μ_iy) Representing target key position c_i(iii) center position of (σ)_ixσ_iy) Is the standard deviation of the x-axis and y-axis, ρ_iIs the correlation value. The sample data is then used to adjust the spatial model. After the spatial model adjustment is complete, the statistical decoder can infer the most likely letter from the spatial model once the selection is triggered using the object according to the following equation:

wherein A is composed of 26 English letters on the virtual keyboard, (x, y) represents action data indicating key position coordinates on the virtual keyboard, C is a character belonging to the alphabet, C is a symbol^*Is a predicted character.

In this embodiment, after the interactive apparatus calculates the recommended word, the recommended word is displayed in the recommended word display area in real time.

It should be added that, in this embodiment, the number of recommended words calculated by the interactive apparatus is x, where x is an integer greater than or equal to 1. The x recommended words are the first x words with the highest probability of being output by the language model.

Optionally, the recommended word display area may be near the target area, or may be in another area of the current scene image, and the position of the recommended word display area is not limited in this embodiment.

Such as: referring to fig. 5, a recommended word display area 51 is displayed below the target area 36 and above the virtual keyboard 31. The recommended word display area 51 displays 4 words (i.e., x has a value of 4), which are: the, to, this, that.

The selecting operation of the recommended word comprises the following steps: the motion data collected by the first sensor indicates the position of the recommended word, and the eyeball data collected by the second sensor indicates the blink of the user. At this time, the recommended word indicated by the blink-time motion data is the recommended word selected by the user.

Optionally, after the replacement of the at least one character information in the target area with the recommended word is completed, the interactive apparatus may provide audio feedback, such as: typing sound and key sound, the embodiment does not limit the specific implementation manner of the audio feedback. In practical implementation, after the recommended word replaces at least one character information, the interactive device may also provide vibration feedback, and the prompting manner for prompting the user to input the recommended word is not limited in this embodiment.

Optionally, the interactive apparatus may add a space after the generated recommended word, or may add a space after at least one character information is replaced with the recommended word, to improve the efficiency of text input using the object.

In summary, based on at least one piece of character information in the target area, a recommended word corresponding to the at least one piece of character information is obtained; displaying a recommended word in a recommended word display area; under the condition that the selection operation of the recommended word is received, replacing at least one character information by the recommended word in the target area; the problem of low text input efficiency when the text is input only through characters can be solved; by displaying the recommended word corresponding to the input character information in real time, when the recommended word is a word to be input by a user, the time for inputting subsequent characters can be saved, and the efficiency of text input is improved.

Fig. 6 is a flowchart of a text input method according to another embodiment of the present application, and this embodiment takes an interactive device, namely HoloLens2 as an example for description. In the present embodiment, an eyeball tracking sensor (i.e., a second sensor) built in the HoloLens2 is used for acquiring eyeball data, and meanwhile, a first sensor for acquiring motion data is built in the HoloLens 2. The method at least comprises the following steps:

step 601, displaying the current scene image through the display component.

Step 602, in response to acquiring an instruction for text input in the current scene image, determining a target area for text input in the current scene image, and operating a virtual keyboard.

Step 603, acquiring the motion data collected by the first sensor, and determining the position indicated by the motion data.

Step 604, when the motion data indicates a keyboard position on the virtual keyboard, displaying a virtual cursor at the keyboard position.

Alternatively, the motion data may correspond to movement data of the virtual cursor, wherein the movement data includes a movement direction and a movement distance. At this time, the virtual cursor is correspondingly moved on the virtual keyboard based on the motion data. In the present embodiment, the motion data is head motion data of the use object, that is, data generated when the head is moved using the object.

The moving direction includes, but is not limited to, up, down, left, and right, and the moving direction is not limited in this embodiment.

The movement distance refers to a distance that the virtual cursor moves in the movement direction. Such as: taking the head motion data with the motion data as the object of use as an example, when the head motion data is rotated by 1 degree to the right, the virtual cursor is controlled to move by one pixel in the preset direction. The present embodiment does not limit the quantitative relationship between the motion data and the displacement values.

Step 605, displaying the character information of the position in real time based on the position of the virtual cursor.

Step 606, obtaining eyeball data collected by a second sensor, and determining whether the eyeball data indicates blinking; in case the eye data indicates blinking of the subject of use, step 607 is performed; in case the eye data does not indicate blinking of the subject, step 603 is performed again.

Step 607, displaying the character information indicated by the blinking motion data in the target area, and displaying the recommended word corresponding to the inputted character information.

Step 608, when the motion data indicates the display position of the recommended word, determining whether the eyeball data collected by the second sensor indicates blinking; if yes, go to step 609; if not, step 603 is repeated.

And step 609, replacing the corresponding character information by the recommended word indicated by the blinking motion data and displaying the recommended word in the target area.

Step 610, determining whether the text input is finished; if yes, ending the process; if not, step 603 is executed again.

In this embodiment, the text input is controlled to be stopped by acquiring an instruction to end the text input.

Optionally, the triggering manner of the instruction to end text input includes: the interactive equipment is provided with a text input ending control, and when the text input ending control is triggered, a text input ending instruction is generated. The ending text input control can be an object key arranged on the interactive equipment; or may be a virtual control displayed by a display component. When the text input ending control is a virtual control displayed through the display assembly, the controller acquires action data acquired by the first sensor and detects whether the action data indicates the position of the virtual control; when the action data indicates a location of the virtual control and a confirmation operation performed using the object is received at the location, it is determined that the virtual control is triggered. The confirmation operation may be a blinking operation or a stopping operation, and the implementation manner of the confirmation operation is not limited in this embodiment.

In the present embodiment, since text input can be performed immediately without performing a stay operation using the subject in the case where the eyeball data indicates blinking of the subject, and the duration of blinking is shorter than the duration of execution of the stay operation, the text input effect can be improved. In addition, the problem of low accuracy of text input caused by short stay time does not exist because the stay time does not need to be set, and the blinking action is different from other eyeball actions, so that the blinking action can be accurately identified, and the accuracy of text input is improved.

In order to embody the advantages of the text input method provided by the present application compared with the conventional text input method, the following describes the effects of the text input method provided by the present application from two dimensions. Wherein, two dimensions are respectively: text entry speed and text entry error rate. The conventional text input method is described by taking a stay selection and a slide selection as examples.

In this embodiment, the text input speed is measured from the characters per minute (wpm), using the following formula:

where | S | represents the length of the entered character string, and T represents the task completion time in seconds. The task completion time refers to the time taken from the selection of the first character to the end of text entry.

The text input speed for the three text input methods is shown with reference to fig. 7. Wherein D-Type represents a stay selection, G-Type represents a slide selection, and E-Type represents a blink selection. Blink selection is the fastest method of selection, and is significantly faster than stay selection and slide selection. Subjects reached an average of 7.77WPM in the first block and increased typing speed to 11.95WPM in the last block. In the first two blocks, the stay selection is faster than the slide selection. However, the sliding selection achieves an average of 9.84WPM in the last tile, which is faster than the stay selection.

Since the virtual keyboard employs Word-level correction, the text entry error rate is measured by the Minimum Word Distance (MWD). The MWD is the minimum number of word deletions, insertions, or substitutions required to convert a transcribed string to a desired string. The text entry error rate is defined as:

where MWD (S, P) represents the MWD between the transcription phrase S and the target phrase P, and | P | represents the number of words in P.

The text input error rates of the three text input methods are shown in fig. 8, where D-Type indicates a stay selection, G-Type indicates a slide selection, and E-Type indicates a blink selection. The three selection methods all have a low error rate of text input in the last block, and the blink selection, the stay selection and the slide selection are respectively 4.6%, 3.7% and 2.3%. Sliding selection produces a relatively low error rate through five blocks; in the last two blocks, it dropped below 3.0%. Using blink selection, the error rate dropped from 6.9% for the first block to 4.6% for the fifth block. The average error rate for the dwell option in the first block is 5.3% and drops to 3.7% in the last block.

Although there is no significant difference in the three text input methods in the text input error rate, the sliding selection is slightly superior in the error rate, while the blinking selection is slightly higher in the three text input methods. Since the slide selection requires the user to perform text input by selecting a recommended word, the user consciously selects the correct word, and therefore has a slight advantage in error rate. While the main cause of a slightly higher blink input error rate may be other blinks. Therefore, the embodiment is improved in design, increases the word recommendation area and prevents the mechanism of mistaken blinking, so as to improve the text input speed and reduce the text input error rate.

The overall experiment result shows that compared with stay selection and sliding selection, blink selection is a better scheme for selecting input characters in text input. Blink selection has a higher speed of text entry and a lower rate of text entry errors.

FIG. 9 is a block diagram of a text input device provided in one embodiment of the present application. The apparatus can be used in the interactive device shown in fig. 1, and the apparatus at least comprises the following modules: an image display module 910, a keyboard operation module 920, a character display module 930, and a character input module 940.

An image display module 910, configured to display a current scene image through a display component;

a keyboard operation module 920, configured to operate a virtual keyboard in response to obtaining an instruction to perform text input in the current scene image;

a character display module 930, configured to determine, based on the motion data acquired by the first sensor, a keyboard position corresponding to the motion data on the virtual keyboard, and display character information corresponding to the keyboard position, where the keyboard positions corresponding to different motion data are different;

and a character input module 940, configured to display character information displayed when the eye data acquired by the second sensor indicates that the user blinks in the target area.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the text input device provided in the above embodiment, only the division of the above functional modules is used for illustration when performing text input, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the text input device is divided into different functional modules to complete all or part of the above described functions. In addition, the text input device provided by the above embodiment and the text input method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment and is not described herein again.

FIG. 10 is a block diagram of an interactive apparatus provided in one embodiment of the present application. The device comprises at least a processor 1001 and a memory 1002.

Processor 1001 may include one or more processing cores such as: 4 core processors, 10 core processors, etc. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for execution by processor 1001 to implement a text input method provided by method embodiments herein.

In some embodiments, the interactive apparatus may further include: a peripheral interface and at least one peripheral. The processor 1001, memory 1002 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the interactive apparatus may also include fewer or more components, which is not limited by the embodiment.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the text input method of the above method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the text input method of the above-mentioned method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A text input method is characterized by being used for interactive equipment and comprising a display component, a sensing component and a supporting component; the display assembly and the sensing assembly are mounted on the support assembly; the supporting assembly is used for enabling a user to wear the interactive device, the sensing assembly comprises a first sensor and a second sensor, the first sensor is used for collecting motion data of the user, and the second sensor is used for tracking eyeballs of the user to obtain eyeball data; the method comprises the following steps:

displaying, by the display component, a current scene image;

2. The method of claim 1, wherein the running a virtual keyboard comprises:

3. The method of claim 2, wherein said keys include letter keys and function keys;

4. The method of claim 1, further comprising:

displaying the recommended word in a recommended word display area;

5. The method of claim 4, further comprising:

6. The method according to claim 4, wherein the obtaining of the recommended word corresponding to the at least one character information based on the at least one character information in the target area comprises:

determining the keyboard location based on the motion data;

inputting the at least one character information into a preset language model to obtain the recommended word; the language model is used for determining the prior probability of each word in a preset corpus by using the at least one character information, and determining the recommended word based on the prior probability.

7. The method of claim 1, further comprising:

8. A text input device is characterized by being used for interactive equipment and comprising a display component, a sensing component and a supporting component; the display assembly and the sensing assembly are mounted on the support assembly; the supporting assembly is used for enabling a user to wear the interactive device, the sensing assembly comprises a first sensor and a second sensor, the first sensor is used for collecting motion data of the user, and the second sensor is used for tracking eyeballs of the user to obtain eyeball data; the device comprises:

9. An interactive apparatus, characterized in that the apparatus comprises a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the text input method of any one of claims 1 to 8.

10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to carry out a text input method according to any one of claims 1 to 8.