CN110673723A

CN110673723A - Speech interaction method, system, medium, and apparatus based on biometric features

Info

Publication number: CN110673723A
Application number: CN201910872899.9A
Authority: CN
Inventors: 周曦; 张锦宇; 李继伟
Original assignee: Guangzhou Yuncong Information Technology Co Ltd
Current assignee: Guangzhou Yuncong Information Technology Co Ltd
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-01-10

Abstract

The invention provides a voice interaction method, a system, a medium and equipment based on biological recognition characteristics, which comprise the following steps: collecting biological recognition characteristics, controlling a voice interaction process according to the biological recognition characteristics, and outputting voice response information; the invention controls the voice interaction process through the biological recognition characteristics, has simple operation, combines various biological recognition characteristics and enhances the expandability of voice interaction.

Description

Speech interaction method, system, medium, and apparatus based on biometric features

Technical Field

The present invention relates to the field of intelligent control, and in particular, to a method, system, medium, and apparatus for voice interaction based on biometric features.

Background

Human-computer Interaction (HCI) is a technology for exchanging information between a person and a computer through input and output devices, and a Human-computer Interaction design oriented to personal experience is a development direction of a new generation of Human-computer Interaction technology. The HCI input system relies on various sensors, and is most convenient in terms of personal experience with visual and auditory inputs. The man-machine interaction system based on computer vision and audio signal processing has wide application prospect. For example, enterprises and communities can realize information inquiry and visitor control by means of an HCI-based intelligent access control device, merchants can sell more efficiently and conveniently by means of an HCI-based intelligent vending machine, banks can provide safer and more reliable financial transactions by means of an ATM (automatic teller machine) integrating HCI functions, and drivers can realize vehicle theft prevention, auxiliary navigation and even intelligent driving by means of driving assistants with HCI functions.

Most of the current mainstream HCI products only adopt human face or voice unilateral information input for response, and have a plurality of defects. Firstly, the security problem is caused, the single user verification obtains limited information, and a plurality of security holes are difficult to avoid. The function is limited, the pure image input is difficult to realize the transmission of user instructions, and the pure voice is difficult to realize the simple and convenient user identity identification. In the interactive process, the interactive flow control is difficult to be well done by adopting single input, and the reminding operation is usually performed by means of awakening words, personnel and characters.

Disclosure of Invention

In view of the problems existing in the prior art, the invention provides a voice interaction method, a voice interaction system, a voice interaction medium and voice interaction equipment based on biological recognition characteristics, and mainly solves the problem that the man-machine interaction function is limited due to single information input and response.

In order to achieve the above and other objects, the present invention adopts the following technical solutions.

A voice interaction method based on biological recognition features comprises the following steps:

and acquiring biological recognition characteristics, controlling a voice interaction process according to the biological recognition characteristics, and outputting voice response information.

Optionally, the biometric features include face features, fingerprint features, audio features, and gesture features.

Optionally, verification is performed according to the biometric features, and voice interaction is controlled according to a verification result.

Optionally, after the biometric feature is collected and before the voice interaction information is acquired, voice guidance information is acquired.

Optionally, after the biometric feature is collected, if the biometric feature passes the verification, the voice guidance information of the identity feature service is triggered; and if the verification fails, triggering the voice guidance information of the visitor feature service.

Optionally, the playing priority of the voice guidance information is set to be higher than that of other voice information.

Optionally, the voice interaction interruption is controlled in accordance with the biometric feature.

Optionally, the biometric feature is continuously detected during the voice interaction, and the voice interaction is interrupted according to the detection result.

Optionally, after the voice interaction is interrupted, the voice interaction is awakened according to the biometric feature.

Optionally, when the biometric feature is not detected in the voice interaction process, a voice interaction interruption time delay is set, and the voice interaction is maintained within the interruption time delay.

Optionally, after the voice interaction information is acquired, the voice interaction information is sent to a server side for voice interaction information processing, and the voice interaction information is recorded.

Optionally, the voice interaction information is recognized, a termination feature for terminating the voice interaction is obtained, and termination of the voice interaction is controlled according to the termination feature.

Optionally, the voice interaction information is converted into text information by the server for real-time display.

Optionally, real-time registration is performed according to the acquired biometric features.

A voice interaction system based on biometric features, comprising:

the characteristic acquisition module is used for acquiring biological identification characteristics;

and the interactive information processing module is used for controlling a voice interactive process according to the biological recognition characteristics and outputting voice response information.

Optionally, an identification module is further included for identifying the biometric characteristic.

Optionally, the recognition module includes a face recognition unit, a fingerprint recognition unit, and a gesture recognition unit.

Optionally, the system further comprises a communication module, configured to establish a connection with the server.

Optionally, the system further comprises a display module for displaying the voice interaction information in real time.

Optionally, the system further comprises a guidance module, configured to acquire voice guidance information after the biometric features are acquired and before the voice interaction information is acquired.

Optionally, the system further comprises a real-time registration module, configured to perform real-time registration according to the collected biometric features.

An apparatus, comprising:

one or more processors; and

one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the device to perform the method for voice interaction based on biometric features.

One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method for voice interaction based on biometric features.

As described above, the voice interaction method, system, medium, and apparatus based on biometric features of the present invention have the following advantageous effects.

The voice interaction process is controlled through the user identification characteristics, the user identity information is acquired, meanwhile, the voice interaction process is flexibly controlled, and the pertinence and the controllability of voice interaction are improved.

Drawings

FIG. 1 is a flowchart illustrating a method for voice interaction based on biometric features according to an embodiment of the present invention.

FIG. 2 is a block diagram of a voice interaction system based on biometric features according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal device in an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a terminal device in another embodiment of the present invention.

Description of the reference symbols

1100 input device

1101 first processor

1102 output device

1103 first memory

1104 communication bus

1200 processing assembly

1201 second processor

1202 second memory

1203 communication assembly

1204 Power supply Assembly

1205 multimedia assembly

1206 voice assembly

1207 input/output interface

1208 sensor assembly

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a method, system, medium, and apparatus for voice interaction based on biometric features. Including steps S01-S02.

In step S01, biometric features are acquired:

in one embodiment, the user identification feature comprises at least one of a face feature, a fingerprint feature, an audio feature, and a gesture feature.

In an embodiment, the biometric features may be collected through a smart terminal device, and the smart terminal may include a mobile terminal such as a smart phone or a PAD. By taking the human face characteristics as an example, the camera at the intelligent mobile phone end can be used for collecting human face images and/or images containing gestures, the microphone at the mobile phone end is used for collecting audio information, and the fingerprint module at the mobile phone end is used for collecting user fingerprints.

In step S02, the voice interaction process is controlled according to the biological recognition characteristics, and the voice response information is output

Taking the face image processing as an example, the face image is identified for identity verification. And when the acquired face image is matched with the face image in the face database, extracting the user identity information corresponding to the face image from the database. And when the collected face image information is not matched with the face image in the face database, acquiring voice guide information. The voice guidance information can be sent to the corresponding audio processing equipment for audio playing. If the user identity authentication shows that the faces are not matched, the user is prompted to face the camera through the guide information, the image is collected again for authentication, or a new user is guided to register. Taking real-time registration of a new user as an example, when the fact that the corresponding feature library does not have the features corresponding to the current user is identified, acquiring guide information, confirming whether registration is needed or not with the current user, and if the user selects registration, guiding the current user to place a finger in a designated area to enter a fingerprint, or adjusting a pose to enable a face to face a camera, collecting facial features, entering basic information, and completing registration step by step.

The users can be classified into professional users and common visitors, the professional users need to log in to obtain specific authority, for example, administrators can obtain voice interaction records of other users through face verification. The ordinary visitor can select the identity of the ordinary visitor to carry out information query according to the guidance information without registering.

In an embodiment, an infrared sensor may also be disposed on the intelligent terminal, and the infrared sensor detects a relative distance between the user and the intelligent terminal, and sends a command to wake up the corresponding user identification feature acquisition module in the low power consumption state when the distance reaches a set threshold.

In one embodiment, voice guidance information is set to have a higher priority than other voice information. When the voice guide information needs to be played, whether the audio transmission channel is occupied is checked, if the audio transmission channel is not occupied, the voice guide information is directly sent to a corresponding audio player through the audio transmission channel, the audio guide information is played, and a user is guided to execute corresponding operation; and if the audio transmission channel is detected to be occupied, interrupting the audio data transmitted in the audio channel and preferentially transmitting the voice guidance data. If the collected face image is not matched with the image in the face database, the user is prompted to have information mismatch through voice guidance information, whether real-time registration is needed or not is judged, and the like.

In one embodiment, the voice guidance information can be stored in the server, the intelligent terminal establishes connection with the server through WiFi, Bluetooth and the like, and when voice guidance is needed, the voice guidance information is downloaded from the server.

In an embodiment, after the biometric feature passes the verification, the voice guidance information may be acquired, and the user is prompted to perform the corresponding operation in real time to start the voice interaction function. For example, clicking a certain icon on the screen of the mobile phone terminal, entering a voice interaction interface, and the like.

In an embodiment, before performing the voice interaction, it may be detected in advance whether the voice guidance information or other audio information is being transmitted in the audio transmission channel, for example, the voice guidance information is detected, and before the user performs the operation corresponding to the voice guidance information, the audio information in the audio channel is cleared, so as to avoid the audio information from interfering with the voice interaction of the user.

And starting voice interaction according to the verified user identity authority, and continuously detecting the user identification characteristics in the voice interaction process. Taking face feature detection as an example, if the face features of the user are detected in the designated area, voice interaction is maintained, and if the face features are not detected, the voice interaction is interrupted according to requirements.

In an embodiment, an interruption time delay may be set, when the face image is not acquired in the designated area, the interruption time delay is started, and in the interruption time delay, the normal voice interaction function is still maintained, that is, the user inputs voice information, and still can acquire the voice feedback information corresponding to the intelligent terminal device, and when the time when the face image is not detected exceeds the interruption time delay, the voice interaction function is interrupted.

In one embodiment, a time threshold for interruption of voice interaction may be set, within which the voice interaction function may be awakened by retrieving the user identification feature. If the interruption time delay is set to be 1 minute, the interruption time threshold is 5 minutes, when the time for the user to temporarily leave the intelligent terminal face acquisition area is in the time period of 1 minute to 5 minutes, the user can acquire the face again through the camera, awaken the previous voice interaction function, do not need to be verified and guided to start the voice interaction function, and continue to complete the previous incomplete or only half of the voice interaction process. The user can wake up the interrupted voice interaction flow in other modes such as fingerprint scanning and the like. And when the user leaving time exceeds 5 minutes, the intelligent terminal automatically terminates the current voice interaction process.

In one embodiment, in the voice interaction process, voice information input by a user is acquired and sent to a server, and the voice information input by the user is identified through the server. And converting the voice information into text information, and sending the text information to the intelligent terminal for real-time display. And simultaneously, text information acquired according to the voice information is input into a memory and is used for recording the voice interaction content of the user, so that the user requirements can be conveniently analyzed according to the recorded text. And when the information fed back to the user by the intelligent terminal has problems, the information can be effectively adjusted according to the recorded information.

Before the voice information input by the user is identified, the server extracts the characteristics of the voice information of the user through the audio processor, and can obtain the termination characteristics of the user for finishing the voice interaction, such as' goodbye

And turning off the audio codes and the like, and sending a command to control the intelligent terminal to finish the current voice interaction if the server side is matched with the corresponding characteristics.

In an embodiment, the intelligent terminal can also input the specific gesture features of the user into the server to control the termination of the voice interaction. If in the voice interaction process, the intelligent terminal collects gesture features of the user through the camera, when the user makes a gesture with two crossed hands, the server compares the gesture features with features in the gesture feature library, and a control command corresponding to the gesture with two crossed hands is obtained. And (4) finishing the current voice interaction by the gesture with crossed hands, and sending a control command to the intelligent terminal by the server side to finally break the current voice interaction.

In one embodiment, the server side acquires the requirement information of the user according to the voice information input by the user, acquires corresponding text data from the database according to the requirement information, converts the text data into the voice data and sends the voice data to the intelligent terminal, and corresponding response information is played to the user through audio playing equipment of the intelligent terminal. If the user inputs 'inquiry of the account staying in the current month' through voice, the server side obtains corresponding data from the database, converts the data into voice, and broadcasts the account staying in the current month through the voice.

According to fig. 2, the present embodiment further provides a speech interaction system based on biometric features, which is used for executing the speech interaction method based on biometric features described in the foregoing method embodiments. Since the technical principle of the system embodiment is similar to that of the method embodiment, repeated description of the same technical details is omitted.

In an embodiment, the voice interaction system based on the biometric feature comprises a feature acquisition module 10 and an interaction information processing module 11, wherein the feature acquisition module 10 is configured to assist in executing the step S01 described in the foregoing method embodiment, and the interaction information processing module 11 is configured to execute the step S02 described in the foregoing method embodiment.

In one embodiment, the system further comprises an identification module, the intelligent terminal inputs the collected user identification features into the identification module to identify corresponding features, and the identification module comprises a face identification unit, a fingerprint identification unit and a gesture identification unit.

In one embodiment, the system further comprises a communication module, a display module, a guidance module, and a real-time registration module.

Taking human face feature processing as an example, the feature acquisition module 10 sends the acquired user identification features to an identification module at the server side through a communication module, and identifies a human face image through the identification module for identity verification. And when the acquired face image is matched with the face image in the face database, extracting the user identity information corresponding to the face image from the database. And when the collected face image information is not matched with the face image in the face database, acquiring the voice guide information through the guide module, and sending the voice guide information to the corresponding audio processing equipment for audio playing. If the user identity authentication shows that the faces are not matched, the user is prompted to face the camera through the guide information, the image is collected again for authentication, or a new user is guided to register. Taking real-time registration of a new user as an example, when the fact that the corresponding feature library does not have the features corresponding to the current user is identified, the guide module obtains guide information, whether registration is needed or not is confirmed with the current user, if the user selects registration, the current user is guided to register the new user through the registration implementation module, and if the user is guided to put fingers in a designated area to enter fingerprints or adjust the pose to enable the face to face a camera, facial features are collected, basic information is entered, and registration is completed step by step.

In the voice interaction process, the intelligent terminal device collects voice information input by a user through the audio collector, sends the voice information to the server, and identifies the voice information input by the user through the server. And converting the voice information into text information, and sending the text information to the intelligent terminal for real-time display. And simultaneously, text information acquired according to the voice information is input into a memory and is used for recording the voice interaction content of the user, so that the user requirements can be conveniently analyzed according to the recorded text. And when the information fed back to the user by the intelligent terminal has problems, the information can be effectively adjusted according to the recorded information.

An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

The present application further provides a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may be caused to execute instructions (instructions) of steps included in the speech interaction method based on biometric features in fig. 1 according to the present application.

Fig. 3 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 4 is a schematic hardware structure diagram of a terminal device according to another embodiment of the present application. Fig. 4 is a specific embodiment of fig. 3 in an implementation process. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, the first processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the method illustrated in fig. 1 described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 referred to in the embodiment of fig. 4 can be implemented as the input device in the embodiment of fig. 3.

In summary, the voice interaction method, system, medium and device based on the biometric feature of the present invention can simplify the process of starting voice interaction when a user logs in and optimize the user experience by recognizing the user feature and controlling the voice interaction process according to the recognition result; the user can simultaneously acquire information such as user requirements, identity and the like through interaction of various characteristics, so that diversification of information control is enhanced; and a voice interaction delay mechanism is arranged, so that voice interaction can be continued when the user identification characteristics are not detected in a short time, and after the voice interaction is interrupted, the current voice interaction can be restarted through the user identification characteristics within a set time, and the user can directly enter an interactive interface to continue unfinished voice interaction, so that the operation is convenient. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. The voice interaction method based on the biological recognition features is characterized by comprising the following steps:

2. The method of claim 1, wherein the biometric features comprise face features, fingerprint features, audio features, and gesture features.

3. The method of claim 1, wherein the voice interaction is performed according to the biometric feature and the voice interaction is controlled according to the result of the verification.

4. The voice interaction method based on biometric features according to claim 3, wherein after the biometric features are collected, voice guidance information is obtained before the voice interaction information is obtained.

5. The voice interaction method based on the biological recognition features as claimed in claim 4, wherein after the biological recognition features are collected, if the biological recognition features pass the verification, the voice guidance information of the identity feature service is triggered; and if the verification fails, triggering the voice guidance information of the visitor feature service.

6. The method of claim 4, wherein the voice guidance information is set to play higher priority than other voice information.

7. The method of claim 1, wherein the interruption of speech interaction is controlled based on the biometric feature.

8. The method of claim 7, wherein the biometric feature is continuously detected during the voice interaction, and the voice interaction is interrupted according to the detection result.

9. The method of claim 7, wherein after the voice interaction is interrupted, the voice interaction is awakened according to the biometric feature.

10. The method of claim 7, wherein when the biometric feature is not detected during the voice interaction process, a voice interaction interruption delay is set, and the voice interaction is maintained during the interruption delay.

11. The voice interaction method based on the biometric feature of claim 1, wherein after the voice interaction information is obtained, the voice interaction information is sent to a server side for voice interaction information processing, and the voice interaction information is recorded.

12. The method of claim 11, wherein the voice interaction information is recognized, a termination feature for terminating the voice interaction is obtained, and the termination of the voice interaction is controlled according to the termination feature.

13. The method of claim 11, wherein the voice interaction information is converted into text information by the server for real-time display.

14. The method of claim 1, wherein the real-time enrollment is performed according to the collected biometric features.

15. A voice interaction system based on biometric features, comprising:

16. The voice interaction system based on biometric features of claim 15, further comprising a recognition module for recognizing the biometric features.

17. The voice interaction system based on biometric features of claim 16, wherein the recognition module comprises a face recognition unit, a fingerprint recognition unit, and a gesture recognition unit.

18. The voice interaction system based on biometric features of claim 15, further comprising a communication module for establishing a connection with a server.

19. The voice interaction system based on biometric features of claim 15, further comprising a display module for displaying the voice interaction information in real time.

20. The voice interaction system according to claim 15, further comprising a guidance module for obtaining voice guidance information after the biometric feature is collected and before the voice interaction information is obtained.

21. The voice interaction system according to claim 15, further comprising a real-time registration module for performing real-time registration according to the collected biometric features.

22. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method recited by one or more of claims 1-14.

23. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method recited by one or more of claims 1-14.