CN108132805B

CN108132805B - Voice interaction method and device and computer readable storage medium

Info

Publication number: CN108132805B
Application number: CN201711382608.5A
Authority: CN
Inventors: 马小莉
Original assignee: Shenzhen TCL New Technology Co Ltd
Current assignee: Shenzhen TCL New Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2022-01-04
Anticipated expiration: 2037-12-20
Also published as: CN108132805A; WO2019119771A1

Abstract

The invention discloses a voice interaction method. The voice interaction method comprises the following steps: when a voice interaction function awakening instruction triggered by a user is received, starting a voice interaction function according to the voice interaction function awakening instruction, and playing a preset voice response file for responding; monitoring whether a voice instruction of a user is received within first preset time; if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user. The invention also discloses a voice interaction device and a computer readable storage medium. According to the voice guidance method and the voice guidance system, the corresponding voice guidance file is generated through the personal operation behavior data of the user to guide the user, more humanized voice help can be provided for the user, and therefore the user experience in the voice interaction process is improved.

Description

Voice interaction method and device and computer readable storage medium

Technical Field

The present invention relates to the field of communications, and in particular, to a voice interaction method and apparatus, and a computer-readable storage medium.

Background

With the progress of voice technology and the continuous maturity of internet big data, intelligent voice becomes the first path for artificial intelligence to enter daily life, and intelligent voice products, such as an intelligent voice television, an intelligent voice sound, an intelligent voice navigator, a voice air detector and the like, are released by various big merchants. The intelligent voice product uses voice to replace the traditional operation mode by simulating the voice mode of people, and brings great convenience to consumers.

However, when a user uses the intelligent voice product for communication, the intelligent voice product cannot be normal, timely and available as well as the communication feedback between people, for example, after the user wakes up the intelligent voice product, the product cannot actively communicate as people do, only when the user sends an instruction, the product has feedback, and the feedback is not timely and passive and cannot guide the user; when the product does not receive the instruction of the user all the time, the user can directly end or directly feed back the product function help menu, the feedback usability is not strong, and the user can not be really helped, so that the humanized voice help is not provided for the user in the voice interaction process of the existing intelligent voice product, and the user experience is poor.

Disclosure of Invention

The invention mainly aims to provide a voice interaction method, a voice interaction device and a computer readable storage medium, and aims to provide more humanized voice help and improve user experience in the voice interaction process.

In order to achieve the above object, the present invention provides a voice interaction method, which includes the following steps:

when a voice interaction function awakening instruction triggered by a user is received, starting a voice interaction function according to the voice interaction function awakening instruction, and playing a preset voice response file for responding;

monitoring whether a voice instruction of a user is received within first preset time;

if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user.

Optionally, the voice interaction method further includes:

monitoring whether a voice instruction of a user is received within second preset time;

if the voice instruction of the user is not received within the second preset time, acquiring operation behavior data of each online user, generating a corresponding voice help file according to the operation behavior data, and playing the voice help file to help the user;

and if the voice command of the user is received within the second preset time, recognizing the voice command and executing corresponding operation according to the recognition result.

Optionally, the step of obtaining operation behavior data of each online user, generating a corresponding voice help file according to the operation behavior data, and playing the voice help file to help the user includes:

acquiring operation behavior data of each online user, and performing statistical analysis on the operation behavior data;

and generating and playing a corresponding voice help file according to the statistical result so as to help the user.

Optionally, the step of acquiring personal operation behavior data of the user, generating a corresponding voice guidance file according to the personal operation behavior data, and playing the voice guidance file to guide the user includes:

acquiring personal operation behavior data of a user, and performing statistical analysis on the personal operation behavior data;

and generating and playing a corresponding voice guide file according to the statistical result so as to guide the user.

Optionally, after the step of monitoring whether a voice instruction of the user is received within the first preset time, the method includes:

and if the voice command of the user is received within the first preset time, recognizing the voice command, and executing corresponding operation according to a recognition result.

In addition, to achieve the above object, the present invention further provides a voice interaction apparatus, including: a memory, a processor, and a voice interaction program stored on the memory and executable on the processor, the voice interaction program when executed by the processor implementing the steps of:

Optionally, the voice interaction program further implements the following steps when executed by the processor:

In addition, to achieve the above object, the present invention further provides a computer-readable storage medium having a voice interaction program stored thereon, where the voice interaction program, when executed by a processor, implements the following steps:

The invention provides a voice interaction method, a voice interaction device and a computer readable storage medium, wherein when a voice interaction function awakening instruction triggered by a user is received, a voice interaction function is started according to the voice interaction function awakening instruction, and a preset voice response file is played for responding; monitoring whether a voice instruction of a user is received within first preset time; if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user. Through the mode, when the terminal receives the voice interaction function awakening instruction triggered by the user, the voice interaction function is started according to the voice interaction function awakening instruction, and the preset voice response file is played for responding, so that the voice interaction function can be started, the human-to-human interaction mode can be simulated, the response can be actively carried out in time, and the user experience effect can be improved; and then monitoring whether a voice instruction of the user is received within a first preset time, if the voice instruction of the user is not received within the first preset time, namely when the user is hesitant to perform what operation, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data and playing the voice guide file to guide the user.

Drawings

Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a voice interaction method according to a first embodiment of the present invention;

fig. 3 is a schematic view of a detailed flow of acquiring personal operation behavior data of a user, generating a corresponding voice guidance file according to the personal operation behavior data, and playing the voice guidance file to guide the user according to the embodiment of the present invention;

FIG. 4 is a flowchart illustrating a voice interaction method according to a second embodiment of the present invention;

fig. 5 is a schematic detailed flow chart of acquiring operation behavior data of each online user, generating a corresponding voice help file according to the operation behavior data, and playing the voice help file to help the user in the embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the prior art, when a user uses an intelligent voice product for communication, the intelligent voice product cannot be normal, timely and available as the communication feedback between people, for example, after the user wakes up the intelligent voice product, the product cannot actively communicate as people do, only when the user sends an instruction, the product has feedback, and the feedback is not timely and passive and cannot guide the user; when the product does not receive the instruction of the user all the time, the user can directly end or directly feed back the product function help menu, the feedback usability is not strong, and the user can not be really helped, so that the humanized voice help is not provided for the user in the voice interaction process of the existing intelligent voice product, and the user experience is poor.

In order to solve the technical problems, the invention provides a voice interaction method, a voice interaction device and a computer-readable storage medium, wherein when a voice interaction function awakening instruction triggered by a user is received, a voice interaction function is started according to the voice interaction function awakening instruction, and a preset voice response file is played for responding; monitoring whether a voice instruction of a user is received within first preset time; if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user. Through the mode, when the terminal receives the voice interaction function awakening instruction triggered by the user, the voice interaction function is started according to the voice interaction function awakening instruction, and the preset voice response file is played for responding, so that the voice interaction function can be started, the human-to-human interaction mode can be simulated, the response can be actively carried out in time, and the user experience effect can be improved; and then monitoring whether a voice instruction of the user is received within a first preset time, if the voice instruction of the user is not received within the first preset time, namely when the user is hesitant to perform what operation, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data and playing the voice guide file to guide the user.

Referring to fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.

The terminal of the embodiment of the invention can be an intelligent voice television, and also can be intelligent voice products such as an intelligent voice sound box, an intelligent robot, an intelligent mobile phone, an intelligent voice alarm clock, an intelligent voice navigator, a voice air detector and the like.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a Wi-Fi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts brightness of the display screen according to brightness of ambient light, and a proximity sensor that turns off the display screen and/or backlight when the terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a voice interaction program.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client and performing data communication with the client; and the processor 1001 may be configured to invoke the voice interaction program stored in the memory 1005 and perform the following operations:

Further, the processor 1001 may call the voice interaction program stored in the memory 1005, and further perform the following operations:

Based on the hardware structure, the invention provides various embodiments of the voice interaction method.

The invention provides a voice interaction method.

Referring to fig. 2, fig. 2 is a flowchart illustrating a voice interaction method according to a first embodiment of the present invention.

In the embodiment of the invention, the voice interaction method comprises the following steps:

step S10, when a voice interaction function awakening instruction triggered by a user is received, starting a voice interaction function according to the voice interaction function awakening instruction, and playing a preset voice response file for response;

in the embodiment of the invention, the voice interaction method can be used for simulating the human-human interaction mode to actively respond in time in the voice interaction process of the intelligent voice product, and can perform statistical analysis according to the personal operation behaviors of the user, so that more humanized voice help is provided, the user is actively guided and helped, the voice interaction frequency is improved, and the user experience in the voice interaction process is improved. The terminal of the embodiment of the invention can be an intelligent voice television, and also can be intelligent voice products such as an intelligent voice sound box, an intelligent robot, an intelligent mobile phone, an intelligent voice alarm clock, an intelligent voice navigator, a voice air detector and the like. For convenience of description, the embodiment of the present invention is described by taking an intelligent voice telephone as an example.

In the embodiment of the invention, when the intelligent voice television receives the voice interaction function awakening instruction triggered by the user, the voice interaction function is started according to the voice interaction function awakening instruction, and the preset voice response file is played for responding. The manner for the user to trigger the voice interaction function wake-up instruction may include, but is not limited to, the following 2 types: 1) a user presses a starting button of the intelligent voice television, and a voice interaction function awakening instruction is triggered when the intelligent voice television is started; 2) and the user selects an option for starting the voice interaction function in the intelligent voice television display interface.

It should be noted that the preset voice response file may be preset by the system, or may be set by the user. When the preset voice response file is preset by the system, one or more voice response files can be stored in the voice response packet, for example, the voice response packet can only comprise a voice response file of 'voice interaction function started, please indicate', and when the voice interaction function is started, the voice response file is actively played; for example, the voice response file comprises voice response files such as 'owner, happy weekend, ask for what you have to instruct woolen', 'good night, owner, what you have to instruct', 'owner, good morning, please' and the like, and the smart voice television can select to play according to the current time. When the voice interaction function is started, the invention can simulate the human-human interaction mode to actively respond in time, and can improve the user experience effect.

Step S20, monitoring whether a voice command of a user is received within a first preset time;

if the voice command of the user is not received within the first preset time, step S30 is executed: and acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user.

After the voice interaction function is started, the intelligent voice television monitors whether a voice instruction of a user is received within a first preset time, if the voice instruction of the user is not received within the first preset time, personal operation behavior data of the user are obtained, and a corresponding voice guide file is generated according to the personal operation behavior data and played to guide the user. The first preset time is obtained based on psychology through experiments and can be set to be 0.7s-1s, the personal operation behavior data can comprise historical browsing records and browsing time, then statistical analysis is carried out on the personal operation behavior data of the user, more humanized voice guidance is provided based on statistical results of the personal operation behavior data of the user, and the user is guided actively, so that the voice interaction frequency can be improved, and the user experience in the voice interaction process is improved.

Specifically, referring to fig. 3, fig. 3 is a detailed flow diagram illustrating a process of acquiring personal operation behavior data of a user, generating a corresponding voice guidance file according to the personal operation behavior data, and playing the voice guidance file to guide the user according to the personal operation behavior data in the embodiment of the present invention. Step S30 includes:

step S31, acquiring personal operation behavior data of a user, and performing statistical analysis on the personal operation behavior data;

and step S32, generating a corresponding voice guide file according to the statistical result and playing the voice guide file to guide the user.

If a voice instruction of a user is not received within a first preset time, the intelligent voice television terminal may first acquire personal operation behavior data of the user, where the personal operation behavior data may include a historical browsing record and browsing time, where the historical browsing record may include browsing types, such as a television program, a television show, a movie, an art, and the like, and may also include program types, such as a television program including a news type, a finance type, a sports type, and the like, and a television show including a drama, a korean show, or a love show, an ancient drama, a suspense show, and the like, and may also include a corresponding host or main play list, and the browsing time may include a working day, a non-working day, an early day, a middle day, a late day, and the like, and then perform statistical analysis on the personal operation behavior data, and generate and play a corresponding voice guidance file according to a statistical result, so as to guide the user. For example, statistical analysis of the personal operation behavior data shows that the user has recently watched a certain tv series a and B continuously, and then the voice guidance file "master" can be generated according to the statistical result, and you want to watch a or B continuously today, or statistical analysis shows that the user has recently watched a movie from the lead actor of a certain person C continuously, and then the voice guidance file "master" can be generated according to the statistical result, and the movie from the lead actor C has many good results, such as D, E and F, and you want to watch it, and furthermore, in a specific embodiment, the statistical result can be combined with the current time to generate a corresponding voice guidance file, for example, statistical analysis of the personal operation behavior data shows that the user watches news simulcasts at 7 pm, to 7 pm, or to near 7 pm, the voice guide file 'news simulcast is being broadcast/will be broadcast, and you need to watch' can be generated according to the statistical result and the current time, the guide voice is formed based on the personal operation behavior data of the user, more intimate and humanized help and service can be provided for the user, and the guide help voice is actively generated, so that the voice interaction frequency of the user can be improved, the user can easily and naturally obtain help, and the user experience is improved.

Furthermore, in the embodiment of the present invention, after step S20, the voice interaction method may further include:

When the intelligent voice television receives a voice command of a user within a first preset time, the voice command is recognized, and corresponding operation is executed according to a recognition result. The specific identification technology may refer to the prior art, and is not described herein. For example, if a voice instruction of a user, namely 'play a tv series a', is received within a first preset time, the collected user voice can be identified, and then the tv interface is controlled to jump to an episode selection interface of the tv series a according to an identification result; for another example, if a voice instruction "play movie D" of the user is received within the first preset time, the television interface is directly controlled to start playing movie D after the voice instruction is recognized.

The invention provides a voice interaction method, which is characterized in that when a voice interaction function awakening instruction triggered by a user is received, a voice interaction function is started according to the voice interaction function awakening instruction, and a preset voice response file is played for responding; monitoring whether a voice instruction of a user is received within first preset time; if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user. Through the mode, when the terminal receives the voice interaction function awakening instruction triggered by the user, the voice interaction function is started according to the voice interaction function awakening instruction, and the preset voice response file is played for responding, so that the voice interaction function can be started, the human-to-human interaction mode can be simulated, the response can be actively carried out in time, and the user experience effect can be improved; and then monitoring whether a voice instruction of the user is received within a first preset time, if the voice instruction of the user is not received within the first preset time, namely when the user is hesitant to perform what operation, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data and playing the voice guide file to guide the user.

Referring to fig. 4, fig. 4 is a flowchart illustrating a voice interaction method according to a second embodiment of the present invention.

Based on the first embodiment shown in fig. 2, after step S30, the voice interaction method further includes:

step S40, monitoring whether a voice command of the user is received within a second preset time;

step S51, if the voice command of the user is not received within the second preset time, acquiring the operation behavior data of each online user, and generating and playing a corresponding voice help file according to the operation behavior data to help the user;

in the embodiment of the invention, after guiding the user, the terminal continues to monitor whether the voice instruction of the user is received within the second preset time, if the voice instruction of the user is not received within the second preset time, the guiding voice may not bring real help to the user or the user may not have purposiveness, at this time, the operation behavior data of each user on line is obtained, and a corresponding voice help file is generated according to the operation behavior data and played to help the user. The second preset time is obtained based on psychology through experiments and can be set to be 2s-3s, the operation behavior data can include video watching records, network searching records, web browsing records and the like within a certain time range (such as within a month), then, the operation behavior data of each online user is subjected to statistical analysis, and voice help is provided based on statistical results of the operation behavior data of each online user, so that reference opinions are provided for the user, and user experience is improved.

Specifically, please refer to fig. 5, where fig. 5 is a schematic view of a detailed flow chart for acquiring operation behavior data of each online user, generating a corresponding voice help file according to the operation behavior data, and playing the voice help file to help the user according to the operation behavior data in the embodiment of the present invention. Step S51 may include:

step S511, obtaining operation behavior data of each online user, and performing statistical analysis on the operation behavior data;

step S512, generating and playing the corresponding voice help file according to the statistical result to help the user.

If the voice instruction of the user is not received within the second preset time, it indicates that the user may have no purposiveness, at this time, the intelligent voice television terminal may first acquire operation behavior data of each online user, where the operation behavior data of each online user may include video viewing records, web search records, web browsing records, and the like within a certain time range (e.g., within a near month), then perform statistical analysis on the operation behavior data, generate and play a corresponding voice help file according to a statistical result, thereby providing a reference opinion for the user to help the user to select. For example, statistical analysis is performed on the operation behavior data of each online user to find that most users are watching a certain movie X recently or searching related information of the movie X, at this time, a voice help file "most recent popular movie X, which you want to watch" may be generated according to the statistical result, or a selective voice help file "recent new films have a, b, c, and have no worries of your interest" may be generated according to the statistical result.

And step S52, if the voice command of the user is received within the second preset time, recognizing the voice command, and executing corresponding operation according to the recognition result.

And when the intelligent voice television receives the voice command of the user within the second preset time, recognizing the voice command and executing corresponding operation according to the recognition result. The specific identification technology may refer to the prior art, and is not described herein.

The present invention further provides a voice interaction apparatus, which includes a memory, a processor, and a voice interaction program stored in the memory and executable on the processor, wherein the voice interaction program, when executed by the processor, implements the steps of the voice interaction method according to any one of the above embodiments.

The specific embodiment of the voice interaction apparatus of the present invention is substantially the same as the embodiments of the voice interaction method described above, and will not be described herein again.

The present invention also provides a computer readable storage medium having a voice interaction program stored thereon, which when executed by a processor implements the steps of the voice interaction method according to any one of the above embodiments.

The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the voice interaction method described above, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A voice interaction method is characterized by comprising the following steps:

when a voice interaction function awakening instruction triggered by a user is received, starting a voice interaction function according to the voice interaction function awakening instruction, and playing a preset voice response file for response, wherein if the preset voice response file is preset by a system, one or more voice response files are stored in a voice response packet, if a plurality of voice response files are stored in the voice response packet, corresponding voice response information is selected according to current time information for playing, or a voice response file is randomly called for playing;

if the voice instruction of the user is not received within the first preset time, acquiring personal operation behavior data of the user, generating a corresponding voice guide file according to the personal operation behavior data, and playing the voice guide file to guide the user, wherein the personal operation behavior data of the user is acquired, and statistical analysis is performed on the personal operation behavior data, the personal operation behavior data comprises a historical browsing record and browsing time, and the historical browsing record comprises a browsing type;

generating a corresponding voice guide file according to the statistical result, or generating a voice guide file according to the statistical result and the current time, and playing the generated voice guide file to guide the user;

if the voice instruction of the user is not received within a second preset time, acquiring operation behavior data of each online user, generating a corresponding voice help file according to the operation behavior data, and playing the voice help file to help the user, wherein the operation behavior data comprises a video watching record, a network searching record and a web browsing record within a certain time range;

2. The voice interaction method according to claim 1, wherein the step of obtaining operation behavior data of each user on line, generating a corresponding voice help file according to the operation behavior data and playing the voice help file to help the user comprises:

3. The voice interaction method of claim 1, wherein the step of monitoring whether the voice command of the user is received within the first preset time is followed by:

4. A voice interaction apparatus, comprising: a memory, a processor, and a voice interaction program stored on the memory and executable on the processor, the voice interaction program when executed by the processor implementing the steps of:

5. The voice interaction apparatus of claim 4, wherein the voice interaction program, when executed by the processor, further performs the steps of:

6. A computer-readable storage medium having stored thereon a voice interaction program, which when executed by a processor, performs the steps of: