CN117032452A - Prompt statement response method and related device - Google Patents

Prompt statement response method and related device Download PDF

Info

Publication number
CN117032452A
CN117032452A CN202310952166.2A CN202310952166A CN117032452A CN 117032452 A CN117032452 A CN 117032452A CN 202310952166 A CN202310952166 A CN 202310952166A CN 117032452 A CN117032452 A CN 117032452A
Authority
CN
China
Prior art keywords
voice
user
interaction
target
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310952166.2A
Other languages
Chinese (zh)
Inventor
王一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Renma Interactive Technology Co Ltd
Original Assignee
Shenzhen Renma Interactive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Renma Interactive Technology Co Ltd filed Critical Shenzhen Renma Interactive Technology Co Ltd
Priority to CN202310952166.2A priority Critical patent/CN117032452A/en
Publication of CN117032452A publication Critical patent/CN117032452A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a response method of a prompt sentence and a related device, wherein the method is applied to wearable equipment in a voice interaction system and comprises the following steps: responding to the voice input operation of the user aiming at the target voice interaction interface, and displaying the current user input voice in the target voice interaction interface; according to the continuous display time length and the preset display time length of the target voice interactive interface, voice data of the first machine output sentence or the second machine output sentence is obtained, and corresponding machine output voice is displayed on the target voice interactive interface; the above process is repeated until each interaction node is completed or a return operation is responded to in advance. Therefore, the application responds to the voice input operation of the user on the target voice interaction interface through the wearable equipment to realize interaction based on the interaction script, and outputs the prompt sentence adapting to the user according to the expression condition of the user, so as to guide the user to carry out accurate reply, thereby improving the flexibility and the intelligence of the interaction product.

Description

Prompt statement response method and related device
Technical Field
The application belongs to the technical field of general data processing in the Internet industry, and particularly relates to a prompt statement response method and a related device.
Background
Currently, popular interactive products on the market generally interact with users through interaction means such as voice or text.
However, most of the interactive products are only for entertainment, and the interactive products cannot provide help for the user's own reading ability or speech expression ability, so that the interactive products have low practicality and poor user interaction effect.
Accordingly, there is a need for a method of responding to animation data that solves the above-described problems.
Disclosure of Invention
The embodiment of the application provides a response method and a related device for prompt sentences, which guide a user to correctly output reply sentences by a voice interaction mode and according to a history interaction record and a machine output sentence which is matched with input voice output of the user, thereby improving the flexibility and the intelligence of response output sentences of an interactive product and enhancing the practicability of the interactive product.
In a first aspect, an embodiment of the present application provides a response method of a prompt sentence, which is applied to a wearable device in a voice interaction system, where the voice interaction system includes the wearable device and a voice server; the method comprises the following steps:
Responding to the voice input operation of a user aiming at a target voice interaction interface, acquiring the current user input voice recorded by the voice input operation, and displaying the current user input voice in the target voice interaction interface, wherein the target voice interaction interface is associated with a target interaction play; the method comprises the steps of,
if the duration of the target voice interaction interface is smaller than the preset display duration, interacting with the voice server to obtain voice data of a first machine output statement adapted to the current user input voice, wherein the content of the first machine output statement is a reply statement corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario;
if the continuous display time length of the target voice interaction interface is longer than or equal to the preset display time length, interacting with the voice server to obtain voice data of a second machine output statement which is adaptive to the first reading record and the current voice input by the user, wherein the content of the second machine output statement comprises the reply statement and/or the target prompt statement, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction time length, the target prompt statement is a first prompt statement or a second prompt statement, the first prompt statement is a complete prompt statement used for prompting the user to input voice conforming to the current interaction node, and the second prompt statement is an incomplete prompt statement used for prompting the user to input voice conforming to the current interaction node;
Displaying machine output voice corresponding to voice data of the first machine output sentence or the second machine output sentence in the target voice interactive interface;
and repeating the process until each interactive node in the target interactive script is completed, or in response to the preset return operation of the user on the target voice interactive interface, skipping to display the target voice interactive interface.
In a second aspect, an embodiment of the present application provides a response apparatus for a prompt sentence, which is applied to a wearable device in a voice interaction system, where the voice interaction system includes the wearable device and a voice server; the device comprises:
the first voice display unit is used for responding to the voice input operation of a user aiming at a target voice interaction interface, acquiring the current user input voice recorded by the voice input operation, and displaying the current user input voice in the target voice interaction interface, wherein the target voice interaction interface is associated with a target interaction script;
the voice data acquisition unit is used for interacting with the voice server to obtain the voice data of a first machine output statement which is adaptive to the current user input voice if the duration of the target voice interaction interface is smaller than the preset display duration, wherein the content of the first machine output statement is a reply statement corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario; and if the duration of the display of the target voice interaction interface is longer than or equal to the preset display duration, interacting with the voice server to obtain voice data of a second machine output sentence adapting to the first reading record and the current voice input by the user, wherein the content of the second machine output sentence comprises the reply sentence and/or a target prompt sentence, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction duration, the target prompt sentence is a first prompt sentence or a second prompt sentence, the first prompt sentence is a complete prompt sentence for prompting the user to input the voice conforming to the current interaction node, and the second prompt sentence is an incomplete prompt sentence for prompting the user to input the voice conforming to the current interaction node;
The second voice display unit is used for displaying machine output voice corresponding to voice data of the first machine output sentence or the second machine output sentence in the target voice interactive interface;
and the skip display unit is used for repeating the process until each interactive node in the target interactive script is completed, or responding to the preset return operation of the user on the target voice interactive interface, and skipping and displaying the target voice interactive interface to a target application interface.
In a third aspect, embodiments of the present application provide a wearable device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing steps as in the first aspect of the embodiments of the present application.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor performs the steps of the first aspect of embodiments of the present application.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer programs/instructions which when executed by a processor implement some or all of the steps as described in the first aspect of the embodiments of the present application.
It can be seen that, in the embodiment of the present application, a user performs voice interaction by using a wearable device in a voice interaction system, and the wearable device obtains user input voice by responding to a voice input operation of the user for a target voice interaction interface; and then, selecting and outputting a first machine output sentence or a second machine output sentence which is adaptive to the current input voice of the user according to the duration of the target voice interactive interface. Therefore, each interactive node of the target interactive script is completed by repeating the process, and the user is guided to input the user input statement which accords with the corresponding interactive node independently by outputting the complete prompt statement or the incomplete prompt statement, namely, the content of the machine output statement and the completeness of the prompt statement are adjusted, so that the performance of voice interaction by the matched user is improved, the flexibility and the intelligence of the interactive product are improved, and the practicability of the interactive product is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a voice interaction system according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for responding to a prompt statement according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a target voice interactive interface according to an embodiment of the present application;
fig. 4 is a schematic view of a scene of man-machine interaction text content according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a target application interface according to an embodiment of the present application;
fig. 6 is a schematic diagram of a scenario of a preset return operation according to an embodiment of the present application;
FIG. 7a is a functional block diagram of a response device for a hint statement according to an embodiment of the present application;
FIG. 7b is a functional block diagram of a response device for a hint statement according to an embodiment of the present application;
fig. 8 is a block diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Referring to fig. 1, fig. 1 is a block diagram of a voice interaction system according to an embodiment of the present application. As shown in fig. 1, the voice interaction system 100 includes a wearable device 110 and a voice server 120, where the wearable device 110 and the voice server 120 are in communication connection, the voice server 120 is provided with a voice data generation engine, the voice data generation engine is used to energize a plurality of interaction scripts corresponding to a preconfigured voice interaction product by using a voice range, the voice server can use the voice data generation engine to generate voice data of a corresponding machine output sentence according to the preconfigured interaction script, and the voice server sends the voice data to the wearable device, so that the wearable device can generate and play the corresponding machine output voice, thereby realizing voice interaction between man and machine.
The wearable device 110 may be a mobile phone terminal, a watch, a smart glasses, etc., and the voice server 120 may be a server, or a voice server cluster formed by a plurality of servers, or a cloud computing service center. One voice server 120 may be used to simultaneously correspond to multiple wearable devices 110, or multiple voice servers 120 may be included in the voice interaction system 100, each voice server 120 corresponding to one or more wearable devices 110.
Based on this, the embodiment of the application provides a response method of a prompt sentence, and the embodiment of the application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 is a flow chart of a response method of a prompt sentence according to an embodiment of the present application, where the method is applied to a wearable device 110 in a voice interaction system 100, and the voice interaction system includes the wearable device 110 and a voice server 120; the method comprises the following steps:
step S201, responding to the voice input operation of the user aiming at the target voice interaction interface, acquiring the current user input voice recorded by the voice input operation, and displaying the current user input voice in the target voice interaction interface.
Wherein the target voice interactive interface is associated with a target interactive play.
Referring to fig. 3, fig. 3 is a schematic display diagram of a target voice interaction interface provided by an embodiment of the present application, as shown in fig. 3, 01 in fig. 3 is a dial of a smart watch, and a user realizes voice interaction through the dial of the smart watch. In the target voice interaction interface, 02 in fig. 3 is a script name or other names of the target interaction script, 03 in fig. 3 is a voice bar control corresponding to machine output voice output by a machine, similarly 04 in fig. 3 is a voice bar control generated by a user through voice input operation of a voice input control corresponding to 05 in fig. 3, and both voice bar controls can generate corresponding voices for the user to listen in response to touch operation of the user, so that corresponding content is obtained, and further voice interaction is realized.
The meaning of the association of the target voice interaction interface and the target interaction script means that each interaction script in the voice interaction product corresponds to a dedicated voice interaction interface, and the interface design and style of the voice interaction interface can be set at will according to the content of the interaction script, so that the user is provided with rich visual experience. Similarly, in a general interaction scenario, a user may enter a target voice interaction interface through a touch interface for a target icon control corresponding to a target interaction scenario, where the target icon control refers to a control used for referring to the target interaction scenario in an interface for the user to select the corresponding scenario in a voice interaction product.
Step S202, if the duration of the target voice interaction interface is smaller than the preset duration, interacting with a voice server to obtain voice data of a first machine output sentence which is adaptive to the current user input voice; and if the continuous display time length of the target voice interaction interface is longer than or equal to the preset display time length, interacting with the voice server to obtain voice data of a second machine output statement which is adaptive to the first reading record and the current voice input by the user.
The content of the first machine output sentence is a reply sentence corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario. The content of the second machine output sentence comprises the reply sentence and/or a target prompt sentence, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction time, the target prompt sentence is a first prompt sentence or a second prompt sentence, the first prompt sentence is a complete prompt sentence for prompting the user to input the voice conforming to the current interaction node, and the second prompt sentence is a non-complete prompt sentence for prompting the user to input the voice conforming to the current interaction node.
The normal voice interaction flow is that a user enters into a target voice interaction interface, the target voice interaction interface firstly transmits machine output voice, then after hearing the machine output voice, the user inputs voice to input the user input voice, finally the wearable device interacts with a voice server according to the user input voice to output new machine output voice, and the process is repeated until each interaction node in the target interaction scenario is completed, or the user exits from a target application interface of an interaction product corresponding to the target voice interaction interface.
The duration of the target voice interactive interface can be understood as the duration of the voice interaction performed by the user on the current target voice interactive interface, that is, the duration from the time when the user enters the target voice interactive interface to the time when the user inputs the voice of the current user. The response method of the prompt sentence is essentially to determine the prompt degree of the prompt sentence by the user through the expression of the voice interaction script and the current actual voice response state, so when the duration of the continuous display time of the target voice interaction interface is smaller than the preset display time, since there is not enough sample to determine the current actual voice response state (i.e. the current voice interaction state) of the user, the first machine output sentence is selected to be normally output for the user, namely only the reply sentence corresponding to the current interaction node is output. When the continuous display time of the target voice interaction interface is longer than or equal to the preset display time, enough samples exist to determine the current actual voice response state of the user, so that the wearable device can determine whether to output a prompt sentence for the user or not and whether to output a complete or incomplete prompt sentence or not according to the current actual voice response state of the user and the current input voice of the user, thereby guiding the user to input accurate content which should be responded, and further improving the expression capability and the script reading capability of the user.
In one possible example, the interacting with the voice server to obtain voice data of a corresponding second machine output sentence according to the first reading record and the current user input voice includes: determining a plurality of historical input voices input by the user according to the first reading record; determining a plurality of matching degrees according to the plurality of historical input voices and the target interaction scenario; according to the multiple matching degrees, confirming that the historical input voice with the corresponding matching degree higher than the preset matching degree is the target input voice; judging whether the ratio of the target input voice to the plurality of historical input voices is larger than or equal to a preset ratio: if the user reading confidence is greater than or equal to the preset ratio, determining that the user is high in reading confidence; and interacting with the voice server to obtain first voice data or second voice data adapted to the current user input voice and a pre-stored second reading record; if the reading confidence is smaller than the preset ratio, determining the low reading confidence of the user; and according to the current voice input by the user, interacting with the voice server to obtain third voice data.
The matching degree refers to the association degree of the text content of a single historical input voice and the corresponding interaction node, and the matching degrees are in one-to-one correspondence with the historical input voices. The content of the second machine output sentence corresponding to the first voice data is the reply sentence, the content of the second machine output sentence corresponding to the second voice data is the target prompt sentence, the second reading record comprises machine output sentences and corresponding user input sentences in each interaction node in all interaction scripts completed by the user, the user input sentences are corresponding text content of user input voice input by the user, and the content of the second machine output sentence corresponding to the third voice data is the reply sentence and the first prompt sentence.
The method comprises the steps of determining whether a user is serious in the current voice interaction process or not by acquiring a reading record of the content of the user in a preset display time period, wherein the judging process is to determine the matching degree of the user input voice input by each user in a first reading record and an interaction node corresponding to the user input voice. And determining the ratio of the high matching degree from the plurality of matching degrees, and when the ratio is larger than or equal to a preset ratio, the user replies and answers carefully in most interaction nodes in the voice interaction process, so that the fact that the user is in high reading carefully in the current voice interaction process is indicated, and otherwise, the user is in low reading carefully. Further, when the wearable device determines that the user is in low reading severity, third voice data is output for the user, that is, a reply sentence corresponding to the current interaction node and a first prompt sentence (that is, a complete prompt sentence) are output, because the user does not carefully reply or does not reply well in the current voice interaction process, a complete prompt is provided for the user in addition to the reply sentence in the subsequent interaction node to guide the user to reply, so that the interaction node is completely completed as a target, and the enthusiasm of the user for voice interaction is mobilized.
For example, due to the younger age of the user, different reply contents may be output according to the current mood and the like when the interactive novice is read, and sometimes the user may just feel funny and reply at will, so that the reply contents of the user are mostly irrelevant to the current mood node. If the user most answers are relevant to the current scenario node, the user is deemed to want to carefully read the interactive novice.
In this example, a plurality of historical input voices input by the user are determined according to the first reading record, and then the matching degree corresponding to each historical input voice is determined according to the target interaction scenario, so that the ratio of the historical input voices with high matching degree is determined to determine whether the user is high in reading fidelity or low in reading fidelity in the current voice interaction process, and the user interacts with the voice server to output different voice data. Therefore, reading carefully of the user is analyzed through the first reading record, so that different voice data are determined to be acquired, flexibility and efficiency of the wearable device for acquiring voice data corresponding to machine output voice are improved, and furthermore, enthusiasm of voice interaction of the user and practicability of the wearable device are improved.
In one possible example, the interacting with the voice server to obtain the first voice data or the second voice data adapted to the current user input voice and the pre-stored second reading record includes: obtaining target user intention according to the current user input voice analysis; judging whether the intention of the target user is complete or not: if the intention of the target user is complete, sending a voice data request message carrying voice data corresponding to the current user input voice and a first content indication message to the voice server; responding to the voice data response message sent by the voice server, and extracting the first voice data carried by the voice data response message; if the intention of the target user is incomplete, determining an expression capacity score of the user according to the second reading record; obtaining a prompt statement confirmation result according to the expression capacity score; transmitting a voice data request message carrying voice data corresponding to the user input voice, the prompt statement confirmation result and a second content indication message to the voice server; and responding to the voice data response message sent by the voice server, and extracting the second voice data carried by the voice data response message.
The first content indication message is used for indicating that voice data carried by a voice data response message is the first voice data, the expression capability score is a score obtained by evaluating voice replies of the user according to the second reading record and a preset scoring rule, the prompt statement confirmation result is used for representing that the target prompt statement is the first prompt statement or the second prompt statement, and the second content indication message is used for indicating that the voice data carried by the voice data response message is the second voice data.
When the user is high in reading carefully, the user input voice currently output by the user is acquired to analyze and judge whether the target user intention expressed by the user input voice is complete or not. If the intention of the target user is complete, the reply sentence of the current interaction node can be output to the user, and the prompting sentence is not required to be output. Because the voice interaction of the user during the period of time is well behaved and the expressed voice can accurately express the user intention, a prompt sentence is not required to be provided for the user. However, if it is determined that the user input voice cannot express the complete user intention, the user is enabled to reenter the user input voice by only outputting the prompt sentence, so as to guide the user to perform normal voice interaction and promote the development of the script. The completeness of the prompt sentence is determined by acquiring an expression capability score determined by user voice input in an interaction scenario which is completed previously by a user and corresponding machine output voice, and the higher the expression capability score of the user determined by the wearable equipment is, the lower the completeness of the corresponding prompt sentence is.
Referring to fig. 4, fig. 4 is a schematic view of a scene of man-machine interaction text content according to an embodiment of the present application. As shown in fig. 4, in the target product interface, the robot (or the bystander) outputs the machine reply sentence corresponding to 01 in fig. 4: "the baby is taking the mother's car to learn, but the baby needs to find the reason to get off, ask what should the baby say? ", then, the user inputs the current user input voice: "stomachache". In this case, if the user is in high reading carefully, the wearable device will determine whether the intention of the target user is complete, and obviously, the user only speaks of the discomfort of the current small physiological state, but does not specify the intention of the user who wants to get off, and the intention of the target user expressed by the wearable device is not complete. The wearable device determines the expression ability score of the user according to the record of the scenario previously completed by the user, and further determines to output the first prompt sentence or the second prompt sentence. Assuming that the expressive power score of the user is 80, the expressive power of the user is qualified, we only need to output the second prompt sentence, namely, the incomplete prompt sentence, and the content of the second prompt sentence may be the content shown in 02 in fig. 4: "prompt: you need to get off the car at present, can tell you to the mother you that you are now bellied. Assuming that the expressive power score of the user is 40, the expressive power of the user is not qualified, and we need to output the first prompt sentence, that is, the complete prompt sentence allows the user to output accurate and complete reply content, as shown in 03 in fig. 4, where the content of the first prompt sentence may be: "prompt: you need to tell you to the mother you the pain of the bellies and get off to the hospital. Thus, based on the score of the expression ability of the user, the complete prompt sentence and the incomplete prompt sentence are output for the user to output qualified reply contents again.
It can be seen that in this example, the first voice data or the second voice data is requested from the voice server by analyzing whether the user input voice can express the complete user intention, so that flexibility and intelligence of the wearable device responding to the user input voice are provided, the user is assisted in voice interaction by outputting different machine output voices, and the expression capability and the reading capability of the user are exercised.
In one possible example, the determining the expressive power score of the user based on the pre-stored second reading record includes: determining at least one effective interaction node according to the second reading record; obtaining the at least one expression capability initial score according to the user input statement and the machine output statement corresponding to the at least one effective interaction node; obtaining at least one reply difficulty according to the machine output statement corresponding to the at least one effective interaction node; acquiring the user age of the user and a preset expression capacity scoring formula; and obtaining the expression capability score according to the expression capability score formula, the age of the user, the at least one initial expression capability score and the at least one response difficulty.
Wherein the effective interactive node is an open reply node which the user replies to under the high reading fidelity, the open reply node is an interactive node which the developer of the target interactive script performs the marking operation in advance and corresponds to a reply sentence which is a non-selection question, the expression capability initial score refers to a score obtained by evaluating the accuracy of a sentence input by a user aiming at a single effective interaction node or the structural integrity of the sentence, the at least one expression capability initial score corresponds to the at least one effective interaction node one by one, the at least one reply difficulty corresponds to the at least one effective interaction node one by one, and the reply difficulty is related to the length and the keyword number of the corresponding machine output sentence.
The expression capability scoring formula is used for calculating expression capability scores of the users at the current ages, and comprises an age variable, an expression capability initial scoring variable, a response difficulty variable and a weight coefficient corresponding to each variable.
The wearable device determines an effective scenario node from the second reading record, wherein the effective scenario node is a scenario node which can be used for evaluating the expression capability of the user, the answer corresponding to the effective scenario node is an open answer, and the corresponding reading acceptance is high when the user answers the machine statement of the effective scenario node. (namely, the user cannot simply make a selection reply according to the content output by the last machine statement when replying, for example, the machine statement is used for inquiring the user to select A or B, if the content of the user reply is selected A, the scenario node is not an effective scenario node), the reply of the user in each effective scenario node is analyzed, the initial score of the expression capacity corresponding to each reply is determined, then the reply difficulty corresponding to each effective scenario node is determined, the age of the user is acquired, and finally the expression capacity score is comprehensively determined according to the initial score of the expression capacity, the reply difficulty and the age of the user.
It can be seen that, in this example, the at least one initial score of expressive power and the at least one response difficulty are obtained by determining the effective interaction node from the second reading record, and the accurate score of expressive power corresponding to the current age of the user is determined according to the actual age of the user. Thus, the accuracy and efficiency of the wearable device for determining the score of the user expression capacity can be improved through the multivariate scoring standard, and the accuracy of determining the completeness degree of the prompt sentence according to the score is further improved.
In one possible example, the obtaining the at least one expression capability initial score according to the user input sentence and the machine output sentence corresponding to the at least one valid interaction node includes: and executing the following operations aiming at the user input statement and the machine output statement corresponding to each effective interaction node to obtain the at least one expression capability initial score: judging whether a standard reply sentence exists or not according to the second reading record and the machine output sentence of the currently processed effective interaction node: if the standard reply sentence exists, determining the association degree between the user input sentence of the currently processed effective interaction node and the machine output sentence of the currently processed effective interaction node; and judging whether the association degree is larger than a preset value or not: if the association degree is judged to be larger than the preset value, determining statement similarity according to the user input statement and the standard reply statement of the currently processed effective interaction node; determining the sentence similarity as an expression capacity initial score corresponding to the currently processed effective interaction node; if the association degree is smaller than or equal to the preset value, determining that the expression capacity initial score corresponding to the currently processed effective interaction node is a preset low value; if no standard reply sentence exists, obtaining the structural integrity of the target sentence according to the user input sentence of the effective interaction node which is processed currently; and determining the structural integrity of the target sentence as an expression capability initial score corresponding to the currently processed effective interaction node.
Wherein the sentence similarity is used to characterize the accuracy of the currently processed user input sentence.
Wherein, the expression capability initial scoring is scored by sentence similarity or sentence structural integrity. The wearable device needs to determine whether the currently processed valid interaction node has a standard reply sentence, for example, some interaction nodes with reply contents which can be very open cannot have the standard reply sentence. If the standard reply sentences exist, determining whether the relevancy between the words spoken by the user and the machine output sentences corresponding to the interactive nodes is high, and if the relevancy is enough, judging whether the sentences corresponding to the input voices of the user are not more than the internal tolerance of the standard reply sentences, namely, whether the sentence similarity is enough; if the association degree is insufficient, the relationship between the words of the user and the questions or the story line is not great, and the comparison with standard reply sentences is not needed, so that a preset low value is directly determined. And if the currently processed effective interaction node does not have a standard reply sentence, determining an expression capability initial score according to the sentence structural integrity.
In the example, the initial score of the expression capacity corresponding to the effective interaction node is determined according to the multi-level judgment standard, so that the comprehensiveness and the authenticity of the determination process of the expression capacity score of the wearable device to the user are guaranteed, and the accuracy of the expression capacity score determined by the wearable device is improved.
In one possible example, the interacting with the voice server to obtain third voice data according to the current user input voice includes: sending a voice data request message carrying voice data corresponding to the current user input voice and a third content indication message to the voice server; and responding to the voice data response message sent by the voice server, and extracting the third voice data carried by the voice data response message.
The third content indication message is configured to indicate that the voice data carried by the voice data response message is the third voice data, and an order of the prompt statement in the machine output statement corresponding to the third voice data is after the reply statement.
The sequence of the prompt sentences is to be after the reply sentences, so that the user can answer the content of the new reply sentences based on the prompt sentences, and the function of guiding the user to reply is achieved.
In this example, the voice data of the user input voice and the third content indication message are sent to realize interaction with the voice server to obtain the corresponding third voice data, so that the wearable device can sequentially output the reply sentence and the prompt voice, the user can effectively output the user input voice according to the prompt sentence, and the practicability and the efficiency of the wearable device for providing the voice interaction service are improved.
Step S203, machine output voice corresponding to the voice data of the first machine output sentence or the second machine output sentence is displayed in the target voice interactive interface.
The display manner may be that a voice bar control is generated, and the user controls the wearable device to output the corresponding machine output voice by clicking the voice bar control, or performs text conversion and voice output of the machine output voice, which is not limited herein.
Step S204, repeating the above process until each interactive node in the target interactive scenario is completed, or in response to a preset return operation of the user on the target voice interactive interface, skipping to display to the target application interface.
In the actual voice interaction process, the user actively or passively returns to the interface of the previous hierarchy, namely the target application interface. The active return mode is to implement a preset return operation for the target and the interactive interface by the user, and the passive return mode is to complete the voice interaction of each interaction node in the target interaction scenario by the user through voice input.
Referring to fig. 5, fig. 5 is a schematic view illustrating a target application interface according to an embodiment of the application. As shown in fig. 5, fig. 5 is a target application interface displayed on the watch, in which an application uses a plurality of scripts for recommending users according to preference of users or hotness of scripts, 01 in fig. 5 is an icon corresponding to one of recommended scripts, wherein script 1, script 2, script 3 and script 4 are names of corresponding scripts; 02 in fig. 5 is a free script identifier for prompting a user that the corresponding script is currently in a free reading stage; 03 in fig. 5 is a catalog control, and a user can touch the catalog control to realize jump display of a preset catalog interface; 04 in fig. 5 is a voice input control, and the user may control the device to execute a certain instruction through the voice input control, for example, jump to a voice interactive interface of a scenario or close the target application.
In one possible example, the preset return operation includes at least one of: aiming at touch operation, preset return gesture and preset voice input keywords of a return application interface control preset on the target voice interaction interface.
The preset return operation can be optionally adapted or selected according to the manufacturer or the type of the device. For example, the wearable device is a watch, then the preset return operation may be a touch operation for a return application interface control or a voice input "return last interface"; when the wearable device is glasses, the preset return operation may be a voice input "i want to exit".
Referring to fig. 6, fig. 6 is a schematic view of a preset return operation according to an embodiment of the present application. FIG. 6 illustrates a target voice interactive interface, wherein 01 in FIG. 6 is a preset return gesture, i.e., a slide gesture in the upper right corner, which can be used by a user to jump the watch to display to the target application interface; in fig. 6, 02 is a drop-down control, by clicking the control, the text display frame corresponding to 03 in fig. 6 may be triggered to be displayed, and the text content displayed in the text display frame is the voice-to-text content corresponding to the voice bar control, where, because the content of the voice input of the user is "exit to the previous interface" and the preset voice input keyword is set to "exit", the watch will respond to the voice skip and display to the target application interface.
It can be seen that in this example, the preset return operation may be various, adapting to the device type and the age status of the user, improving the flexibility and intelligence of the wearable device to implement voice interaction.
As can be seen, fig. 2 is a flow chart of a response method of a prompt sentence provided by an embodiment of the present application, a user performs voice interaction by using a wearable device in a voice interaction system, and the wearable device obtains user input voice by responding to a voice input operation of the user for a target voice interaction interface; and then, selecting and outputting a first machine output sentence or a second machine output sentence which is adaptive to the current input voice of the user according to the duration of the target voice interactive interface. Therefore, each interactive node of the target interactive script is completed by repeating the process, and the user is guided to input the user input statement which accords with the corresponding interactive node independently by outputting the complete prompt statement or the incomplete prompt statement, namely, the content of the machine output statement and the completeness of the prompt statement are adjusted, so that the performance of voice interaction by the matched user is improved, the flexibility and the intelligence of the interactive product are improved, and the practicability of the interactive product is enhanced.
The following are embodiments of the apparatus of the present application, which are within the same concept as embodiments of the method of the present application, for performing the methods described in the embodiments of the present application. For convenience of explanation, the embodiments of the present application are only shown in the parts related to the embodiments of the present application, and specific technical details are not disclosed, please refer to the description of the embodiments of the present application method, which is not repeated here.
The response device for the prompt sentence provided by the embodiment of the application is applied to the wearable device 110 in the voice interaction system 100 shown in fig. 1, the voice interaction system 100 includes the wearable device 110 and the voice server 120, and the response device is specifically configured to execute the steps executed by the wearable device in the response method for the prompt sentence. The response device of the prompt sentence provided by the embodiment of the application can comprise modules corresponding to the corresponding steps.
The embodiment of the application can divide the functional modules of the response device of the prompt sentence according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules can be realized in a hardware mode or a software functional module mode. The division of the modules in the embodiment of the application is schematic, only one logic function is divided, and other division modes can be adopted in actual implementation.
Fig. 7a is a functional unit block diagram of a response device of a prompt sentence provided by an embodiment of the present application, where the functional unit block diagram is applied to a wearable device 110 in a voice interaction system 100, where the voice interaction system 100 includes the wearable device 110 and a voice server 120; the response means 70 of the alert sentence comprises: a first voice display unit 701, configured to respond to a voice input operation of a user on a target voice interaction interface, obtain a current user input voice recorded by the voice input operation, and display the current user input voice on the target voice interaction interface, where the target voice interaction interface is associated with a target interaction script; the voice data obtaining unit 702 is configured to interact with the voice server to obtain voice data of a first machine output sentence adapted to the current user input voice if the duration of the target voice interaction interface is less than a preset duration of display, where the content of the first machine output sentence is a reply sentence corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario; and if the duration of the display of the target voice interaction interface is longer than or equal to the preset display duration, interacting with the voice server to obtain voice data of a second machine output sentence adapting to the first reading record and the current voice input by the user, wherein the content of the second machine output sentence comprises the reply sentence and/or a target prompt sentence, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction duration, the target prompt sentence is a first prompt sentence or a second prompt sentence, the first prompt sentence is a complete prompt sentence for prompting the user to input the voice conforming to the current interaction node, and the second prompt sentence is an incomplete prompt sentence for prompting the user to input the voice conforming to the current interaction node; a second voice display unit 703, configured to display, in the target voice interactive interface, a machine output voice corresponding to the voice data of the first machine output sentence or the second machine output sentence; and a skip display unit 704, configured to repeat the above process until each interactive node in the target interactive scenario is completed, or skip display to a target application interface in response to a preset return operation of the user on the target voice interactive interface.
In one possible example, in terms of the user input voice according to the first reading record and the current time, interacting with the voice server to obtain voice data of a corresponding second machine output sentence, the voice data obtaining unit 702 is specifically configured to: determining a plurality of historical input voices input by the user according to the first reading record; determining a plurality of matching degrees according to the plurality of historical input voices and the target interaction scenario, wherein the matching degrees refer to the association degree of text content of single historical input voices and corresponding interaction nodes, and the plurality of matching degrees are in one-to-one correspondence with the plurality of historical input voices; according to the multiple matching degrees, confirming that the historical input voice with the corresponding matching degree higher than the preset matching degree is the target input voice; judging whether the ratio of the target input voice to the plurality of historical input voices is larger than or equal to a preset ratio: if the user reading confidence is greater than or equal to the preset ratio, determining that the user is high in reading confidence; the second reading record comprises machine output sentences and corresponding user input sentences in each interaction node in all interaction scenarios completed by the user, wherein the machine output sentences and the corresponding user input sentences are corresponding text contents of the user input voice input by the user; if the reading confidence is smaller than the preset ratio, determining the low reading confidence of the user; and according to the current voice input by the user, interacting with the voice server to obtain third voice data, wherein the contents of the second machine output statement corresponding to the third voice data are the reply statement and the first prompt statement.
In one possible example, in the aspect of the interaction with the voice server to obtain the first voice data or the second voice data adapted to the current user input voice and the pre-stored second reading record, the voice data obtaining unit 702 is specifically configured to: obtaining target user intention according to the current user input voice analysis; judging whether the intention of the target user is complete or not: if the intention of the target user is complete, sending a voice data request message carrying voice data corresponding to the current user input voice and a first content indication message to the voice server, wherein the first content indication message is used for indicating that the voice data carried by a voice data response message is the first voice data; responding to the voice data response message sent by the voice server, and extracting the first voice data carried by the voice data response message; if the intention of the target user is incomplete, determining an expression capacity score of the user according to the second reading record, wherein the expression capacity score refers to a score obtained by evaluating the voice response of the user according to the second reading record and a preset scoring rule; obtaining a prompt statement confirmation result according to the expression capability score, wherein the prompt statement confirmation result is used for representing the target prompt statement as the first prompt statement or the second prompt statement; the voice server sends voice data corresponding to the user input voice, the prompt statement confirmation result and a voice data request message of a second content indication message, wherein the second content indication message is used for indicating that the voice data carried by the voice data response message is the second voice data; and responding to the voice data response message sent by the voice server, and extracting the second voice data carried by the voice data response message.
In one possible example, in the aspect of determining the expressive power score of the user according to the second pre-stored reading record, the voice data obtaining unit 702 is specifically configured to: determining at least one effective interaction node according to the second reading record, wherein the effective interaction node refers to an open reply node which is replied by the user under the high reading carefully, and the open reply node refers to an interaction node which is marked by a developer of the target interaction scenario in advance and corresponds to a reply sentence which is a non-selection question; obtaining the at least one expression capability initial score according to the user input statement and the machine output statement corresponding to the at least one effective interaction node, wherein the expression capability initial score refers to a score obtained by evaluating the accuracy of the user input statement of a single effective interaction node or the structural integrity of the statement, and the at least one expression capability initial score corresponds to the at least one effective interaction node one by one; obtaining at least one reply difficulty according to the machine output statement corresponding to the at least one effective interaction node, wherein the at least one reply difficulty corresponds to the at least one effective interaction node one by one, and the reply difficulty is related to the length and the keyword number of the corresponding machine output statement; acquiring the user age of the user and a preset expression capacity scoring formula, wherein the expression capacity scoring formula is used for calculating expression capacity scores of the user at the current age, and comprises an age variable, an expression capacity initial scoring variable, a response difficulty variable and a weight coefficient corresponding to each variable; and obtaining the expression capability score according to the expression capability score formula, the age of the user, the at least one initial expression capability score and the at least one response difficulty.
In one possible example, in the aspect that the at least one expression capability initial score is obtained according to the user input sentence and the machine output sentence corresponding to the at least one valid interaction node, the voice data obtaining unit 702 is specifically configured to: and executing the following operations aiming at the user input statement and the machine output statement corresponding to each effective interaction node to obtain the at least one expression capability initial score: judging whether a standard reply sentence exists or not according to the second reading record and the machine output sentence of the currently processed effective interaction node: if the standard reply sentence exists, determining the association degree between the user input sentence of the currently processed effective interaction node and the machine output sentence of the currently processed effective interaction node; and judging whether the association degree is larger than a preset value or not: if the relevancy is larger than the preset value, determining statement similarity according to the user input statement of the currently processed effective interaction node and the standard reply statement, wherein the statement similarity is used for representing the accuracy of the currently processed user input statement; determining the sentence similarity as an expression capacity initial score corresponding to the currently processed effective interaction node; if the association degree is smaller than or equal to the preset value, determining that the expression capacity initial score corresponding to the currently processed effective interaction node is a preset low value; if no standard reply sentence exists, obtaining the structural integrity of the target sentence according to the user input sentence of the effective interaction node which is processed currently; and determining the structural integrity of the target sentence as an expression capability initial score corresponding to the currently processed effective interaction node.
In one possible example, in the aspect that the user input voice according to the current time interacts with the voice server to obtain third voice data, the voice data obtaining unit 702 is specifically configured to: sending a voice data request message carrying voice data corresponding to the current user input voice and a third content indication message to the voice server, wherein the third content indication message is used for indicating that the voice data carried by the voice data response message is the third voice data, and the sequence of the prompt statement in the machine output statement corresponding to the third voice data is behind the reply statement; and responding to the voice data response message sent by the voice server, and extracting the third voice data carried by the voice data response message.
In one possible example, the preset return operation includes at least one of: aiming at touch operation, preset return gesture and preset voice input keywords of a return application interface control preset on the target voice interaction interface.
In the case of using integrated units, as shown in fig. 7b, fig. 7b is a functional unit block diagram of another response device for a hint statement according to an embodiment of the present application. In fig. 7b, the response means 71 of the hint statement comprises: a processing module 712 and a communication module 711. The processing module 712 is configured to control and manage actions of the response means of the alert sentence, for example, steps of the first voice display unit 701, the voice data acquisition unit 702, the second voice display unit 703, and the skip display unit 704, and/or other processes for performing the techniques described herein. The communication module 711 is used to support interactions between the responding device of the alert sentence and other devices. As shown in fig. 7b, the response means of the hint statement may comprise a memory module 713, the memory module 713 being arranged to store program code and data of the response means of the hint statement.
The processing module 712 may be a processor or controller, such as a central processing unit (Central Processing Unit, CPU), general purpose processor, digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 711 may be a transceiver, an RF circuit, a communication interface, or the like. The memory module 713 may be a memory.
All relevant contents of each scenario related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein. The response means 71 of the alert sentence may perform the response method of the alert sentence shown in fig. 2.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Fig. 8 is a block diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, the terminal device 800 may include one or more of the following components: a processor 801, a memory 802 coupled to the processor 801, wherein the memory 802 may store one or more computer programs that may be configured to implement the methods as described in the embodiments above when executed by the one or more processors 801. The terminal device 800 may be the wearable device 110 in the above-described embodiments.
Processor 801 may include one or more processing cores. The processor 801 connects various parts within the overall terminal device 800 using various interfaces and lines, performs various functions of the terminal device 800 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 802, and invoking data stored in the memory 802. Alternatively, the processor 801 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field-Programmable gate array (FPGA), programmable Logic Array (PLA). The processor 801 may integrate one or a combination of several of a central processing unit (CentralProcessing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 801 and may be implemented solely by a single communication chip.
The Memory 802 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Memory 802 may be used to store instructions, programs, code, sets of codes, or instruction sets. The memory 802 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device 800 in use, etc.
It will be appreciated that the terminal device 800 may include more or fewer structural elements than those shown in the above-described block diagrams and is not limited in this regard.
The embodiments of the present application also provide a computer storage medium having stored thereon a computer program/instruction which, when executed by a processor, performs part or all of the steps of any of the methods described in the method embodiments above.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: u disk, removable hard disk, magnetic disk, optical disk, volatile memory or nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), direct memory bus RAM (DR RAM), and the like, various mediums that can store program code.
Although the present invention is disclosed above, the present invention is not limited thereto. Variations and modifications, including combinations of the different functions and implementation steps, as well as embodiments of the software and hardware, may be readily apparent to those skilled in the art without departing from the spirit and scope of the invention.

Claims (10)

1. The response method of the prompt sentence is characterized by being applied to wearable equipment in a voice interaction system, wherein the voice interaction system comprises the wearable equipment and a voice server; the method comprises the following steps:
responding to the voice input operation of a user aiming at a target voice interaction interface, acquiring the current user input voice recorded by the voice input operation, and displaying the current user input voice in the target voice interaction interface, wherein the target voice interaction interface is associated with a target interaction play; the method comprises the steps of,
if the duration of the target voice interaction interface is smaller than the preset display duration, interacting with the voice server to obtain voice data of a first machine output statement adapted to the current user input voice, wherein the content of the first machine output statement is a reply statement corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario;
If the continuous display time length of the target voice interaction interface is longer than or equal to the preset display time length, interacting with the voice server to obtain voice data of a second machine output statement which is adaptive to the first reading record and the current voice input by the user, wherein the content of the second machine output statement comprises the reply statement and/or the target prompt statement, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction time length, the target prompt statement is a first prompt statement or a second prompt statement, the first prompt statement is a complete prompt statement used for prompting the user to input voice conforming to the current interaction node, and the second prompt statement is an incomplete prompt statement used for prompting the user to input voice conforming to the current interaction node;
displaying machine output voice corresponding to voice data of the first machine output sentence or the second machine output sentence in the target voice interactive interface;
and repeating the process until each interactive node in the target interactive script is completed, or in response to the preset return operation of the user on the target voice interactive interface, skipping to display the target voice interactive interface.
2. The method of claim 1, wherein said interacting with the voice server to obtain voice data for a corresponding second machine output sentence based on the first reading record and the current user input voice comprises:
determining a plurality of historical input voices input by the user according to the first reading record;
determining a plurality of matching degrees according to the plurality of historical input voices and the target interaction scenario, wherein the matching degrees refer to the association degree of text content of single historical input voices and corresponding interaction nodes, and the plurality of matching degrees are in one-to-one correspondence with the plurality of historical input voices;
according to the multiple matching degrees, confirming that the historical input voice with the corresponding matching degree higher than the preset matching degree is the target input voice;
judging whether the ratio of the target input voice to the plurality of historical input voices is larger than or equal to a preset ratio:
if the user reading confidence is greater than or equal to the preset ratio, determining that the user is high in reading confidence; the second reading record comprises machine output sentences and corresponding user input sentences in each interaction node in all interaction scenarios completed by the user, wherein the machine output sentences and the corresponding user input sentences are corresponding text contents of the user input voice input by the user;
If the reading confidence is smaller than the preset ratio, determining the low reading confidence of the user; and according to the current voice input by the user, interacting with the voice server to obtain third voice data, wherein the contents of the second machine output statement corresponding to the third voice data are the reply statement and the first prompt statement.
3. The method of claim 2, wherein said interacting with the voice server to obtain the first voice data or the second voice data adapted to the current user input voice and the pre-stored second reading record comprises:
obtaining target user intention according to the current user input voice analysis;
judging whether the intention of the target user is complete or not:
if the intention of the target user is complete, sending a voice data request message carrying voice data corresponding to the current user input voice and a first content indication message to the voice server, wherein the first content indication message is used for indicating that the voice data carried by a voice data response message is the first voice data; responding to the voice data response message sent by the voice server, and extracting the first voice data carried by the voice data response message;
If the intention of the target user is incomplete, determining an expression capacity score of the user according to the second reading record, wherein the expression capacity score refers to a score obtained by evaluating the voice response of the user according to the second reading record and a preset scoring rule; obtaining a prompt statement confirmation result according to the expression capability score, wherein the prompt statement confirmation result is used for representing the target prompt statement as the first prompt statement or the second prompt statement; the voice server sends voice data corresponding to the user input voice, the prompt statement confirmation result and a voice data request message of a second content indication message, wherein the second content indication message is used for indicating that the voice data carried by the voice data response message is the second voice data; and responding to the voice data response message sent by the voice server, and extracting the second voice data carried by the voice data response message.
4. A method according to claim 3, wherein said determining an expressive power score of said user from a pre-stored second reading record comprises:
Determining at least one effective interaction node according to the second reading record, wherein the effective interaction node refers to an open reply node which is replied by the user under the high reading carefully, and the open reply node refers to an interaction node which is marked by a developer of the target interaction scenario in advance and corresponds to a reply sentence which is a non-selection question;
obtaining the at least one expression capability initial score according to the user input statement and the machine output statement corresponding to the at least one effective interaction node, wherein the expression capability initial score refers to a score obtained by evaluating the accuracy of the user input statement of a single effective interaction node or the structural integrity of the statement, and the at least one expression capability initial score corresponds to the at least one effective interaction node one by one;
obtaining at least one reply difficulty according to the machine output statement corresponding to the at least one effective interaction node, wherein the at least one reply difficulty corresponds to the at least one effective interaction node one by one, and the reply difficulty is related to the length and the keyword number of the corresponding machine output statement;
Acquiring the user age of the user and a preset expression capacity scoring formula, wherein the expression capacity scoring formula is used for calculating expression capacity scores of the user at the current age, and comprises an age variable, an expression capacity initial scoring variable, a response difficulty variable and a weight coefficient corresponding to each variable;
and obtaining the expression capability score according to the expression capability score formula, the age of the user, the at least one initial expression capability score and the at least one response difficulty.
5. The method of claim 4, wherein the deriving the at least one expressive power initial score from the user-input statement and the machine-output statement corresponding to the at least one active interaction node comprises:
and executing the following operations aiming at the user input statement and the machine output statement corresponding to each effective interaction node to obtain the at least one expression capability initial score:
judging whether a standard reply sentence exists or not according to the second reading record and the machine output sentence of the currently processed effective interaction node:
if the standard reply sentence exists, determining the association degree between the user input sentence of the currently processed effective interaction node and the machine output sentence of the currently processed effective interaction node; and judging whether the association degree is larger than a preset value or not:
If the relevancy is larger than the preset value, determining statement similarity according to the user input statement of the currently processed effective interaction node and the standard reply statement, wherein the statement similarity is used for representing the accuracy of the currently processed user input statement; determining the sentence similarity as an expression capacity initial score corresponding to the currently processed effective interaction node;
if the association degree is smaller than or equal to the preset value, determining that the expression capacity initial score corresponding to the currently processed effective interaction node is a preset low value;
if no standard reply sentence exists, obtaining the structural integrity of the target sentence according to the user input sentence of the effective interaction node which is processed currently; and determining the structural integrity of the target sentence as an expression capability initial score corresponding to the currently processed effective interaction node.
6. The method of claim 2, wherein said interacting with the voice server to obtain third voice data based on the current user input voice comprises:
sending a voice data request message carrying voice data corresponding to the current user input voice and a third content indication message to the voice server, wherein the third content indication message is used for indicating that the voice data carried by the voice data response message is the third voice data, and the sequence of the prompt statement in the machine output statement corresponding to the third voice data is behind the reply statement;
And responding to the voice data response message sent by the voice server, and extracting the third voice data carried by the voice data response message.
7. The method of claim 1, wherein the preset return operation comprises at least one of:
aiming at touch operation, preset return gesture and preset voice input keywords of a return application interface control preset on the target voice interaction interface.
8. The response device of the prompt sentence is characterized by being applied to wearable equipment in a voice interaction system, wherein the voice interaction system comprises the wearable equipment and a voice server; the device comprises:
the first voice display unit is used for responding to the voice input operation of a user aiming at a target voice interaction interface, acquiring the current user input voice recorded by the voice input operation, and displaying the current user input voice in the target voice interaction interface, wherein the target voice interaction interface is associated with a target interaction script;
the voice data acquisition unit is used for interacting with the voice server to obtain the voice data of a first machine output statement which is adaptive to the current user input voice if the duration of the target voice interaction interface is smaller than the preset display duration, wherein the content of the first machine output statement is a reply statement corresponding to a current interaction node, and the current interaction node is any one of a plurality of interaction nodes in the target interaction scenario; and if the duration of the display of the target voice interaction interface is longer than or equal to the preset display duration, interacting with the voice server to obtain voice data of a second machine output sentence adapting to the first reading record and the current voice input by the user, wherein the content of the second machine output sentence comprises the reply sentence and/or a target prompt sentence, the first reading record is an interaction record generated by the user continuously reading the target interaction scenario within the preset interaction duration, the target prompt sentence is a first prompt sentence or a second prompt sentence, the first prompt sentence is a complete prompt sentence for prompting the user to input the voice conforming to the current interaction node, and the second prompt sentence is an incomplete prompt sentence for prompting the user to input the voice conforming to the current interaction node;
The second voice display unit is used for displaying machine output voice corresponding to voice data of the first machine output sentence or the second machine output sentence in the target voice interactive interface;
and the skip display unit is used for repeating the process until each interactive node in the target interactive script is completed, or responding to the preset return operation of the user on the target voice interactive interface, and skipping and displaying the target voice interactive interface to a target application interface.
9. A wearable device comprising a processor, a memory, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-7.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-7.
CN202310952166.2A 2023-07-28 2023-07-28 Prompt statement response method and related device Pending CN117032452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310952166.2A CN117032452A (en) 2023-07-28 2023-07-28 Prompt statement response method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310952166.2A CN117032452A (en) 2023-07-28 2023-07-28 Prompt statement response method and related device

Publications (1)

Publication Number Publication Date
CN117032452A true CN117032452A (en) 2023-11-10

Family

ID=88630969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310952166.2A Pending CN117032452A (en) 2023-07-28 2023-07-28 Prompt statement response method and related device

Country Status (1)

Country Link
CN (1) CN117032452A (en)

Similar Documents

Publication Publication Date Title
US20210225380A1 (en) Voiceprint recognition method and apparatus
TWI684881B (en) Method, system and non-transitory machine-readable medium for generating a conversational agentby automatic paraphrase generation based on machine translation
AU2022221524B2 (en) Tailoring an interactive dialog application based on creator provided content
US8972265B1 (en) Multiple voices in audio content
US9875494B2 (en) Using intents to analyze and personalize a user's dialog experience with a virtual personal assistant
US20200126566A1 (en) Method and apparatus for voice interaction
CN111737411A (en) Response method in man-machine conversation, conversation system and storage medium
CN110287461A (en) Text conversion method, device and storage medium
CN104488027A (en) Speech processing system and terminal device
CN110808038B (en) Mandarin evaluating method, device, equipment and storage medium
CN109801527B (en) Method and apparatus for outputting information
CN110795913A (en) Text encoding method and device, storage medium and terminal
CN112579760A (en) Man-machine conversation method and device, computer equipment and readable storage medium
TWI674517B (en) Information interaction method and device
CN113761156A (en) Data processing method, device and medium for man-machine interaction conversation and electronic equipment
CN116401354A (en) Text processing method, device, storage medium and equipment
Khanna et al. Anatomy and utilities of an artificial intelligence conversational entity
CN111444729A (en) Information processing method, device, equipment and readable storage medium
CN110610698A (en) Voice labeling method and device
CN115658875B (en) Data processing method based on chat service and related products
CN117032452A (en) Prompt statement response method and related device
CN111966803B (en) Dialogue simulation method and device, storage medium and electronic equipment
CN111556096B (en) Information pushing method, device, medium and electronic equipment
WO2021098876A1 (en) Question and answer method and apparatus based on knowledge graph
CN114519347A (en) Method and device for generating conversation content for language and vocabulary learning training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination