CN111524515A

CN111524515A - Voice interaction method and device, electronic equipment and readable storage medium

Info

Publication number: CN111524515A
Application number: CN202010360957.2A
Authority: CN
Inventors: 李俊彦
Original assignee: Hisense Electronic Technology Wuhan Co ltd
Current assignee: Hisense Electronic Technology Wuhan Co ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-11

Abstract

The embodiment of the application provides a voice interaction method, a voice interaction device, electronic equipment and a readable storage medium, which can realize the collection of voice information of a user and generate at least two reply sentences according to the voice information; acquiring characteristic information of each reply statement in the at least two reply statements; and determining a target reply sentence in the at least two reply sentences according to the preset personalized feature information and the feature information of each reply sentence, and outputting the target reply sentence. In the embodiment of the application, when the voice interaction terminal is configured with different personalized feature information, even if the input voice information is the same, the output reply sentence is different, and the output reply sentence in the application is related to the personalized feature information configured by the user in advance, so that the voice interaction process is more personalized.

Description

Voice interaction method and device, electronic equipment and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of voice interaction, in particular to a voice interaction method and device, electronic equipment and a readable storage medium.

Background

With the development of voice recognition technology, more and more electronic devices are equipped with voice assistants, and users can perform voice interaction with the electronic devices through the voice assistants, so that various problems encountered in daily life are solved.

Most of the voice assistants configured on the existing electronic devices collect voice sent by a user, convert the voice into a corresponding text by using a voice recognition technology, find reply information which is most matched with the text from a preset database based on the text, and output the reply information.

However, since the database is configured in a unified manner, the reply information output by the voice assistant is the same for any type of user if the input voice information is the same. For example, after the user a and the user b respectively send out the voices of "i have got a cold" to the voice assistant, the reply messages output by the voice assistant will be identical, but there may be differences between the reply messages that the user b and the user a desire to receive, for example, the user a wants to receive a cure mode of getting a cold, and the user b wants to receive some human care. Namely, the voice interaction process of the existing voice assistant lacks personalized configuration and is tedious.

Disclosure of Invention

The embodiment of the application provides a voice interaction method, a voice interaction device, electronic equipment and a readable storage medium, which can solve the technical problem that the voice interaction process of the existing voice assistant is lack of personalized configuration.

In a first aspect, an embodiment of the present invention provides a voice interaction method, where the method includes:

collecting voice information of a user, and generating at least two reply sentences according to the voice information;

acquiring characteristic information of each reply statement in the at least two reply statements;

determining a target reply statement in the at least two reply statements according to the pre-configured personalized feature information and the feature information of each reply statement;

and outputting the target reply statement.

In a possible embodiment, the collecting voice information of the user and generating at least two reply sentences according to the voice information includes:

converting the voice information into a first text;

converting the first text into a second text specifying criteria;

and retrieving the associated information corresponding to the first text and the second text in a preset database to obtain the at least two reply sentences.

In a possible implementation manner, the obtaining feature information of each of the at least two reply sentences includes:

extracting key information from each reply statement, wherein the key information comprises any one or more of the following information: keywords, key phrases, and key sentences;

and determining the characteristic information of each reply statement according to the characteristic label associated with the key information.

In a possible implementation manner, the determining a target reply sentence among the at least two reply sentences according to the preconfigured personalized feature information and the feature information of each reply sentence includes:

determining the matching degree between the personalized feature information and the feature information of each reply sentence;

and determining the reply sentence corresponding to the feature information with the highest matching degree of the personalized feature information as the target reply sentence.

In a possible implementation manner, the determining a matching degree between the personalized feature information and the feature information of each reply sentence includes:

traversing each reply statement, and determining the similarity between each feature tag in the personalized feature information and each feature tag in the traversed feature information of the reply statement;

and determining the sum of the similarity between each feature tag in the personalized feature information and each feature tag in the traversed feature information of the reply sentence as the matching degree between the personalized feature information and the traversed feature information of the reply sentence.

In one possible implementation, the outputting the target reply statement includes:

determining broadcast tone corresponding to the personalized feature information;

and according to the broadcast tone, converting the reply statement into broadcast voice and broadcasting.

In a possible embodiment, the personalized feature information includes any one or more of the following feature tags: age, personality, occupation, sex, intellectual quotient, role.

In a second aspect, an embodiment of the present invention provides a voice interaction apparatus, where the apparatus includes:

the reply sentence generation module is used for acquiring voice information of a user and generating at least two reply sentences according to the voice information;

the acquisition module is used for acquiring the characteristic information of each reply statement in the at least two reply statements;

the determining module is used for determining a target reply statement in the at least two reply statements according to the preset personalized feature information and the feature information of each reply statement;

and the output module is used for outputting the target reply statement.

In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory, causing the at least one processor to perform the voice interaction method as provided by the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for voice interaction as provided in the first aspect is implemented.

The voice interaction method, the voice interaction device, the electronic equipment and the readable storage medium can realize the collection of the voice information of a user and generate at least two reply sentences according to the voice information; acquiring characteristic information of each reply statement in the at least two reply statements; and determining a target reply sentence in the at least two reply sentences according to the preset personalized feature information and the feature information of each reply sentence, and outputting the target reply sentence. In the embodiment of the application, after the voice interaction terminal collects the voice information of the user, at least two reply sentences are generated, and then one target reply sentence is selected from the at least two reply sentences to be output according to the feature information of the at least two reply sentences and the personalized feature information pre-configured by the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a voice interaction system according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a voice interaction method provided in an embodiment of the present invention;

fig. 3 is a schematic flowchart illustrating a detailed step of step S201 in a voice interaction method according to an embodiment of the present application;

fig. 4 is a schematic view of a usage scenario of a voice interaction method according to an embodiment of the present application;

FIG. 5 is a block diagram of a voice interaction apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The terms "first," "second," and the like in the embodiments of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the designations used as such may be interchanged under appropriate circumstances in order to facilitate describing the embodiments of the application.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a voice interaction system provided in an embodiment of the present application. As shown in fig. 1, the system provided by the present embodiment includes a voice interactive terminal 101 and a server 102. The voice interaction terminal 101 may be a child story machine, a mobile phone, a tablet computer, a vehicle-mounted terminal, a television, an intelligent sound, a wearable intelligent device, an intelligent appliance, or the like. The implementation manner of the voice interaction terminal 101 is not particularly limited in this embodiment, as long as the voice interaction terminal 101 can perform voice interaction with the user.

Speech Interaction (Speech Interaction) is based on technologies such as Speech recognition, natural language understanding, Speech synthesis and the like, and gives intelligent man-machine Interaction experience of the type of 'being able to listen, speak and understand you' to a terminal in various practical application scenes. The method is suitable for multiple application scenes including intelligent question answering, intelligent playing, intelligent searching and other scenes.

The user inputs a question and answer sentence to the voice interactive terminal 101 through voice, and the voice interactive terminal 101 can obtain a question and answer result according to the question and answer sentence and feed back the question and answer result to the user. Specifically, the voice interaction terminal 101 may locally obtain a question and answer result according to a corpus stored by the voice interaction terminal; the query statement may also be sent to the server 102, and the server 102 obtains the query and answer result from a preset database and feeds back the result to the voice interactive terminal 101. The specific implementation manner of the present embodiment is not particularly limited, and the voice interactive terminal 101 may obtain the question and answer result from the local and the server 102 may obtain the question and answer result according to the query statement.

In the existing voice interaction process, the corpus stored in the terminal itself and the database preset by the server 102 are all configured in a unified manner, so when the query sentences input by the user are the same, the question-answer result output by the voice interaction terminal 101 is also unchanged. However, the expected responses may be different for different users after inputting the same query sentence, for example, after the user a and the user b respectively send out the voice of "i am cold" to the voice interaction terminal 101, the user a may want to receive a cure mode of the cold, and the user b may want to receive some human care. Namely, the existing voice interaction process is difficult to meet the personalized requirements of users.

In order to solve the technical problem, the present application provides a voice interaction method, which includes acquiring voice information of a user, generating at least two reply sentences according to the voice information, then obtaining feature information of each reply sentence of the at least two reply sentences, determining a target reply sentence in the at least two reply sentences according to pre-configured personalized feature information and the feature information of each reply sentence, and outputting the target reply sentence. Therefore, when the voice interaction terminal is configured with different personalized feature information, even if the input voice information is the same, the output reply sentence is different, and the output reply sentence in the application is related to the personalized feature information configured by the user in advance, so that the voice interaction process can be more personalized. The following examples are given for illustrative purposes.

Referring to fig. 2 and fig. 2 are schematic flow diagrams of a voice interaction method provided in an embodiment of the present application, where an execution subject in the embodiment of the present application may be a voice interaction terminal in the embodiment shown in fig. 1, or may also be a server in the embodiment shown in fig. 1, and this embodiment is not limited herein. As shown in fig. 2, the voice interaction method includes:

s201, collecting voice information of a user, and generating at least two reply sentences according to the voice information.

In the embodiment of the application, a microphone array is arranged in the voice interaction terminal, when the voice interaction terminal enters a voice interaction state, voice information sent by surrounding users is collected through the microphone array, then the voice information is identified, and at least two reply sentences are generated according to an identification result.

Optionally, the voice interaction terminal may use a local voice recognition system to recognize the voice information input by the user, and obtain a recognition result.

Or, the voice interaction terminal may also send the voice information input by the user to the server, and then the server identifies the voice information to obtain an identification result, and feeds the identification result back to the voice interaction terminal.

Optionally, after obtaining the recognition result corresponding to the voice information input by the user, the voice interaction terminal may query, based on the recognition result, reply information related to the recognition result from a corpus stored in the voice interaction terminal, and then generate at least two reply sentences according to the reply information.

Or after the voice interaction terminal obtains the recognition result corresponding to the voice information input by the user, the server may also query reply information related to the recognition result from a preset database, then generate at least two reply statements according to the reply information, and feed back the generated at least two reply statements to the voice interaction terminal.

For example, after the voice interaction terminal collects voice information that the user inputs 'i have a cold', reply information related to 'the cold' can be inquired in a local corpus or a database preset by the server, and at least two reply sentences are generated, such as 'the cold needs to drink more hot water', 'the current nearest pharmacy is 500 meters away, the pharmacy has cold medicine for sale', 'the cold is a common acute upper respiratory tract viral infectious disease, needs to pay more attention to rest, and consumes less greasy and cold food'.

S202, acquiring characteristic information of each reply statement in the at least two reply statements.

In the embodiment of the application, the feature information of each reply statement can be determined according to the semantics or the key information of each reply statement.

For example, for "catch a cold and drink too much", the reply sentence generally comes from the literary care of relatives and friends, and the corresponding characteristic information can be "friends", "body stickers", and the like; for the 'nearest previous pharmacy is 500 meters away from you, the pharmacy has cold medicine sales', the reply sentence is generally intelligently pushed by a merchant, and the corresponding characteristic information can be 'pushing', 'merchant' and the like; for the "cold is a common acute upper respiratory viral infectious disease, which requires more attention to rest and less greasy and cold food, the reply sentence generally comes from professionals such as doctors, and the corresponding characteristic information can be" professional "," popular science ", etc.

S203, determining a target reply sentence in the at least two reply sentences according to the preset personalized feature information and the feature information of each reply sentence.

In the embodiment of the application, the user can configure the personalized feature information of the voice interaction terminal based on the personalized requirement of the user.

For example, if the user needs the voice interaction terminal to perform voice interaction with the user by using the identity of the friend, the personalized feature information of the voice interaction terminal may be configured in advance as "friend", "sticky note", and the like. If the user needs the voice interaction terminal to perform the voice interaction with the identity of the professional, personalized feature information of the voice interaction terminal can be configured in advance to be 'professional', 'popular science', and the like.

It can be understood that, in the embodiment of the present application, a reply sentence with characteristic information identical or similar to the personalized characteristic information may be selected as a target reply sentence from the at least two reply sentences according to the preconfigured personalized characteristic information.

For example, when the personalized feature information of the voice interaction terminal is pre-configured as a "friend", a reply sentence, of which the feature information is also the "friend", in the at least two reply sentences is taken as a target reply sentence; when the personalized feature information of the voice interaction terminal is pre-configured as a professional, the reply sentence of which the feature information is the professional in the at least two reply sentences is taken as the target reply sentence.

And S204, outputting the target reply statement.

In the embodiment of the application, after the target reply sentence is determined, the target reply sentence can be displayed on a display interface of the voice interaction terminal in a text mode. Or, the target reply statement can be broadcasted after being converted into voice in a voice broadcasting mode.

According to the voice interaction method provided by the embodiment of the application, the voice interaction terminal generates at least two reply sentences after acquiring the voice information of the user, and then selects one target reply sentence from the at least two reply sentences to output according to the feature information of the at least two reply sentences and the personalized feature information pre-configured by the user, so that when the voice interaction terminal is configured with different personalized feature information, the output reply sentences can be different even if the input voice information is the same, and the output reply sentences in the application are related to the personalized feature information pre-configured by the user, so that the voice interaction process can be more personalized.

Based on the content described in the foregoing embodiment, with reference to fig. 3, fig. 3 is a schematic flowchart illustrating a detailed step of step S201 in a voice interaction method provided in an embodiment of the present application, and in a feasible implementation manner of the present application, the collecting voice information of a user described in step S201 and generating at least two reply sentences according to the voice information specifically include:

s301, converting the voice information into a first text.

In the embodiment of the application, after the voice interaction terminal collects the voice information of the user, the voice information can be converted into the first text through the voice recognition technology. For example, after voice information of 'i have a cold' sent by the user is collected, the voice information is converted into a first text through a voice recognition technology: i catch a cold.

S302, converting the first text into a second text with specified standard.

In the embodiment of the application, after the first text is obtained, the first text may be converted into a second text with a specified standard through semantic recognition, part-of-speech tagging and other manners.

The specified standard can be understood as a standard sentence which can be recognized by the voice interaction terminal or the voice interaction system. For example, the first text "i have got a cold" may be converted into a second text: the user catches a cold.

S303, retrieving the associated information corresponding to the first text and the second text in a preset database to obtain the at least two reply sentences.

In the embodiment of the present application, after the first text and the second text are determined, the associated information corresponding to the first text and the second text may be retrieved from a preset database.

For example, when the first text is "i have a cold", the second text is: when the user catches a cold, a cold treatment scheme, cold science popularization information, cold symptom analysis information, caring words, comfort words or laugh words and the like which are related to the cold can be searched.

The preset database may be a local database of the voice interaction terminal, or a local database of the server.

Optionally, the above search may be performed using an inverted index technique and/or a semantic search technique.

Further, after the associated information corresponding to the first text and the second text is retrieved, key information such as keywords, key phrases, key sentences, etc. in the associated information may be extracted, and then the key information may be input into a Natural Language Generation (NLG) system, and at least two reply sentences may be generated by the NLG system.

Among them, NLG systems usually have two modes, one is to generate a reply sentence according to a template, and the other is to generate a reply sentence according to a deep learning training model.

Specifically, for the first mode, some reply templates in the case of common conversations may be preset in the database, and then reply statements are generated according to the reply templates according to the extracted key information. For the second mode, a language-generated model may be trained based on the historically labeled training data, and then the extracted key information may be placed into the model to generate a reply sentence.

According to the voice interaction method provided by the embodiment of the application, after the voice information of a user is collected, the voice information is converted into a first text, and then the first text is converted into a second text with a specified standard; and then retrieving the associated information corresponding to the first text and the second text in a preset database, and generating at least two reply sentences according to the associated information. And searching based on the first text and the second text is helpful for improving the accuracy of the search result.

Based on the content described in the foregoing embodiment, in a possible implementation manner of the present application, the obtaining the feature information of each reply statement in the at least two reply statements described in step S202 specifically includes:

and extracting key information from each reply statement, and determining the characteristic information of each reply statement according to the characteristic tag associated with the key information. The key information includes any one or more of the following information: keywords, key phrases, key sentences.

Alternatively, the TextRank algorithm may be utilized to extract key information from each reply statement. The TextRank algorithm is adapted from a webpage sorting algorithm Pagerank, and can extract keywords and key phrases of a given text and extract key sentences of the text by using an extraction type automatic abstract method.

In the embodiment of the application, characteristic tags can be added to all keywords, key phrases and key sentences in the database in advance. For example, a characteristic label "care" is added to "drink hot water more", a characteristic label "professional" is added to "acute upper respiratory viral infectious disease", and the like.

Assuming that the reply sentence is 'catch a cold and drink hot water more', the extracted key information comprises 'catch a cold' and 'drink hot water more', and the feature label corresponding to 'catch a cold' is 'disease' and 'drink hot water more' is 'care', so that the feature information of the reply sentence can be determined to be disease and care.

And then, assuming that the reply sentence is that the cold is a common acute upper respiratory tract viral infection disease, the extracted key information comprises the cold and the acute upper respiratory tract viral infection disease, and the feature label corresponding to the cold is the disease and the feature label corresponding to the acute upper respiratory tract viral infection disease is the professional, so that the feature information of the reply sentence can be determined to be the disease and the professional.

In the embodiment of the application, after the feature information of each reply sentence is determined, the matching degree between the preset personalized feature information and the feature information of each reply sentence is determined, and then the reply sentence corresponding to the feature information with the highest matching degree of the personalized feature information is determined as the target reply sentence.

Optionally, the personalized feature information may include any one or more of the following feature tags: age, personality, occupation, sex, intellectual quotient, role.

For better understanding of the embodiment of the present application, referring to fig. 4, fig. 4 is a schematic view of a usage scenario of a voice interaction method provided in the embodiment of the present application.

In the embodiment of the application, after the user inputs 'I catch a cold', two reply sentences are generated, namely 'catch a cold and drink more hot water', and 'catch a cold is a common acute upper respiratory tract viral infection disease and needs to pay more attention to rest and eat less greasy and cold food'. Wherein, the characteristic information corresponding to 'having a cold and needing to drink more hot water' includes: general friends, care, sentiment, 100 points. The characteristic information corresponding to the cold is that the cold is a common acute upper respiratory tract viral infection disease and needs more attention to rest and less greasy and cold food, and comprises the following steps: doctor, professional answer, sentiment 140 points.

Then, when the personalized feature parameters preconfigured by the user are friends and care, the feature information corresponding to the reply sentence "more hot water is drunk" is closer to the personalized feature parameters, so that the reply sentence can be output.

When the personalized feature parameters preconfigured by the user are professional suggestions, the doctor belongs to the professional and the professional suggestions are basically similar to the professional answers, so that the reply sentence, namely the cold is a common acute upper respiratory viral infection disease, the feature information corresponding to the more careful rest and less greasy and cold food is needed to be closer to the personalized feature parameters, and the reply sentence can be output.

In a possible implementation manner, each reply sentence may be traversed, and the similarity between each feature tag in the personalized feature information and each feature tag in the traversed feature information of the reply sentence is determined; and determining the sum of the similarity between each feature tag in the personalized feature information and each feature tag in the traversed feature information of the reply sentence as the matching degree between the personalized feature information and the traversed feature information of the reply sentence.

It can be understood that, when the similarity between the feature tags in the feature information of any reply sentence of each feature tag in the personalized feature information is high, it can be understood that the matching degree between the personalized feature information and the feature information of the reply sentence is also higher.

In this embodiment of the application, when the number of the target reply sentences is only one, the target reply sentences may be directly output, and when the number of the target reply sentences is only two or more, one reply sentence may be determined from the plurality of target reply sentences through the decision tree ordering model and output. The decision tree ranking model can be trained according to the importance of the personalized feature information.

After the target reply sentence is determined, the voice interaction terminal can display the target reply sentence on a display interface of the voice interaction terminal in a text form; the target reply sentence can also be broadcasted after being converted into voice.

In a feasible implementation manner, the broadcast mood corresponding to the personalized feature information may be determined first, and then the reply sentence is converted into a broadcast voice according to the broadcast mood and broadcast, so that the interest of voice interaction is increased.

For example, when the broadcast voice corresponding to the personalized feature information is soft, the reply sentence can be converted into the broadcast voice with soft voice for broadcast; when the broadcast voice corresponding to the personalized feature information is steady, the reply sentences can be converted into the broadcast voice with steady voice for broadcasting.

Based on the content described in the foregoing embodiment, an embodiment of the present application further provides a voice interaction apparatus, and referring to fig. 5, fig. 5 is a schematic diagram of program modules of the voice interaction apparatus provided in the embodiment of the present application, in a possible implementation manner of the present application, the voice interaction apparatus 50 includes:

the reply statement generating module 501 is configured to collect voice information of a user, and generate at least two reply statements according to the voice information.

An obtaining module 502, configured to obtain feature information of each reply statement in at least two reply statements.

The determining module 503 is configured to determine a target reply sentence in the at least two reply sentences according to the preconfigured personalized feature information and the feature information of each reply sentence.

An output module 504, configured to output the target reply statement.

In the voice interaction device 50 provided in the embodiment of the present application, after the voice information of the user is collected, at least two reply sentences are generated, and then one target reply sentence is selected from the at least two reply sentences to be output according to the feature information of the at least two reply sentences and the personalized feature information pre-configured by the user, so that when the voice interaction terminal configures different personalized feature information, even if the input voice information is the same, the output reply sentences will be different, and since the output reply sentences in the present application are related to the personalized feature information pre-configured by the user, the voice interaction process can be more personalized.

In a possible implementation manner, the reply statement generation module 501 is specifically configured to:

converting the voice information into a first text;

converting the first text into a second text with specified standard;

and searching the associated information corresponding to the first text and the second text in a preset database to obtain at least two reply sentences.

In a possible implementation, the obtaining module 502 is specifically configured to:

extracting key information from each reply statement, the key information including any one or more of the following: keywords, key phrases, and key sentences;

In a possible implementation, the determining module 503 is specifically configured to:

and determining the reply sentence corresponding to the characteristic information with the highest matching degree of the personalized characteristic information as a target reply sentence.

Optionally, traversing each reply statement, and determining a similarity between each feature tag in the personalized feature information and each feature tag in the feature information of the traversed reply statement; and determining the sum of the similarity between each feature tag in the personalized feature information and each feature tag in the traversed feature information of the reply sentence as the matching degree between the personalized feature information and the traversed feature information of the reply sentence.

In one possible implementation, the output module 504 may be configured to:

according to the broadcast tone, the reply sentences are converted into broadcast voice and broadcast.

It should be noted that, in the embodiment of the present application, specific execution contents of the reply statement generating module 501, the obtaining module 502, the determining module 503 and the outputting module 504 may refer to related contents in the embodiments shown in fig. 2 to fig. 4, which are not described herein again.

Further, based on the content described in the foregoing embodiments, an electronic device is also provided in the embodiments of the present application, where the electronic device includes at least one processor and a memory; wherein the memory stores computer execution instructions; the at least one processor executes computer-executable instructions stored in the memory to implement the aspects described in the embodiments of the voice interaction method described above.

It should be understood that the user equipment provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, so that details of this embodiment are not described herein again.

For better understanding of the embodiment of the present application, referring to fig. 6, fig. 6 is a schematic diagram of a hardware structure of an electronic device provided in the embodiment of the present application.

As shown in fig. 6, the electronic device 60 of the present embodiment includes: a processor 601 and a memory 602; wherein

A memory 602 for storing computer-executable instructions;

the processor 601 is configured to execute the computer-executable instructions stored in the memory to implement the steps of the voice interaction method in the foregoing embodiments.

Reference may be made in particular to the description relating to the method embodiments described above.

Alternatively, the memory 602 may be separate or integrated with the processor 601.

When the memory 602 is provided separately, the device further comprises a bus 603 for connecting said memory 602 and the processor 601.

Based on the content in the foregoing embodiments, the present application further provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the steps in the voice interaction method in the foregoing embodiments are implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of voice interaction, the method comprising:

and outputting the target reply statement.

2. The method of claim 1, wherein the collecting voice information of the user and generating at least two reply sentences according to the voice information comprises:

converting the voice information into a first text;

converting the first text into a second text specifying criteria;

3. The method of claim 1, wherein the obtaining feature information of each of the at least two reply sentences comprises:

4. The method according to any one of claims 1 to 3, wherein the determining a target reply sentence among the at least two reply sentences according to the pre-configured personalized feature information and the feature information of each reply sentence comprises:

5. The method of claim 4, wherein the determining a matching degree between the personalized feature information and the feature information of each reply sentence comprises:

6. The method of any of claims 1 to 3, wherein the outputting the target reply statement comprises:

7. The method according to any one of claims 1 to 3, wherein the personalized feature information comprises any one or more of the following feature tags: age, personality, occupation, sex, intellectual quotient, role.

8. A voice interaction apparatus, comprising:

and the output module is used for outputting the target reply statement.

9. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the voice interaction method of any of claims 1 to 7.

10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the voice interaction method of any one of claims 1 to 7.