CN107463636B - Voice interaction data configuration method and device and computer readable storage medium - Google Patents

Voice interaction data configuration method and device and computer readable storage medium Download PDF

Info

Publication number
CN107463636B
CN107463636B CN201710581290.7A CN201710581290A CN107463636B CN 107463636 B CN107463636 B CN 107463636B CN 201710581290 A CN201710581290 A CN 201710581290A CN 107463636 B CN107463636 B CN 107463636B
Authority
CN
China
Prior art keywords
voice
information
question
instruction
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710581290.7A
Other languages
Chinese (zh)
Other versions
CN107463636A (en
Inventor
钱庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201710581290.7A priority Critical patent/CN107463636B/en
Publication of CN107463636A publication Critical patent/CN107463636A/en
Application granted granted Critical
Publication of CN107463636B publication Critical patent/CN107463636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The disclosure provides a voice interaction data configuration method and device and a computer readable storage medium, and belongs to the technical field of smart home. The method is applied to the audio equipment and comprises the following steps: receiving a first voice instruction; when the first voice instruction is a data configuration instruction, playing a first voice file, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode; carrying out voice recognition on the received voice data to obtain question information and answer information to be configured; and correspondingly storing the question information and the answer information on the audio equipment. According to the voice configuration method and device, the voice instruction is received through the audio equipment, so that the question information and the answer information to be configured are obtained through the voice recognition technology, the question information and the answer information are correspondingly stored, the voice interaction data configuration is achieved, the complex operation required in the voice interaction data configuration process is greatly simplified, and the configuration efficiency is improved.

Description

Voice interaction data configuration method and device and computer readable storage medium
Technical Field
The present disclosure relates to the field of smart home technologies, and in particular, to a voice interaction data configuration method and apparatus, and a computer-readable storage medium.
Background
With the development of speech recognition technology, speech interaction functions are widely used as a new generation of interaction mode. The voice interaction means that received voice is converted into language which can be recognized by a computer, a question in the voice is obtained, then the question is inquired in background data, an answer corresponding to the question is obtained, and finally the answer corresponding to the question is extracted and converted into voice to be fed back to the user. However, due to the fact that data of voice interaction is insufficient at present, some questions cannot be inquired and answers cannot be made. Thus requiring the user to configure his or her questions and answers into the data of the voice interaction.
In the existing voice interaction data configuration method, a manufacturer provides a management background, and a user can log in the management background, manually input and submit questions and answers to be configured.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a data configuration method and apparatus for voice interaction, and a computer-readable storage medium.
According to a first aspect of the embodiments of the present disclosure, there is provided a data configuration method for voice interaction, applied to an audio device, including: receiving a first voice instruction; when the first voice instruction is a data configuration instruction, playing a first voice file, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode; carrying out voice recognition on the received voice data to obtain question information and answer information to be configured; and correspondingly storing the question information and the answer information on the audio equipment.
In a possible manner of the first aspect, the performing voice recognition on the received voice data to obtain question information and answer information to be configured includes:
when a second voice instruction is received, performing voice recognition on the second voice instruction to obtain problem information to be configured;
playing a second voice file, wherein the second voice file is used for inquiring answers of questions to be configured;
and when a third voice instruction is received, performing voice recognition on the third voice instruction to obtain answer information of the question.
In a possible manner of the first aspect, the performing voice recognition on the received voice data to obtain question information and answer information to be configured includes:
when a fourth voice instruction is received, performing voice recognition on the fourth voice instruction to obtain character information corresponding to the fourth voice instruction;
and splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
In one possible manner of the first aspect, after the storing the question information and the answer information correspondingly on the audio device, the method further includes: and sending the question information and the answer information to a designated server, wherein the designated server is used for checking the question information and the answer information.
In one possible manner of the first aspect, after the sending the question information and the answer information to a specified server, the method further includes: and receiving an audit failure message, wherein the audit failure message is used for prompting that the audit of the specified server is failed.
According to a second aspect of the embodiments of the present disclosure, there is provided a data configuration apparatus for voice interaction, including:
the receiving module is used for receiving a first voice instruction;
the playing module is used for playing a first voice file when the first voice instruction is a data configuration instruction, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode;
the recognition module is used for carrying out voice recognition on the received voice data to obtain question information and answer information to be configured;
and the storage module is used for correspondingly storing the question information and the answer information on the audio equipment.
In one possible form of the second aspect, the identification module includes:
the first recognition unit is used for performing voice recognition on a second voice instruction when the second voice instruction is received to obtain problem information to be configured;
the playing unit is used for playing a second voice file, and the second voice file is used for inquiring answers of questions to be configured;
and the second recognition unit is used for performing voice recognition on a third voice instruction when the third voice instruction is received to obtain answer information of the question.
In one possible manner of the second aspect, the identification module further includes:
the third recognition unit is used for performing voice recognition on a fourth voice instruction when the fourth voice instruction is received to obtain character information corresponding to the fourth voice instruction;
and the splitting unit is used for splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
In one possible form of the second aspect, the apparatus further includes: and the sending module is used for sending the question information and the answer information to a specified server, and the specified server is used for auditing the question information and the answer information.
In a possible manner of the second aspect, the receiving module is further configured to receive an audit failure message, where the audit failure message is used to prompt that the audit of the designated server is failed.
According to a third aspect of the embodiments of the present disclosure, there is provided a data configuration apparatus for voice interaction, including: a processor; a memory for storing processor-executable instructions;
wherein the processor is configured to: receiving a first voice instruction; when the first voice instruction is a data configuration instruction, playing a first voice file, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode; carrying out voice recognition on the received voice data to obtain question information and answer information to be configured; and correspondingly storing the question information and the answer information on the audio equipment.
According to a fourth aspect of embodiments of the present disclosure, a computer-readable storage medium is provided, wherein a computer program is stored in the computer-readable storage medium, and when executed by a processor, the computer program implements the method steps of any one of the first aspect.
According to the voice interaction method and device, the voice instruction is received through the audio equipment, so that the question information and the answer information to be configured are obtained through a voice recognition technology, and are correspondingly stored, and the voice interaction data configuration is achieved. The voice interaction data configuration method greatly simplifies the complex operation required in the voice interaction data configuration process and improves the configuration efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow chart illustrating a method for voice-interactive data configuration, according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method for voice-interactive data configuration, according to an example embodiment.
FIG. 3 is a flow chart illustrating a method for voice-interactive data configuration, according to an example embodiment.
FIG. 4 is a block diagram illustrating a voice-interactive data configuration apparatus according to an example embodiment.
FIG. 5 is a block diagram illustrating a voice-interactive data configuration apparatus according to an example embodiment.
Fig. 6 is a block diagram illustrating a voice-interactive data configuration apparatus 600 in accordance with an example embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a voice interaction data configuration method according to an exemplary embodiment, where the voice interaction data configuration method is used in an audio device, as shown in fig. 1, and includes the following steps.
In step 101, a first voice instruction is received.
In step 102, when the first voice command is a data configuration command, a first voice file is played, where the first voice file is used to prompt a user to input specific content of data to be configured in a voice manner.
In step 103, voice recognition is performed on the received voice data to obtain question information and answer information to be configured.
In step 104, the question information and the answer information are stored on the audio device in correspondence.
According to the voice interaction method and device, the voice instruction is received through the audio equipment, so that the question information and the answer information to be configured are obtained through a voice recognition technology, and are correspondingly stored, and the voice interaction data configuration is achieved. The voice interaction data configuration method greatly simplifies the complex operation required in the voice interaction data configuration process and improves the configuration efficiency.
In one possible implementation manner, the performing voice recognition on the received voice data to obtain question information and answer information to be configured includes:
when a second voice instruction is received, performing voice recognition on the second voice instruction to obtain problem information to be configured;
playing a second voice file, wherein the second voice file is used for inquiring answers of the questions to be configured;
and when a third voice instruction is received, performing voice recognition on the third voice instruction to obtain answer information of the question.
In one possible implementation manner, the performing voice recognition on the received voice data to obtain question information and answer information to be configured includes:
when a fourth voice instruction is received, performing voice recognition on the fourth voice instruction to obtain character information corresponding to the fourth voice instruction;
and splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
In one possible implementation, after the storing the question information and the answer information correspondence on the audio device, the method further includes: and sending the question information and the answer information to a designated server, wherein the designated server is used for checking the question information and the answer information.
In one possible implementation, after the sending the question information and the answer information to the specified server, the method further includes: and receiving an audit failure message, wherein the audit failure message is used for prompting that the audit of the specified server is failed.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
FIG. 2 is a flow chart illustrating a method for voice-interactive data configuration, according to an example embodiment. As shown in fig. 2, the data configuration method of voice interaction is used in an audio device, and includes the following steps.
In step 201, an audio device receives a first voice instruction.
The audio equipment has the functions of collecting audio and playing the audio, and can collect the audio in a certain range around the audio equipment. The audio collected and played by the audio device may be a sound made by a person, or may be other sounds in the environment, such as noise, music, and the like. In the disclosed embodiment, the audio is represented by voice instructions and voice files. The audio device may be a smart speaker. The smart speaker may be an electronic device having a network function. The intelligent sound box can collect and play voice and can perform voice interaction with people in a voice mode.
The voice interaction refers to a technical process of interaction between a user and a machine in a voice mode, and the technical process specifically includes: the machine converts the received text information which can be read by the voice conversion computer into the voice, obtains the question in the voice, then inquires the question in background data, obtains the answer corresponding to the question, finally extracts the answer corresponding to the question, converts the answer into the voice and feeds the voice back to the user. Taking the smart sound box as an example, the user asks the smart sound box for 'which day the festival for national celebration is', the smart sound box converts the voice into the text information 'which day the festival for national celebration is' which can be read in by the computer, and inquires the text information in background data to obtain an answer 'the month and day is the festival for national celebration' corresponding to the question, and converts the answer into voice for playing.
The voice instruction is a voice uttered by a user instructing the audio device to execute a command. When a user wants the audio device to execute a certain command, the user can send a voice instruction to the audio device, and the audio device can execute the corresponding command according to the voice instruction after receiving the voice instruction.
Using intelligent sound box as an example, if the user wants to configure this question and the answer that corresponds for intelligent sound box, the user can say "i want to train" to intelligent sound box, if this voice command is in intelligent sound box collection scope, intelligent sound box can gather this voice command to receive voice command.
In another possible scenario, there may be a case that the smart speaker cannot answer the question of the user, so as to remind the user to perform data configuration, that is, before receiving the first voice instruction, the user asks a question to the smart speaker, and the smart speaker receives the question, searches the data for the question, and does not inquire an answer. At this time, the smart sound box may prompt data configuration in a voice manner, for example, play "if the problem does not exist in the data, the problem and the corresponding answer are to be added, if so, please say 'i want to train'; if not, please say 'no' ", at which point the user may choose not to configure the question and corresponding answer. If the user wants to add the question and the corresponding answer, a first voice instruction, for example, "i want to train", may be issued to the smart speaker. If the user does not want to add the question and the corresponding answer, a voice command of 'no' can be sent to the intelligent sound box, so that data configuration is not carried out.
Of course, in the embodiment of the present disclosure, the first voice instruction is taken as an example for explanation, and in an actual scene, the audio device may also be used as a transfer device of an instruction, collect other voice instructions, and send the voice instruction to the control device, so that the control device can control the electronic device in the smart home environment, which is not specifically limited in this embodiment of the present disclosure.
In step 202, when the first voice command is a data configuration command, the audio device plays a first voice file, where the first voice file is used to prompt the user to input specific content of the data to be configured in a voice manner.
The audio device receives the first voice instruction, and can perform voice recognition on the first voice instruction to obtain the text information of the first voice. Speech recognition is the process of converting human voice into computer readable text information. In the embodiment of the present disclosure, the audio device may convert the voice command issued by the user into text information by using a voice recognition technology. For example, for a voice command, the voice command received by the audio device is a voice signal, and through voice recognition, text information indicated by the voice signal, that is, text information of the voice command, can be obtained.
It should be noted that, in the process of performing speech recognition on a speech command, signal processing processes such as signal amplification and noise filtering can be performed on a received initial signal, so that accurate text information can be obtained, and the recognition accuracy is improved.
In the embodiment of the present disclosure, in order to trigger the data configuration operation, the first voice instruction issued by the user should include a preset keyword for triggering the data configuration operation. The preset keywords are characters pre-configured in the audio equipment by the user. The preset keywords can be one or several. And when the recognized text information comprises the preset keyword, triggering corresponding data configuration operation. That is, when the text information includes the preset keyword, the audio device starts a data configuration function to perform data configuration; and when the text information does not comprise the preset keyword, the audio equipment does not start the data configuration function.
And when the data configuration operation is triggered, the audio equipment plays the first voice file. The first voice file is a voice message which is configured in the audio equipment by the user in advance. The user configures different voice files in the audio equipment in advance for prompting the user to input the specific content of the data to be configured in a voice mode. In the embodiment of the present disclosure, in order to distinguish the specific content of the voice file, the voice file is divided into a first voice file and a second voice file. The first voice file is used for prompting a user to input specific contents of the questions to be configured in a voice mode, and the second voice file is used for inquiring answers of the questions to be configured. The configuration operation may be realized by a user through an operation on the audio device, or may be realized by a user through an operation on another mobile terminal connected to the audio device.
For example, in the embodiment of the present disclosure, the training mode represents the data configuration mode, the user issues a "i want to train" voice instruction, and after receiving the voice instruction, the audio device obtains text information of the voice instruction through voice recognition. Since the text message includes a preset keyword "train", the audio device starts a training mode. The audio device plays the first voice file "what the question is", and the user hears the voice and then needs to input the specific content of the question in the form of voice. In fact, the preset keyword may also be set to "configure", "set", etc., so that the user may trigger the data configuration operation based on different voice instructions.
In step 203, the audio device receives a second voice command, and performs voice recognition on the second voice command to obtain the problem information to be configured.
The second voice instruction is the specific content of the question input by the user after hearing the first voice file played by the audio equipment. And the audio equipment receives the second voice instruction and converts the second voice instruction into text information, wherein the text information is the question information to be configured.
For example, in the embodiment of the present disclosure, the training mode represents the data configuration mode, the user issues a "i want to train" voice instruction, and after receiving the voice instruction, the audio device obtains text information of the voice instruction through voice recognition. Since the text message includes the preset keyword "train", the audio device starts the training mode and plays the first voice file "what the problem is". After hearing the voice prompt, the user says 'who is the small A' to the audio equipment, the audio equipment receives the second voice instruction, and identifies the second voice instruction to obtain problem information 'who is the small A' to be configured.
In step 204, the audio device plays a second voice file, where the second voice file is used to ask an answer to the question to be configured.
And after the audio equipment obtains the problem information to be configured, playing a second voice file. The second voice file is used for prompting the user to input the specific content of the answer corresponding to the question to be configured in a voice mode. For example, in step 203, after the user hears the first voice file played by the audio device, the question input is "who is a small a". And after the audio equipment identifies the second voice command, the problem information to be configured is obtained, and a second voice file is played. When the user hears the second voice file, the user should input the answer "Xiao A is a new promotion actress named after the Ladies of the B series. ".
In step 205, the audio device receives the third voice command, and performs voice recognition on the third voice command to obtain answer information to be configured.
The third voice instruction is an answer corresponding to a question to be configured, which is input by the user after the user hears the second voice file played by the audio equipment. And the audio equipment receives a third voice instruction and converts the third voice instruction into text information, wherein the text information is the answer information to be configured.
For example, in step 204, the user hears the second voice file and enters the answer "small a is a new promoting actress named after B shows actress capitalized" corresponding to the question to be configured. The audio device receives the third voice instruction, converts the third voice instruction into text information, and obtains answer information to be configured, wherein the answer information ' Xiao A is a new Jinlun and is named after the god of a B-play god's character is decorated '.
The above steps 203 to 205 are processes of performing voice recognition on the received voice data to obtain question information and answer information to be configured. In practical implementation, the above process may also be implemented by that the user sends a fourth voice instruction to the audio device, and the audio device receives the fourth voice instruction and performs voice recognition on the fourth voice instruction to obtain the text information corresponding to the fourth voice instruction. And the audio equipment splits the text information corresponding to the fourth voice instruction to obtain question information and answer information to be configured. When the audio device splits the text information corresponding to the fourth voice instruction, the extraction of the question information to be configured and the answer information to be configured can be realized through the specific keywords included in the fourth voice instruction. The specific keyword is used for indicating the position of the question information and the answer information in the text information corresponding to the fourth voice instruction. For example, the specific keyword may be "question is", "answer is", when the user issues a fourth voice instruction "who the question is small a is; the answer is that the small A is a new promotion actress, and when the name is given due to the fact that the capital angle of the B-play actress is decorated, the audio equipment receives the fourth voice instruction and converts the fourth voice instruction into text information, so that the question is who the small A is; the answer is "Xiao A is a new Genin actress named after the Largo actor from which the B series was played". After a specific keyword ' question is ' in the text information, the specific keyword ' answer is ' before the text information is the question information ' Xiao A who ' to be configured ', and after the specific keyword ' answer is ' after the text information is ' answer information ' Xiao A is the answer information ' to be configured ', the Chinese character is a new Jinlun, and the Chinese character is named after the Chinese character is decorated by B play from the leading role of the female. The method for receiving the fourth voice instruction, identifying the fourth voice instruction and splitting the character information corresponding to the obtained fourth voice instruction so as to obtain the question information and the answer information greatly reduces the interaction between the user and the audio equipment, simplifies the operation and saves the configuration time.
The above steps 201 to 205 are processes in which the user starts a data configuration mode by voice and obtains question information and answer information to be configured. In practical implementation, the user may also perform the data configuration operation by performing a manual operation on the audio device and by speaking the question to be configured and the answer to the question to be configured.
In step 206, the audio device stores the question information and the answer information in association with each other.
And after the audio equipment obtains the question information to be configured and the corresponding answer information, storing the question information and the corresponding answer information in an internal memory. And the question information and the answer information have a corresponding relationship. For example, the question information and the answer information may be stored in two fields of the same table. Wherein, the question information is stored in the A field, and the answer information is stored in the B field. The audio device may search for the B field by searching for the a field, but the corresponding relationship may also take other expression forms, such as a table expression, etc. That is, when the audio device searches for the question information, the answer information corresponding to the question information can be obtained.
In step 207, the audio device sends the question information and the answer information to a designated server, and the designated server reviews the question information and the answer information.
The designated server refers to a server capable of providing voice interaction services for the audio device based on a question database of the designated server. In order to send the question information and answer information to be configured, which are stored in the audio device, to a specified server, the audio device needs to acquire an address of the specified server, where the address of the specified server may be configured in the audio device by a user in advance for a subsequent data configuration operation.
To improve security, the designated server may define that only registered audio devices may obtain voice interaction services based on a database of the designated server. The audio device may have default network identification information, and may carry the default network identification information when sending the question information and the answer information to be configured, so that the designated server may check the network identification information when receiving the answer information of the question information, and when the check is passed, may audit the question information and the answer information. The default network identification information may be an account number registered by the audio device, or information that uniquely identifies the audio device, such as an audio device name and an audio device identifier. For example, an audio device does not register an account with the designated server. When the audio equipment sends the configured question information and answer information to the specified server, the specified server denies the legality of the audio equipment by checking the network identification information of the audio equipment, thereby refusing to provide the voice interaction service for the audio equipment. The audio device may receive a check failure message sent by the designated server.
In order to improve the legality of the data, the designated server can audit the data sent by the audio equipment. The process of auditing the question information and the answer information by the appointed server can be realized by manual auditing or can be finished by a sensitive word detection tool. Wherein, the sensitive words refer to words with sensitive political tendency, violence tendency and unhealthy colors or non-civilized language. The sensitive word detection tool matches the question information and the answer information with sensitive words. When the question information and the answer information comprise sensitive words, the appointed server does not pass the verification; when the question information and the answer information do not include the sensitive words, the specified server passes the verification.
In step 208, the audio device receives the audit failed message sent by the designated server, and converts the audit failed message into voice for playing.
And when the audit of the appointed server fails, the appointed server sends an audit failure message to the audio equipment, and the audio equipment receives the audit failure message sent by the appointed server and converts the audit failure message into voice for playing.
In practical implementation, the audio device may also prompt the user in a display manner that the audit has failed after receiving the audit failure message. The display content may include the question information, the answer information, the result of the non-approval, the reason for the non-approval, and the like.
For example, the specified server fails to check, the audio device receives a check failure message sent by the specified server, and the user is prompted by the check failure message in a voice playing or display mode to check failure. After the user hears or sees the audit failure message, the user can modify the question information and the answer information and then reconfigure the question information and the answer information.
In another possible scenario, when the specified server passes the audit, the specified server stores the question information and the answer information in the voice interaction data of the question database, and when a question sent by any audio device is received, the specified server can perform query based on the stored voice interaction data, so as to obtain the answer information.
For example, the answer to the question is "who is a mao" and the answer information is "a mao is a new jingane, named after a mao of a B-show. When the designated server passes the audit, a certain user sends a voice instruction 'XiaoA is who' to the audio equipment, the audio equipment performs voice recognition on the voice instruction to obtain the question information 'XiaoA is who' and sends the question information to the designated server, the designated server searches the question information to obtain corresponding answer information 'XiaoA is a new actor and gives a name for decorating a main role of a B show actor', and the audio equipment is sent, and the audio equipment converts the answer information into voice to play to the user.
According to the voice interaction method and device, the voice instruction is received through the audio equipment, so that the question information and the answer information to be configured are obtained through a voice recognition technology, and are correspondingly stored, and the voice interaction data configuration is achieved. The voice interaction data configuration method greatly simplifies the complex operation required in the voice interaction data configuration process and improves the configuration efficiency.
To facilitate an understanding of the disclosed embodiments, a specific interaction flow is provided below, and fig. 3 is a flow chart illustrating a voice-interactive data configuration method according to an exemplary embodiment. Referring to fig. 3, according to the present disclosure, a user issues a voice instruction, an audio device starts a training mode, and the user completes configuration of questions and answers in the audio device by asking the audio device one by one. Finally, the audio equipment submits the question and the answer to manual examination, and if the examination is passed, the question and the answer are stored in a database; if the audit is not passed, the audio device tells the user that the audit is not passed. The above process realizes the data configuration of voice interaction through the voice interaction between the user and the audio equipment.
FIG. 4 is a block diagram illustrating a voice-interactive data configuration apparatus according to an example embodiment. Referring to fig. 4, the apparatus includes:
a receiving module 401, configured to receive a first voice instruction;
a playing module 402, configured to play a first voice file when the first voice instruction is a data configuration instruction, where the first voice file is used to prompt a user to input specific content of data to be configured in a voice manner;
the recognition module 403 is configured to perform voice recognition on the received voice data to obtain question information and answer information to be configured;
the storage module 404 is configured to correspondingly store the question information and the answer information on the audio device.
In one possible implementation, the identification module includes:
the first recognition unit is used for performing voice recognition on a second voice instruction when the second voice instruction is received to obtain problem information to be configured;
the playing unit is used for playing a second voice file, and the second voice file is used for inquiring answers of the questions to be configured;
and the second recognition unit is used for performing voice recognition on the third voice instruction to obtain answer information of the question when the third voice instruction is received.
In one possible implementation, the identification module further includes:
the third recognition unit is used for performing voice recognition on a fourth voice instruction when the fourth voice instruction is received to obtain character information corresponding to the fourth voice instruction;
and the splitting unit is used for splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
In one possible implementation, referring to fig. 5, the apparatus further includes:
a sending module 405, configured to send the question information and the answer information to a specified server, where the specified server is configured to check the question information and the answer information.
In one possible implementation manner, the receiving module is further configured to receive an audit failure message, where the audit failure message is used to prompt that the audit of the specified server is failed.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 6 is a block diagram illustrating a voice-interactive data configuration apparatus 600 in accordance with an example embodiment. For example, apparatus 600 may be provided as an audio device, such as a smart speaker. Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the methods illustrated in the embodiments of fig. 1 or fig. 2.
The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input/output (I/O) interface 658. The apparatus 600 may operate based on an operating system, such as Windows Server, stored in the memory 632TM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMOr the like.
In an exemplary embodiment, a computer-readable storage medium is also provided, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any of the embodiments shown in fig. 1 or fig. 2.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A data configuration method for voice interaction is applied to an audio device, and the method comprises the following steps:
receiving a first voice instruction;
when the first voice instruction is a data configuration instruction, playing a first voice file, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode;
carrying out voice recognition on the received voice data to obtain question information and answer information to be configured;
correspondingly storing the question information and the answer information on the audio equipment;
the voice recognition of the received voice data to obtain the question information and the answer information to be configured includes:
when a second voice instruction is received, performing voice recognition on the second voice instruction to obtain problem information to be configured;
playing a second voice file, wherein the second voice file is used for inquiring answers of questions to be configured;
and when a third voice instruction is received, performing voice recognition on the third voice instruction to obtain answer information of the question, wherein the third voice instruction is an answer corresponding to the question to be configured and input by the user after hearing the second voice file played by the audio equipment.
2. The method of claim 1, wherein performing voice recognition on the received voice data to obtain question information and answer information to be configured further comprises:
when a fourth voice instruction is received, performing voice recognition on the fourth voice instruction to obtain character information corresponding to the fourth voice instruction;
and splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
3. The method of claim 1, wherein after storing the question information and the answer information in correspondence on the audio device, the method further comprises:
and sending the question information and the answer information to a designated server, wherein the designated server is used for checking the question information and the answer information.
4. The method of claim 3, wherein after sending the question information and the answer information to a designated server, the method further comprises:
and receiving an audit failure message, wherein the audit failure message is used for prompting that the audit of the specified server is failed.
5. A data configuration apparatus for voice interaction, the apparatus being applied to an audio device, the apparatus comprising:
the receiving module is used for receiving a first voice instruction;
the playing module is used for playing a first voice file when the first voice instruction is a data configuration instruction, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode;
the recognition module is used for carrying out voice recognition on the received voice data to obtain question information and answer information to be configured;
the storage module is used for correspondingly storing the question information and the answer information on the audio equipment;
the identification module comprises:
the first recognition unit is used for performing voice recognition on a second voice instruction when the second voice instruction is received to obtain problem information to be configured;
the playing unit is used for playing a second voice file, and the second voice file is used for inquiring answers of questions to be configured;
and the second recognition unit is used for performing voice recognition on a third voice instruction when the third voice instruction is received to obtain answer information of the question, wherein the third voice instruction is an answer corresponding to the question to be configured and input by the user after the user hears the second voice file played by the audio equipment.
6. The apparatus of claim 5, wherein the identification module further comprises:
the third recognition unit is used for performing voice recognition on a fourth voice instruction when the fourth voice instruction is received to obtain character information corresponding to the fourth voice instruction;
and the splitting unit is used for splitting the text information corresponding to the fourth voice instruction to obtain the question information and the answer information to be configured.
7. The apparatus of claim 5, further comprising:
and the sending module is used for sending the question information and the answer information to a specified server, and the specified server is used for auditing the question information and the answer information.
8. The apparatus of claim 7, wherein the receiving module is further configured to receive an audit failure message, and wherein the audit failure message is used to prompt that the audit of the designated server is failed.
9. A voice-interactive data configuration apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
receiving a first voice instruction;
when the first voice instruction is a data configuration instruction, playing a first voice file, wherein the first voice file is used for prompting a user to input specific contents of data to be configured in a voice mode;
carrying out voice recognition on the received voice data to obtain question information and answer information to be configured;
correspondingly storing the question information and the answer information on audio equipment;
the voice recognition of the received voice data to obtain the question information and the answer information to be configured includes:
when a second voice instruction is received, performing voice recognition on the second voice instruction to obtain problem information to be configured;
playing a second voice file, wherein the second voice file is used for inquiring answers of questions to be configured;
and when a third voice instruction is received, performing voice recognition on the third voice instruction to obtain answer information of the question, wherein the third voice instruction is an answer corresponding to the question to be configured and input by the user after hearing the second voice file played by the audio equipment.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-4.
CN201710581290.7A 2017-07-17 2017-07-17 Voice interaction data configuration method and device and computer readable storage medium Active CN107463636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710581290.7A CN107463636B (en) 2017-07-17 2017-07-17 Voice interaction data configuration method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710581290.7A CN107463636B (en) 2017-07-17 2017-07-17 Voice interaction data configuration method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN107463636A CN107463636A (en) 2017-12-12
CN107463636B true CN107463636B (en) 2021-02-19

Family

ID=60544288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710581290.7A Active CN107463636B (en) 2017-07-17 2017-07-17 Voice interaction data configuration method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN107463636B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108390859B (en) * 2018-01-22 2021-03-23 深圳慧安康科技有限公司 Intelligent robot device for intercom extension
CN108172226A (en) * 2018-01-27 2018-06-15 上海萌王智能科技有限公司 A kind of voice control robot for learning response voice and action
CN108922534A (en) * 2018-07-04 2018-11-30 北京小米移动软件有限公司 control method, device, equipment and storage medium
CN111326137A (en) * 2018-12-13 2020-06-23 允匠智能科技(上海)有限公司 Voice robot interaction system based on office intelligence
CN111400539B (en) * 2019-01-02 2023-05-30 阿里巴巴集团控股有限公司 Voice questionnaire processing method, device and system
CN111475020A (en) * 2020-04-02 2020-07-31 深圳创维-Rgb电子有限公司 Information interaction method, interaction device, electronic equipment and storage medium
CN111797606A (en) * 2020-07-09 2020-10-20 海口科博瑞信息科技有限公司 Data filling method, system, equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system
CN106126624A (en) * 2016-06-22 2016-11-16 武汉市骏驰天下投资管理有限公司 A kind of Financial Information interactive system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140074466A1 (en) * 2012-09-10 2014-03-13 Google Inc. Answering questions using environmental context

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system
CN106126624A (en) * 2016-06-22 2016-11-16 武汉市骏驰天下投资管理有限公司 A kind of Financial Information interactive system

Also Published As

Publication number Publication date
CN107463636A (en) 2017-12-12

Similar Documents

Publication Publication Date Title
CN107463636B (en) Voice interaction data configuration method and device and computer readable storage medium
US10832686B2 (en) Method and apparatus for pushing information
CN107205097B (en) Mobile terminal searching method and device and computer readable storage medium
CN108010523B (en) Information processing method and recording medium
CN105489221B (en) A kind of audio recognition method and device
CN110970021B (en) Question-answering control method, device and system
US11404052B2 (en) Service data processing method and apparatus and related device
CN109036416B (en) Simultaneous interpretation method and system, storage medium and electronic device
CN111739553A (en) Conference sound acquisition method, conference recording method, conference record presentation method and device
CN104751847A (en) Data acquisition method and system based on overprint recognition
CN109560941A (en) Minutes method, apparatus, intelligent terminal and storage medium
CN109509472A (en) Method, apparatus and system based on voice platform identification background music
CN109271503A (en) Intelligent answer method, apparatus, equipment and storage medium
CN111353065A (en) Voice archive storage method, device, equipment and computer readable storage medium
KR20190115405A (en) Search method and electronic device using the method
CN112231748A (en) Desensitization processing method and apparatus, storage medium, and electronic apparatus
CN112035630A (en) Dialogue interaction method, device, equipment and storage medium combining RPA and AI
CN109729067A (en) Voice punch card method, device, equipment and computer storage medium
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
CN107680598B (en) Information interaction method, device and equipment based on friend voiceprint address list
CN103176998A (en) Read auxiliary system based on voice recognition
Byalpi Alexa based Real-Time Attendance System
CN111128127A (en) Voice recognition processing method and device
CN106371905B (en) Application program operation method and device and server
WO2021159734A1 (en) Data processing method and apparatus, device, and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant