CN108492826B - Audio processing method and device, intelligent equipment and medium - Google Patents

Audio processing method and device, intelligent equipment and medium Download PDF

Info

Publication number
CN108492826B
CN108492826B CN201810276931.2A CN201810276931A CN108492826B CN 108492826 B CN108492826 B CN 108492826B CN 201810276931 A CN201810276931 A CN 201810276931A CN 108492826 B CN108492826 B CN 108492826B
Authority
CN
China
Prior art keywords
audio file
playing
preset
voice instruction
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810276931.2A
Other languages
Chinese (zh)
Other versions
CN108492826A (en
Inventor
褚长森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201810276931.2A priority Critical patent/CN108492826B/en
Publication of CN108492826A publication Critical patent/CN108492826A/en
Application granted granted Critical
Publication of CN108492826B publication Critical patent/CN108492826B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention provides an audio processing method, an audio processing device, intelligent equipment and a medium, wherein the method comprises the following steps: and receiving a voice instruction input by a user, and playing the first audio file while processing the voice instruction. And when the processing result of the voice instruction is obtained, stopping playing the first audio file and playing a second audio file, and after the playing of the second audio file is finished, playing the processing result. The playing time of the preset audio file is not less than the target time required by the processing result, so that the preset audio file can be ensured to be played all the time in the process of processing the voice instruction. By adopting the embodiment of the invention, the intelligent equipment can be prevented from being in a suspected non-response state in the time interval for processing the voice command.

Description

Audio processing method and device, intelligent equipment and medium
Technical Field
The present invention relates to the field of computers, and in particular, to an audio processing method, apparatus, intelligent device, and medium.
Background
With the development of the communication era, intelligent devices (such as intelligent sound boxes) have become more and more popular in daily life of people. Under the condition that the intelligent equipment is connected with the network, the data can be obtained from the database and voice playing is carried out, so that the voice command input by the user aiming at the intelligent sound box can be responded. For example, if the voice command sent by the user is "play a certain song of li", the smart speaker searches the database for the certain song of li and plays the song.
However, a certain time period is required from the time when the smart device responds to the voice command to the time when the response is obtained, so that the smart device is in a suspected non-response state in the time period.
Disclosure of Invention
The embodiment of the invention provides an audio processing method, an audio processing device, intelligent equipment and a medium, which can avoid that the intelligent equipment is in a suspected non-response state in a processing time gap of a voice instruction.
A first aspect of an embodiment of the present invention provides an audio processing method, including:
receiving a voice instruction;
playing a preset audio file while processing the voice instruction;
when the preset audio file is played, playing a processing result of the voice instruction;
and presetting the playing time length of the audio file to be not less than the target time length required for obtaining the processing result.
Optionally, while processing the voice instruction, the audio processing method further includes:
performing semantic analysis on the content in the voice instruction to obtain voice content;
and determining the preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, and playing the preset audio file includes:
playing the first audio file;
and when the processing result of the voice instruction is acquired, stopping playing the first audio file and starting playing the second audio file.
Optionally, before playing the first audio file, the method further includes:
determining a target duration required for acquiring the processing result;
and processing the first audio file to enable the playing time length of the first audio file to be equal to the target time length.
Optionally, the audio processing method further includes: the semantic content of the first audio file and the semantic content of the second audio file are the same or different.
Optionally, playing a preset audio file includes:
playing the first audio file at a first tone;
and when the processing result of the voice instruction is acquired, playing the second audio file by using a second tone.
Optionally, playing a preset audio file includes: and playing the preset audio file according to the target tone.
Optionally, playing the preset audio file with the target tone color includes:
obtaining the tone of the voice instruction through a deep learning model to be used as a target tone;
and playing the preset audio file according to the target tone.
Optionally, before obtaining the tone of the voice instruction through the deep learning model, the method further includes:
acquiring a sample audio file;
and performing tone recognition training on a preset deep learning model by using the sample audio file to obtain the deep learning model meeting preset conditions.
A second aspect of the embodiments of the present invention provides an audio processing apparatus, including:
a receiving unit for receiving a voice instruction;
a processing unit for processing voice instructions;
the playing unit is used for playing a preset audio file while the processing unit processes the voice instruction, and the playing time of the preset audio file is not less than the target time required for obtaining the processing result;
and the playing unit is also used for playing the processing result of the voice instruction when the preset audio file is played.
Optionally, the audio processing apparatus further includes a determining unit:
the processing unit is also used for carrying out semantic analysis on the content in the voice instruction to obtain voice content;
and the determining unit is used for determining the preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, and the specific way for playing the preset audio file by the playing unit is as follows: playing a first audio file; and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file.
Optionally, the audio processing apparatus further includes:
the determining unit is further used for determining a target duration required for obtaining the processing result;
and the processing unit is further used for processing the first audio file so that the playing time length of the first audio file is equal to the target time length.
Optionally, the voice contents of the first audio file and the second audio file are the same or different.
Optionally, the specific way for playing the preset audio file by the playing unit is as follows: playing a first audio file with a first tone; and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone.
Optionally, the specific way that the playing unit is used for playing the preset audio file is as follows: and playing the preset audio file according to the target tone.
Optionally, playing the preset audio file with the target tone color includes: acquiring the tone of the voice command as a target tone through a deep learning model; and playing the preset audio file according to the target tone.
Optionally, the deep learning model is used to obtain the tone of the voice command, and before the target tone, the apparatus further includes: acquiring a sample audio file; and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions.
In a third aspect, an embodiment of the present invention provides an intelligent device, which includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program that supports the intelligent device to execute the audio processing method, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a medium storing a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the method of the first aspect.
In a fifth aspect, an embodiment of the present invention provides an application program, which includes program instructions, and when executed, is configured to perform the method of the first aspect.
In the embodiment of the invention, when the voice instruction is received, the intelligent equipment starts to process the voice instruction, meanwhile, the intelligent equipment plays a section of preset audio file, and after the preset audio file is played, the processing result of the voice instruction is played. The playing time of the preset audio file is not less than the target time required for obtaining the processing result, so that the intelligent device plays the preset audio file all the time in the process of processing the voice instruction, and the situation that the intelligent device is in a suspected non-response state in the time interval of processing the voice instruction can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an audio processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another audio processing method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of another audio processing method according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of an audio processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an intelligent device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the intelligent equipment needs a certain time from the receiving of a voice instruction to the obtaining of a processing result of the voice instruction, so that the intelligent equipment is in a suspected non-response state in the time, a user mistakenly thinks that the intelligent equipment is insensitive to reaction, and the experience effect on the intelligent equipment is reduced. In order to solve the above problem, embodiments of the present invention provide an audio processing method, an apparatus, an intelligent device, and a medium, which ensure that the intelligent device plays a preset audio file while processing a voice instruction, and starts to play a processing result after the preset audio file is played. The playing time of the preset audio file is not less than the target time required for obtaining the processing result, so that the preset audio file can be played by the intelligent equipment in the time period from the receiving of the voice instruction to the obtaining of the processing result of the voice instruction, and the intelligent equipment is prevented from being in a suspected non-response state in the time gap of the voice instruction processing.
In the embodiment of the invention, the intelligent device can refer to all devices which can interact with a user and play audio. The interaction between the intelligent device and the user may mean that the intelligent device may find a response result corresponding to a voice instruction or a touch operation input by the user in a database after receiving the voice instruction or the touch operation input by the user, and display the response result to the user in a form of voice playing or text output. For example, the intelligent device may be an intelligent sound box, an intelligent rearview mirror, or other intelligent devices; or the intelligent device can also be a mobile phone or a tablet computer and other portable devices; or the intelligent device can also be a television or other devices with a voice control function.
Referring to fig. 1, a flowchart of an audio processing method according to an embodiment of the present invention is shown, as shown in fig. 1, the audio processing method may include the following steps:
101. the intelligent device receives the voice instruction.
The smart device may be internally provided with or externally connected to at least one microphone device, and configured to receive a voice instruction input by a user, or the smart device may receive a voice instruction input by a user in other manners, which is not specifically limited in this embodiment. Optionally, the voice command received by the smart device may be input by the user, or may be input by other devices. For example, the smart device is a smart speaker, and if the user wants to listen to music using the smart speaker, the user may send a voice command similar to "play a certain balloon", to the smart speaker, where the voice command received by the smart device is input by the user. Or, the user can input a voice instruction to the smart sound box through the audio/video monitoring device, and the voice instruction received by the smart device is input by other devices. For example, a user installs an audio/video monitoring device at home, and when the user is not at home, the audio/video monitoring device can send a voice instruction to the intelligent device. For another example, to increase the interest of using the smart device, the voice command received by the smart device may be a piece of audio pre-recorded by the user.
Alternatively, in brief, the voice instruction may be a command sent to the smart device by using voice instead of touch operation, and after receiving the voice instruction, the smart device searches for a result corresponding to the command in its local database or other devices connected to the smart device and executes the result. Assuming that the intelligent device is an intelligent audio/video sound box or an intelligent rearview mirror, a user can speak a music name to be listened to the intelligent rearview mirror if the user wants to listen to the music but the operation is inconvenient when the user drives the vehicle, and the intelligent rearview mirror searches the music in a local database or a device connected with the local database after receiving a voice instruction input by the user and automatically plays the music. For another example, assuming that the smart device is a smart speaker, the smart speaker may be connected to another smart device, such as a mobile phone, in the same network environment, and the user may input a voice command, such as "query my logistics information", to the smart speaker, and then the smart speaker may obtain and play the related logistics information from the mobile phone after receiving the voice command "query my logistics information". If the intelligent device is a mobile phone or a tablet computer, and a user calls a certain phone according to a voice instruction input by the mobile phone, the mobile phone automatically dials a phone call to a corresponding contact person after receiving the voice instruction; assuming that the intelligent device is a television, and assuming that a voice command input by a user through a remote controller of the television is 'movie playing and monster capturing', the intelligent television searches for and plays the movie in a database.
102. And when the intelligent equipment processes the voice command, playing a preset audio file.
103. And when the preset audio file is played, the intelligent equipment plays the processing result of the voice instruction.
In 102, the preset audio file may be preset by the smart device or may be default by the smart device system. Optionally, in practical applications, because the complexity of the voice command received by the smart device is different, the time required by the smart device to process the voice command is different. For simpler voice instructions input by a user, such as 'turning up the volume', the time required by the intelligent device to process the voice instructions is shorter, and for more complex voice instructions input by the user, such as searching all articles of the same type in the whole network, the intelligent device may need longer time to complete the processing of the more complex instructions. In the time interval of processing the complex voice command, the intelligent device may be in a non-response state, so that the user can easily generate a bad feeling that the intelligent device is insensitive to reaction, and the user experience on the intelligent device is reduced. That is to say, when the smart device receives a more complex voice command input by the user, it takes a longer time to search for a processing result of the voice command, and the smart device is in a suspected non-response and non-response state in this period of time, so that the user feels that the smart device has no response or no response after receiving the voice command, which causes the user to generate a bad feeling that the smart device is insensitive to the response, and reduces the experience effect of the user on the smart device.
In order to solve the above problem, an audio processing method is provided in this embodiment, where after receiving a voice instruction, an intelligent device starts to process the voice instruction and simultaneously plays a preset audio file. The playing duration of the preset audio file is greater than or equal to the target duration required by the obtained processing result, so that the preset audio file can be ensured to be played by the intelligent equipment all the time in the processing process of the voice instruction, the boring sense of the intelligent equipment during the process of waiting for the processing result to be played can be relieved, and the situation that the intelligent equipment is in a suspected non-response state in the time gap of processing the voice instruction is avoided.
In step 102, optionally, the playing duration of the preset audio file played by the smart device is not less than the target duration required for obtaining the processing result, that is, the playing duration of the preset audio file is greater than or equal to the target duration required for obtaining the processing result. Optionally, in step 103, when the playing of the preset audio file is finished, the processing result of the voice instruction played by the intelligent device may be understood as that if the playing time of the preset audio file played in step 102 is exactly equal to the target time required for obtaining the processing result, when the playing of the preset audio file is finished, the intelligent device just starts playing the processing result; if the playing duration of the preset audio file is longer than the target duration required for obtaining the processing result, the intelligent device may stop playing the preset audio file and play the processing result when detecting that the processing result has been obtained, or may play the processing result after the preset audio file is played.
In step 102, optionally, the playing the preset audio file includes: playing a first audio file; and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file. Optionally, semantic contents of the first audio file and the second audio file are the same or different, for example, the preset audio file may be "good" and known ", the first audio file in the preset audio file may be" good ", the second audio file may be" known ", semantic contents of the first audio file and the second audio file included in the preset audio file are different, or the preset audio file may be" en ", and at this time, it may be understood that semantic contents of the first audio file and the second audio file included in the preset audio file are the same and are both" en ". In brief, when the intelligent device processes the voice instruction, if the played preset audio file comprises two parts, namely a first audio file and a second audio file, the intelligent device can play the first audio file in the period from the beginning of processing the voice instruction to the time when the processing result is obtained; after the processing result is obtained, a second audio file can be played; and when the second audio file is played, playing the processing result of the voice instruction. Therefore, the preset audio file can be played, and the situation that the intelligent equipment is in a suspected non-response state in the time interval for processing the voice command can be avoided.
For example, as shown in fig. 2, assuming that the smart device determines that the preset audio file to be played is "en" after receiving the voice command, the smart device plays the first audio file "en" during the period from the start of processing the voice command to the acquisition of the processing result, where "en" may be played after an extension process, such as "en …" that is played in an extended manner. When the intelligent equipment detects that the processing result is obtained, the playing of the first audio file 'en' can be stopped, the playing of the second audio file 'en' is started, and the playing time of the second audio file is not processed; and when the second audio file 'en' is played, playing the processing result.
Optionally, before the smart device plays the first audio file, the method further includes: determining a target duration required for obtaining a processing result; and processing the first audio file so that the playing time length of the first audio file is equal to the target time length. That is, in the example shown in fig. 2, before playing the first audio file "en", the smart device may first determine, according to the voice instruction, a target duration required from the receiving of the voice instruction to the obtaining of the processing result of the voice instruction, and then extend the playing duration of the first audio file "en" to be the same as the target duration, so that the playing of the first audio file is just finished when the smart device obtains the processing result.
Optionally, before the smart device plays the first audio file, the method further includes: and prolonging the playing time of the first audio file. The amount of the preset audio file to be extended is not particularly limited, and it is only required to ensure that the playing time of the preset audio file is not less than the target time required for obtaining the processing result. There are many ways to extend the playing duration of the first audio file, for example, the playing speed of the first audio file may be slowed down, or the playing intonation of the first audio file may be lengthened. In brief, before the first audio file is played, the intelligent device may process the first audio file to have the playing duration equal to the target duration, or may not depend on the target duration, but simply extend the playing duration of the first audio file.
Therefore, in step 102, when the smart device obtains the processing result of the voice instruction, the playing of the first audio file is stopped, and the playing of the second audio file is started, which has two possible implementation manners: firstly, when a processing result is obtained, the first audio file is just played to the end, the intelligent device plays the second audio file, and at the moment, the intelligent device processes the first audio file according to the target duration, so that the playing duration of the first audio file is equal to the target duration; secondly, when the processing result is obtained, the first audio file is not played completely, the intelligent device stops playing the first audio file and starts playing the second audio file, and at this time, it is indicated that the processing of the intelligent device on the first audio file is not based on the target duration, only the playing duration of the first audio file is prolonged, and how much the playing duration is specifically prolonged is not clear.
In the embodiment of the invention, when the voice instruction is received, the intelligent equipment starts to process the voice instruction, meanwhile, the intelligent equipment plays a section of preset audio file, and after the preset audio file is played, the processing result of the voice instruction is played. The playing time of the preset audio file is not less than the target time required for obtaining the processing result, so that the intelligent device plays the preset audio file all the time in the process of processing the voice instruction, and the situation that the intelligent device is in a suspected non-response state in the time interval of processing the voice instruction can be avoided.
Referring to fig. 3, another audio processing method according to an embodiment of the present invention is provided, as shown in fig. 3, the audio processing method may include the following steps:
301. the intelligent device receives the voice instruction.
302. And the intelligent equipment performs semantic analysis on the content in the voice instruction to obtain the voice content.
304. And the intelligent equipment determines a preset audio file according to the voice content.
The semantic analysis refers to learning and understanding semantic contents represented by a section of text or voice by using various methods, the semantic analysis is one of core tasks of natural language processing, the natural language processing realizes man-machine natural language communication, communication between a person and intelligent equipment is realized in the embodiment of the invention, and a user simply says that the user sends a voice instruction for instructing the intelligent equipment to execute certain operation; the intelligent equipment obtains the voice content which can be identified by the intelligent equipment through semantic analysis according to the voice instruction of the user, and then executes certain operation according to the voice content. In other words, in step 302, the smart device performs semantic analysis on the voice command to determine the voice content in the voice command, for example, the voice content may be a song played from a certain plum, an animation played from a certain animation, or a call made to a certain person; the intelligent device selects a preset audio file according to the voice content obtained by semantic analysis, for example, if the voice content is a certain song of plum, the intelligent device can select "you will play" as the preset audio file; if the voice content is turning up volume, the smart device may select "good" as the audio file.
Optionally, the determining, by the intelligent device, the preset audio file according to the voice content includes: the intelligent equipment searches for a preset audio file corresponding to the voice content obtained through semantic analysis from the corresponding relation between at least one group of preset voice content and the audio file. That is, the intelligent device may store a plurality of sets of correspondence between the voice content and the audio file in its database in advance, for example, the correspondence between the voice content and the audio file stored in advance by the intelligent device may be: the voice content is calling, and the corresponding preset audio is 'known'; the voice content is inquiry logistics information, and the corresponding preset audio is ding-dong and the like. After completing semantic analysis on the received voice instruction and obtaining voice content, the intelligent device searches an audio file corresponding to the voice content in the database and takes the audio file as a preset audio file. Therefore, the intelligent device can be guaranteed to find the preset audio file quickly and play the preset audio file, and power consumption is saved.
Optionally, the determining, by the intelligent device, the preset audio file according to the voice content may further include: the intelligent device searches the preset audio files from the audio database, that is, the intelligent device can store all the audio files in a unified audio database in advance. After the voice content of the voice instruction is obtained through analysis, a section of audio can be randomly selected from the audio database to serve as the preset audio, so that the effect that the user uses the intelligent equipment is increased, and the experience effect of the user on the intelligent equipment is improved, even if the same voice instruction is received every time, the preset audio files are different.
Optionally, the smart device may also determine the preset audio file in other manners, including: acquiring a historical play record of the intelligent equipment, wherein the historical play record refers to a played processing result of the intelligent equipment; and determining a preset audio file according to the historical play record. In other words, if the speech content obtained by the smart device through semantic analysis is not clear, the smart device may choose to determine the preset audio file according to the history playing record. Specifically, the intelligent device may determine which audio and video have been played by the intelligent device within a preset time period according to a history playing record of the intelligent device within a certain time period, and select an audio file related to the history playing record from the local database as a preset audio file. The preset time may be set by the smart device, and may be several hours, weeks, or months, or may be any other time period. For example, if the smart device receives a voice command and does not have a definite voice content after semantic analysis, the smart device may select a preset audio file according to the history play record. The preset time set by the intelligent device can be assumed to be one week, and the playing records acquired by the intelligent device within one week are assumed as follows: if the song is played 50 times, the music short piece is played MV30 times, and the logistics information is inquired for 1 time, it can be determined that the most audios and videos are played by the intelligent device in the last week, and then the intelligent device can select 'will play for you' in a local database or a preset audio file database as a preset audio file.
304. And when the intelligent equipment processes the voice command, playing a preset audio file.
305. And when the preset audio file is played, the intelligent equipment plays the processing result of the voice instruction.
In step 304, optionally, if the preset audio file includes a first audio file and a second audio file, playing the preset audio file includes: playing a first audio file with a first tone; and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone. That is to say, when playing the preset audio file, the smart device may play the first audio file and the second audio file through different tones. For example, when the smart device plays a preset audio file including a first audio file and a second audio file, in order to show naturalness, the tone of the audio file needs to be processed in high and low, for example, the first audio file may be processed such that the sound of the first 0.5s is large (the decibel number is large) and the sound of the second 0.3s is small (or gradually small) (the decibel number is small), that is, the first audio file is played according to a first tone, that is, a gradually decreasing tone. Or the smart device may play the first audio file using the other tones as the first tone. Similarly, the second tone may be a default tone of the preset audio file, a gradually decreasing tone, or another tone set by the smart device.
In step 304, optionally, playing a preset audio file, including: and playing the preset audio file according to the target tone. Optionally, playing the preset audio file with the target tone color includes: acquiring the tone of the voice command as a target tone through a deep learning model; and playing the preset audio file according to the target tone. That is to say, when playing the preset audio file, the intelligent device may play the preset audio file without using the tone of the preset audio file, and may play the preset audio file using the tone of the voice instruction as the target tone, so as to increase interactivity and interestingness. For example, if the voice instruction received by the intelligent device is input by the user, the intelligent device obtains the tone when the user inputs the voice instruction through the deep learning model, and plays the preset audio file with the tone as the target tone, so that the user can feel the interaction with the intelligent device, and the experience effect on the intelligent device is improved.
Optionally, before obtaining the tone of the voice instruction through the deep learning model, the method further includes: acquiring a sample audio file; and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions. That is, the deep learning model for obtaining the timbre of the voice instruction is trained in advance using the sample audio file. And the intelligent equipment inputs the voice command into the trained deep learning model, so that the tone corresponding to the voice command can be obtained.
After receiving the voice command, the intelligent equipment starts to process the voice command, and performs voice analysis on the content in the voice command to obtain voice content; further determining a preset audio file according to the voice content, and playing the preset audio file; after the preset audio file is played, the processing result of the voice instruction is played, and because the playing time of the preset audio file is longer than or equal to the target time of the obtained processing result, the preset audio file can be ensured to be played all the time in the process of processing the voice instruction by the intelligent equipment, so that the intelligent equipment is prevented from being in a suspected non-response state in the time gap of processing the voice instruction.
Referring to fig. 4, a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention, as shown in fig. 4, the audio processing apparatus may include a receiving unit 401, a processing unit 402, and a playing unit 403:
a receiving unit 401, configured to receive a voice instruction;
a processing unit 402 for processing voice instructions;
a playing unit 403, configured to play a preset audio file while the processing unit processes the voice instruction, where a playing duration of the preset audio file is not less than a target duration required for obtaining the processing result;
the playing unit 403 is further configured to play the processing result of the voice instruction when the preset audio file is played.
Optionally, the audio processing apparatus further includes a determining unit 404: the processing unit 402 is further configured to perform semantic analysis on the content in the voice instruction to obtain voice content; a determining unit 404, configured to determine a preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, and the specific way for playing the preset audio file by the playing unit 403 is as follows: playing a first audio file; and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file.
Optionally, the audio processing apparatus further includes: a determining unit 404, further configured to determine a target duration required for obtaining the processing result; the processing unit 402 is further configured to process the first audio file, so that the playing time length of the first audio file is equal to the target time length.
Optionally, the voice contents of the first audio file and the second audio file are the same or different.
Optionally, the specific way for playing the preset audio file by the playing unit is as follows: playing a first audio file with a first tone; and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone.
Optionally, the specific way that the playing unit is used for playing the preset audio file is as follows: and playing the preset audio file according to the target tone.
Optionally, playing the preset audio file with the target tone color includes: acquiring the tone of the voice command as a target tone through a deep learning model; and playing the preset audio file according to the target tone.
Optionally, the deep learning model is used to obtain the tone of the voice command, and before the target tone, the apparatus further includes: acquiring a sample audio file; and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions.
After the receiving unit 401 receives the voice command, the processing unit 402 starts to process the voice command, and at the same time, the playing unit 403 plays the preset audio file, and when the playing of the preset audio file is finished, the playing unit 403 plays the processing result of the voice command. The problem that the intelligent equipment is in a non-response state in the time interval of processing the voice command is solved by playing the preset audio file, so that intelligent response of the intelligent equipment is realized.
It can be understood that the functions of each functional module and unit of the processing apparatus in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
Fig. 5 is a schematic block diagram of an intelligent device according to an embodiment of the present invention. The intelligent device in the present embodiment shown in fig. 5 may include: one or more processors 501, one or more input devices 502, one or more output devices 503, and memory 504. The processor 501, the input device 502, the output device 503, and the memory 504 are connected by a bus 505. The memory 504 is used for storing a computer program comprising program instructions, and the processor 501 is used for executing the program instructions stored by the memory 504. Wherein the processor 501 is configured to invoke program instructions to perform:
receiving a voice instruction;
playing a preset audio file while processing the voice instruction;
when the preset audio file is played, playing a processing result of the voice instruction;
the playing time of the preset audio file is not less than the target time required by the acquired processing result.
Optionally, while processing the voice instruction, the processor 501 is configured to call the program instruction to further perform:
performing semantic analysis on the content in the voice instruction to obtain voice content;
and determining a preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, the preset audio file is played, and the processor 501 is configured to call a program instruction to specifically execute:
playing a first audio file;
and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file.
Optionally, before playing the first audio file, the processor 501 is configured to call the program instructions to further perform:
determining a target duration required for obtaining a processing result;
and processing the first audio file so that the playing time length of the first audio file is equal to the target time length.
Optionally, the semantic content of the first audio file and the semantic content of the second audio file are the same or different.
Optionally, the preset audio file is played, and the processor 501 is configured to call a program instruction to specifically execute:
playing a first audio file with a first tone;
and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone.
Optionally, the preset audio file is played, and the processor 501 is configured to call a program instruction to specifically execute: and playing the preset audio file according to the target tone.
Optionally, the processor 501 is configured to call a program instruction to specifically execute:
acquiring the tone of the voice command as a target tone through a deep learning model;
and playing the preset audio file according to the target tone.
Optionally, before obtaining the tone color of the voice command through the deep learning model, the processor 501 is configured to call the program command to further perform:
acquiring a sample audio file;
and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions.
It should be understood that, in the embodiment of the present invention, the Processor 501 may be a Central Processing Unit (CPU), and the Processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 502 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone (for receiving voice instructions), etc., and the output device 503 may include a display (LCD, etc.), a speaker (for playing audio files), etc.
The memory 504 may include a read-only memory and a random access memory, and provides instructions and data to the processor 501. A portion of the memory 504 may also include non-volatile random access memory. For example, the memory 504 may also store device type information.
In a specific implementation, the processor 501, the input device 502, and the output device 503 described in this embodiment of the present invention may execute the implementation manners described in the embodiments of the audio processing method provided in fig. 1 and fig. 3 and the embodiments of the audio processing apparatus provided in fig. 4 of the present invention, which are not described herein again.
In an embodiment of the present invention, there is provided a medium storing a computer program comprising program instructions that when executed by a processor implement:
receiving a voice instruction;
playing a preset audio file while processing the voice instruction;
when the preset audio file is played, playing a processing result of the voice instruction;
the playing time of the preset audio file is not less than the target time required by the acquired processing result.
Optionally, while processing the voice instruction, the program instruction when executed by the processor further implements:
performing semantic analysis on the content in the voice instruction to obtain voice content;
and determining a preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, the preset audio file is played, and the program instruction is specifically implemented when executed by the processor:
playing a first audio file;
and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file.
Optionally, before playing the first audio file, the program instructions when executed by the processor further implement:
determining a target duration required for obtaining a processing result;
and processing the first audio file so that the playing time length of the first audio file is equal to the target time length.
Optionally, the semantic content of the first audio file and the semantic content of the second audio file are the same or different.
Optionally, when the preset audio file is played, the program instructions are executed by the processor to implement:
playing a first audio file with a first tone;
and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone.
Optionally, when the preset audio file is played, the program instructions are executed by the processor to implement:
and playing the preset audio file according to the target tone.
Optionally, the preset audio file is played in the target tone, and the program instructions, when executed by the processor, further implement:
acquiring the tone of the voice command as a target tone through a deep learning model;
and playing the preset audio file according to the target tone.
Optionally, with the deep learning model, before obtaining the tone of the voice instruction, the program instruction when executed by the processor further implements:
acquiring a sample audio file;
and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions.
In one embodiment of the invention, there is also provided an application program comprising program instructions that when executed perform:
receiving a voice instruction;
playing a preset audio file while processing the voice instruction;
when the preset audio file is played, playing a processing result of the voice instruction;
the playing time of the preset audio file is not less than the target time required by the acquired processing result.
Optionally, while processing the voice instruction, the program instruction when executed is configured to specifically perform:
performing semantic analysis on the content in the voice instruction to obtain voice content;
and determining a preset audio file according to the voice content.
Optionally, the preset audio file includes a first audio file and a second audio file, the preset audio file is played, and the program instruction is configured to specifically perform:
playing a first audio file;
and when the processing result of the voice instruction is obtained, stopping playing the first audio file and starting playing the second audio file.
Optionally, before playing the first audio file, the program instructions when executed are further configured to: :
determining a target duration required for obtaining a processing result;
and processing the first audio file so that the playing time length of the first audio file is equal to the target time length.
Optionally, the semantic content of the first audio file and the semantic content of the second audio file are the same or different.
Optionally, a preset audio file is played, and when executed, the program instructions are configured to specifically perform:
playing a first audio file with a first tone;
and when the processing result of the voice instruction is acquired, playing a second audio file by using a second tone.
Optionally, a preset audio file is played, and when executed, the program instructions are configured to specifically perform:
and playing the preset audio file according to the target tone.
Optionally, the preset audio file is played in the target tone, and the program instructions, when executed, are further configured to:
acquiring the tone of the voice command as a target tone through a deep learning model;
and playing the preset audio file according to the target tone.
Optionally, before obtaining the tone of the voice command through the deep learning model, the program command when executed is further configured to:
acquiring a sample audio file;
and performing tone recognition training on the preset deep learning model by using the sample audio file to obtain the deep learning model meeting the preset conditions.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware that is related to instructions of a computer program, and the program can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (14)

1. An audio processing method, comprising:
receiving a voice instruction;
performing semantic analysis on the content in the voice instruction;
under the condition that the voice content of the voice instruction is obtained after semantic analysis, searching a preset audio file corresponding to the voice content obtained by the semantic analysis from the corresponding relation between at least one group of preset voice content and the audio file; under the condition that the voice content of the voice instruction cannot be accurately obtained after semantic analysis, obtaining a historical play record of the intelligent equipment, wherein the historical play record is a processing result played by the intelligent equipment, and determining the preset audio file according to the historical play record; the preset audio files comprise a first audio file and a second audio file;
processing the voice instruction and simultaneously playing the preset audio file; playing the preset audio file comprises playing the first audio file, stopping playing the first audio file when a processing result of the voice instruction is obtained, and starting playing the second audio file;
when the preset audio file is played, playing a processing result of the voice instruction;
and the playing time length of the preset audio file is not less than the target time length required for obtaining the processing result.
2. The method of claim 1, wherein prior to playing the first audio file, the method further comprises:
determining a target duration required for acquiring the processing result;
and processing the first audio file to enable the playing time length of the first audio file to be equal to the target time length.
3. The method of claim 1, wherein the semantic content of the first audio file and the second audio file is the same or different.
4. The method according to any one of claims 1-3, wherein the playing the preset audio file comprises:
playing the first audio file at a first tone;
and when the processing result of the voice instruction is acquired, playing the second audio file by using a second tone.
5. The method of claim 1, wherein the playing the preset audio file comprises:
obtaining the tone of the voice instruction through a deep learning model to be used as a target tone;
and playing the preset audio file according to the target tone.
6. The method of claim 5, wherein before obtaining the timbre of the voice command through the deep learning model, the method further comprises:
acquiring a sample audio file;
and performing tone recognition training on a preset deep learning model by using the sample audio file to obtain the deep learning model meeting preset conditions.
7. An audio processing apparatus, comprising:
a receiving unit for receiving a voice instruction;
the processing unit is used for carrying out semantic analysis on the content in the voice instruction;
a determination unit configured to:
under the condition that the voice content of the voice instruction is obtained after semantic analysis, searching a preset audio file corresponding to the voice content obtained by the semantic analysis from the corresponding relation between at least one group of preset voice content and the audio file; under the condition that the voice content of the voice instruction cannot be accurately obtained after semantic analysis, obtaining a historical play record of the intelligent equipment, wherein the historical play record is a processing result played by the intelligent equipment, and determining the preset audio file according to the historical play record; the preset audio files comprise a first audio file and a second audio file;
the processing unit is used for processing the voice instruction;
the playing unit is used for playing the preset audio file while the processing unit processes the voice instruction, wherein the playing of the preset audio file comprises playing of the first audio file, and when the processing result of the voice instruction is obtained, the playing of the first audio file is stopped, and the playing of the second audio file is started;
and the playing unit is also used for playing the processing result of the voice instruction when the preset audio file is played.
8. The apparatus of claim 7,
the determining unit is further configured to determine a target duration required for obtaining the processing result;
the processing unit is further configured to process the first audio file, so that the playing time length of the first audio file is equal to the target time length.
9. The apparatus of claim 7, wherein the semantic content of the first audio file and the second audio file is the same or different.
10. The apparatus according to any one of claims 7-9, wherein the playing unit is configured to play the preset audio file in a specific manner:
playing the first audio file at a first tone;
and when the processing result of the voice instruction is acquired, playing the second audio file by using a second tone.
11. The apparatus according to claim 7, wherein the playing unit is configured to play the preset audio file in a specific manner:
obtaining the tone of the voice instruction through a deep learning model to be used as a target tone;
and playing the preset audio file according to the target tone.
12. The apparatus according to claim 11, wherein the playback unit is specifically configured to:
acquiring a sample audio file;
and performing tone recognition training on a preset deep learning model by using the sample audio file to obtain the deep learning model meeting preset conditions.
13. A smart device comprising a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program that enables the smart device to perform an audio processing method, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the audio processing method of any of claims 1-6.
14. A storage medium, characterized in that the medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the audio processing method according to any one of claims 1-6.
CN201810276931.2A 2018-03-30 2018-03-30 Audio processing method and device, intelligent equipment and medium Active CN108492826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810276931.2A CN108492826B (en) 2018-03-30 2018-03-30 Audio processing method and device, intelligent equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810276931.2A CN108492826B (en) 2018-03-30 2018-03-30 Audio processing method and device, intelligent equipment and medium

Publications (2)

Publication Number Publication Date
CN108492826A CN108492826A (en) 2018-09-04
CN108492826B true CN108492826B (en) 2021-05-04

Family

ID=63317160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810276931.2A Active CN108492826B (en) 2018-03-30 2018-03-30 Audio processing method and device, intelligent equipment and medium

Country Status (1)

Country Link
CN (1) CN108492826B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10721350B1 (en) * 2018-08-21 2020-07-21 Wells Fargo Bank, N.A. Fraud detection in contact centers using deep learning model
CN109377734A (en) * 2018-10-15 2019-02-22 深圳市道通智能航空技术有限公司 Phonetic prompt method, speech prompting system, mobile controlling terminal and voice prompting equipment
CN114664032A (en) * 2022-03-18 2022-06-24 上海商汤智能科技有限公司 Voice broadcasting method, system, device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201532764U (en) * 2009-11-20 2010-07-21 余超 Vehicle-mounted sound-control wireless broadband network audio player
JP4506004B2 (en) * 2001-03-01 2010-07-21 ソニー株式会社 Music recognition device
CN104424944A (en) * 2013-08-19 2015-03-18 联想(北京)有限公司 Information processing method and electronic device
CN105897531A (en) * 2016-06-21 2016-08-24 美的智慧家居科技有限公司 Mobile terminal and voice control system and voice control method for household appliances
CN107820111A (en) * 2016-09-12 2018-03-20 船井电机株式会社 Information equipment
CN107833574A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method and apparatus for providing voice service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4506004B2 (en) * 2001-03-01 2010-07-21 ソニー株式会社 Music recognition device
CN201532764U (en) * 2009-11-20 2010-07-21 余超 Vehicle-mounted sound-control wireless broadband network audio player
CN104424944A (en) * 2013-08-19 2015-03-18 联想(北京)有限公司 Information processing method and electronic device
CN105897531A (en) * 2016-06-21 2016-08-24 美的智慧家居科技有限公司 Mobile terminal and voice control system and voice control method for household appliances
CN107820111A (en) * 2016-09-12 2018-03-20 船井电机株式会社 Information equipment
CN107833574A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Method and apparatus for providing voice service

Also Published As

Publication number Publication date
CN108492826A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
JP6811758B2 (en) Voice interaction methods, devices, devices and storage media
KR102505597B1 (en) Voice user interface shortcuts for an assistant application
CN109326289B (en) Wake-up-free voice interaction method, device, equipment and storage medium
KR102437944B1 (en) Voice wake-up method and device
US10068573B1 (en) Approaches for voice-activated audio commands
TWI511125B (en) Voice control method, mobile terminal apparatus and voice controlsystem
US9983849B2 (en) Voice command-driven database
CN107766482B (en) Information pushing and sending method, device, electronic equipment and storage medium
CN111199732B (en) Emotion-based voice interaction method, storage medium and terminal equipment
JP6783339B2 (en) Methods and devices for processing audio
CN108492826B (en) Audio processing method and device, intelligent equipment and medium
CN110211589B (en) Awakening method and device of vehicle-mounted system, vehicle and machine readable medium
US20200265843A1 (en) Speech broadcast method, device and terminal
US11062708B2 (en) Method and apparatus for dialoguing based on a mood of a user
CN104123938A (en) Voice control system, electronic device and voice control method
WO2017160498A1 (en) Audio scripts for various content
JP2019133127A (en) Voice recognition method, apparatus and server
JP2023506087A (en) Voice Wakeup Method and Apparatus for Skills
CN109686372A (en) Resource control method for playing back and device
US10693944B1 (en) Media-player initialization optimization
CN113157240A (en) Voice processing method, device, equipment, storage medium and computer program product
CN111063356A (en) Electronic equipment response method and system, sound box and computer readable storage medium
CN110660393B (en) Voice interaction method, device, equipment and storage medium
CN108922523B (en) Position prompting method and device, storage medium and electronic equipment
CN112786031B (en) Man-machine conversation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant