CN113674739B

CN113674739B - Time determination method, device, equipment and storage medium

Info

Publication number: CN113674739B
Application number: CN202110819212.2A
Authority: CN
Inventors: 赵倩芸; 姜银峰
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2023-12-19
Anticipated expiration: 2041-07-20
Also published as: CN113674739A

Abstract

In the time determining method provided by the embodiment of the application, the preset first voice can be played through the sound source, so that the voice interaction device responds according to the preset first voice, and the second voice is played. The audio signal generated in the test process can be collected through the audio collection module arranged between the audio source and the voice interaction equipment, and a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice are obtained. In this way, according to the time parameter of the first audio signal and the time parameter of the second audio signal, the waiting time before the voice interaction device sends out the response after receiving the voice command, namely the interaction waiting time, can be determined. Therefore, by disposing the sound source and the audio acquisition module, the automatic test of the interaction waiting time of the voice interaction equipment can be realized.

Description

Time determination method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a time determining method, apparatus, device, and storage medium.

Background

With the development of computer technology, voice interaction devices such as intelligent sound boxes and the like are widely applied. The voice interaction device can collect audio signals sent by a user through an audio receiving module such as a microphone and the like, analyze the audio signals, determine instructions sent by the user and execute corresponding operations. Optionally, after the audio signal sent by the user is collected, the voice interaction device can play the audio signal through a sounding module such as a speaker, and feedback the audio signal sent by the user to complete the process of interaction with the user.

For a voice interaction device that interacts with a user through an audio signal, the time interval from the moment when the user utters ends to the moment when the voice interaction device starts playing the fed-back audio signal is called interaction latency. In the interaction waiting time, the user needs to wait for the response of the voice interaction device. Obviously, the longer the interaction latency, the worse the user experience. In order to optimize the voice interaction device, it is necessary to determine the interaction latency of the voice device. However, the conventional method of determining the interaction waiting time has a problem of being inaccurate.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a time determining method and a time determining device.

In a first aspect, an embodiment of the present application provides a method for determining a time, where the method includes:

acquiring an audio signal, wherein the audio signal comprises a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after receiving the first audio signal;

determining a time parameter of the first audio signal and a time parameter of the second audio signal;

And determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

In a second aspect, an embodiment of the present application provides a time processing apparatus, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring audio signals, the audio signals comprise a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent out after the voice interaction equipment receives the first audio signal;

a first processing unit for determining a time parameter of the first audio signal and a time parameter of the second audio signal;

and the second processing unit is used for determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

In a third aspect, an embodiment of the present application provides a time determining system, where the system includes a sound source, a voice interaction device, and a time determining device, where the time determining device includes an acquisition module and a processing module, where the acquisition module is disposed between the sound source and the voice interaction device;

The sound source is used for playing preset voice;

the voice interaction equipment is used for receiving preset voice and sending response voice according to the preset voice;

the acquisition module is configured to acquire an audio signal, where the audio signal includes a first audio signal corresponding to the preset voice and a second audio signal corresponding to the response voice, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

the processing module is used for determining the time parameter of the first audio signal and the time parameter of the second audio signal; determining interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal;

the time determining device is further configured to perform the time determining method according to any one of the embodiments of the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the time determination method as described in any of the embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a time determination method as described in any of the embodiments of the present application.

In the time determining method provided by the embodiment of the application, the preset first voice can be played through the sound source, so that the voice interaction device responds according to the preset first voice, and the second voice is played. The audio signal generated in the test process can be collected through the audio collection module arranged between the audio source and the voice interaction equipment, and a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice are obtained. Thus, the first audio signal corresponding to the first voice represents the voice command sent by the sound source, and the second audio signal corresponding to the voice represents the response of the voice interaction device to the voice command. Thus, according to the time parameter of the first audio signal and the time parameter of the second audio signal, the waiting time before the voice interaction device sends out the response after receiving the voice command, namely the interaction waiting time, can be determined. Therefore, by disposing the sound source and the audio acquisition module, the automatic test of the interaction waiting time of the voice interaction equipment can be realized. Compared with the traditional manual testing method, the method can improve the testing accuracy on one hand and save manpower and material resources on the other hand.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a scenario of a time determining system provided in an embodiment of the present application;

fig. 2 is an interaction diagram of voice and audio signals in a transmission process according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a time determining method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a device for determining time according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it is to be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the present application. It should be understood that the drawings and examples of the present application are for illustrative purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.

It should be noted that references to "one" or "a plurality" in this application are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be interpreted as "one or more" unless the context clearly indicates otherwise.

The voice interaction device may interact with the user through audio data. Specifically, in the case where the user can issue an instruction by voice, the instruction is called a voice instruction. The voice interaction device can collect the audio signal corresponding to the voice command, and determine the task required to be executed by the voice interaction device according to the audio signal, so as to perform corresponding operation. In some possible implementations, to let the user know that the voice interaction device has received the instruction, the voice interaction device may play the feedback audio after receiving the voice instruction.

For example. Assuming that the user speaks the voice of "give me place singer a," the voice interaction device may collect the audio signal through the microphone and parse the audio signal to determine that the user wants to control the voice interaction device to play the song of singer a. Then, the voice interaction device plays the preset feedback audio, for example, plays "your instruction received" through a speaker, and indicates to the user that the instruction sent by the user has been received. The voice interaction device may then look up singer a's song from the network and play it.

In this process, the time interval from the end of the user's utterance to the first start of the utterance by the voice interaction device is called interaction latency. That is, in the example set forth above, the time interval from the time when the user finishes speaking the sentence "give me to the singer a" to the time when the voice interaction device starts playing "your instruction has been received" is the interaction waiting time. And in the interaction waiting time, the voice interaction equipment does not perform voice interaction with the user, and the user is in a waiting state. Obviously, the longer the interaction latency, the longer the user is in a waiting state, the worse the user experience.

Therefore, the technician can determine the interaction waiting time of the voice interaction device, and then optimize the voice interaction device according to the interaction waiting time. Currently, the interaction waiting time of the voice interaction device is mostly manually performed by a tester. For example, the tester may issue an instruction to the voice interaction device, start timing after issuing the instruction, and stop timing after hearing the voice fed back by the voice interaction device, where the recorded time is the interaction waiting time of the voice interaction device.

Therefore, the traditional method for testing the interaction waiting time has the problem of inaccuracy due to the fact that the manual timing of technicians is relied on, and the labor consumption is high.

In order to solve the problems in the prior art, embodiments of the present application provide a time determining method, apparatus, device and storage medium, which are described in detail below with reference to the accompanying drawings.

In order to facilitate understanding of the technical solution provided in the embodiments of the present application, first, description will be made with reference to the scenario example shown in fig. 1.

Referring to fig. 1, the diagram is a schematic frame diagram of an exemplary application scenario provided in an embodiment of the present application. In fig. 1, a sound source 10, a voice interaction device 20, an audio acquisition module 30 and a processing module 40 are included. Wherein, the sound source 10 can play voice and send out a first audio signal; the voice interaction device 20 may collect an audio signal sent from the audio source 10 and respond to the audio signal to send out a second audio signal; the audio collection module 30 is located between the audio source 10 and the voice interaction device 20, and can collect a first audio signal sent by the audio source 10 and a second audio signal sent by the voice interaction device 20; the processing module 40 is connected to the audio acquisition module 30, and is capable of receiving the first audio signal and the second audio signal acquired by the audio acquisition module 30, and determining the interaction waiting time of the voice interaction device according to the first audio signal and the second audio signal. The time determination method provided by the embodiments of the present application may be performed by the processing module 40.

Alternatively, the audio source 10 may be a human mouth or a speaker box with a speaker, the voice interaction device 20 under test may be a smart speaker box, a smart desk lamp or other terminal device with voice interaction function (e.g. a mobile phone, a tablet computer, etc.), and the audio acquisition module 30 may include a microphone array, which may include one or more microphones.

It should be noted that, in the embodiment shown in fig. 1, the audio acquisition module 30 and the processing module 40 may be different devices, or may be different entity modules in the same device. For example, the audio acquisition module 30 may be a microphone or microphone array of a test device and the processing module 40 may be a processor of the same test device.

In the time determining method provided by the embodiment of the application, the sound source simulation user can send an instruction to the voice interaction device to simulate an actual voice interaction scene. The interaction waiting time of the voice interaction device can be determined by calculating the time interval between the time when the sound source plays the voice and the time when the voice interaction device plays the response voice.

First, taking the embodiment shown in fig. 1 as an example, the propagation process of the voice and audio signals in the embodiment of the present application will be described. Referring to fig. 2, fig. 2 is an interaction diagram of voice and audio signals in a transmission process according to an embodiment of the present application, including:

S201: the sound source plays the first voice.

In the process of testing the interaction waiting time of the voice interaction device, the first voice can be played through the sound source. The first voice may be a preset section of speech, for example, may be a wake-up word of the voice interaction device, or a preset instruction of the voice interaction device.

S202: the voice interaction device and the audio acquisition module receive a first voice.

After the first voice is played by the sound source, the audio acquisition module located between the sound source and the voice interaction device can acquire the first voice and convert the first voice into an audio signal. Similarly, the voice interaction device may also collect the first voice and convert the first voice into an audio signal for subsequent processing.

S203: the voice interaction device generates and plays the second voice according to the received first voice.

After receiving the first voice, the voice interaction device may generate a second voice for responding to the first voice according to the first voice, and play the second voice through a speaker or the like of the voice interaction device. Specifically, the voice interaction device may recognize the meaning of the first voice, so as to determine an instruction to be completed according to the first voice, and then generate and play the second voice according to the instruction. For example, assuming that the first voice is a wake word of the voice interaction device, the second voice may be a preset response word for answering the wake word.

S204: the audio acquisition module receives the second voice.

After the voice interaction device plays the second voice, the audio acquisition module may receive the second voice and convert the second voice into an audio signal.

S205: the audio acquisition module sends the acquired audio signals to the processing module.

After converting the first speech and/or the second speech to an audio signal, the audio acquisition module may send the audio signal to the processing module. Optionally, the method comprises the step of. The audio collection module may first send an audio signal to the processing module after the audio signal corresponding to the first voice is collected, and send an audio signal to the processing module again after the audio signal corresponding to the second voice is collected. The audio collection module may also collect the audio signal corresponding to the first set of voices, store the audio signal, and record time information of the audio signal. After the audio signal corresponding to the second voice is collected, the audio collection module can uniformly send the audio signals corresponding to the first voice and the second voice respectively and related time information to the processing module.

In some possible implementations, the audio acquisition module may include one or more units with an audio acquisition function. For example, the audio acquisition module may include a microphone array that may include one or more microphones that may each independently receive speech and convert the speech into an audio signal, and the audio signal sent by the audio acquisition module to the processing module may include multiple segments of audio signals. Alternatively, when the audio collection module includes a plurality of units having a function of collecting audio, different units having a function of collecting audio are disposed at different positions.

The processing module may execute the time determining method provided in the embodiment of the present application after receiving the audio signal collected by the audio collecting module, and a specific execution process may be described below.

Fig. 3 is a schematic flow chart of a time determining method provided in the embodiment of the present application, and the embodiment may be suitable for a scenario of testing interaction waiting time of a voice interaction device. The method may be performed by any device having data processing capabilities and having or with a device having audio acquisition capabilities.

The method is described below as an example performed by a processing module of the test apparatus. As shown in fig. 3, the method specifically includes the following steps:

s301: the processing module acquires an audio signal.

To determine the interaction latency of the voice interaction device, the processing module may access the audio signal. As can be seen from the foregoing description, the audio signal may be an audio signal collected by the audio collection module during interaction of the audio source with the voice interaction device, including a first audio signal and a second audio signal. Wherein the first audio signal is an audio signal from a sound source, and the second audio signal is an audio signal emitted by the voice interaction device after receiving the first audio signal.

In the embodiment of the application, the audio signal sent by the audio acquisition module to the further processing module may include multiple segments of audio signals, and the starting time of each segment of audio signal is different. The processing module may determine the first audio signal and the second audio signal from the plurality of segments of audio signals collected by the audio collection module prior to determining the interaction latency. As can be seen from the foregoing description, the audio acquisition module in the embodiments of the present application may include one unit with an audio acquisition function, or may include a plurality of units with audio acquisition functions. The method by which the processing module determines the first audio signal and the second audio signal will be described below in each case.

If the audio acquisition module comprises only one unit with the function of acquiring audio, for example the audio acquisition module comprises only one microphone, the processing module may determine the first audio signal and the second audio signal based on the start time of the audio signals.

Specifically, in the process that the sound source and the voice interaction device perform voice interaction, the audio acquisition module acquires only two sections of voices. The voice with the earlier starting time is the first voice played by the sound source, and the voice with the later starting time is the response of the voice interaction device to the first voice, namely the second voice. Then, when the audio signal transmitted from the audio acquisition module to the processing module includes two audio signals, the processing module may determine an audio signal having an earlier start time of the two audio signals as a first audio signal and an audio signal having a later start time as a second audio signal.

If the audio acquisition module comprises a plurality of units with an audio acquisition function, for example, the audio acquisition module comprises a plurality of microphones, each unit can independently acquire an audio signal. In the process of voice interaction between the sound source and the voice interaction equipment, each unit with the function of collecting the audio can collect two sections of audio signals. The audio acquisition module may send more than two segments of audio signals to the processing module, and the processing module may determine the first audio signal and the second audio signal by two methods. The following description will be made separately.

It is assumed that the audio acquisition module includes a microphone array, and the microphone array includes a first microphone and a second microphone for illustration. In the process of voice interaction between the sound source and the voice interaction equipment, the sound source plays the first voice, and the voice interaction equipment plays the second voice. In this process, the first microphone may collect the first voice and the second voice, resulting in a first segment of audio signal and a second segment of audio signal. Likewise, the second microphone may collect the first voice and the second voice to obtain a third audio signal and a fourth audio signal.

In a first possible implementation, the processing module may determine the first audio signal and the second audio signal based on the locations of the first microphone and the second microphone. In particular, it is assumed that the distance from the first microphone to the sound source is smaller than the distance from the second microphone to the sound source, and that the distance from the second microphone to the voice interaction device is larger than the distance from the first microphone to the voice interaction device. The audio device may determine the first audio signal from the first and second audio signals and the second audio signal from the third and fourth audio signals.

Because the first microphone is closer to the sound source, the first microphone has better acquisition effect on the first voice, and the processing module can determine the first audio signal from the first section audio signal and the second section audio signal acquired by the first microphone.

Optionally, the processing module may compare the intensity of the first audio signal with the intensity of the second audio signal, and if the intensity of the first audio signal is greater than the intensity of the second audio signal, the intensity of the voice corresponding to the first audio signal is greater than the intensity of the voice corresponding to the second audio signal. In this way, the processing module may determine that the device that emits the voice corresponding to the first segment of audio signal is a sound source, and the first segment of audio signal is an audio signal corresponding to the first voice, thereby determining the first segment of audio signal as the first audio signal.

Optionally, the processing module may compare a start time of the first audio signal with a start time of the second audio signal, and if the start time of the first audio signal is earlier than the intensity of the second audio signal, it is indicated that the first microphone collects the voice corresponding to the first audio signal first, and then collects the voice corresponding to the second audio signal, where the first audio signal is an audio signal corresponding to the first voice. In this way, the processing module may determine the first segment of the audio signal as the first audio signal.

Similarly, since the second microphone is closer to the voice interaction device, the second microphone has better acquisition effect on the second voice, and the processing module can determine the second audio signal from the third audio signal and the fourth audio signal acquired by the second microphone.

Optionally, the processing module may compare the intensity of the third audio signal with the intensity of the fourth audio signal, and if the intensity of the third audio signal is greater than the intensity of the fourth audio signal, the intensity of the voice corresponding to the third audio signal is greater than the intensity of the voice corresponding to the fourth audio signal. In this way, the processing module may determine that the device that emits the voice corresponding to the third segment of audio signal is a voice interaction device, and the third segment of audio signal is an audio signal corresponding to the second voice, thereby determining the third segment of audio signal as the second audio signal.

Optionally, the processing module may compare a start time of the third audio signal with a start time of the fourth audio signal, and if the start time of the third audio signal is earlier than the intensity of the fourth audio signal, it is indicated that the second microphone collects the voice corresponding to the third audio signal first, and then collects the voice corresponding to the fourth audio signal, where the fourth audio signal is an audio signal corresponding to the second voice. In this way, the processing module may determine the fourth segment of the audio signal as the second audio signal.

In a second possible implementation, the processing module may determine the first audio signal and the second audio signal based on a start time and an intensity of each of the first audio signal, the second audio signal, the third audio signal, and the fourth audio signal.

Specifically, the processing module may compare the start times of two audio signals collected by the same microphone, and determine an audio signal corresponding to the first voice and an audio signal corresponding to the second voice from the two audio signals collected by the same microphone.

For example. The processing module may compare the respective start times of the first and second audio signals and compare the respective start times of the third and fourth audio signals. If the start time of the first section of audio signal is earlier than the start time of the second section of audio signal, the acquisition time of the first section of audio signal is earlier, the first section of audio signal is acquired by acquiring the first voice, and then the second section of audio signal can be determined to be acquired by acquiring the second voice. Similarly, if the start time of the third audio signal is earlier than the start time of the fourth audio signal, it is indicated that the acquisition time of the third audio signal is earlier, and the third audio signal is acquired by acquiring the first voice, so that it can be determined that the fourth audio signal is acquired by acquiring the second voice.

Then, the processing module can compare the intensities of two sections of audio signals corresponding to the same voice, and determine the audio signal with higher intensity as the first audio signal or the second audio signal.

For example, the processing module may compare the intensities of the first and third audio signals. Because the first section of audio signal and the third section of audio signal are acquired to the first voice, the audio signal with higher intensity in the first section of audio signal and the third section of audio signal is the audio signal with better acquisition effect to the first voice. Then the processing module may determine the first segment of audio signal as the first audio signal if the intensity of the first segment of audio signal is greater than the intensity of the third segment of audio signal, indicating that the first microphone is better capturing the first speech than the second microphone.

Similarly, the processing module may compare the intensities of the second and fourth audio signals. Because the second section of audio signals and the fourth section of audio signals are acquired for the second voice, the audio signals with higher intensity in the second section of audio signals and the fourth section of audio signals are audio signals with better acquisition effect for the second voice. Then the processing module may determine the fourth segment of audio signal as the second audio signal if the intensity of the fourth segment of audio signal is greater than the intensity of the second segment of audio signal, indicating that the second microphone is better capturing the second speech than the first microphone.

S302: the processing module determines a time parameter of the first audio signal and a time parameter of the second audio signal.

After determining the first audio signal and the second audio signal, the processing module may obtain a time parameter of the first audio signal and a time parameter of the second audio signal. In the embodiment of the present application, the time parameter may include a start time of the audio signal and an end time of the audio signal. Then in determining the time parameter of the first audio signal and the time parameter of the second audio signal, the processing module may determine a start time of the first audio signal, an end time of the first audio signal, a start time of the second audio signal and an end time of the second audio signal by means of a speech end point detection (Voice Activity Detection, VAD).

S302: the processing module determines interaction waiting time of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal.

As can be seen from the foregoing, the interaction latency of the voice interaction device may be the time interval from when the user issues a voice command to when the voice interaction device begins responding to the user's voice command. After determining the end time of the first audio signal and the start time of the second audio signal, the processing module may determine the end time of the first audio signal as the start time of the interaction waiting time and the start time of the second audio signal as the end time of the interaction waiting time. The difference between the start time of the second audio signal and the end of the first audio signal is the interaction latency of the voice interaction device.

Fig. 4 is a schematic structural diagram of a time determining apparatus provided in the embodiment of the present application, where the embodiment may be adapted to a scenario for determining a corresponding time of a voice interaction device, where the time determining apparatus specifically includes: an acquisition unit 410, a first processing unit 420, and a second processing unit 430.

An obtaining unit 410, configured to obtain an audio signal, where the audio signal includes a first audio signal from a sound source and a second audio signal from a voice interaction device, where the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

the first processing unit 420 is configured to determine a time parameter of the first audio signal and a time parameter of the second audio signal.

The second processing unit 430 is configured to determine an interaction waiting time of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal.

The time determining device provided by the embodiment of the disclosure can execute the time determining method provided by any embodiment of the disclosure, and has the corresponding functional units and beneficial effects of executing the time determining method.

It should be noted that, in the above embodiment of the time determining apparatus, each unit and unit included are only divided according to the functional logic, but not limited to the above division, as long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present disclosure.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., a terminal device running a software program) 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 5, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in fig. 3. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure and the time determining method provided by the foregoing embodiment belong to the same inventive concept, and technical details not described in detail in the embodiment of the present disclosure may be referred to the foregoing embodiment, and the embodiment of the present disclosure has the same beneficial effects as the foregoing embodiment.

The present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the time determination method provided by the above embodiments.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an audio signal, wherein the audio signal comprises a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after receiving the first audio signal; determining a time parameter of the first audio signal and a time parameter of the second audio signal; and determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

The computer readable storage medium may write computer program code for performing the operations of the present disclosure in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the unit cells do not constitute a limitation of the cells themselves in some cases, for example, the first processing unit may also be described as an "influence factor determination unit".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example one ], the method comprising:

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example two ], the method further comprising: optionally, the determining the interaction waiting time of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal includes:

and determining the interaction waiting time of the voice interaction equipment according to the ending time of the first audio signal and the starting time of the second audio signal.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example three ], the method further comprising: optionally, the interaction latency of the voice interaction device is a difference between a start time of the second audio signal and an end signal of the first audio signal.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example four ], the method further comprising: optionally, the acquiring the audio signal includes:

an audio signal acquired by a microphone array is acquired, the microphone array comprising one or more microphones.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example five ], the method further comprising: optionally, after acquiring the audio signal acquired by the microphone array, the method further comprises:

determining the first audio signal from the sound source;

the second audio signal from the voice interaction device is determined.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example six ], the method further comprising: optionally, the microphone array includes a first microphone and a second microphone, a distance from the first microphone to the sound source is smaller than a distance from the second microphone to the sound source, and a distance from the first microphone to the voice interaction device is smaller than a distance from the second microphone to the voice interaction device;

the acquiring the audio signal collected by the microphone array comprises the following steps:

Acquiring a first section of audio signal and a second section of audio signal acquired by the first microphone;

acquiring a third section of audio signal and a fourth section of audio signal acquired by the second microphone;

the determining the first audio signal from the audio source includes:

determining the first segment of audio signal as a first audio signal in response to the signal strength of the first segment of audio signal being greater than the strength of the second segment of audio signal; or alternatively, the first and second heat exchangers may be,

determining the first segment of audio signal as a first audio signal in response to the acquisition time of the first segment of audio signal being earlier than the acquisition time of the second segment of audio signal;

the determining the second audio signal from the voice interaction device comprises:

determining the third segment of audio signal as a second audio signal in response to the signal strength of the third segment of audio signal being greater than the strength of the fourth segment of audio signal; or alternatively, the first and second heat exchangers may be,

and determining the fourth segment of audio signals as second audio signals in response to the acquisition time of the fourth segment of audio signals being later than the acquisition time of the third segment of audio signals.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example seven ], the method further comprising:

the determining the first audio signal from the audio source includes:

in response to the acquisition time of the first segment of audio signals being earlier than the acquisition time of the second segment of audio signals, the acquisition time of the third segment of audio signals being earlier than the acquisition time of the fourth segment of audio signals, and the signal strength of the first segment of audio signals being greater than the strength of the third segment of audio signals, determining the first segment of audio signals as first audio signals;

and determining the fourth section of audio signal as a second audio signal in response to the acquisition time of the first section of audio signal being earlier than the acquisition time of the second section of audio signal, the acquisition time of the third section of audio signal being earlier than the acquisition time of the fourth section of audio signal, and the signal strength of the fourth section of audio signal being greater than the strength of the second section of audio signal.

According to one or more embodiments of the present disclosure, there is provided a time determining method [ example eight ], the method further comprising: optionally, the sound source includes one or more of:

a manual mouth and a sound box.

According to one or more embodiments of the present disclosure, there is provided a time determination method [ example nine ], the method further comprising: optionally, the voice interaction device includes one or more of the following:

intelligent audio amplifier, intelligent desk lamp and mobile terminal.

According to one or more embodiments of the present disclosure, there is provided a time determining apparatus, comprising:

the voice interaction device comprises an acquisition unit, a voice interaction unit and a processing unit, wherein the acquisition unit is used for acquiring an audio signal, the audio signal comprises a first audio signal from a sound source and a second audio signal from the voice interaction device, and the second audio signal is an audio signal sent out after the voice interaction device receives the first audio signal;

According to one or more embodiments of the present disclosure, there is provided an electronic device [ example eleven ], the electronic device comprising: one or more processors; a memory for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the time determination method as described in any of the embodiments of the present application.

According to one or more embodiments of the present disclosure, there is provided a time determination system [ example twelve ], the system comprising a sound source, a voice interaction device, and a time determination device, the time determination device comprising an acquisition module and a processing module, the acquisition module being disposed between the sound source and the voice interaction device;

the sound source is used for playing the first voice;

the voice interaction equipment is used for receiving the first voice and playing the second voice according to the preset voice;

the acquisition module is configured to acquire an audio signal, where the audio signal includes a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

the time determining device is further configured to perform a time determining method according to any embodiment of the present application.

According to one or more embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a time determination method as described in any of the embodiments of the present application.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. A method of time determination, the method comprising:

determining interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal;

wherein the acquiring the audio signal comprises:

acquiring an audio signal acquired by a microphone array, wherein the microphone array comprises a plurality of microphones; the microphone array comprises a first microphone and a second microphone, the distance from the first microphone to the sound source is smaller than the distance from the second microphone to the sound source, and the distance from the first microphone to the voice interaction device is larger than the distance from the second microphone to the voice interaction device;

determining the first audio signal from the sound source;

determining the second audio signal from the voice interaction device;

wherein, the obtaining the audio signal that the microphone array gathered includes:

the determining the first audio signal from the sound source includes:

Determining the first segment of audio signal as a first audio signal in response to the signal strength of the first segment of audio signal being greater than the strength of the second segment of audio signal; or, in response to the acquisition time of the first segment of audio signals being earlier than the acquisition time of the second segment of audio signals, determining the first segment of audio signals as first audio signals;

said determining said second audio signal from said voice interaction device comprises:

determining the third segment of audio signal as a second audio signal in response to the signal strength of the third segment of audio signal being greater than the strength of the fourth segment of audio signal; or, in response to the acquisition time of the fourth segment of audio signals being later than the acquisition time of the third segment of audio signals, determining the fourth segment of audio signals as second audio signals.

2. The method of claim 1, wherein said determining the interaction latency of the voice interaction device based on the time parameter of the first audio signal and the time parameter of the second audio signal comprises:

3. The method of claim 2, wherein the interaction latency of the voice interaction device is a difference between a start time of the second audio signal and an end signal of the first audio signal.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the determining the first audio signal from the audio source further comprises:

the determining the second audio signal from the voice interaction device further comprises:

5. The method of claim 1, wherein the sound source comprises one or more of:

a manual mouth and a sound box.

6. The method of claim 1, wherein the voice interaction device comprises one or more of:

intelligent audio amplifier, intelligent desk lamp and mobile terminal.

7. A time determining apparatus, comprising:

a second processing unit, configured to determine an interaction waiting time of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal; the acquisition unit is specifically used for acquiring a first section of audio signal and a second section of audio signal acquired by the first microphone; acquiring a third section of audio signal acquired by a second microphone and a fourth section of audio signal acquired by a microphone array, and determining the first section of audio signal as a first audio signal in response to the signal strength of the first section of audio signal being greater than the strength of the second section of audio signal; or, in response to the acquisition time of the first segment of audio signals being earlier than the acquisition time of the second segment of audio signals, determining the first segment of audio signals as first audio signals; determining the third segment of audio signal as a second audio signal in response to the signal strength of the third segment of audio signal being greater than the strength of the fourth segment of audio signal; or, in response to the acquisition time of the fourth segment of audio signals being later than the acquisition time of the third segment of audio signals, determining the fourth segment of audio signals as second audio signals; the first microphone and the second microphone belong to a microphone array, the distance from the first microphone to the sound source is smaller than the distance from the second microphone to the sound source, and the distance from the first microphone to the voice interaction device is larger than the distance from the second microphone to the voice interaction device.

8. An electronic device, the electronic device comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the time determination method of any of claims 1-6.

9. A time determination system, comprising a sound source, a voice interaction device and a time determination device, wherein the time determination device comprises an acquisition module and a processing module, and the acquisition module is arranged between the sound source and the voice interaction device;

the sound source is used for playing the first voice;

The time determination device further configured to perform the time determination method according to any one of claims 2-6.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the time processing method according to any one of claims 1-6.