CN113674739A

CN113674739A - Time determination method, device, equipment and storage medium

Info

Publication number: CN113674739A
Application number: CN202110819212.2A
Authority: CN
Inventors: 赵倩芸; 姜银峰
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-11-19
Anticipated expiration: 2041-07-20
Also published as: CN113674739B

Abstract

In the time determination method provided by the embodiment of the application, the preset first voice can be played through the sound source, so that the voice interaction device responds according to the preset first voice, and plays the second voice. The audio acquisition module arranged between the sound source and the voice interaction equipment can acquire the audio signals generated in the test process to obtain a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice. In this way, according to the time parameter of the first audio signal and the time parameter of the second audio signal, the waiting time before the voice interaction device sends out a response after receiving the voice instruction, i.e. the interaction waiting time, can be determined. Therefore, by deploying the sound source and the audio acquisition module, the automatic test of the interaction waiting time of the voice interaction equipment can be realized.

Description

Time determination method, device, equipment and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a method, an apparatus, a device, and a storage medium for determining time.

Background

With the development of computer technology, voice interaction devices such as intelligent sound boxes and the like are widely applied. The voice interaction equipment can acquire audio signals sent by a user through an audio receiving module such as a microphone and the like, analyze the audio signals, determine an instruction sent by the user and execute corresponding operation. Optionally, after the audio signal sent by the user is collected, the voice interaction device may play the audio signal through a sound generation module such as a speaker, and feed back the audio signal sent by the user, thereby completing the process of interacting with the user.

For a voice interaction device interacting with a user through an audio signal, a time interval from a time when a user utterance is ended to a time when the voice interaction device starts playing a fed-back audio signal is called an interaction waiting time. In the interaction waiting time, the user needs to wait for the response of the voice interaction device. Obviously, the longer the interaction latency, the worse the user experience. In order to optimize the voice interaction device, the interaction latency of the voice device needs to be determined. However, the conventional method for determining the interaction latency has a problem of being inaccurate.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present application provide a time determination method and apparatus.

In a first aspect, an embodiment of the present application provides a time determination method, where the method includes:

acquiring audio signals, wherein the audio signals comprise a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after the voice interaction equipment receives the first audio signal;

determining a time parameter of the first audio signal and a time parameter of the second audio signal;

and determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

In a second aspect, an embodiment of the present application provides a time processing apparatus, including:

the acquisition module is used for acquiring audio signals, wherein the audio signals comprise a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after the voice interaction equipment receives the first audio signal;

a first processing unit for determining a time parameter of the first audio signal and a time parameter of the second audio signal;

and the second processing unit is used for determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

In a third aspect, an embodiment of the present application provides a time determination system, where the system includes an audio source, a voice interaction device, and a time determination device, where the time determination device includes an acquisition module and a processing module, and the acquisition module is disposed between the audio source and the voice interaction device;

the sound source is used for playing preset voice;

the voice interaction equipment is used for receiving preset voice and sending out response voice according to the preset voice;

the acquiring module is configured to acquire an audio signal, where the audio signal includes a first audio signal corresponding to the preset voice and a second audio signal corresponding to the response voice, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

the processing module is configured to determine a time parameter of the first audio signal and a time parameter of the second audio signal; determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal;

the time determination device is further configured to execute the time determination method according to any one of the embodiments of the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a time determination method as in any of the embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a time determination method according to any one of the embodiments of the present application.

In the time determination method provided by the embodiment of the application, the preset first voice can be played through the sound source, so that the voice interaction device responds according to the preset first voice, and plays the second voice. The audio acquisition module arranged between the sound source and the voice interaction equipment can acquire the audio signals generated in the test process to obtain a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice. Therefore, the first audio signal corresponding to the first voice represents the voice instruction sent by the sound source, and the second audio signal corresponding to the voice represents the response of the voice interaction equipment to the voice instruction. Therefore, according to the time parameter of the first audio signal and the time parameter of the second audio signal, the waiting time before the voice interaction device sends out a response after receiving the voice instruction, namely the interaction waiting time, can be determined. Therefore, by deploying the sound source and the audio acquisition module, the automatic test of the interaction waiting time of the voice interaction equipment can be realized. Compared with the traditional manual testing method, on one hand, the testing accuracy can be improved, and on the other hand, manpower and material resources can be saved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic view of a scenario of a time determination system according to an embodiment of the present application;

FIG. 2 is an interaction diagram of voice and audio signals during transmission according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a time determination method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an apparatus for determining time according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.

It should be understood that the various steps recited in the method embodiments of the present application may be performed in a different order and/or in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present application is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this application are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

The voice interaction device may interact with the user through audio data. Specifically, an instruction may be issued by voice at the user, and the instruction is referred to as a voice instruction. The voice interaction device can collect an audio signal corresponding to the voice instruction, and determine a task which the user requires to be executed by the voice interaction device according to the audio signal, so as to perform corresponding operation. In some possible implementations, to let the user know that the voice interaction device has received the instruction, the voice interaction device may play the feedback audio after receiving the voice instruction.

For example. Assuming that the user speaks the voice of "put a song of singer a for me", the voice interaction device may collect an audio signal through the microphone, and analyze the audio signal to determine that the user wants to control the voice interaction device to play the song of singer a. Then, the voice interaction device plays preset feedback audio, for example, plays "your instruction has been received" through the speaker, and indicates to the user that the user has received the instruction issued by the user. The voice interaction device may then look up the singer A's song from the network and play it.

In this process, the time interval from the end of the user utterance to the first start of the voice interaction device uttering is referred to as the interaction latency. That is, in the example mentioned above, the time interval from the time when the user finishes saying "put me for singer a", to the time when the voice interaction device starts playing "your instruction has been received" is the interaction waiting time. And in the interaction waiting time, the voice interaction equipment does not perform voice interaction with the user, and the user is in a waiting state. Obviously, the longer the interaction latency, the longer the user is in the wait state, and the worse the user experience.

Therefore, the technician can determine the interaction waiting time of the voice interaction device and then optimize the voice interaction device according to the interaction waiting time. Currently, most of the voice interaction devices are manually subjected to interaction waiting time by testers. For example, a tester may send an instruction to the voice interaction device, start timing after the instruction is sent, and stop timing after the voice fed back by the voice interaction device is heard, where the recorded time is the interaction waiting time of the voice interaction device.

Therefore, the traditional method for testing the interactive waiting time has the problems of inaccuracy and more labor consumption due to the fact that the manual timing of technicians is relied on.

In order to solve the problems in the prior art, embodiments of the present application provide a time determination method, an apparatus, a device, and a storage medium, which are described in detail below with reference to the accompanying drawings.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, first, a description is made with reference to a scene example shown in fig. 1.

Referring to fig. 1, the figure is a schematic diagram of a framework of an exemplary application scenario provided in an embodiment of the present application. In fig. 1, an audio source 10, a voice interaction device 20, an audio acquisition apparatus 30 and a processing apparatus 40 are included. Wherein, the sound source 10 can play voice and send out a first audio signal; the voice interaction device 20 may collect an audio signal emitted from the sound source 10, and in response to the audio signal, emit a second audio signal; the audio acquisition module 30 is located between the sound source 10 and the voice interaction device 20, and can acquire a first audio signal sent by the sound source 10 and a second audio signal sent by the voice interaction device 30; the processing module 40 is connected to the audio capture module 30, and is capable of receiving the first audio signal and the second audio signal captured by the audio capture module 30 and determining an interaction waiting time of the voice interaction device according to the first audio signal and the second audio signal. The time determination method provided by the embodiment of the present application may be executed by the processing module 40.

Alternatively, the sound source 10 may be a man-made mouth or a speaker box with a speaker, the tested voice interaction device 20 may be a smart speaker box, a smart desk lamp or other terminal devices with voice interaction functions (e.g. devices such as a mobile phone, a tablet computer, etc.), and the audio acquisition module 30 may include a microphone array, which may include one or more microphones.

It should be noted that, in the embodiment shown in fig. 1, the audio acquisition module 30 and the processing module 40 may be different devices, or may be different entity modules in the same device. For example, the audio acquisition module 30 may be a microphone or microphone array of a test device and the processing module 40 may be a processor of the same test device.

In the time determination method provided by the embodiment of the application, the voice source simulation user can send an instruction to the voice interaction device to simulate an actual voice interaction scene. The interaction waiting time of the voice interaction equipment can be determined by calculating the time interval from the time of playing the voice by the voice source to the time of playing the response voice by the voice interaction equipment.

First, the propagation process of voice and audio signals in the embodiment of the present application will be described by taking the embodiment shown in fig. 1 as an example. Referring to fig. 2, fig. 2 is an interaction diagram of voice and audio signals in a transmission process according to an embodiment of the present application, including:

s201: the sound source plays the first voice.

In the process of testing the interaction waiting time of the voice interaction device, the first voice can be played through the sound source. The first voice may be a preset segment, for example, a wakeup word of the voice interaction device, or a preset instruction of the voice interaction device.

S202: the voice interaction device and the audio acquisition module receive a first voice.

After the first voice is played by the sound source, the first voice can be collected by the audio collection module between the sound source and the voice interaction equipment, and the first voice is converted into an audio signal. Similarly, the voice interaction device may also capture the first voice and convert the first voice into an audio signal for subsequent processing.

S203: and the voice interaction equipment generates and plays the second voice according to the received first voice.

After receiving the first voice, the voice interaction device may generate a second voice for responding to the first voice according to the first voice, and play the second voice through a speaker of the voice interaction device or the like. Specifically, the voice interaction device may recognize the meaning of the first voice, determine an instruction to be completed according to the first voice, and generate and play the second voice according to the instruction. For example, assuming that the first voice is a wake-up word of the voice interaction device, the second voice may be a response word preset to answer the wake-up word.

S204: the audio acquisition module receives the second voice.

After the voice interaction device plays the second voice, the audio acquisition module may receive the second voice and convert the second voice into an audio signal.

S205: and the audio acquisition module sends the acquired audio signals to the processing module.

After converting the first speech and/or the second speech into an audio signal, the audio acquisition module may send the audio signal to the processing module. Optionally. The audio acquisition module may send the audio signal to the processing module after acquiring the audio signal corresponding to the first voice, and send the audio signal to the processing module again after acquiring the audio signal corresponding to the second voice. The audio acquisition module can also be used for storing the audio signals corresponding to the first set of voices after acquiring the audio signals and recording time information of the audio signals. After the audio signal corresponding to the second voice is collected, the audio collection module may uniformly send the audio signals corresponding to the first voice and the second voice, respectively, and the related time information to the processing module.

In some possible implementations, the audio capture module may include one or more units with capture audio functionality. For example, the audio acquisition module may include a microphone array, the microphone array may include one or more microphones, each microphone may independently receive and convert speech into audio signals, and the audio signals sent by the audio acquisition module to the processing module may include multiple segments of audio signals. Optionally, when the audio capture module comprises a plurality of units having capture audio functionality, different units having capture audio functionality may be deployed at different locations.

After receiving the audio signal acquired by the audio acquisition module, the processing module may execute the time determination method provided in the embodiment of the present application, and the specific execution process may be described as follows.

Fig. 3 is a schematic flow chart of a time determination method according to an embodiment of the present application, which is applicable to a scenario of testing an interaction waiting time of a voice interaction device. The method may be performed by any device having data processing capabilities and the device having or being in communication with audio capture capabilities.

The method is described as an example of the method being performed by a processing module of the test equipment. As shown in fig. 3, the method specifically includes the following steps:

s301: the processing module acquires an audio signal.

To determine the interaction latency of the voice interaction device, the processing module may acquire an audio signal. According to the foregoing description, the audio signal may be an audio signal collected by the audio collecting module during the interaction between the sound source and the voice interaction device, and includes a first audio signal and a second audio signal. The first audio signal is an audio signal from a sound source, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal.

In this embodiment, the audio signal sent by the audio acquisition module to the further processing module may include multiple segments of audio signals, where the start time of each segment of audio signal is different. Then the processing module may determine the first audio signal and the second audio signal from the plurality of segments of audio signals captured by the audio capture module before determining the interaction latency. As can be seen from the foregoing description, the audio capturing module in the embodiment of the present application may include one unit having an audio capturing function, or may include a plurality of units having an audio capturing function. The method by which the processing module determines the first audio signal and the second audio signal in these two cases, respectively, is described below.

If the audio capture module comprises only one unit with audio capture functionality, e.g. the audio capture module comprises only one microphone, the processing module may determine the first audio signal and the second audio signal from the start time of the audio signal.

Specifically, in the process of voice interaction between the sound source and the voice interaction device, the audio acquisition module only acquires two sections of voices. The voice with the earlier starting time is the first voice played by the sound source, and the voice with the later starting time is the response of the voice interaction equipment to the first voice, namely the second voice. Then, when the audio signal sent to the processing module by the audio acquisition module includes two segments of audio signals, the processing module may determine an audio signal with an earlier start time of the two segments of audio signals as the first audio signal, and determine an audio signal with a later start time as the second audio signal.

If the audio capture module includes a plurality of units having audio capture capabilities, for example, the audio capture module includes a plurality of microphones, each unit can independently capture audio signals. In the process of voice interaction between the sound source and the voice interaction equipment, two sections of audio signals can be collected by each unit with the audio collecting function. The audio acquisition module may send more than two segments of audio signals to the processing module, and then the processing module may determine the first audio signal and the second audio signal by the following two methods. The following are described separately.

The audio acquisition module is assumed to comprise a microphone array comprising a first microphone and a second microphone for example. In the process of voice interaction between the sound source and the voice interaction equipment, the sound source plays the first voice, and the voice interaction equipment plays the second voice. In the process, the first microphone can collect the first voice and the second voice to obtain a first section of audio signal and a second section of audio signal. Similarly, the second microphone may collect the first voice and the second voice to obtain a third segment of audio signal and a fourth segment of audio signal.

In a first possible implementation, the processing module may determine the first audio signal and the second audio signal according to the positions of the first microphone and the second microphone. Specifically, assume that the distance from the first microphone to the sound source is smaller than the distance from the second microphone to the sound source, and the distance from the second microphone to the voice interaction device is larger than the distance from the first microphone to the voice interaction device. The audio device may determine the first audio signal from the first segment of audio signal and the second audio signal from the third segment of audio signal and the fourth segment of audio signal.

Because the first microphone is closer to the sound source, the first microphone has a better acquisition effect on the first voice, and then the processing module can determine the first audio signal from the first section of audio signal and the second section of audio signal acquired by the first microphone.

Optionally, the processing module may compare the intensity of the first audio signal and the intensity of the second audio signal, and if the intensity of the first audio signal is greater than the intensity of the second audio signal, it indicates that the intensity of the voice corresponding to the first audio signal is greater than the intensity of the voice corresponding to the second audio signal at the position of the first microphone. In this way, the processing module may determine that the device emitting the voice corresponding to the first segment of audio signal is a sound source, and the first segment of audio signal is an audio signal corresponding to the first voice, so as to determine the first segment of audio signal as the first audio signal.

Optionally, the processing module may compare the start times of the first section of audio signal and the second section of audio signal, and if the start time of the first section of audio signal is earlier than the intensity of the second section of audio signal, it indicates that the first microphone collects the voice corresponding to the first section of audio signal first, and then collects the voice corresponding to the second section of audio signal, where the first section of audio signal is the audio signal corresponding to the first voice. In this way, the processing module may determine the first segment of audio signals as first audio signals.

Similarly, since the second microphone is closer to the voice interaction device, the second microphone has a better acquisition effect on the second voice, and then the processing module may determine the second audio signal from the third segment of audio signal and the fourth segment of audio signal acquired by the second microphone.

Optionally, the processing module may compare the intensities of the third and fourth segments of audio signals, and if the intensity of the third segment of audio signal is greater than the intensity of the fourth segment of audio signal, it indicates that the intensity of the voice corresponding to the third segment of audio signal is greater than the intensity of the voice corresponding to the fourth segment of audio signal at the position of the second microphone. In this way, the processing module may determine that the device emitting the voice corresponding to the third section of audio signal is the voice interaction device, and the third section of audio signal is the audio signal corresponding to the second voice, so as to determine the third section of audio signal as the second audio signal.

Optionally, the processing module may compare start times of the third and fourth segments of audio signals, and if the start time of the third segment of audio signal is earlier than the intensity of the fourth segment of audio signal, it indicates that the second microphone first acquires a voice corresponding to the third segment of audio signal, and then acquires a voice corresponding to the fourth segment of audio signal, where the fourth segment of audio signal is an audio signal corresponding to the second voice. In this way, the processing module may determine the fourth segment of audio signal as the second audio signal.

In a second possible implementation, the processing module may determine the first audio signal and the second audio signal according to the respective start times and intensities of the first segment of audio signal, the second segment of audio signal, the third segment of audio signal, and the fourth segment of audio signal.

Specifically, the processing module may compare start times of two audio signals collected by the same microphone, and determine an audio signal corresponding to the first voice and an audio signal corresponding to the second voice from the two audio signals collected by the same microphone.

For example. The processing module may compare respective start times of the first and second segments of audio signals and compare respective start times of the third and fourth segments of audio signals. If the start time of the first section of audio signal is earlier than the start time of the second section of audio signal, it is indicated that the acquisition time of the first section of audio signal is earlier, the first section of audio signal is acquired by the first voice, and then it can be determined that the second section of audio signal is acquired by the second voice. Similarly, if the start time of the third section of audio signal is earlier than the start time of the fourth section of audio signal, it is indicated that the acquisition time of the third section of audio signal is earlier, the third section of audio signal is acquired from the first voice, and it can be determined that the fourth section of audio signal is acquired from the second voice.

Then, the processing module may compare the intensities of two audio signals corresponding to the same voice, and determine the audio signal with higher intensity as the first audio signal or the second audio signal.

For example, the processing module may compare the strength of the first and third segments of audio signals. Because the first section of audio signal and the third section of audio signal are obtained by collecting the first voice, the audio signal with higher intensity in the first section of audio signal and the third section of audio signal is the audio signal with better effect on collecting the first voice. Then if the intensity of the first segment of audio signal is greater than the intensity of the third segment of audio signal, which indicates that the first microphone is better for capturing the first speech than the second microphone, the processing module may determine the first segment of audio signal as the first audio signal.

Similarly, the processing module may compare the strength of the second segment of audio signal and the fourth segment of audio signal. Because the second section of audio signal and the fourth section of audio signal are obtained by collecting the second voice, the audio signal with higher intensity in the second section of audio signal and the fourth section of audio signal is the audio signal with better effect on collecting the second voice. Then if the intensity of the fourth segment of audio signal is greater than the intensity of the second segment of audio signal, which indicates that the second microphone is better for capturing the second speech than the first microphone, the processing module may determine the fourth segment of audio signal as the second audio signal.

S302: the processing module determines a time parameter of the first audio signal and a time parameter of the second audio signal.

After determining the first audio signal and the second audio signal, the processing module may obtain a time parameter of the first audio signal and a time parameter of the second audio signal. In the embodiment of the present application, the time parameter may include a start time of the audio signal and an end time of the audio signal. Then, in determining the time parameter of the first audio signal and the time parameter of the second audio signal, the processing module may determine a start time of the first audio signal, an end time of the first audio signal, a start time of the second audio signal, and an end time of the second audio signal by Voice Activity Detection (VAD) method.

S302: the processing module determines the interaction waiting time of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal.

As can be appreciated from the foregoing description, the interaction latency of the voice interaction device may be the time interval from after the user issues a voice command to when the voice interaction device begins to respond to the user's voice command. After determining the end time of the first audio signal and the start time of the second audio signal, the processing module may determine the end time of the first audio signal as a start time of the interaction latency and the start time of the second audio signal as an end time of the interaction latency. The difference between the start time of the second audio signal and the end of the first audio signal is the interaction latency of the voice interaction device.

Fig. 4 is a schematic structural diagram of a time determination apparatus provided in an embodiment of the present application, where the embodiment may be applicable to determining a scene of a corresponding time of a voice interaction device, and the time determination apparatus specifically includes: an acquisition unit 410, a first processing unit 420 and a second processing unit 430.

An obtaining unit 410, configured to obtain an audio signal, where the audio signal includes a first audio signal from a sound source and a second audio signal from a voice interaction device, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

a first processing unit 420 for determining a time parameter of the first audio signal and a time parameter of the second audio signal.

A second processing unit 430, configured to determine an interaction latency of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal.

The time determination device provided by the embodiment of the disclosure can execute the time determination method provided by any embodiment of the disclosure, and has the corresponding functional units and beneficial effects of the execution time determination method.

It should be noted that, in the embodiment of the time determination apparatus, the included units and units are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.

Referring now to fig. 5, a schematic diagram of an electronic device (e.g., a terminal device running a software program) 500 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM502, and the RAM503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in fig. 3. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure and the time determination method provided by the embodiment of the present disclosure belong to the same inventive concept, and technical details that are not described in detail in the embodiment of the present disclosure may be referred to the embodiment of the present disclosure, and the embodiment of the present disclosure have the same beneficial effects.

The disclosed embodiments provide a computer storage medium having a computer program stored thereon, which when executed by a processor implements the time determination method provided by the above embodiments.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring audio signals, wherein the audio signals comprise a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after the voice interaction equipment receives the first audio signal; determining a time parameter of the first audio signal and a time parameter of the second audio signal; and determining the interaction waiting time of the voice interaction equipment according to the time parameter of the first audio signal and the time parameter of the second audio signal.

Computer readable storage media may be written with computer program code for performing the operations of the present disclosure in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first processing unit may also be described as an "influencing factor determination unit".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a time determination method, the method comprising:

According to one or more embodiments of the present disclosure, [ example two ] there is provided a time determination method, further comprising: optionally, the determining the interaction latency of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal includes:

and determining the interaction waiting time of the voice interaction equipment according to the end time of the first audio signal and the start time of the second audio signal.

According to one or more embodiments of the present disclosure, [ example three ] there is provided a time determination method, further comprising: optionally, the interaction latency of the voice interaction device is a difference between a start time of the second audio signal and an end signal of the first audio signal.

According to one or more embodiments of the present disclosure, [ example four ] there is provided a time determination method, further comprising: optionally, the acquiring the audio signal comprises:

audio signals acquired by a microphone array are acquired, the microphone array including one or more microphones.

According to one or more embodiments of the present disclosure, [ example five ] there is provided a time determination method, the method further comprising: optionally, after acquiring the audio signal acquired by the microphone array, the method further comprises:

determining the first audio signal from the audio source;

determining the second audio signal from the voice interaction device.

According to one or more embodiments of the present disclosure, [ example six ] there is provided a time determination method, further comprising: optionally, the microphone array comprises a first microphone and a second microphone, the first microphone is less distant from the audio source than the second microphone, the first microphone is less distant from the voice interaction device than the second microphone;

the acquiring of the audio signal collected by the microphone array comprises:

acquiring a first section of audio signal and a second section of audio signal acquired by the first microphone;

acquiring a third section of audio signal and a fourth section of audio signal acquired by the second microphone;

the determining the first audio signal from the audio source comprises:

determining the first segment audio signal as a first audio signal in response to the signal strength of the first segment audio signal being greater than the strength of the second segment audio signal; or the like, or, alternatively,

determining the first section audio signal as a first audio signal in response to the acquisition time of the first section audio signal being earlier than the acquisition time of the second section audio signal;

the determining the second audio signal from the voice interaction device comprises:

determining the third section of audio signal as a second audio signal in response to the signal strength of the third section of audio signal being greater than the strength of the fourth section of audio signal; or the like, or, alternatively,

and determining the fourth section of audio signal as a second audio signal in response to the acquisition time of the fourth section of audio signal being later than the acquisition time of the third section of audio signal.

According to one or more embodiments of the present disclosure, [ example seven ] there is provided a time determination method, the method further comprising: optionally, the microphone array comprises a third microphone and a fourth microphone;

the acquiring of the audio signal collected by the microphone array comprises:

the determining the first audio signal from the audio source comprises:

in response to that the acquisition time of the first section of audio signals is earlier than that of the second section of audio signals, the acquisition time of the third section of audio signals is earlier than that of the fourth section of audio signals, and the signal intensity of the first section of audio signals is greater than that of the third section of audio signals, determining the first section of audio signals as first audio signals;

and determining the fourth section of audio signal as the second audio signal in response to the acquisition time of the first section of audio signal being earlier than the acquisition time of the second section of audio signal, the acquisition time of the third section of audio signal being earlier than the acquisition time of the fourth section of audio signal, and the signal intensity of the fourth section of audio signal being greater than the intensity of the second section of audio signal.

According to one or more embodiments of the present disclosure, [ example eight ] there is provided a time determination method, further comprising: optionally, the audio source comprises one or more of:

artificial mouth and audio amplifier.

According to one or more embodiments of the present disclosure, [ example nine ] there is provided a time determination method, the method further comprising: optionally, wherein the voice interaction device comprises one or more of:

intelligent audio amplifier, intelligent desk lamp and mobile terminal.

According to one or more embodiments of the present disclosure, [ example ten ] there is provided a time determination apparatus comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring audio signals, the audio signals comprise a first audio signal from a sound source and a second audio signal from voice interaction equipment, and the second audio signal is an audio signal sent by the voice interaction equipment after the voice interaction equipment receives the first audio signal;

According to one or more embodiments of the present disclosure, [ example eleven ] there is provided an electronic device comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a time determination method as in any embodiment of the present application.

According to one or more embodiments of the present disclosure, [ example twelve ] there is provided a time determination system comprising an audio source, a voice interaction device, and a time determination device, the time determination device comprising an acquisition module and a processing module, the acquisition module being disposed between the audio source and the voice interaction device;

the sound source is used for playing a first voice;

the voice interaction equipment is used for receiving the first voice and playing the second voice according to the preset voice;

the acquiring module is configured to acquire an audio signal, where the audio signal includes a first audio signal corresponding to the first voice and a second audio signal corresponding to the second voice, and the second audio signal is an audio signal sent by the voice interaction device after receiving the first audio signal;

the time determination device is further configured to execute the time determination method according to any embodiment of the present application.

According to one or more embodiments of the present disclosure, [ example thirteen ] there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a time determination method as described in any one of the embodiments of the present application.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for time determination, the method comprising:

2. The method of claim 1, wherein determining the interaction latency of the voice interaction device according to the time parameter of the first audio signal and the time parameter of the second audio signal comprises:

3. The method of claim 2, wherein the interaction latency of the voice interaction device is a difference between a start time of the second audio signal and an end signal of the first audio signal.

4. The method of claim 1, wherein the obtaining the audio signal comprises:

5. The method of claim 4, wherein after acquiring the audio signal acquired by the microphone array, the method further comprises:

determining the first audio signal from the audio source;

determining the second audio signal from the voice interaction device.

6. The method of claim 5, wherein the microphone array comprises a first microphone and a second microphone, wherein the first microphone is located at a distance from the audio source that is less than the distance from the audio source to the second microphone, and wherein the distance from the first microphone to the voice interaction device is less than the distance from the voice interaction device to the second microphone;

the acquiring of the audio signal collected by the microphone array comprises:

the determining the first audio signal from the audio source comprises:

7. The method of claim 6, wherein the array of microphones comprises a third microphone and a fourth microphone;

the acquiring of the audio signal collected by the microphone array comprises:

the determining the first audio signal from the audio source comprises:

8. The method of claim 1, wherein the audio source comprises one or more of:

artificial mouth and audio amplifier.

9. The method of claim 1, wherein the voice interaction device comprises one or more of:

intelligent audio amplifier, intelligent desk lamp and mobile terminal.

10. A time determination device, comprising:

11. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the time determination method of any one of claims 1-9.

12. A time determination system, characterized in that the system comprises an audio source, a voice interaction device and a time determination device, the time determination device comprises an acquisition module and a processing module, the acquisition module is disposed between the audio source and the voice interaction device;

the sound source is used for playing a first voice;

the time determination device is further configured to perform the time determination method according to any one of claims 2-9.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the time processing method according to any one of claims 1 to 9.