CN112711484A

CN112711484A - Recording method and device

Info

Publication number: CN112711484A
Application number: CN201911024439.7A
Authority: CN
Inventors: 张向党; 安爱辉; 薛向东
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-04-27

Abstract

The application discloses a recording method and device, and relates to the technical field of voice processing. The specific implementation scheme is as follows: responding to the request of each recording application, and respectively establishing an audio channel and a first buffer area for each recording application; sending the received audio data to a first buffer corresponding to each sound recording application; and sending the audio data in each first buffer area to a corresponding sound recording application through a corresponding audio channel. According to the embodiment of the application, the audio channel is established for each recording application, so that each recording application can acquire audio data independently and perform identification and analysis, and the audio channels of the recording applications are not influenced with each other. In addition, the first buffer area is independently established for each recording application, so that each recording application can stably read data from the corresponding first buffer area.

Description

Recording method and device

Technical Field

The application relates to the technical field of computers, in particular to the technical field of voice processing.

Background

In a native operating system (e.g., an android operating system), the recording resource can only support one recording application process call at a time. If other recording application processes need to record, the recording in the previous recording resource needs to be released and can be used, and the recording applications cannot be in parallel. However, with the development of intelligent interactive devices, more and more applications have recording requirements. The framework of single recordings has not been supported.

Disclosure of Invention

The embodiment of the application provides a recording method and a recording device, which are used for solving one or more technical problems in the prior art.

In a first aspect, an embodiment of the present application provides a recording method, including:

responding to the request of each recording application, and respectively establishing an audio channel and a first buffer area for each recording application;

sending the received audio data to a first buffer corresponding to each sound recording application;

and sending the audio data in each first buffer area to a corresponding sound recording application through a corresponding audio channel.

According to the embodiment of the application, the audio channel is established for each recording application, so that each recording application can acquire audio data independently and perform identification and analysis, and the audio channels of the recording applications are not influenced with each other. In addition, the first buffer area is independently established for each recording application, so that each recording application can stably read data from the corresponding first buffer area.

In one embodiment, the recording method further comprises:

and under the condition that the process of the first recording application is stopped and/or ended, deleting the audio channel and the first buffer area corresponding to the first recording application.

According to the embodiment of the application, the corresponding audio channel and the first buffer area are deleted under the condition that the process of the recording application is stopped and/or ended, so that system resources can be effectively saved. Meanwhile, the waste of system processing capacity caused by the fact that the algorithm of the recording application is continuously operated under the condition of no data is avoided. The system processing and computing power is improved.

In one embodiment, after sending the received audio data to the first buffer corresponding to each recording application, the recording method further includes:

and resampling the audio data through the first buffer area corresponding to each recording application according to the preset sampling frequency of each recording application.

According to the method and the device, the audio data are resampled according to the sampling frequency of each recording application, so that the audio data acquired from the corresponding first buffer area of each recording application can meet the recording requirement of the recording application, and the processing quality and accuracy of the audio data by the recording application are guaranteed.

In one embodiment, the recording method further comprises:

and setting a voice processing algorithm for each audio channel corresponding to the audio record application, wherein the voice processing algorithm is used for processing the audio data.

According to the embodiment of the application, the voice processing algorithm is respectively registered for each recording application, so that each recording application can respectively and independently process the audio data according to different identification requirements.

In one embodiment, sending the received audio data to a first buffer corresponding to each recording application includes:

sending the audio data acquired by the hardware in real time to a second buffer area;

and respectively sending the audio data to the first buffer area corresponding to each recording application through the second buffer area.

According to the embodiment of the application, the collected audio data can be managed through the second buffer area and can be uniformly sent to the first buffer areas.

In one embodiment, the recording application includes a voice wakeup application, a voice call application, and a third party interactive application.

According to the embodiment of the application, the independent audio channels can be respectively established for the original recording application (voice awakening application and voice call application), and the independent audio channels can be established for the third-party interactive application, so that the overall universality and the expandability of an audio framework are improved.

In a second aspect, an embodiment of the present application provides a sound recording apparatus, including:

the establishing module is used for responding to the request of each recording application and respectively establishing an audio channel and a first buffer area for each recording application;

the first sending module is used for sending the received audio data to a first buffer area corresponding to each sound recording application;

and the second sending module is used for sending the audio data in each first buffer area to the corresponding recording application through the corresponding audio channel.

In one embodiment, the sound recording apparatus further comprises:

and the deleting module is used for deleting the audio channel and the first buffer area corresponding to the first sound recording application under the condition that the process of the first sound recording application is stopped and/or ended.

In one embodiment, the sound recording apparatus further comprises:

and the resampling module is used for resampling the audio data through the first buffer area corresponding to each recording application according to the preset sampling frequency of each recording application.

In one embodiment, the sound recording apparatus further comprises:

and the algorithm setting module is used for setting a voice processing algorithm for each audio channel corresponding to the audio recording application, and the voice processing algorithm is used for processing the audio data.

In one embodiment, the first transmitting module includes:

the first sending submodule is used for sending the audio data acquired by the hardware in real time to the second buffer area;

and the second sending submodule sends the audio data to the first buffer area corresponding to each sound recording application through the second buffer area.

In a third aspect, an embodiment of the present application provides an electronic device, where functions of the electronic device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the electronic device includes a processor and a memory, the memory is used for storing a program for supporting the electronic device to execute the image generation method, and the processor is configured to execute the program stored in the memory. The electronic device may also include a communication interface for communicating with other devices or a communication network.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for an electronic device and computer software instructions for the electronic device, including a program for executing the above-described image generation.

One embodiment in the above application has the following advantages or benefits: according to the embodiment of the application, the audio channel is established for each recording application, so that each recording application can acquire audio data independently and perform identification and analysis, and the audio channels of the recording applications are not influenced with each other. In addition, the first buffer area is independently established for each recording application, so that each recording application can stably read data from the corresponding first buffer area. According to the embodiment of the application, the technical means of independently establishing the audio channel and the first buffer area for each recording application is adopted, so that the received audio data can be respectively sent to each recording application through each first buffer area and each audio channel. Therefore, the technical problem that the recording resource can only support the calling of one recording application process is solved, and the technical effects that each recording application can independently acquire audio data and perform recognition and analysis, and audio channels of the recording applications have no mutual influence are achieved.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a first embodiment of the present application;

FIG. 3 is a schematic diagram according to a first embodiment of the present application;

FIG. 4 is a schematic illustration according to a first embodiment of the present application;

FIG. 5 is a schematic illustration according to a first embodiment of the present application;

FIG. 6 is an architectural block diagram according to a first embodiment of the present application;

FIG. 7 is a schematic illustration according to a second embodiment of the present application;

FIG. 8 is a schematic diagram according to a second embodiment of the present application;

FIG. 9 is a schematic illustration according to a second embodiment of the present application;

FIG. 10 is a schematic illustration according to a second embodiment of the present application;

FIG. 11 is a schematic illustration according to a second embodiment of the present application;

fig. 12 is a block diagram of an electronic device for implementing the recording method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to a first embodiment of the present application, there is provided a recording method, as shown in fig. 1, the method including:

s100: and responding to the request of each recording application, and respectively establishing an audio channel and a first buffer area for each recording application.

The timing of each recording application sending the request may be different depending on the characteristics of the different recording applications. For example, if the recording application is a voice wakeup identification application (voice wakeup identification module) in the intelligent voice interaction device, the time when the recording application sends the request is the device start time of the intelligent voice interaction device. If the recording application is a voice call application or a third-party voice interaction application, the time when the recording application sends the request is a process starting instruction of the recording application (for example, a click operation of a user on an application program, a voice instruction operation, and the like).

The sound recording Application may include an app (Application program) and may also include Application modules. For example, any third-party app with recording requirements, a native voice wakeup recognition module, a voice call module, and the like, which are provided in the intelligent voice interaction device.

The audio path may be used to transfer data from the bottom layer of the audio framework to the recording application of the top layer. A first buffer (buffer) may be used to temporarily store data in the recording application of the lower layer of the audio frame to be sent to the upper layer.

In one example, the number of sound recording application instances may be recorded and the data in the first buffers may be maintained by maintaining a global variable.

S200: and sending the received audio data to a first buffer corresponding to each sound recording application.

The size of the storage space of each first buffer area can be selected and adjusted according to the needs. The audio data sampling rate of each first buffer may be determined according to the corresponding recording application to match the requirements of different recording applications for the sampling rate of the audio data.

The audio data may be sent in real-time to the corresponding first buffer for each recording application. As long as the hardware collects the audio data and the first buffer areas exist currently, the collected audio data are sent to the first buffer areas after being processed by the bottom layer of the audio frame. The bottom layers of the audio framework may include hardware (hardware), kernel (driver) and Tinyalsa (Tiny Advanced Linux Sound Architecture) layers.

S300: and sending the audio data in each first buffer area to a corresponding sound recording application through a corresponding audio channel.

The audio data may be processed during transmission to the recording application via the audio channel, for example, the audio data may be preprocessed by a speech processing algorithm to remove noise data from the audio data. The recording application can be sent to the cloud server for further voice recognition after the processed audio data is obtained.

In one embodiment, as shown in fig. 2, the recording method further includes:

s400: and under the condition that the process of the first recording application is stopped and/or ended, deleting the audio channel and the first buffer area corresponding to the first recording application. The first recording application may be understood as any one of the recording applications mentioned in the above steps S100-S300.

In one example, if the first recording application is a voice passing application, the audio path and the first buffer corresponding to the application are deleted after the voice passing is finished. And if the first recording application is the third-party app, deleting the audio channel and the first buffer zone corresponding to the first recording application when the app exits.

In one example, in the event that the process of the first recording application stops and/or ends, it is determined whether the first recording application is a special recording application. If so, the audio channel and the first buffer area corresponding to the first recording application are not deleted. The special recording application may include a recording application that needs to remain in a startup state at all times when the device is started. For example, the special recording application may include a voice wakeup recognition application, and since the intelligent voice interaction device needs to detect whether the user issues a voice command in real time, it is always necessary to send audio data to the audio channel and the first buffer corresponding to the voice wakeup recognition application.

It should be noted that the process of the first recording application is stopped, and it can be understood that the recording application is currently in a state where audio data does not need to be processed.

In one embodiment, as shown in fig. 3, after sending the received audio data to the first buffer corresponding to each recording application, the recording method further includes:

s500: and resampling the audio data through the first buffer area corresponding to each recording application according to the preset sampling frequency of each recording application.

In one example, each recording application may employ a different sampling rate than the hardware at which the initial audio data was acquired, depending on the requirements of the recording. The audio data may be resampled through the set respective first buffers to change the sampling rate of the audio data to a sampling rate required by the recording application. For example, if the sampling rate of hardware acquiring the initial audio data is 48k-32bit, and the sampling rates required by the recording applications are 16k-16bit, 48k-16bit, and 16k-32bit, respectively, the sampling rate of the audio data is adjusted by resampling with the respective corresponding first buffer areas. The hardware may include a sound card, a microphone, etc.

In voiceprint recognition, speech often needs to be resampled in order to meet the requirements for different sampling rates. Resampling, i.e. transforming the original sampling frequency to a new sampling frequency to adapt to the requirements of different sampling rates. There are three conventional methods for achieving resampling: firstly, if the original analog signal x (t) can be reproduced or recorded, resampling can be performed; secondly, converting x (n) into an analog signal x (t) through digital-to-analog conversion D/A, and resampling the analog-to-digital conversion A/D of the x (t); and thirdly, performing sampling rate conversion on the sampled digital signal x (n) in a digital domain by using an L/M times sampling rate conversion algorithm to obtain a new sampling rate.

Audio resampling is divided into upsampling and downsampling, i.e. interpolation and decimation. When rational-order resampling is realized, upsampling and downsampling are combined. In digital signal processing, the time-frequency dual characteristics of the time domain signal and the frequency domain signal are known as follows: extracting a time domain and extending a corresponding frequency domain; the interpolation of the time domain corresponds to the compression of the frequency domain. Extension of the frequency domain may cause spectral aliasing if the frequency content of the signal is not limited; compression in the frequency domain causes the spectral image to correspond. Therefore, before down-sampling, filter filtering is performed to prevent aliasing, i.e., anti-aliasing (anti-aliasing) filtering; after upsampling, the upsampled by an anti-image filter (anti-image filter).

In one embodiment, as shown in fig. 4, the recording method further includes:

s600: and setting a voice processing algorithm for each audio channel corresponding to the audio record application, wherein the voice processing algorithm is used for processing the audio data.

In one example, if the recording application is a Voice wakeup recognition application, the set Voice processing algorithm may include a VAD (Voice Activity Detection) algorithm. If the recording application is a Voice call application, the set Voice processing algorithm may include a VOIP (Voice over Internet Protocol, Voice over IP) algorithm.

It should be noted that the voice processing algorithm set for each audio channel may be configured according to the recording application, or may also adopt a voice processing algorithm specified by the recording application.

In one embodiment, as shown in fig. 5, sending the received audio data to the first buffer corresponding to each recording application includes:

s210: and sending the audio data acquired by the hardware in real time to a second buffer area.

S220: and respectively sending the audio data to the first buffer area corresponding to each recording application through the second buffer area.

In one example, after receiving audio data collected by hardware in the intelligent voice interaction device in real time, the bottom layer of the audio framework sends the audio data to the second buffer through a data acquisition thread established between the bottom layer and the second buffer. The second buffer area sends the audio data to each of the first buffer areas which are established currently after receiving the audio data.

In one embodiment, the recording application includes a voice wakeup application, a voice call application, and a third party interactive application. The third party interactive application may include any application that requires a recording requirement. Such as a voice-over app, a third party video call app, a friend-making app, etc.

In one example, the sound recording method of the above embodiments can be applied to an audio architecture as shown in fig. 6.

The audio architecture comprises the following components from the bottom layer to the upper layer in sequence: a hardware layer, a kernel layer, an advanced Linux sound Framework layer (Tinyalsa), a Hardware Abstraction Layer (HAL), a Framework layer (Framework), and an application layer.

The kernel layer includes drivers for realizing communication with hardware. The advanced Linux sound framework layer includes an API interface for the application layer to call. The hardware abstraction layer comprises a data acquisition and resampling module and an algorithm module applied to each recording. The application layer includes recording applications. The data acquisition and resampling module comprises a second buffer area and a plurality of first buffer areas.

And when responding to requests respectively sent by the voice awakening identification module, the voice call module and the third-party interaction module of the application layer, respectively establishing an audio channel and a first buffer area for the three modules. When audio is collected by hardware collected by the intelligent voice interaction equipment, the audio data are sent to the second buffer area after being processed by the hardware layer, the kernel layer and the advanced Linux sound frame layer in sequence. The second buffer sends audio data to the three first buffers. The three first buffer areas respectively resample the audio data according to the sampling rates required by the voice awakening identification module, the voice call module and the third-party interaction module, and then send the resampled audio data to the identification awakening algorithm module (VAD) of the voice awakening identification module, the VOIP module of the voice call module and the voice processing algorithm module of the third-party interaction module through the first buffer areas for preprocessing. And sending the preprocessed voice data to a voice awakening recognition module, a voice call module and a third-party interaction module through a recording thread (recordthead) of the framework layer.

In one example, the recording method of the embodiments of the present application can be applied through the following four modules.

The data acquisition and resampling module has the main functions of acquiring original audio data from the bottom layer, meeting the requirements of different upper-layer recording by taking the highest sampling rate supported by the equipment as a standard when a handle of the bottom-layer equipment is opened, and being capable of resampling to ensure the requirements of upper-layer recording when the recording of upper-layer application adopts a sampling rate different from that provided by the module. While the module maintains an overall variable to record the number of recording instances in the upper layer and maintains the audio data buffers for each instance.

The voice recognition and wake-up module is a core module for the user to carry out conversation interaction. The module has the main functions that besides waking up the system according to the wake-up model, the module also needs to run own algorithm processing to judge VAD and send audio data to the cloud for analysis; when the upper layer voice recognition and awakened recording example needs to be stopped or closed, the signal base algorithm calling and the data transmission channel which can be stopped or closed independently do not affect other recording channels.

The voice call module is another core module for voice interaction of the user. The module has the main functions of carrying out echo cancellation, noise suppression and automatic gain control on an original signal and then carrying out coding transmission. The module can be stopped or closed independently, and when the module is stopped or closed, the algorithm and the data transmission channel of the VOIP in the module can be stopped independently.

The third-party interaction module is reserved for a third party for use, can be conveniently expanded to support a plurality of recording examples, has good expansibility and greatly enriches the use scene of a user.

In an application scenario, after the smart sound box is turned on, an audio channel of a voice wake-up application is established. At this time, if the user speaks the wake word, the wake word is sent to the voice wake application for analysis through the audio channel of the voice wake application. When the user starts the video telephone, an audio channel of the video call application is established, and the collected user voice is transmitted into the audio channel of the video call application and the audio channel of the voice awakening application. The voice awakening application and the video call application corresponding to the two audio channels can analyze the audio data, and if the voice awakening application identifies awakening words from voice of the user, corresponding instruction operation can be carried out according to the awakening words and intention of the user while the user is in a call. And the user can keep the call state unaffected at this time.

According to a second embodiment of the present application, there is provided a sound recording apparatus, as shown in fig. 7, the sound recording apparatus 100 including:

the establishing module 10 is configured to respectively establish an audio channel and a first buffer for each recording application in response to a request of each recording application.

And a first sending module 20, configured to send the received audio data to a first buffer corresponding to each sound recording application.

And the second sending module 30 is configured to send the audio data in each first buffer area to a corresponding recording application through a corresponding audio channel.

In one embodiment, as shown in fig. 8, the sound recording apparatus further includes:

and the deleting module 40 is configured to delete the audio path and the first buffer corresponding to the first recording application when the process of the first recording application is stopped and/or ended.

In one embodiment, as shown in fig. 9, the sound recording apparatus 100 further includes:

and the resampling module 50 is configured to resample the audio data through the first buffer corresponding to each recording application according to the preset sampling frequency of each recording application.

In one embodiment, as shown in fig. 10, the sound recording apparatus 100 further includes:

and an algorithm setting module 60, configured to set a voice processing algorithm for each audio channel corresponding to the recording application, where the voice processing algorithm is used to process the audio data.

In one embodiment, as shown in fig. 11, the first transmitting module 20 includes:

and the first sending submodule 21 is configured to send the audio data collected by the hardware in real time to the second buffer.

And the second sending submodule 22 is used for sending the audio data to the first buffer area corresponding to each sound recording application through the second buffer area respectively.

The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 12 is a block diagram of an electronic device according to the sound recording method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 12, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 12 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of recording provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of recording provided herein.

The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of recording in the embodiment of the present application (for example, the establishing module 10, the first sending module 20, and the third sending module 30 shown in fig. 7). The processor 901 executes various functional applications of the server and data processing, i.e., a method of implementing sound recording in the above method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 902.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the recorded electronic device, and the like. Further, the memory 902 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the electronic device of the recording over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for recording sound may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903, and the output device 904 may be connected by a bus or other means, and fig. 12 illustrates an example of connection by a bus.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the recorded electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD) such as a Liquid crystal Cr9 star display 9, a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the audio channel is established for each recording application, so that each recording application can acquire audio data independently and perform identification and analysis, and the audio channels of the recording applications are not influenced mutually. In addition, the first buffer area is independently established for each recording application, so that each recording application can stably read data from the corresponding first buffer area.

The technical scheme of the embodiment of the application is used as a general solution for supporting multi-channel recording on an intelligent platform, the development threshold of intelligent equipment is greatly reduced, application types are greatly enriched through the design of supporting multi-channel recording examples, the capability of meeting more service requirements can be provided, and the intelligent platform is more convenient to use interactive scenes.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of recording a sound, comprising:

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein after sending the received audio data to the first buffer corresponding to each of the recording applications, further comprising:

and resampling the audio data through a first buffer area corresponding to each recording application according to the preset sampling frequency of each recording application.

4. The method of claim 1, further comprising:

and setting a voice processing algorithm for each audio channel corresponding to the recording application, wherein the voice processing algorithm is used for processing the audio data.

5. The method of claim 1, wherein sending the received audio data to a first buffer corresponding to each of the recording applications comprises:

6. The method of any one of claims 1 to 5, wherein the recording application comprises a voice wakeup application, a voice call application, and a third party interactive application.

7. A sound recording apparatus, comprising:

8. The apparatus of claim 7, further comprising:

9. The apparatus of claim 7, further comprising:

10. The apparatus of claim 7, further comprising:

and the algorithm setting module is used for setting a voice processing algorithm for each audio channel corresponding to the recording application, and the voice processing algorithm is used for processing the audio data.

11. The apparatus of claim 7, the first sending module comprising:

and the second sending submodule sends the audio data to the first buffer area corresponding to each recording application through the second buffer area.

12. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.