CN115334195A

CN115334195A - Audio recording method, device, equipment and readable storage medium

Info

Publication number: CN115334195A
Application number: CN202110508675.7A
Authority: CN
Inventors: 许崇峰
Original assignee: Qiku Software Shenzhen Co Ltd
Current assignee: Qiku Software Shenzhen Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-11

Abstract

The invention discloses an audio recording method, an audio recording device, audio recording equipment and a readable storage medium, wherein the method comprises the following steps: when the preset communication application program is monitored to be in a call state, recording the sound of a microphone channel through a privileged application to generate a first audio stream; capturing a second audio stream sent in the communication process of the opposite communication terminal; after the end of the call state is detected, the first audio stream and the second audio stream are mixed to generate a target recording audio, so that compared with the existing technology for recording based on a screen recording mode, the method records the first audio stream of the local terminal through privileged application, and directly captures the second audio stream of the opposite terminal, so that the influence of background noise is avoided, and the recording quality is improved.

Description

Audio recording method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of audio processing technologies, and in particular, to an audio recording method, apparatus, device, and readable storage medium.

Background

More and more users now choose to communicate through WeChat voice or video communication, not to mention some important things, such as: borrowing money or to meet an equally important matter. However, such applications lack a recording function, and cannot meet the recording requirements of users, so that the user experience is reduced. At present, calls of communication application programs such as WeChat/QQ and the like are recorded in a screen recording mode, but sometimes, the call contents are recorded in a mode of recording the screen after being played, so that the call contents are easily interfered by external environment noise, the recording quality is poor, and the mode is recorded and stored in an audio and video file format, so that the occupied storage space is large.

The above is only for the purpose of assisting understanding of the technical solution of the present invention, and does not represent an admission that the above is the prior art.

Disclosure of Invention

The invention mainly aims to provide an audio recording method, an audio recording device, audio recording equipment and a readable storage medium, and aims to solve the technical problems that the recording quality is poor and the storage space of recorded audio and video files occupies a large space due to the fact that the existing audio recording adopts a screen recording mode.

In order to achieve the above object, the present invention provides an audio recording method, including the steps of:

when the preset communication application program is monitored to be in a call state, recording the sound of a microphone channel through a privileged application to generate a first audio stream;

capturing a second audio stream sent in the communication process of the opposite communication terminal;

and after the communication state is detected to be finished, mixing the first audio stream and the second audio stream to generate target recorded audio.

Optionally, when it is monitored that the preset communication application program is in a call state, the step of recording, by the privileged application, the sound of the microphone channel to generate the first audio stream includes:

when the preset communication application program is monitored to be in a call state, judging whether a recording instruction is received or not;

if the recording instruction is received, recording the sound of the microphone channel through the privileged application so as to generate a first audio stream.

Optionally, if the recording instruction is received, the step of recording the sound of the microphone channel by the privileged application specifically includes:

if a recording instruction is received, judging whether a privileged application meeting the preset recording requirement exists at present;

if the privileged application meeting the preset recording requirement exists at present, determining a target microphone channel occupied by the preset communication application program;

and calling the privileged application and occupying the target microphone channel simultaneously so as to record the sound of the microphone channel through the privileged application.

Optionally, after the step of determining whether there is a privileged application meeting the preset recording requirement, the method further includes:

if no privileged application meeting the preset recording requirement exists at present, sending a recording request to a call opposite terminal to inform the call opposite terminal to capture a third audio stream and feed back the third audio stream;

and receiving a third audio stream fed back by the opposite call end, and taking the third audio stream as the first audio stream.

Optionally, the step of determining whether a privileged application meeting the preset recording requirement currently exists includes:

determining the privilege authority of each privileged application in the terminal;

if the privileged authority of the privileged application is in a preset white list, judging that the privileged application meeting the preset recording requirement exists;

and if the privileged authority of the privileged application is not in the preset white list, judging that no privileged application meeting the preset recording requirement exists.

Optionally, the step of mixing the first audio stream and the second audio stream to generate the target recorded audio comprises:

obtaining first non-silent audio in the first audio stream,

obtaining a second non-silent audio in the second audio stream;

mixing the first non-silent audio and the second non-silent audio.

Optionally, after the step of mixing the first audio stream and the second audio stream to generate the target recorded audio, the method further includes:

judging whether the target recorded audio meets a preset storage requirement or not;

and if the target recorded audio meets the preset storage requirement, storing the target recorded audio based on a preset storage mode, and displaying a second preset prompt message.

Further, to achieve the above object, the present invention further provides an audio recording apparatus, including:

the monitoring module is used for recording the sound of the microphone channel through the privileged application to generate a first audio stream when the preset communication application program is monitored to be in a call state;

the capturing module is used for capturing a second audio stream sent in the communication process of the opposite communication terminal;

and the mixed flow module is used for mixing the first audio stream and the second audio stream after the communication state is detected to be finished so as to generate target recorded audio.

Further, to achieve the above object, the present invention further provides an audio recording device, where the audio recording device includes a memory, a processor, and an audio recording program stored in the memory and capable of running on the processor, and the audio recording program, when executed by the processor, implements the steps of the audio recording method as described above.

Further, in order to achieve the above object, the present invention also provides a readable storage medium, on which an audio recording program is stored, and the audio recording program, when executed by a processor, implements the steps of audio recording as described above.

Compared with the existing audio recording mode, the method and the device have the advantages that when the preset communication application program is monitored to be in a call state, the sound of the microphone channel is recorded through the privileged application to generate the first audio stream; capturing a second audio stream sent by the opposite call terminal; after the communication state is detected to be finished, the first audio stream and the second audio stream are mixed to generate a target recording audio, and therefore compared with the existing technology for recording based on a screen recording mode, the method and the device record the first audio stream of the local terminal through privileged application and directly capture the second audio stream of the opposite terminal, the influence of background noise is avoided, and recording quality is improved.

Drawings

Fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the audio recording apparatus of the present invention;

fig. 2 is a schematic flowchart of an audio recording method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating an audio recording method according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating an audio recording method according to a third embodiment of the present invention;

FIG. 5 is a flowchart illustrating an audio recording method according to a fourth embodiment of the present invention;

fig. 6 is a functional block diagram of an audio recording apparatus according to an embodiment of the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

The invention provides an audio recording device, and referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the audio recording device of the invention.

As shown in fig. 1, the audio recording apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface module 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display (Display), an input unit (such as a Keyboard), and optionally, the user interface 1003 may include a standard wired interface, a wireless interface. Optionally, the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). Optionally, the memory 1005 may also be an audio recording device independent of the processor 1001.

Those skilled in the art will appreciate that the hardware configuration of the audio recording device shown in fig. 1 does not constitute a limitation of the audio recording device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a readable storage medium, may include therein an operating system, a network communication module, a user interface module, and an audio recording program. The operating system is a program for managing and controlling hardware and software resources of the audio recording equipment and supports the running of a network communication module, a user interface module, an audio recording program and other programs or software; the network communication module is used for managing and controlling the network interface module 1004; the user interface module is used to manage and control the user interface 1003.

In the hardware structure of the audio recording device shown in fig. 1, the network interface module 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the processor 1001 may call an audio recording program stored in the memory 1005 and perform the following operations:

capturing a second audio stream sent in the conversation process of the conversation opposite terminal;

and after the end of the call state is detected, mixing the first audio stream and the second audio stream to generate target recorded audio.

The invention also provides an audio recording method.

Referring to fig. 2, fig. 2 is a flowchart illustrating an audio recording method according to a first embodiment of the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than presented herein. Specifically, the audio recording method of the embodiment includes:

step S10, when the preset communication application program is monitored to be in a call state, recording the sound of a microphone channel through the privileged application to generate a first audio stream;

it should be noted that the execution subject of the embodiment is a mobile terminal having a microphone device, where the mobile terminal may be an electronic device such as a tablet or a mobile phone.

Specifically, the preset communication application includes a communication application with a communication function, such as a phone application, which is installed in a system of the mobile terminal, and further includes an instant communication application with a communication function, such as a WeChat application and a qq application, which is additionally installed by a subsequent user during use, which is not limited in this embodiment.

It can be understood that, when it is monitored that the preset communication application is in a call state, it indicates that the current preset communication application is occupying the microphone channel, and since only one common application is allowed to occupy the microphone channel at the same time in a general situation, in order to implement sound recording of the current microphone channel, in this embodiment, the privileged application is invoked to simultaneously occupy the microphone channel, so as to record sound of the microphone channel through the privileged application.

It should be noted that, in the present embodiment, the normal application and the privileged application are divided based on the allowed rights or the functions of the application, specifically, in the present embodiment, the privileged application refers to an application having a capability _ AUDIO _ OUTPUT right, and in the present embodiment, the privileged application also refers to an application having a role of rolemanager.

It is easy to understand that, due to the uniqueness of the privileged application, when the preset communication application program is occupying the microphone channel, the privileged application is allowed to occupy the microphone channel at the same time, so that when the privileged application occupies the microphone channel, the sound input by the user at the local end through the microphone channel can be recorded through the privileged application, so as to obtain the sound signal input by the user at the local end through the microphone device during the call, and further obtain the first audio stream.

In addition, in practical application, when it is monitored that the preset communication application program is in a call state, there may be an influence of background noise or current noise inside the device when the sound of the microphone channel is recorded through the privileged application, so that the recorded sound is chaotic, and therefore in order to improve the quality of the recorded audio, after the sound of the microphone channel is recorded through the privileged application, the recorded sound is also subjected to denoising processing, and then the denoised first audio stream is obtained.

Further, since only the first audio stream input by the user at the local end can be recorded in the microphone channel, and the audio stream information sent by the opposite end of the call cannot be obtained from the microphone channel, in order to facilitate subsequent audio mixing, in this embodiment, when the first audio stream of the microphone channel is recorded through privileged application, time information of the first audio stream needs to be marked at the same time, that is, a timestamp corresponding to the first audio stream is generated, so that when subsequent audio mixing is performed, the first audio stream and the audio stream sent by the opposite end of the call can be mixed based on the timestamp of the first audio stream to obtain complete recorded audio.

Further, since the memory of the terminal is limited, in order to avoid that the finally recorded audio stream cannot be stored in the memory due to the fact that the recorded audio stream is too large, in this embodiment, when the first audio stream of the microphone channel is recorded through the privileged application, the currently accumulated character amount of the audio stream needs to be calculated at the same time, and when it is detected that the currently accumulated character amount of the audio stream reaches the current maximum storage capacity, the audio recording is automatically stopped, so that the recorded audio is stored to the maximum extent. Optionally, a reminding message may be generated when it is detected that the currently accumulated character amount of the audio stream reaches the current maximum storage capacity, so as to remind a user to adjust a recording policy, for example, close recording or adjust a recording audio storage location.

Step S20, capturing a second audio stream sent in the communication process of the opposite communication terminal;

specifically, the opposite end refers to another terminal device for making a call with a user, and it is easy to understand that, when making a call, a call link, such as a Connection-Oriented Synchronous link (SCO) or an Extended Connection-Oriented Synchronous link (eSCO), must be established at both ends or at multiple ends of the call to enable a call based on the established call link, that is, the local end sends an audio stream of the local end to the opposite end based on the established call link, and the opposite end sends an audio stream of the opposite end to the local end based on the established call link, so as to implement a call, further, in order to implement a normal call, the call link is generally divided into an uplink call link and a downlink call link, in general, the uplink call link refers to data of the local end and the downlink link at the opposite end, and the downlink refers to the link at the opposite end and the second audio stream sent by the call can be directly captured from the downlink in this embodiment.

In a specific application scenario, a target call link in a call process of an opposite call end is determined, the target call link is set as a target recording audio source, and then a preset recording simulator is called to capture a second audio stream sent by the opposite call end from the target recording audio source, specifically, the target call link refers to a link where an audio stream of the opposite call end is transmitted to a local end, such as the downlink, which is not described in detail in this embodiment.

In a specific implementation process, the preset recording simulator is called to capture the second audio stream sent by the opposite call end from the call link of the opposite call end, where in this embodiment, the preset recording simulator refers to hardware with a recording function, such as a MediaRecorder or an AudioRecorder, pre-installed in the system, so as to capture the second audio stream sent by the opposite call end from the call link of the opposite call end through the MediaRecorder or the AudioRecorder.

It should be noted that the MediaRecorder is based on the API of the AudioRecorder, that is, when the MediaRecorder is used for recording, the MediaRecorder still creates an AudioRecord to interact with the audioflanger, so when receiving a recording request sent by a user, it is determined whether the recording request is a request carried in a screen recording request, that is, when performing screen recording, it is usually necessary to record audio and video, therefore, if the recording request is a request carried in the screen recording request, the MediaRecorder is directly called to capture a second audio stream sent by a call opposite end, and if the recording request is not a request carried in the screen recording request, the AudioRecorder is directly called to capture the second audio stream sent by the call opposite end.

Further, since the audio recorded by the MediaRecorder is encoded, that is, the MediaRecorder has no way to obtain the original audio, and the AudioRecorder can directly obtain a frame of data with PCM format, that is, the second audio stream captured based on the MediaRecorder cannot be directly subjected to audio post-processing, and the second audio stream captured based on the AudioRecorder can be directly subjected to audio post-processing, if the second audio stream transmitted by the opposite end of the call needs to be subjected to audio post-processing, the AudioRecorder is first called to capture the second audio stream transmitted by the opposite end of the call, where the audio post-processing in this embodiment refers to processing of changing the characteristics of the audio signal, such as denoising, changing the sound, and amplifying the signal.

Further, since the memory of the terminal is limited, in order to avoid that the finally recorded audio stream cannot be stored in the memory due to the fact that the recorded audio stream is too large, in this embodiment, when capturing the second audio stream sent by the opposite end of the call, the currently accumulated character amount of the audio stream needs to be calculated at the same time, and when it is detected that the currently accumulated character amount of the audio stream reaches the current maximum storage capacity, the audio recording is automatically stopped, so that the recorded audio is stored to the maximum extent.

Further, before the step of calling a preset recording simulator to capture a second audio stream sent by the opposite end of the call from the target recording audio source, the method further includes: initializing a preset recording simulator, and setting recording parameters of the preset recording simulator.

It can be understood that, in general, when the preset recording simulator is used to record audio, the preset recording simulator is default to record from the microphone, and it is noted that in this embodiment, sound of the microphone channel is recorded through a privileged application, that is, sound of the local terminal may be obtained based on the privileged application, so that in order to capture the second audio stream sent by the opposite end of the call through the preset recording simulator, before the step is executed in this embodiment, the preset recording simulator needs to be initialized, that is, recording parameters of the preset recording simulator set before being emptied are cleared, and then recording parameters of the preset recording simulator are reset, specifically, when the recording parameters of the recording simulator are reset, the target recording audio source is set as a source of audio acquisition of the preset recording simulator.

Specifically, the recording parameters include a source of audio acquisition, an audio sampling rate, an audio sampling precision, and a size of a buffer in which the acquired audio data is stored, and in this embodiment, a preset recording simulator needs to be invoked to capture a second audio stream sent by an opposite end of a call, so in this embodiment, a target call link of the opposite end of the call is set as the source of audio acquisition.

For convenience of understanding, the embodiment is exemplified by an AudioRecorder, when a recording requirement is received, a state of the AudioRecord is obtained, whether the AudioRecord ensures to obtain an appropriate hardware resource is detected based on the state of the AudioRecord, after the AudioRecord ensures to obtain the appropriate hardware resource is detected, an AudioRecord object is constructed, then, a second audio stream is collected from a target call link with a preset audio sampling rate and audio sampling precision, and the collected second audio stream is read to a buffer.

And step S30, mixing the first audio stream and the second audio stream after the communication state is detected to generate a target recording audio.

In particular, mixing refers to mixing multiple audio-video streams into a single stream. It can be understood that, in this embodiment, a first audio source at a local end and a second audio source at an opposite end are recorded based on privileged application, respectively, that is, the first audio source and the second audio source are two different audio streams, so to ensure consistency between recorded audio and audio during an actual call, after detecting that a call state is ended, the first audio stream and the second audio stream need to be mixed into a single-stream target recorded audio, when mixing the audio, the first audio stream and the second audio stream are decoded respectively, then the decoded first audio stream and the decoded second audio stream are mixed into a single-stream target audio stream, and finally, to achieve normal playing of the audio, the target audio stream also needs to be encoded, so as to obtain a target recorded audio that can be played. In a specific operation process, audio mixing may be performed based on the DirectSound component, and in addition, mixing may be performed based on other audio mixing components in this embodiment, which is not limited in this embodiment.

It should be further noted that the above-mentioned call state may be determined by the user, for example, if it is detected that the user currently clicks a key for hanging up the phone, it is determined that the current call state is ended, in addition, in this embodiment, it may also be determined that the current call state is ended when it is monitored that the silent duration of the two parties for calling reaches a certain specific duration, for example, it is monitored that the two parties for calling are not speaking within 3 minutes, and it is determined that the current call state is ended.

Compared with the existing audio recording mode, the method and the device have the advantages that when the preset communication application program is monitored to be in a call state, the sound of the microphone channel is recorded through the privileged application to generate the first audio stream; capturing a second audio stream sent by the opposite call terminal; the method comprises the steps of recording a first audio stream of a local terminal and directly capturing a second audio stream of an opposite terminal through privileged application after a call state is detected to be finished so as to generate a target recording audio, so that compared with the prior art of recording based on a screen recording mode, the method avoids the influence of background noise and further improves the recording quality.

Further, based on the first embodiment of the audio recording method of the present invention, a second embodiment of the audio recording method of the present invention is provided.

Referring to fig. 3, fig. 3 is a flowchart illustrating an audio recording method according to a second embodiment of the present invention.

The second embodiment of the audio recording method is different from the first embodiment of the audio recording method in that step S10: when the preset communication application program is monitored to be in a call state, recording the sound of the microphone channel through the privileged application to generate a first audio stream, wherein the method comprises the following steps:

step S101, when a preset communication application program is monitored to be in a call state, judging whether a recording instruction is received;

step S102, if the recording instruction is received, recording the sound of the microphone channel through the privileged application to generate a first audio stream.

It should be noted that, the recording instruction may be: the recording control signal generated by triggering the long pressing of the power key, the shaking, the long pressing of the volume increasing key, the long pressing of the volume decreasing key, the pressing of the two volume adjusting keys in sequence, the turning of the screen from the back side up to the front side up twice continuously, and the like may also be an instruction triggered by the user's clicking operation on the privileged application, which is not limited in this embodiment.

It can be understood that, in this embodiment, in order to ensure privacy of the user, when it is monitored that the preset communication application is in a call state, the audio recording is performed only when a recording instruction sent by the user is received, and further, in this embodiment, the specific step of recording the sound of the microphone channel through the privileged application to generate the first audio stream is the same as that in the first embodiment, and is not described again here.

Furthermore, at least two microphone devices are provided on the terminal, such as an external microphone and an internal microphone, so that when a recording instruction is received, a target microphone channel occupied by a preset communication application program needs to be determined, and then a privileged application is called to simultaneously occupy the target microphone channel, so as to record the sound of the microphone channel through the privileged application.

In addition, it should be noted that, since the privileged application refers to an application having a CAPTURE _ AUDIO _ OUTPUT right or having a role of rolemanager. Roll _ assist, generally, the privileged application in the terminal can normally operate only when the terminal is allowed to execute the right or support the role, and in order to ensure privacy of the user, the privileged application can occupy the microphone channel together with other applications for AUDIO recording only when the recording requirement is currently met, so that if a recording instruction is received, it is necessary to first determine whether the privileged application meeting the preset recording requirement currently exists, and if the privileged application meeting the preset recording requirement currently exists, determine a target microphone channel occupied by the preset communication application program, and finally invoke the privileged application and occupy the target microphone channel at the same time, so as to record sound of the microphone channel through the privileged application.

In addition, it should be noted that the preset recording requirement may be a recording requirement set by the user, for example, the preset recording requirement is that the privileged application does not affect the normal operation of the terminal during recording, or a file of an audio file recorded based on the privileged application does not need to be opened by means of other third-party application programs, for example, for a mobile phone of the user a, a privileged application a and a privileged application b exist in the mobile phone of the user a, and since the privileged application b may cause a mobile phone to be stuck when recording audio or a user of the finally recorded audio file cannot directly open the mobile phone, the privileged application b may be determined not to meet the preset recording requirement by means of a third-party tool.

In addition, it is noted that, when the user closes the application right corresponding to the privileged application, the privileged application may not be recorded, and therefore, in this embodiment, the step of determining whether the privileged application meeting the preset recording requirement currently exists further includes: determining the privileged authority of each privileged application in the terminal, if the privileged authority of the privileged application is in a preset white list, judging that the privileged application meeting the preset recording requirement exists, and if the privileged authority of the privileged application is not in the preset white list, judging that the privileged application meeting the preset recording requirement does not exist, specifically, the privileged authority refers to the permission to occupy the microphone channel at the same time, and the preset white list refers to a list which the privileged authority of the application is passed by the user.

In a specific implementation, if no privileged application meeting a preset recording requirement exists at present, a recording request is sent to the opposite call end to notify the opposite call end to capture a third audio stream and feed back the third audio stream, and finally the third audio stream fed back by the opposite call end is received and taken as the first audio stream.

In addition, in order to ensure privacy of user data, a user may set, by himself or herself, whether data of the application program can be shared with other application programs, so that the step of recording sound of the microphone channel through the privileged application further includes determining a recording permission parameter corresponding to the preset communication application program, recording sound of the microphone channel through the privileged application program if the recording permission parameter corresponding to the preset communication application program is characterized as a permission state, and displaying a first preset prompt message to notify a user corresponding to the call state to switch the recording permission state corresponding to the preset communication application program if the recording permission parameter corresponding to the preset communication application program is characterized as a prohibition state, so that the privileged application records sound of the microphone channel through the privileged application program only when the preset communication application program is permitted to perform a recording operation, thereby ensuring privacy of user data.

It should be noted that the recording permission parameter includes a data attribute parameter of the communication application program or a recording permission parameter corresponding to the communication application program, for example, when the data attribute of the communication application program is private data, it indicates that the current communication application program is not allowed to perform data sharing, that is, the current communication application program is prohibited from recording, or when the recording permission corresponding to the current communication application program is in an off state, it indicates that the current communication application program is prohibited from recording.

According to the method and the device, whether the recording instruction is received or not is judged when the preset communication application program is monitored to be in a call state, if the recording instruction is received, the sound of the microphone channel is recorded through the privileged application to generate the first audio stream, so that the sound of the microphone channel is recorded through the privileged application only when the recording instruction is received, and the privacy of user data is guaranteed.

Further, based on the first embodiment of the audio recording method of the present invention, a third embodiment of the audio recording method of the present invention is provided.

Referring to fig. 4, fig. 4 is a flowchart illustrating an audio recording method according to a second embodiment of the present invention.

The third embodiment of the audio recording method is different from the first embodiment of the audio recording method in that the step of mixing the first audio stream and the second audio stream to generate the target recorded audio further includes:

step S301, obtaining a first non-silent audio in the first audio stream,

step S302, acquiring a second non-silent audio frequency in the second audio frequency stream;

step S303, mixing the first non-silent audio and the second non-silent audio. Specifically, non-silent audio refers to an audio signal whose audio frequency is not lower than a certain frequency, and silent audio refers to an audio signal whose audio frequency is lower than a certain frequency, for example, in general, the pitch frequency range of audio when a person normally speaks is 100HZ-1KHZ, and when the voice of a person speaks brightly, the pitch frequency range is 200HZ-1.1KHZ, and the certain frequency may be 100HZ, that is, the audio signals whose audio frequency is lower than 100HZ are all silent audio, that is, the silent audio generally does not include what the user says. Therefore, the audio is unnecessary for the user, and therefore, in order to improve the flexibility of audio recording, only the non-silent audio in the audio stream is mixed after the end of the call state is detected, namely, the silent audio in the audio stream is cut so as to reduce the storage space occupied by the target recorded audio.

It is easy to understand that in a specific application scenario, two parties in a call generally belong to interactive information transmission, that is, the voice signals of the two parties in the call are not synchronized in time, for example, when a queen and a young person are in a call, the queen says: "Xiaozhuang, good morning", xiaozhuang responds after listening to the king and speaking: the "Xiaowang, good morning", or within a certain time period, both parties are in silent state, so in a complete conversation process, the audio sources of each party are divided into silent audio and non-silent audio.

Therefore, in order to improve the flexibility of audio recording, during the audio mixing, the user may further send an audio clipping instruction, in a specific implementation, when recording an audio stream during a call when a preset communication application is in a call state, the user may display related prompt information at a status bar of a current display interface, for example, change the color of the status bar area to red to remind the user that the recording is currently performed, after detecting that the call state is finished, that is, when audio synthesis of a target recorded audio is to be started, the user may display related prompt information at the status bar of the current display interface, for example, change the color of the status bar area to green to remind the user that the audio synthesis is currently performed, during a preset period, the user may send an audio clipping instruction to notify the terminal to perform audio clipping, for example, after detecting that the call state is finished, a selection box for clipping the recorded audio is displayed on the current display interface, where the selection box includes a yes selection key and a no selection key, and when it is determined that the audio clipping instruction is triggered by a shake-up mode, the screen is not triggered by a shake-up mode.

Specifically, after the end of the call state is detected, if an audio clipping instruction is received, the first audio stream and the second audio stream need to be clipped while the first audio stream and the second audio stream are mixed. In particular, when the first audio stream and the second audio stream are mixed, the silent audio in the first audio stream and the second audio stream is sequentially intercepted, so that the first audio is decomposed into at least one section of first non-silent audio and the second audio is decomposed into at least one section of second non-silent audio, and finally the first non-silent audio and the second non-silent audio are mixed into the target recorded audio.

For ease of understanding, as illustrated by the first audio stream, when mixed stream is performed, an audio stream waveform corresponding to the first audio stream is output, then silent audio in a silent period in the audio stream waveform is obtained, and a start time node and an end time node of a silent time corresponding to the silent audio are output, and then the first audio stream is split into several pieces of audio based on the start time node and the end time node of the silent time, for example, when the duration of the first audio stream is 3 minutes, where the start time node and the end time node of the silent time corresponding to the silent audio are (10s, 37s), (89s, 101s), (135s, 88s), and (10s, 157s), so that the 3-minute first audio stream can be split into (0s, 9s), non-silent audio, (10s, 37s), non-silent audio, (111s, 88s), non-silent audio, (89s, 101s), non-silent audio, (102s, 134s), non-silent audio (10s, 139s), and (157s, so as to avoid the influence of the non-silent audio in the mixed stream on the corresponding pieces of the non-silent audio, the second audio, 158s, and the other audio.

According to the method and the device, the first non-silent audio in the first audio stream is obtained, the second non-silent audio in the second audio stream is obtained, and the first non-silent audio and the second non-silent audio are mixed, so that the finally generated target recorded audio only contains the non-silent audio, and the storage space occupied by the target recorded audio is reduced.

Further, based on the first embodiment of the audio recording method of the present invention, a fourth embodiment of the audio recording method of the present invention is provided.

Referring to fig. 5, fig. 5 is a flowchart illustrating an audio recording method according to a fourth embodiment of the present invention.

The fourth embodiment of the audio recording method differs from the first embodiment of the audio recording method in that, after the step of mixing the first audio stream and the second audio stream to generate the target recorded audio, the method further includes:

step S40, judging whether the target recorded audio meets the preset storage requirement or not;

and S50, if the target recorded audio meets the preset storage requirement, storing the target recorded audio based on a preset storage mode, and displaying a second preset prompt message.

Specifically, in a specific application, the preset storage requirement may be a storage requirement for presetting, which is not limited in this embodiment, for example, the audio duration of an audio file is longer than the shortest duration or the audio belongs to lossless audio, and the like, and a user may flexibly set based on actual requirements, and it should be understood that, in a specific operation process, a situation of a user misoperation may occur, for example, when a recording end request is received again at the 5 th S after the user sends the recording request, it is implicitly indicated that the current user triggers the recording request by mistake, and therefore, in this embodiment, in order to improve the intellectualization of the terminal, it is further necessary to detect whether the current target recorded audio meets the preset storage requirement before storing the target audio, and in addition, the preset storage manner of this embodiment includes a file storage address, a file storage type, a file naming manner, and the like, and this embodiment does not limit this.

For ease of understanding, the present embodiment specifically describes the above steps by way of example.

For example, for a user 2021.01.16 morning 10: when the 00 o 'clock a is based on an application program to perform a call, a target recorded audio with a duration of 3 minutes is recorded, and a preset shortest duration for storing the audio is 1 minute, which indicates that the target recorded audio currently meets a preset storage requirement, and then a storage address of a user's preset device is determined, for example, when a user specifies to store files to a media area, the target recorded audio with a duration of 3 minutes is stored to the media area, and the target recorded audio is stored based on an MP3 file format, further, when the user searches for an MP3 file format, the MP3 file may be named based on the recording time of the current audio when the MP3 file format is stored, for example, the name of the MP3 file is set to "2021.01.16 th" or "2021.01.16 th morning 10:00 o ", in addition, file naming can be performed based on the audio source or the information of the application program that records the audio, which is not limited in this embodiment.

Further, in practical application, if the audio time is too short or the audio is damaged, in order to further improve the storage space of the terminal, if the target recorded audio does not meet the preset storage requirement, the target recorded audio is deleted, and a third preset prompt message is displayed. To alert the user that the currently recorded audio is invalid and has been deleted.

According to the method and the device, the target recorded audio is stored based on the preset storage mode when the target recorded audio meets the preset storage requirement, and the target recorded audio is deleted when the target recorded audio does not meet the preset storage requirement, so that the intelligence of audio storage is improved, and the user experience is further improved.

The invention also provides an audio recording device. Referring to fig. 6, the audio recording apparatus includes:

the monitoring module 10 is configured to record sound of a microphone channel through a privileged application to generate a first audio stream when a preset communication application program is monitored to be in a call state;

Specifically, the preset communication application includes a communication application with a communication function, such as a phone application, which is installed in the system of the mobile terminal, and further includes an instant communication application with a communication function, such as a WeChat application and a qq application, which is installed by a subsequent user during use, which is not limited in this embodiment.

It can be understood that when it is monitored that the preset communication application is in a call state, it indicates that the current preset communication application is occupying the microphone channel, and since only one common application is allowed to occupy the microphone channel at the same time in general, in order to implement sound recording of the current microphone channel, in this embodiment, the privileged application is called to simultaneously occupy the microphone channel, so as to record sound of the microphone channel through the privileged application.

It should be noted that, in this embodiment, the normal application and the privileged application are divided based on the allowed rights of the application or the functions possessed by the application, specifically, in this embodiment, the privileged application refers to an application possessing the capability _ AUDIO _ OUTPUT rights, and in addition, the privileged application in this embodiment also refers to an application possessing a role of rollemannager, roll _ associate, such as a Google ASSISTANT which is installed in the system in advance and possesses a role of rollemannager, roll _ associate, and the normal application refers to an application which does not support the capability of opening the capability _ AUDIO _ OUTPUT, such as a third-party recording application program.

Easily understand, because of the uniqueness of the privileged application, when the preset communication application program is occupying the microphone channel, the privileged application is allowed to occupy the microphone channel at the same time, so that when the privileged application occupies the microphone channel, the sound input by the user at the local end through the microphone channel can be recorded through the privileged application, and the sound signal input by the user at the local end through the microphone device during the call can be acquired, and the first audio stream can be obtained.

In addition, in practical application, when it is monitored that the preset communication application program is in a call state, there may be an influence of background noise or current noise inside the device when recording the sound of the microphone channel through the privileged application, so that the recorded sound is chaotic, and therefore, in order to improve the quality of the recorded audio, after recording the sound of the microphone channel through the privileged application, the recorded sound is further subjected to denoising processing, and then a denoised first audio stream is obtained.

Further, since only the first audio stream input by the user at the local end can be recorded in the microphone channel, and the audio stream information sent by the opposite end of the call cannot be acquired from the microphone channel, in order to facilitate subsequent audio mixing, in this embodiment, when the first audio stream of the microphone channel is recorded through privileged application, time information of the first audio stream needs to be simultaneously marked, that is, a timestamp corresponding to the first audio stream is generated, so that when subsequent audio mixing is performed, the first audio stream and the audio stream sent by the opposite end of the call can be mixed into a complete recorded audio based on the timestamp of the first audio stream.

A capturing module 20, configured to capture a second audio stream sent in a call process of an opposite call end;

it is easy to understand that when a call is made, the double or multiple ends of the call must establish a call link, such as SCO: synchronous Connection ordered (Connection Oriented Synchronous) link or eSCO: an Extended Synchronous Connection-organized link, that is, the local terminal sends an audio stream of the local terminal to the opposite terminal based on the established call link, and the opposite terminal sends an audio stream of the opposite terminal to the local terminal based on the established call link, so as to implement a call.

In a specific application scenario, a target call link in a call process of an opposite call end is determined, the target call link is set as a target recording audio source, and then a preset recording simulator is called to capture a second audio stream sent by the opposite call end from the target recording audio source, specifically, the target call link refers to a link transmitted from the opposite call end to the local end, such as the downlink, which is not described in detail in this embodiment.

In a specific implementation process, the preset recording simulator is called to capture the second audio stream sent by the opposite end of the call from the call link of the opposite end of the call, where in this embodiment, the preset recording simulator refers to hardware with recording or screen recording functions, such as MediaRecorder or AudioRecorder, pre-installed in the system, so as to capture the second audio stream sent by the opposite end of the call from the call link of the opposite end of the call through the MediaRecorder or AudioRecorder.

It should be noted that the MediaRecorder is an API based on the AudioRecorder, that is, when the MediaRecorder is used for recording, the MediaRecorder still creates the AudioRecord to interact with the AudioFlinger, so when a recording request sent by a user is received, it is determined whether the recording request is a request carried in a screen recording request, that is, when a screen recording request is made, it is usually necessary to record audio and video.

Further, since the audio recorded by the MediaRecorder is encoded, that is, the MediaRecorder has no way to obtain the original audio, and the AudioRecorder can directly obtain a frame of data with PCM format, that is, the second audio stream captured based on the MediaRecorder cannot be directly subjected to audio post-processing, and the second audio stream captured based on the AudioRecorder can be directly subjected to audio post-processing, if audio post-processing needs to be performed on the second audio stream transmitted by the opposite end of the call, the AudioRecorder is preferentially called to capture the second audio stream transmitted by the opposite end of the call, where the audio post-processing in this embodiment refers to processing of changing the characteristics of the audio signal, such as denoising, changing the sound, and signal amplification.

Specifically, the recording parameters include a source of audio acquisition, an audio sampling rate, an audio sampling precision, and a size of a buffer where the acquired audio data is stored, and in this embodiment, a preset recording simulator needs to be called to capture a second audio stream sent by a call peer end, so in this embodiment, a target call link of the call peer end is set as the source of audio acquisition.

And the mixed flow module 30 is configured to mix the first audio stream and the second audio stream after detecting that the call state is ended, so as to generate a target recording audio.

In particular, mixing refers to mixing multiple audio-video streams into a single stream. It can be understood that, in this embodiment, a first audio source at a local end and a second audio source at an opposite end are recorded based on privileged application, that is, the first audio source and the second audio source are two different audio streams, so that to ensure consistency between the recorded audio and the audio during an actual call, after detecting that a call state is over, the first audio stream and the second audio stream need to be mixed into a single-stream target recorded audio, when mixing, the first audio stream and the second audio stream are decoded respectively, then the decoded first audio stream and the decoded second audio stream are mixed into a single-stream target audio stream, and finally, to achieve normal playing of the audio, the target audio stream needs to be encoded, so as to obtain a target recorded audio that can be played. In a specific operation process, the audio mixing may be performed based on third-party mixing software, for example, audio mixing may be performed based on a DirectSound component, and mixing may also be performed based on other audio mixing technologies in this embodiment, which is not limited in this embodiment.

It should be further noted that the above-mentioned call state may be determined by the user, for example, if it is detected that the user clicks a key for hanging up the phone call currently, it is determined that the current call state is ended, in addition, in this embodiment, it may also be determined that the current call state is ended when it is detected that the silence duration of the two parties for call reaches a certain duration, for example, it is detected that the two parties for call do not sound within 3 minutes, and it is determined that the current call state is ended.

In addition, in an embodiment, the monitoring module 10 is further configured to determine whether a recording instruction is received when it is monitored that the preset communication application is in a call state;

the monitoring module 10 is further configured to record sound of the microphone channel through the privileged application to generate a first audio stream if the recording instruction is received.

In addition, in an embodiment, the monitoring module 10 is further configured to determine whether a privileged application meeting a preset recording requirement currently exists if the recording instruction is received;

the monitoring module 10 is further configured to determine a target microphone channel occupied by the preset communication application program if a privileged application meeting a preset recording requirement currently exists;

the monitoring module 10 is further configured to invoke a privileged application and occupy the target microphone channel at the same time, so as to record the sound of the microphone channel through the privileged application.

In addition, in an embodiment, the monitoring module 10 is further configured to send a recording request to a call peer end if there is no privileged application meeting a preset recording requirement, so as to notify the call peer end to capture a third audio stream, and feed back the third audio stream;

the monitoring module 10 is further configured to receive a third audio stream fed back by the opposite call end, and use the third audio stream as the first audio stream.

In addition, in an embodiment, the capturing module 20 is further configured to determine a target call link in the call of the opposite end, and set the target call link as a target recording audio source;

the capturing module 20 is further configured to invoke a preset recording simulator to capture a second audio stream sent by the opposite call end from the target recording audio source.

In addition, in an embodiment, the mixing module 30 is further configured to obtain a first non-silent audio in the first audio stream,

the mixed flow module 30 is further configured to obtain second non-silent audio in the second audio stream;

the mixing flow module 30 is further configured to mix the first non-silent audio and the second non-silent audio.

The audio recording apparatus provided in this embodiment records, by using a privileged application, sound of a microphone channel when a preset communication application is monitored to be in a call state, so as to generate a first audio stream; capturing a second audio stream sent by the opposite call terminal; and after the communication state is detected to be finished, mixing the first audio stream and the second audio stream to generate a target recording audio, so that the first audio stream of the local end is recorded through privileged application, and the second audio stream of the opposite end is directly captured, so that the influence of background noise is avoided, and the recording quality is improved.

In addition, the embodiment of the invention also provides a readable storage medium.

The readable storage medium has stored thereon an audio recording program, which when executed by the processor implements the steps of the audio recording method as described above.

The readable storage medium of the present invention may be a computer readable storage medium, and the specific implementation manner thereof is substantially the same as that of each embodiment of the audio recording method, and is not described herein again.

The present invention is described in connection with the accompanying drawings, but the present invention is not limited to the above embodiments, which are only illustrative and not restrictive, and those skilled in the art can make various changes without departing from the spirit and scope of the invention as defined by the appended claims, and all changes that come within the meaning and range of equivalency of the specification and drawings that are obvious from the description and the attached claims are intended to be embraced therein.

Claims

1. An audio recording method, characterized in that the audio recording method comprises the following steps:

when the preset communication application program is monitored to be in a call state, recording sound of a microphone channel through a privileged application to generate a first audio stream;

2. The audio recording method according to claim 1, wherein the step of recording the sound of the microphone channel by the privileged application to generate the first audio stream when the preset communication application is monitored to be in a call state comprises:

when the preset communication application program is monitored to be in a call state, judging whether a recording instruction is received;

3. The audio recording method according to claim 2, wherein the step of recording the sound of the microphone channel through the privileged application if the recording instruction is received specifically includes:

if the privileged application meeting the preset recording requirement exists currently, determining a target microphone channel occupied by the preset communication application program;

invoking the privileged application while occupying the target microphone channel to record microphone channel sound through the privileged application.

4. The audio recording method of claim 3, wherein after the step of determining whether the privileged application meeting the preset recording requirement currently exists, the method further comprises:

and receiving a third audio stream fed back by the opposite call terminal, and taking the third audio stream as the first audio stream.

5. The method of claim 3, wherein the step of determining whether a privileged application meeting the preset recording requirement currently exists comprises:

determining the privilege right of each privileged application in the terminal;

6. The audio recording method of claim 1, wherein the step of mixing the first audio stream and the second audio stream to generate the target recorded audio comprises:

obtaining a first non-silent audio in the first audio stream;

obtaining a second non-silent audio in the second audio stream;

mixing the first non-silent audio and the second non-silent audio to generate a target recorded audio.

7. The audio recording method of any of claims 1 to 6, wherein the step of mixing the first audio stream and the second audio stream to generate the target recorded audio further comprises:

8. An audio recording apparatus, comprising:

and the mixed flow module is used for mixing the first audio stream and the second audio stream after the end of the call state is detected so as to generate a target recorded audio.

9. An audio recording device, characterized in that the audio recording device comprises a memory, a processor and an audio recording program stored on the memory and executable on the processor, which audio recording program, when executed by the processor, implements the steps of the audio recording method according to any one of claims 1-7.

10. A readable storage medium, having stored thereon an audio recording program which, when executed by a processor, implements the steps of the audio recording method according to any one of claims 1-7.