CN113050918A

CN113050918A - Audio optimization method, device, equipment and storage medium based on remote double recording

Info

Publication number: CN113050918A
Application number: CN202110438130.3A
Authority: CN
Inventors: 余龙龙
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2021-06-29

Abstract

The invention relates to artificial intelligence and provides an audio optimization method, device and equipment based on remote double recording and a storage medium. The method can receive a remote double recording request; determining a first double recording terminal and a second double recording terminal according to the remote double recording request; acquiring user audio in the first double-recording terminal, and determining the acquisition time of the user audio; acquiring audio corresponding to the acquisition time from a voice library of the first double-recording terminal as broadcast audio; performing mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream containing the user audio and the broadcast audio; and sending the target audio stream to the second double recording terminal. According to the invention, after the user wears the external equipment, the other party in the remote double recording can still answer the text content broadcasted by the loudspeaker of the user terminal. Furthermore, the disclosure relates to blockchain techniques, in which the target audio stream may be stored.

Description

Audio optimization method, device, equipment and storage medium based on remote double recording

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an audio optimization method, device, equipment and storage medium based on remote double recording.

Background

In the whole process of intelligent remote double recording, for the user, in order to answer the audio content of the other party more clearly, external equipment such as earphones and the like is generally required to be worn, however, because the text broadcasting is broadcasted through the loudspeaker, the sound collection is collected through the microphone, and the loudspeaker and the microphone are integrated in the external equipment, therefore, when the user wears the external equipment such as the earphones, the microphone in the external equipment cannot collect the audio played by the loudspeaker in the external equipment, and the other party in the remote double recording cannot answer the text content broadcasted by the loudspeaker.

Therefore, how to optimize the audio content in the remote double recording so that the other party in the remote double recording can still listen to the text content broadcasted by the loudspeaker after the user wears the external device becomes a problem which needs to be solved urgently.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio optimization method, apparatus, device and storage medium based on remote double recording, wherein after a user wears an external device, another party in the remote double recording can still receive text content broadcasted by a speaker at a user end.

On one hand, the invention provides an audio optimization method based on remote double recording, which comprises the following steps:

receiving a remote double recording request;

determining a first double recording terminal and a second double recording terminal according to the remote double recording request;

acquiring user audio in the first double-recording terminal, and determining the acquisition time of the user audio;

acquiring audio corresponding to the acquisition time from a voice library of the first double-recording terminal as broadcast audio;

performing mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream containing the user audio and the broadcast audio;

and sending the target audio stream to the second double recording terminal.

According to a preferred embodiment of the present invention, the determining, according to the remote dubbing request, the first dubbing terminal and the second dubbing terminal includes:

analyzing the message of the remote double-recording request to obtain message information carried by the message;

acquiring information indicating a terminal from the message information as a terminal identifier;

searching a target identifier which is the same as the terminal identifier from a preset terminal list, and determining a terminal pointed by the target identifier as the first double-recording terminal;

determining the terminal identification except the target identification as a characteristic identification;

and determining the terminal pointed by the characteristic identifier as the second dubbing terminal.

According to a preferred embodiment of the present invention, the sending the target audio stream to the second dubbing terminal includes:

acquiring an audio format of the second double recording terminal;

coding and compressing the target audio stream according to the audio format to obtain a file to be sent;

and determining an address corresponding to the feature identifier as a terminal address of the second double recording terminal, and sending the file to be sent to the terminal address.

According to a preferred embodiment of the present invention, the acquiring the user audio in the first dubbing terminal includes:

monitoring an audio input module in the first double-recording terminal;

and when the audio input module is monitored to receive audio, extracting the user audio from the audio input module.

According to a preferred embodiment of the present invention, the determining the capturing time of the user audio includes:

acquiring an audio number of the user audio from the audio input module;

acquiring a log list from a preset log library, and extracting a log containing the audio number from the log list as a log of the user audio;

and extracting information indicating time from the log as the acquisition time.

According to a preferred embodiment of the present invention, the acquiring, as the broadcast audio, the audio corresponding to the collection time from the voice library of the first dubbing terminal includes:

calculating the time after a preset time period by taking the acquisition time as the starting time to be taken as the termination time;

and intercepting the audio broadcasted between the starting time and the ending time from the voice library as the broadcast audio.

According to a preferred embodiment of the present invention, the performing mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream including the user audio and the broadcast audio includes:

transcoding the user audio to obtain a first audio, and transcoding the broadcast audio to obtain a second audio;

sequentially extracting first node information from the audio head of the first audio according to a configuration time interval, and sequentially extracting second node information from the audio head of the second audio according to the configuration time interval;

and performing audio mixed flow processing on the first node information and the second node information by adopting a weighted average algorithm to obtain the target audio stream.

In another aspect, the present invention further provides an audio optimization apparatus based on remote double recording, where the audio optimization apparatus based on remote double recording includes:

a receiving unit, configured to receive a remote double-recording request;

a determining unit, configured to determine a first dubbing terminal and a second dubbing terminal according to the remote dubbing request;

the determining unit is further configured to acquire a user audio in the first dubbing terminal and determine an acquisition time of the user audio;

the acquisition unit is used for acquiring the audio corresponding to the acquisition time from the voice library of the first double-recording terminal as broadcast audio;

the processing unit is used for carrying out mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream containing the user audio and the broadcast audio;

and the sending unit is used for sending the target audio stream to the second double recording terminal.

In another aspect, the present invention further provides an electronic device, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the remote double recording-based audio optimization method.

In another aspect, the present invention further provides a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the remote double-recording-based audio optimization method.

According to the technical scheme, the information quantity of the broadcast audio can be reduced by determining the acquisition time of the user audio, so that the mixed flow efficiency of the user audio and the broadcast audio is improved, the target audio stream can be generated quickly, the generated target audio stream can contain the user audio and the broadcast audio by performing mixed flow processing on the user audio and the broadcast audio, and the target audio stream is sent to the second double recording terminal.

Drawings

Fig. 1 is a flow chart of a preferred embodiment of the remote double-recording-based audio optimization method of the present invention.

FIG. 2 is a flow chart of a preferred embodiment of the present invention for generating a target audio stream.

Fig. 3 is a functional block diagram of a preferred embodiment of the remote double-recording-based audio optimization device according to the present invention.

Fig. 4 is a schematic structural diagram of an electronic device implementing a remote double-recording-based audio optimization method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flow chart of a preferred embodiment of the remote double-recording-based audio optimization method according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The audio optimization method based on remote double recording is applied to one or more electronic devices, where the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored computer readable instructions, and hardware thereof includes, but is not limited to, microprocessors, Application Specific Integrated Circuits (ASICs), Programmable Gate arrays (FPGAs), Digital Signal Processors (DSPs), embedded devices, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), a smart wearable device, and the like.

The electronic device may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, an electronic device group consisting of a plurality of network electronic devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network electronic devices.

The network in which the electronic device is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

And S10, receiving the remote double-recording request.

In at least one embodiment of the invention, the remote bibliographic request may be generated by an agent trigger. The information carried in the remote double recording request comprises: a terminal identification, a tag indicating a terminal identification, etc.

In at least one embodiment of the invention, the method further comprises:

the electronic equipment detects whether the remote double recording request is legal or not, and if the remote double recording request is legal, the electronic equipment receives the remote double recording request.

Through right the remote double recording request is detected, can improve electronic equipment's security, avoid electronic equipment receives the invasion of malicious code.

And S11, determining the first dubbing terminal and the second dubbing terminal according to the remote dubbing request.

In at least one embodiment of the present invention, the first dubbing terminal may refer to a terminal device of an agent, and the second dubbing terminal may refer to a terminal device of an applicant.

In at least one embodiment of the present invention, the determining, by the electronic device, the first dubbing terminal and the second dubbing terminal according to the remote dubbing request includes:

The remote double recording request is a code, and in the remote double recording request, according to the writing principle of the code, the content between { } is called as the message.

The terminal identification comprises the identification of the binding terminal of the agent, and the terminal identification also comprises the identification of the binding terminal of the applicant.

The preset terminal list stores terminal identification codes of a plurality of agents.

By analyzing the message, the whole remote double-recording request does not need to be analyzed, so that the obtaining efficiency of message information can be improved, the target identification can be accurately determined through the terminal identification and the preset terminal list, the first double-recording terminal can be accurately determined by utilizing the mapping relation between the identification and the terminal, and in addition, the determination efficiency of the second double-recording terminal can be improved because each terminal identification does not need to be compared with all identifications in the preset terminal list.

In at least one embodiment of the invention, the method further comprises:

determining the terminal identification different from all the identifications in the preset terminal list as the characteristic identification;

and determining the terminal corresponding to the feature identifier as the second dubbing terminal.

Through the implementation mode, the second double recording terminal can be accurately determined.

And S12, acquiring the user audio in the first double-recording terminal, and determining the acquisition time of the user audio.

In at least one embodiment of the present invention, the user audio refers to audio generated by the user on the first dubbing terminal side.

The acquisition time refers to the time for acquiring the user audio.

In at least one embodiment of the present invention, the electronic device acquiring the user audio in the first dubbing terminal includes:

monitoring an audio input module in the first double-recording terminal;

Wherein the audio input module is capable of receiving audio, for example, the audio input module may be a microphone.

Through the embodiment, the user audio can be directly acquired from the audio input module, so that the acquisition efficiency of the user audio is improved.

In at least one embodiment of the present invention, the electronic device determining the capture moment of the user audio comprises:

acquiring an audio number of the user audio from the audio input module;

And the log list stores a plurality of logs corresponding to the audio input module. All logs in the log list record actions of collecting audio in the audio input module.

The target log can be quickly acquired from the log list through the audio number, and the acquisition time of the user audio can be accurately acquired through the target log.

And S13, acquiring the audio corresponding to the acquisition time from the voice library of the first double-recording terminal as the broadcast audio.

In at least one embodiment of the present invention, the broadcast audio refers to an audio generated when the first dual-recording terminal performs text voice broadcast.

A plurality of audios and broadcasting time of each audio are stored in the voice library.

In at least one embodiment of the present invention, the acquiring, by the electronic device, an audio corresponding to the collection time from a voice library of the first dubbing terminal as an broadcast audio includes:

The preset time period may be determined according to a network delay time of the electronic device, and the specific time of the preset time period is not limited in the present invention.

For example, the acquisition time is 7: 00, the preset time period is 10 minutes, and the starting time is calculated to be 7: 00, the termination time is 7: 10.

due to the fact that delay errors exist between the acquisition time of the user audio and the generation time of the user audio, the broadcast audio of the first double-recording terminal when the user audio is generated can be accurately and comprehensively acquired through the preset time period.

And S14, performing mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream containing the user audio and the broadcast audio.

It is emphasized that, to further ensure the privacy and security of the target audio stream, the target audio stream may also be stored in a node of a blockchain.

In at least one embodiment of the present invention, the target audio stream includes the user audio and the broadcast audio.

Fig. 2 is a flow chart of a preferred embodiment of generating a target audio stream according to the present invention, as shown in fig. 2. In at least one embodiment of the present invention, the electronic device performs mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream including the user audio and the broadcast audio includes:

s140, transcoding the user audio to obtain a first audio, and transcoding the broadcast audio to obtain a second audio;

s141, sequentially extracting first node information from the audio head of the first audio according to a configuration time interval, and sequentially extracting second node information from the audio head of the second audio according to the configuration time interval;

and S142, performing audio mixed flow processing on the first node information and the second node information by adopting a weighted average algorithm to obtain the target audio stream.

It should be noted that the smaller the configuration time interval is, the more accurate the first node information and the second node information acquired by the electronic device is, and relatively, the acquisition efficiency of the first node information and the second node information is reduced. Therefore, the configuration time interval can be set in a self-defining mode according to actual needs. The value of the configuration time interval is not particularly limited.

The user audio and the broadcast audio can be synchronously processed by sequentially acquiring first node information from the audio head of the first audio and sequentially acquiring second node information from the audio head of the second audio, and the target audio stream can be accurately generated by performing audio mixed flow processing on the first node information and the second node information.

And S15, sending the target audio stream to the second double recording terminal.

In at least one embodiment of the present invention, the transmitting, by the electronic device, the target audio stream to the second dubbing terminal includes:

acquiring an audio format of the second double recording terminal;

Wherein the audio format may be a WAV format.

And coding and compressing the target audio stream through the audio format, so that the analysis efficiency of the second double-recording terminal on the file to be sent is improved.

In at least one embodiment of the present invention, after the target audio stream is transmitted to the second dubbing terminal, the method further includes:

acquiring the number of the target audio stream as a target number;

generating feedback information according to the target number;

and sending the feedback information to the first dubbing terminal.

By the embodiment, the feedback information can be sent to the first double recording terminal in time after the target audio stream is sent to the second double recording terminal.

Fig. 3 is a functional block diagram of a preferred embodiment of the remote double-recording-based audio optimization device according to the present invention. The remote double-recording-based audio optimization device 11 includes a receiving unit 110, a determining unit 111, an obtaining unit 112, a processing unit 113, a transmitting unit 114, and a generating unit 115. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The receiving unit 110 receives a remote dubbing request.

In at least one embodiment of the present invention, the receiving unit 110 detects whether the remote dubbing request is legal, and if the remote dubbing request is legal, the receiving unit 110 receives the remote dubbing request.

Through right the remote double recording request is detected, the safety of the electronic equipment can be improved, and the electronic equipment is prevented from receiving the invasion of malicious codes.

The determining unit 111 determines the first dubbing terminal and the second dubbing terminal according to the remote dubbing request.

In at least one embodiment of the present invention, the determining unit 111 determines that the first dubbing terminal and the second dubbing terminal according to the remote dubbing request includes:

In at least one embodiment of the present invention, the determining unit 111 determines the terminal identifier different from all the identifiers in the preset terminal list as the feature identifier;

The determining unit 111 collects a user audio in the first dubbing terminal and determines a collection time of the user audio.

The acquisition time refers to the time for acquiring the user audio.

In at least one embodiment of the present invention, the determining unit 111 acquiring the user audio in the first dubbing terminal includes:

monitoring an audio input module in the first double-recording terminal;

In at least one embodiment of the present invention, the determining unit 111 determines the capturing time of the user audio by:

acquiring an audio number of the user audio from the audio input module;

The obtaining unit 112 obtains the audio corresponding to the collecting time from the voice library of the first dubbing terminal as the broadcast audio.

In at least one embodiment of the present invention, the acquiring unit 112 acquires, as the broadcast audio, the audio corresponding to the capturing time from the voice library of the first dubbing terminal, including:

The processing unit 113 performs mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream containing the user audio and the broadcast audio.

In at least one embodiment of the present invention, the processing unit 113 performs mixed flow processing on the user audio and the broadcast audio to obtain a target audio stream including the user audio and the broadcast audio includes:

It should be noted that, the smaller the configuration time interval is, the more accurate the first node information and the second node information acquired by the processing unit 113 is, and relatively, the acquisition efficiency of the first node information and the second node information is reduced. Therefore, the configuration time interval can be set in a self-defining mode according to actual needs. The value of the configuration time interval is not particularly limited.

The transmitting unit 114 transmits the target audio stream to the second dubbing terminal.

In at least one embodiment of the present invention, the sending unit 114 sending the target audio stream to the second dubbing terminal includes:

acquiring an audio format of the second double recording terminal;

Wherein the audio format may be a WAV format.

In at least one embodiment of the present invention, after the target audio stream is sent to the second dubbing terminal, the obtaining unit 112 obtains a number of the target audio stream as a target number;

the generating unit 115 generates feedback information according to the target number;

the sending unit 114 sends the feedback information to the first dubbing terminal.

In one embodiment of the present invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions stored in the memory 12 and executable on the processor 13, such as a remote double-recording based audio optimization program.

It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and that it may comprise more or less components than shown, or some components may be combined, or different components, e.g. the electronic device 1 may further comprise an input output device, a network access device, a bus, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the electronic device 1, and is connected to each part of the whole electronic device 1 by various interfaces and lines, and executes an operating system of the electronic device 1 and various installed application programs, program codes, and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be divided into a receiving unit 110, a determining unit 111, an obtaining unit 112, a processing unit 113, a transmitting unit 114, and a generating unit 115.

The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

With reference to fig. 1, the memory 12 in the electronic device 1 stores computer readable instructions to implement a remote double-recording-based audio optimization method, and the processor 13 can execute the computer readable instructions to implement:

receiving a remote double recording request;

and sending the target audio stream to the second double recording terminal.

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

receiving a remote double recording request;

and sending the target audio stream to the second double recording terminal.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. The audio optimization method based on remote double recording is characterized by comprising the following steps:

receiving a remote double recording request;

and sending the target audio stream to the second double recording terminal.

2. The remote double recording-based audio optimization method of claim 1, wherein determining the first double recording terminal and the second double recording terminal according to the remote double recording request comprises:

3. The remote dubbing-based audio optimization method of claim 2, wherein the sending the target audio stream to the second dubbing terminal comprises:

acquiring an audio format of the second double recording terminal;

4. The remote double recording-based audio optimization method of claim 1, wherein the acquiring the user audio in the first double recording terminal comprises:

monitoring an audio input module in the first double-recording terminal;

5. The remote double recording-based audio optimization method of claim 4, wherein the determining the capture moment of the user audio comprises:

acquiring an audio number of the user audio from the audio input module;

6. The method for optimizing audio based on remote double recording according to claim 1, wherein the obtaining the audio corresponding to the collecting time from the voice library of the first double recording terminal as the broadcast audio comprises:

7. The method of claim 1, wherein the mixing the user audio and the broadcast audio to obtain a target audio stream including the user audio and the broadcast audio comprises:

8. An apparatus for remote double-recording-based audio optimization, the apparatus comprising:

a receiving unit, configured to receive a remote double-recording request;

9. An electronic device, characterized in that the electronic device comprises:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the remote double recording-based audio optimization method of any one of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer readable storage medium stores computer readable instructions which are executed by a processor in an electronic device to implement the remote double-recording-based audio optimization method according to any one of claims 1 to 7.