WO2021179321A1

WO2021179321A1 - Audio data processing method, electronic device and computer-readable storage medium

Info

Publication number: WO2021179321A1
Application number: PCT/CN2020/079342
Authority: WO
Inventors: 周事成; 薛政
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2021-09-16
Also published as: CN112771880A

Abstract

An audio data processing method, an electronic device and a computer-readable storage medium. The audio data processing method comprises: acquiring a plurality of audio coding segments, the plurality of audio coding segments being recorded and stored in segments by an audio and video device (S21); splicing the plurality of audio coding segments to obtain spliced data (S22); and decoding the spliced data to obtain audio synthesis data of the plurality of audio coding segments (S23). By splicing a plurality of audio coding segments to obtain spliced data, and decoding the spliced data, the invention can avoid, when the plurality of audio coding segments are respectively decoded, missing of decoding data due to the fact that a decoding algorithm depends on characteristics of previous and next frame data, and thus avoid audio distortion caused by discontinuity of obtained audio synthesis data at the splicing positions.

Description

Audio data processing method, electronic equipment and computer readable storage medium

Technical field

The present disclosure relates to the field of computer technology, and more specifically, to an audio data processing method, an electronic device, and a computer-readable storage medium.

Background technique

During the audio and video recording process, in order to reduce the damage of the video file caused by the failure of the recording equipment and other factors, the recorded audio and video files are usually saved in segments. Segmented storage can ensure that during the audio and video recording process, even if the audio and video files are damaged, the damaged file is only one segment, not all the audio and video files.

When using segmented audio and video files, for example, during editing or playback, the segmented audio and video files need to be synthesized to obtain a complete audio and video file. In the synthesis process, audio files and video files need to be synthesized separately. Among them, in the audio file synthesis process, the audio decoding algorithm will cause the synthesized audio file to have audio discontinuity, which will cause the synthesized audio file to have a freeze or abnormal hearing.

Therefore, how to synthesize segmented audio files into complete and lossless audio files is an urgent problem to be solved.

Summary of the invention

According to a first aspect of the present disclosure, there is provided an audio data processing method, including: acquiring a plurality of audio coding segments, the plurality of audio coding segments are recorded by audio and video equipment and saved in segments; The audio coding segments are spliced to obtain spliced data; the spliced data is decoded to obtain audio synthesis data of the multiple audio coding segments.

In the embodiments of the present disclosure, by splicing multiple audio encoding segments to obtain spliced data, and decoding the spliced data, it is possible to avoid that when multiple audio encoding segments are decoded separately, the decoding algorithm depends on the characteristics of the preceding and following frame data. The resulting lack of decoded data, thereby avoiding audio distortion caused by discontinuity of the obtained audio synthesis data at the splicing place.

According to a second aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions Performing an audio data processing method, the audio data processing method includes: acquiring a plurality of audio encoding segments, the plurality of audio encoding segments are recorded by audio and video equipment and stored in segments; Splicing to obtain spliced data; decoding the spliced data to obtain audio synthesis data of the audio coding segment.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, wherein the computer program is characterized in that, when the computer program is executed by a processor, the audio system described in the first aspect of the embodiments of the present disclosure is implemented. Data processing method.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings needed in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only some of the present disclosure. Embodiments, for those of ordinary skill in the art, without creative work, other drawings can be obtained based on these drawings.

Fig. 1 is an architecture diagram of an audio data processing system according to an exemplary embodiment of the present disclosure;

Fig. 2 is a flowchart of an audio data processing method according to an exemplary embodiment of the present disclosure;

Fig. 3 is a sub-flow chart of step S12 in an exemplary embodiment of the present disclosure;

Fig. 4 is a sub-flow chart of step S11 in an exemplary embodiment of the present disclosure;

Fig. 5 is a sub-flow chart of step S12 in an exemplary embodiment of the present disclosure;

Fig. 6 is a flowchart of an audio data processing method according to an exemplary embodiment of the present disclosure;

Fig. 7 is a sub-flow chart of step S111 in an exemplary embodiment of the present disclosure;

Fig. 8 is a sub-flow chart of step S111 in an exemplary embodiment of the present disclosure;

Fig. 9 is a schematic diagram of an audio data processing method according to an exemplary embodiment of the present disclosure;

Fig. 10 is an effect comparison diagram of an audio data processing method according to another exemplary embodiment of the present disclosure;

FIG. 11 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concept of the example embodiments will be fully conveyed To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail in order to avoid overwhelming the crowd and obscure all aspects of the present disclosure.

In addition, the drawings are only schematic illustrations of the present disclosure, and the same reference numerals in the drawings denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

The exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is an architectural diagram of an audio data processing system according to an exemplary embodiment of the present disclosure.

As shown in FIG. 1, the terminal device 101 and the terminal device 102 are connected to the server 105 through a network 104, and the audio and video device 103 is connected to the terminal device 101 and the terminal device 102 through the network 104. The

terminal devices

101 and 102 may be, for example, but not limited to, mobile phones, computers, tablet computers, handheld terminals, and the like. The server 105 may be a server that provides various services, for example, a background management server that provides support for the audio data processing system operated by the user using the terminal devices 101 and 102 (just an example). The back-end server can analyze and process the received multiple audio coding segments and other data, and feed back the processing result (for example, audio synthesis data—just an example) to the terminal device.

The terminal device 101 (or the terminal device 102) can, for example, obtain multiple audio coding segments, which are recorded by the audio and video device 103 and saved in segments; the terminal device 101 can, for example, The two audio encoding segments are spliced to obtain spliced data; the terminal device 101 may, for example, decode the spliced data to obtain audio synthesis data of the multiple audio encoding segments.

The terminal device 101 (may also be the terminal device 102) can obtain multiple audio and video mixed stream data segments from the audio and video device 103, and perform audio extraction on the multiple audio and video mixed stream data segments respectively to obtain the multiple audio coded segments.

The terminal device 101 (or the terminal device 102) can receive an editing instruction sent by a target object, the editing instruction includes editing object information; acquiring multiple audio and video mixed stream data segments corresponding to the editing object information from the audio and video device 103, the multiple The audio and video mixed stream data segment is recorded by the audio and video device 103 and saved in segments.

The terminal device 101 (may also be the terminal device 102) can receive the download instruction sent by the target object; download multiple audio and video mixed stream data segments corresponding to the download instruction from the audio and video device 103 and save them locally.

The server 105 may be an entity server, or may be composed of multiple servers, for example, a part of the server 105 may be used as an audio data processing task submission system in the present disclosure for obtaining tasks that will execute audio data processing commands; and A part of the server 105 can also be used, for example, as the audio data processing system in the present disclosure, for obtaining multiple audio coding segments, the multiple audio coding segments are recorded by audio and video equipment and saved in segments; The multiple audio encoding segments are spliced to obtain spliced data; the spliced data is decoded to obtain audio synthesis data of the multiple audio encoding segments.

The terminal device 101/terminal device 102 and the audio and video device 103 can transmit data through wireless transmission, such as WiFi, Bluetooth, zigbee, and so on. The terminal device 101/the terminal device 102 and the server 105 can communicate with each other through traditional 4G, 5G, WiFi, or the Internet.

Fig. 2 is a flowchart of an audio data processing method according to an exemplary embodiment of the present disclosure. The audio data processing method provided by the embodiments of the present disclosure can be executed by any electronic device with computing and processing capabilities, such as the

terminal devices

101 and 102 and/or the server 105. As shown in FIG. 2, the audio data processing method 20 provided by the embodiment of the present disclosure may include:

Step S21: Obtain multiple audio coding segments, the multiple audio coding segments are recorded by the audio and video equipment and stored in segments.

Step S22, splicing multiple audio coding segments to obtain spliced data.

Step S23: Decoding the spliced data to obtain audio synthesis data of multiple audio coding segments.

In the embodiments of the present disclosure, multiple audio coding segments can be obtained by downloading after communicating with the audio and video equipment. The process of communicating and downloading with the audio and video equipment can be triggered by a local designated command, or can be triggered by a clock cycle. , It can also be triggered by a download instruction actively sent by the audio and video device, which is not specifically limited in the present disclosure. The audio and video equipment can be, for example, but not limited to, video recording equipment, unmanned aerial vehicles (equipped with cameras), and the like. The encoding format of the audio encoding segment may be one of Adaptive Multi-Rate (AMR), Advanced Audio Coding (AAC), OPUS audio encoding format, etc., which are not included in this disclosure. No special restrictions.

In step S22, the splicing method for splicing multiple audio coding segments may be end-to-end splicing.

In step S23, the decoding mode may be an audio decoding mode, wherein the audio decoding mode corresponding to the audio coding format can be selected according to the different audio coding format.

In the audio data processing method of the embodiment of the present disclosure, by splicing multiple audio encoding segments to obtain spliced data, and decoding the spliced data, it can avoid that when multiple audio encoding segments are decoded separately, the decoding algorithm depends on the preceding and following frames. The lack of decoded data caused by the characteristics of the data, thereby avoiding audio distortion caused by discontinuity in the splicing of the obtained audio synthesis data.

Fig. 3 is a sub-flow chart of step S22 in an exemplary embodiment of the present disclosure.

As shown in Fig. 3, in one embodiment, step S22 may include:

Step S221: Sort the multiple audio coding segments according to the arrangement information of the multiple audio coding segments.

In step S222, the multiple audio coding segments are spliced head and tail according to the sorting result to obtain spliced data.

In the embodiment of the present disclosure, the arrangement information of the multiple audio coding segments may be one of recording time information and number information of the multiple audio coding segments. The recording time information of the audio coding segment refers to the recording time information of the start position, the end position, or the designated middle position of the audio coding segment during the recording process. The number information of the audio code segment refers to the number information added by the recording device according to the recording order of the audio code segment during the recording process.

In an exemplary embodiment of the present disclosure, the arrangement information of the multiple audio encoding segments is recording time information, and step S221 may include: sorting the multiple audio encoding segments in chronological order according to the recording time information of the multiple audio encoding segments .

In an exemplary embodiment of the present disclosure, the arrangement information of the multiple audio encoding segments is number information, and step S221 may include: sorting the multiple audio encoding segments in a number order according to the number information of the multiple audio encoding segments.

Fig. 4 is a sub-flow chart of step S21 in an exemplary embodiment of the present disclosure.

As shown in Fig. 4, in one embodiment, step S21 may include:

Step S211: Obtain multiple audio and video mixed stream data segments.

Step S212: Perform audio extraction on multiple audio and video mixed stream data segments respectively to obtain multiple audio coded segments.

In the embodiment of the present disclosure, multiple audio and video mixed stream data segments may be recorded by audio and video equipment and saved in segments. For example, for audio and video data with a large storage space, in order to reduce the damage of the video file due to factors such as recording equipment failure, it can be segmented during the recording process to obtain multiple audio and video mixed stream data segments. When processing audio and video mixed stream data segments, it is usually necessary to separate the video track (Video Track) and audio track (Audio Track) in multiple audio and video mixed stream data segments to decode and decode the video track and audio track separately. Splicing. Multiple audio and video mixed stream data segments can be downloaded after communicating with audio and video equipment. The process of communicating and downloading with audio and video equipment can be triggered by a local designated command, or triggered by a clock cycle, or by audio The download instruction actively sent by the video device triggers execution, and this disclosure does not specifically limit this. The audio and video equipment can be, for example, but not limited to, video recording equipment, unmanned aerial vehicles (equipped with cameras), and the like.

In step S212, after audio extraction is performed on each audio-video mixed stream data segment, the audio coding segment of each audio-video mixed stream data segment can be obtained. Among them, the multiple audio coding segments may be, for example, x1(t), x2(t), x3(t), etc., 0<t<T, and T is the segment period.

Fig. 9 is a schematic diagram of an audio data processing method according to an exemplary embodiment of the present disclosure. As shown in Figure 9, in steps S22 and S23 based on this embodiment, multiple audio coding segments x1(t), x2(t), x3(t), etc. are spliced to obtain spliced data x(t), The spliced data x(t) is decoded to obtain audio synthesis data y(t) of multiple audio coding segments x1(t), x2(t), x3(t), etc.

Fig. 5 is a sub-flow chart of step S22 in an exemplary embodiment of the present disclosure.

As shown in FIG. 5, in one embodiment, step S22 may include:

Step S51: Determine the arrangement information of the multiple audio coding segments according to the arrangement information of the multiple audio and video mixed stream data segments.

Step S52: Sort the multiple audio coding segments according to the arrangement information of the multiple audio coding segments.

In step S53, the multiple audio coding segments are spliced head to tail according to the sorting result to obtain spliced data.

In some of the embodiments of the present disclosure, the arrangement information of multiple audio and video mixed stream data segments is one of recording time information and number information.

Among them, the recording time information of the audio and video mixed stream data segment refers to the recording time information of the start position, the end position, or a designated position in the middle of the audio and video mixed stream data segment during the recording process. The number information of the audio and video mixed stream data segment refers to the number information added by the recording device according to the recording order of the audio and video mixed stream data segment during the recording process.

In step S51, the arrangement information of each audio-video mixed stream data segment may be used as the arrangement information of the audio coded segment obtained by audio extraction of each audio-video mixed stream data segment. For example, audio extraction is performed on the audio and video mixed stream data segment A to obtain the audio encoding end a, and the arrangement information of the audio and video mixed stream data segment A is used as the arrangement information of the audio encoding segment a.

In step S52, when the arrangement information is recording time information, the multiple audio and video mixed stream data segments may be sorted in chronological order according to the recording time information of the multiple audio and video mixed stream data segments. When the arrangement information is number information, the multiple audio and video mixed stream data segments can be sorted according to the number order according to the number information of the multiple audio and video mixed stream data segments.

Fig. 6 is a flowchart of an audio data processing method according to an exemplary embodiment of the present disclosure.

As shown in FIG. 6, the audio data processing method based on the foregoing embodiment may further include:

Step S61: Perform video extraction on multiple audio and video mixed stream data segments respectively to obtain multiple video coded segments.

Step S62: Generate video synthesis data according to multiple video coding segments.

Step S63: Generate audio and video synthesized data according to the audio synthesized data and the video synthesized data.

In the embodiment of the present disclosure, in step S62, multiple video coding segments may be spliced to generate video synthesis data.

Fig. 7 is a sub-flow chart of step S211 in an exemplary embodiment of the present disclosure.

As shown in FIG. 7, in an embodiment, step S211 may include:

Step S2111: Receive an editing instruction sent by the target object, where the editing instruction includes editing object information.

Step S2112: Obtain multiple audio and video mixed stream data segments corresponding to the editing object information from the audio and video equipment. The multiple audio and video mixed stream data segments are recorded by the audio and video equipment and stored in segments.

In the embodiment of the present disclosure, in step S2111, the target object may be, for example, an operation object of the execution device of the audio data processing method of the embodiment of the present disclosure. For example, when the execution device of the audio data processing method of the embodiment of the present disclosure is the

terminal device

101 or 102, the target object may be the operating user of the

terminal device

101 or 102. When the operating user of the

terminal device

101 or 102 performs a preset operation on the

terminal device

101 or 102, the

terminal device

101 or 102 may generate an editing instruction according to the preset operation. For another example, when the execution device of the audio data processing method of the embodiment of the present disclosure is the server 105, and the server 105 provides a background management server supporting the

terminal device

101 or 102, the target object may be the operating user of the

terminal device

101 or 102. When an operating user of the

terminal device

101 or 102 performs a preset operation on the

terminal device

101 or 102, the

terminal device

101 or 102 can generate an editing instruction according to the preset operation and send it to the server 105 via the network 104.

The editing object information is used to determine multiple audio and video mixed stream data segments, and the editing object information may be identification information, storage address information, etc. of the multiple audio and video mixed stream data segments. For example, if the audio and video equipment records and saves multiple audio and video mixed stream data segments z1(t), z2(t), z3(t)... the identification information is z, then the editing object information can include identification information z . For another example, if the storage address information of multiple audio and video mixed stream data segments z1(t), z2(t), z3(t)... is C:\download, the editing object information may include the storage address information C:\ download.

In step S2112, the audio and video equipment may be, for example, but not limited to, an unmanned aerial vehicle (equipped with a camera), a video recorder, and the like. The execution device of the audio data processing method of the embodiment of the present disclosure may communicate with the audio and video device to obtain multiple audio and video mixed stream data segments corresponding to the editing object information through the communication interface.

In step S2112, when the audio and video equipment records and saves in segments to obtain multiple audio and video mixed stream data segments, the recorded audio and video can be segmented according to the segment period to obtain multiple audio and video mixed stream data segments.

In an exemplary embodiment of the present disclosure, the audio data processing method may further include: performing an editing operation on the audio and video synthesized data in response to the editing instruction to obtain the edited audio and video synthesized data. Among them, the editing instructions are used to perform audio and video editing operations on the audio and video synthesized data. The editing operation included in the editing instruction may be the audio data processing method of the embodiment of the present disclosure. For example, the editing instruction is a splicing instruction, and the terminal device obtains the audio and video mixed stream data segment corresponding to the editing object information input by the user from the audio and video equipment according to the splicing instruction input by the user, and obtains multiple audio and video mixed streams corresponding to the editing object information. After the data segment, the audio data processing method of the embodiment of the present disclosure can be executed on the multiple audio and video mixed stream data segments according to the editing instruction to obtain the spliced audio and video synthesis data. In completing the splicing process of the present application, the audio and video synthesis data can also be processed according to other editing operations input by the user, and the other editing operations can be color correction, cutting, speed changing, reverse playback, copying, and so on. For example, after multiple audio and video mixed stream data segments are spliced, the audio and video synthesized data can be color-tuned, cut, shifted, reversed, and copied according to other editing instructions to obtain the edited audio and video synthesized data.

Fig. 8 is a sub-flow chart of step S211 in an exemplary embodiment of the present disclosure.

As shown in FIG. 8, in an embodiment, step S211 may include:

Step S81, receiving a download instruction sent by the target object.

Step S82: Download multiple audio and video mixed stream data segments corresponding to the download instruction from the audio and video equipment and save them locally.

In the embodiment of the present disclosure, in S81, the target object may be, for example, an operation object of the execution device of the audio data processing method of the embodiment of the present disclosure. For example, when the execution device of the audio data processing method of the embodiment of the present disclosure is the

terminal device

101 or 102, the target object may be the operating user of the

terminal device

101 or 102. When the operating user of the

terminal device

101 or 102 performs a preset operation on the

terminal device

101 or 102, the

terminal device

101 or 102 may generate a download instruction according to the preset operation. For another example, when the execution device of the audio data processing method of the embodiment of the present disclosure is the server 105, and the server 105 provides a background management server supporting the

terminal device

101 or 102, the target object may be the operating user of the

terminal device

101 or 102. When the operating user of the

terminal device

101 or 102 performs a preset operation on the

terminal device

101 or 102, the

terminal device

101 or 102 may generate a download instruction according to the preset operation and send it to the server 105 via the network 104. Wherein, multiple audio and video mixed stream data segments may be obtained by downloading after communicating with audio and video equipment, which is not specifically limited in the present disclosure. In the embodiment of the present disclosure, in step S81, the audio and video equipment includes, but is not limited to, an unmanned aerial vehicle (equipped with a camera), a video recorder, and the like.

In an exemplary embodiment of the present disclosure, the format of the multiple audio encoding segments is one of the AMR audio encoding format, the AAC audio encoding format, and the OPUS audio encoding format.

In an exemplary embodiment of the present disclosure, an editing instruction sent by the target object may also be received, and the editing instruction includes editing object information. In response to the editing instruction, an editing operation is performed on the audio and video synthesized data corresponding to the editing object information to obtain the edited audio and video synthesized data. Editing operations can be, for example, but not limited to, splicing, toning, cutting, shifting, rewinding, copying, etc.

Fig. 10 is an effect comparison diagram of an audio data processing method according to another exemplary embodiment of the present disclosure.

As shown in FIG. 10, in one embodiment, taking the audio coding section of the AMR audio coding format as an example, the original audio data is 16kHz, 16bits single-channel data, and the step size is 20ms, and the mode 8 configuration is adopted. That is, the original data is encoded once every 20ms of data and encoded into 61 binary data. Fig. 10(a) is a time-frequency diagram of original audio data of an exemplary embodiment of the present disclosure. The original audio data is single-frequency data with a duration of 1s, corresponding to 50 frames. After the original audio data is encoded into data in the AMR audio encoding format, it is recorded as amr0, and the first 0.5s encoded data is separated from the last 0.5s encoded data, and they are respectively recorded as audio encoding segment amr1 and audio encoding segment amr2. The audio synthesis data 101 obtained by combining the decoded data (Pulse-code Modulation, PCM, pulse code modulation) decoded independently of the audio encoding segment amr1 and the audio encoding segment amr2 is shown in FIG. 10(b). The audio coding segment amr1 and the audio coding segment amr2 are spliced according to the audio data processing method of the embodiment of the present disclosure and then decoded to obtain the audio synthesis data 102 as shown in FIG. 11(c). Comparing Figs. 10(b) and (c), it can be seen that the audio synthesis data obtained after splicing and decoding shown in Fig. 10(c) is closer to the original audio data. The audio synthesis data obtained by decoding and splicing shown in Figure 10(b) has huge distortion compared to Figure 10(c). This is because the AMR codec will use historical information, that is, the codec of the current frame will use the previous The data information of one frame or even longer ago, and because the audio is stored in segments, the device decodes the audio of each segment separately, and the first audio data of the next segment cannot be obtained when the previous audio data is decoded. Therefore, when decoding the first audio data of the next frame, there will be a problem of decoding errors. According to the embodiments of the present disclosure, the audio segments are spliced first, and then decoded after splicing. The error problem of segmented decoding appears, and the decoding is more reliable.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided. Electronic devices may include, but are not limited to, smart phones, tablet computers, portable computers, desktop computers, wearable devices, virtual reality devices, smart homes, etc., for example.

Those skilled in the art can understand that various aspects of the present invention can be implemented as a system, a method, or a program product. Therefore, various aspects of the present invention can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which may be collectively referred to herein as "Circuit", "Module" or "System".

As shown in FIG. 11, the electronic device 1100 may include:

Processor 1110;

And a memory 1120, configured to store executable instructions of the processor 1110;

Wherein, the processor 1110 is configured to execute an audio data processing method by executing the executable instruction, and the audio data processing method includes: acquiring a plurality of audio coding segments, and the plurality of audio coding segments are recorded by an audio and video device And saved in segments; splicing the multiple audio coding segments to obtain spliced data; decoding the spliced data to obtain audio synthesis data of the audio coding segment.

In an exemplary embodiment of the present disclosure, splicing the multiple audio encoding segments to obtain spliced data includes: sorting the multiple audio encoding segments according to arrangement information of the multiple audio encoding segments; According to the sorting result, the multiple audio coding segments are spliced end to end to obtain spliced data.

In an exemplary embodiment of the present disclosure, the arrangement information of the plurality of audio encoding segments is recording time information, and the multiple audio encoding segments are performed according to the arrangement information of the plurality of audio encoding segments. The sorting includes: sorting the multiple audio coding segments in a time sequence according to the recording time information of the multiple audio coding segments.

In an exemplary embodiment of the present disclosure, the arrangement information of the plurality of audio encoding segments is number information, and the plurality of audio encoding segments are sorted according to the arrangement information of the plurality of audio encoding segments The method includes: sorting the multiple audio coding segments in a number order according to the number information of the multiple audio coding segments.

In an exemplary embodiment of the present disclosure, obtaining a plurality of audio coding segments includes: obtaining a plurality of audio and video mixed stream data segments; respectively performing audio extraction on the plurality of audio and video mixed stream data segments to obtain the multiple audio codes part.

In an exemplary embodiment of the present disclosure, splicing the multiple audio encoding segments to obtain spliced data includes: determining the number of the multiple audio encoding segments according to the arrangement information of the multiple audio and video mixed stream data segments Arrangement information; sort the multiple audio encoding segments according to the arrangement information of the multiple audio encoding segments; perform end-to-end splicing on the multiple audio encoding segments according to the sorting result to obtain splicing data.

In an exemplary embodiment of the present disclosure, the arrangement information of the multiple audio and video mixed stream data segments is one of recording time information and number information.

In an exemplary embodiment of the present disclosure, the audio data processing method further includes: respectively performing video extraction on the multiple audio and video mixed stream data segments to obtain multiple video encoding segments; Generate video synthesis data; generate audio and video synthesis data according to the audio synthesis data and the video synthesis data.

In an exemplary embodiment of the present disclosure, acquiring multiple audio and video mixed stream data segments includes: receiving an editing instruction sent by a target object through a user interface, where the editing instruction includes editing object information; communicating with an audio and video device through communication The interface obtains the multiple audio and video mixed stream data segments corresponding to the editing object information from the audio and video equipment, and the multiple audio and video mixed stream data segments are recorded by the audio and video equipment and saved in segments.

Among them, the user interface may be, for example, but not limited to, a touch screen, a physical button, a microphone, and so on.

In an exemplary embodiment of the present disclosure, the audio data processing method further includes: performing an editing operation on the audio and video synthesized data in response to the editing instruction to obtain the edited audio and video synthesized data.

In an exemplary embodiment of the present disclosure, acquiring multiple audio and video mixed stream data segments includes: pre-downloading the multiple audio and video mixed stream data segments from the audio and video device and save them locally.

In an exemplary embodiment of the present disclosure, the format of the plurality of audio encoding segments is one of an AMR audio encoding format, an AAC audio encoding format, and an OPUS audio encoding format.

In the electronic device of the embodiment of the present disclosure, the processor obtains the spliced data by splicing multiple audio encoding segments, and decodes the spliced data, which can avoid that when multiple audio encoding segments are decoded separately, the decoding algorithm depends on the preceding and following frames. The lack of decoded data caused by the characteristics of the data, thereby avoiding audio distortion caused by discontinuity in the splicing of the obtained audio synthesis data.

It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.

In the exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium on which is stored a program product capable of implementing the above-mentioned method of this specification. In some possible implementation manners, various aspects of the present invention may also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to enable the The terminal device executes the steps according to various exemplary embodiments of the present invention described in the above-mentioned "Exemplary Method" section of this specification.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described in accordance with the function. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present disclosure.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit may refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed electronic device, computer-readable storage medium, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present disclosure.

In addition, the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present disclosure is essentially or a part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

The above are only specific implementations of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present disclosure. Modifications or replacements, these modifications or replacements should be covered within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

An audio data processing method, characterized in that it comprises:

Acquiring multiple audio coding segments, the multiple audio coding segments are recorded by audio and video equipment and saved in segments;

Splicing the multiple audio coding segments to obtain spliced data;

Decoding the spliced data to obtain audio synthesis data of the multiple audio coding segments.
The method according to claim 1, wherein the splicing of the multiple audio coding segments to obtain spliced data comprises:

Sorting the multiple audio coding segments according to the arrangement information of the multiple audio coding segments;

According to the sorting result, the multiple audio coding segments are spliced end to end to obtain spliced data.
The method according to claim 2, wherein the arrangement information of the plurality of audio encoding segments is recording time information, and the plurality of audio encodings are encoded according to the arrangement information of the plurality of audio encoding segments The segments to be sorted include:

Sort the multiple audio encoding segments in a time sequence according to the recording time information of the multiple audio encoding segments.
The method according to claim 2, wherein the arrangement information of the plurality of audio encoding segments is number information, and the arrangement of the plurality of audio encoding segments is performed according to the arrangement information of the plurality of audio encoding segments. Sorting includes:

Sorting the multiple audio coding segments in a number order according to the number information of the multiple audio coding segments.
The method according to claim 1, wherein obtaining a plurality of audio coding segments comprises:

Obtain multiple audio and video mixed stream data segments;

Perform audio extraction on the multiple audio and video mixed stream data segments respectively to obtain the multiple audio coding segments.
The method according to claim 5, wherein the splicing of the multiple audio coding segments to obtain spliced data comprises:

Determining the arrangement information of the multiple audio coding segments according to the arrangement information of the multiple audio and video mixed stream data segments;

Sorting the multiple audio coding segments according to the arrangement information of the multiple audio coding segments;

According to the sorting result, the multiple audio coding segments are spliced end to end to obtain spliced data.
The method according to claim 6, wherein the arrangement information of the multiple audio and video mixed stream data segments is one of recording time information and number information.
The method according to claim 5, further comprising:

Performing video extraction on the multiple audio and video mixed stream data segments respectively to obtain multiple video encoding segments;

Generating video synthesis data according to the multiple video coding segments;

The audio and video synthesis data is generated according to the audio synthesis data and the video synthesis data.
The method according to claim 8, wherein obtaining multiple audio and video mixed stream data segments comprises:

Receiving an editing instruction sent by the target object, where the editing instruction includes editing object information;

Obtain the multiple audio and video mixed stream data segments corresponding to the editing object information from the audio and video equipment, where the multiple audio and video mixed stream data segments are recorded by the audio and video equipment and stored in segments.
The method according to claim 9, further comprising:

In response to the editing instruction, an editing operation is performed on the audio and video synthesized data to obtain the edited audio and video synthesized data.
The method according to claim 5, wherein acquiring multiple audio and video mixed stream data segments comprises:

Receive the download instruction sent by the target object;

Download the multiple audio and video mixed stream data segments corresponding to the download instruction from the audio and video device and save them locally.
The method according to any one of claims 1-11, wherein the format of the multiple audio coding segments is one of an AMR audio coding format, an AAC audio coding format, and an OPUS audio coding format.
An electronic device, characterized in that it comprises:

Processor; and

A memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute an audio data processing method by executing the executable instruction, and the audio data processing method includes:

Acquiring multiple audio coding segments, the multiple audio coding segments are recorded by audio and video equipment and saved in segments;

Splicing the multiple audio coding segments to obtain spliced data;

Decoding the spliced data to obtain audio synthesis data of the audio coding segment.
The electronic device according to claim 13, wherein the splicing of the multiple audio coding segments to obtain spliced data comprises:

Sorting the multiple audio coding segments according to the arrangement information of the multiple audio coding segments;

According to the sorting result, the multiple audio coding segments are spliced end to end to obtain spliced data.
The electronic device according to claim 14, wherein the arrangement information of the plurality of audio encoding segments is recording time information, and the arrangement of the plurality of audio encoding segments is performed according to the arrangement information of the plurality of audio encoding segments. The coding segment to be sorted includes:

Sort the multiple audio encoding segments in a time sequence according to the recording time information of the multiple audio encoding segments.
The electronic device according to claim 14, wherein the arrangement information of the plurality of audio encoding segments is number information, and the plurality of audio encodings are encoded according to the arrangement information of the plurality of audio encoding segments The segments to be sorted include:

Sorting the multiple audio coding segments in a number order according to the number information of the multiple audio coding segments.
The electronic device according to claim 13, wherein acquiring a plurality of audio coding segments comprises:

Obtain multiple audio and video mixed stream data segments;

Perform audio extraction on the multiple audio and video mixed stream data segments respectively to obtain the multiple audio coding segments.
The electronic device according to claim 17, wherein the splicing of the multiple audio coding segments to obtain spliced data comprises:

Determining the arrangement information of the multiple audio coding segments according to the arrangement information of the multiple audio and video mixed stream data segments;

Sorting the multiple audio coding segments according to the arrangement information of the multiple audio coding segments;

According to the sorting result, the multiple audio coding segments are spliced end to end to obtain spliced data.
The electronic device according to claim 18, wherein the arrangement information of the multiple audio and video mixed stream data segments is one of recording time information and number information.
The electronic device according to claim 17, wherein the audio data processing method further comprises:

Performing video extraction on the multiple audio and video mixed stream data segments respectively to obtain multiple video encoding segments;

Generating video synthesis data according to the multiple video coding segments;

The audio and video synthesis data is generated according to the audio synthesis data and the video synthesis data.
The electronic device according to claim 20, wherein acquiring a plurality of audio and video mixed stream data segments comprises:

Receiving an editing instruction sent by the target object through a user interface, where the editing instruction includes editing object information;

Communicate with the audio and video equipment, and obtain the multiple audio and video mixed stream data segments corresponding to the editing object information from the audio and video equipment through a communication interface, and the multiple audio and video mixed stream data segments are passed through the Audio and video equipment is recorded and saved in segments.
The electronic device according to claim 21, wherein the audio data processing method further comprises:

In response to the editing instruction, an editing operation is performed on the audio and video synthesized data to obtain the edited audio and video synthesized data.
The electronic device according to claim 17, wherein acquiring a plurality of audio and video mixed stream data segments comprises:

Receive the download instruction sent by the target object through the user interface;

Communicate with the audio and video equipment, download the multiple audio and video mixed stream data segments corresponding to the download instruction from the audio and video equipment through a communication interface and save them locally.
The electronic device according to any one of claims 13-23, wherein the format of the multiple audio coding segments is one of an AMR audio coding format, an AAC audio coding format, and an OPUS audio coding format.
A computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the audio data processing method according to any one of claims 1-12 when the computer program is executed by a processor.