CN113986191B

CN113986191B - Audio playing method and device, terminal equipment and storage medium

Info

Publication number: CN113986191B
Application number: CN202111609669.7A
Authority: CN
Inventors: 刘春宇; 漆原; 刘佳泽; 徐春; 党正军
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-06-07
Anticipated expiration: 2041-12-27
Also published as: CN113986191A

Abstract

The embodiment of the application provides an audio playing method, an audio playing device, terminal equipment and a storage medium, and relates to the technical field of application program development and audio. The method comprises the following steps: a playing interface for displaying target audio, wherein the target audio is audio containing a plurality of audio tracks; displaying identification information corresponding to a plurality of audio track combinations in a playing interface, wherein each audio track combination comprises at least one audio track of a plurality of audio tracks, and the identification information of each audio track contained in the audio track combination is displayed in the identification information corresponding to the audio track combination; in response to an operation of identifying information corresponding to a target track combination among the plurality of track combinations, when a plurality of tracks are included in the target track combination, audio corresponding to the target track combination is played, the audio corresponding to the target track combination being obtained by mixing audio of the plurality of tracks included in the target track combination. By adopting the technical scheme provided by the embodiment of the application, the flexibility of the audio playing mode can be improved.

Description

Audio playing method and device, terminal equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of application program development and audio, in particular to an audio playing method, an audio playing device, terminal equipment and a storage medium.

Background

The audio playing application program has an audio playing function, and a user can play audio such as songs, photos, audio books and the like through the audio playing application program.

In the related art, the audio playing application program can only control the audio to start playing, stop playing and play volume, and the audio playing mode is single and not flexible enough.

Disclosure of Invention

The embodiment of the application provides an audio playing method, an audio playing device, a terminal device and a storage medium, which can improve the flexibility of an audio playing mode. The technical scheme is as follows.

According to an aspect of an embodiment of the present application, there is provided an audio playing method, including:

a playback interface that displays target audio, the target audio being audio that includes a plurality of audio tracks;

displaying identification information corresponding to a plurality of audio track combinations in the playing interface, wherein each audio track combination comprises at least one audio track of the plurality of audio tracks, and the identification information corresponding to the audio track combination displays the identification information of each audio track contained in the audio track combination;

in response to an operation of identifying information corresponding to a target audio track combination of the plurality of audio track combinations, when a plurality of audio tracks are included in the target audio track combination, playing audio corresponding to the target audio track combination, wherein the audio corresponding to the target audio track combination is obtained by mixing audio of the plurality of audio tracks included in the target audio track combination.

According to an aspect of an embodiment of the present application, there is provided an audio playback apparatus, including:

the interface display module is used for displaying a playing interface of target audio, and the target audio is audio containing a plurality of audio tracks;

the identification display module is used for displaying identification information corresponding to a plurality of audio track combinations in the playing interface, wherein each audio track combination comprises at least one audio track in the plurality of audio tracks, and the identification information corresponding to the audio track combination displays the identification information of each audio track contained in the audio track combination;

and an audio playing module, configured to play, in response to an operation on identification information corresponding to a target audio track combination in the multiple audio track combinations, audio corresponding to the target audio track combination when multiple audio tracks are included in the target audio track combination, where the audio corresponding to the target audio track combination is obtained by mixing audio of multiple audio tracks included in the target audio track combination.

According to an aspect of the embodiments of the present application, there is provided a terminal device, the terminal device includes a processor and a memory, the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the above audio playing method.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored therein, the computer program being loaded and executed by a processor to implement the above-mentioned audio playing method.

According to an aspect of the embodiments of the present application, there is provided a computer program product, which is loaded and executed by a processor to implement the above-mentioned audio playing method.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the target audio is audio containing a plurality of audio tracks, the combination modes of the audio track combinations are various through a plurality of selectable audio track combinations displayed in the playing interface of the target audio, and a user can select the audio only playing part of the audio tracks in the target audio according to the user's mind, so that the flexibility and richness of the audio playing mode are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an environment for implementing an embodiment provided by an embodiment of the present application;

FIG. 2 is a flowchart of an audio playing method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a playback interface provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a playback interface provided by another embodiment of the present application;

fig. 5 is a flowchart of an audio playing method according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a playback interface provided by yet another embodiment of the present application;

fig. 7 is a block diagram of an audio playback device provided by an embodiment of the present application;

fig. 8 is a block diagram of an audio playback device according to another embodiment of the present application;

fig. 9 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods consistent with aspects of the present application.

Refer to fig. 1, which illustrates a schematic diagram of an environment for implementing an embodiment of the present application. This embodiment implementation environment may be implemented as an audio playback system 10. Optionally, the system 10 comprises a terminal device 11.

The terminal device 11 has installed and operated therein a target application, such as a client of the target application. Optionally, a user account is logged in the client. The terminal device is an electronic device with data calculation, processing and storage capabilities. The terminal device may be a smart phone, a tablet Computer, a PC (Personal Computer), a wearable device, and the like, which is not limited in the embodiment of the present application. Optionally, the terminal device is a device with a touch display screen, and the user can realize human-computer interaction through the touch display screen. The target application program refers to an application program with audio processing and playing functions, and the target application program may be an audio playing application program and can be used for playing audio and adjusting the audio. The target application program may also have functions of video playing, instant messaging, social contact, gaming, payment, shopping, image browsing, and the like, which is not specifically limited in this embodiment of the present application. In the method provided by the embodiment of the present application, the execution subject of each step may be the terminal device 11, such as a client running in the terminal device 11.

In some embodiments, the system 10 further includes a server 12, where the server 12 establishes a communication connection (e.g., a network connection) with the terminal device 11, and the server 12 is configured to provide a background service for the target application. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.

The technical solution of the present application will be described below with reference to several examples.

Referring to fig. 2, a flowchart of an audio playing method according to an embodiment of the present application is shown. In the present embodiment, the method is mainly applied to the client described above for illustration. The method can comprise the following steps (201-203).

Step 201, displaying a playing interface of a target audio, wherein the target audio is an audio containing a plurality of audio tracks.

In some embodiments, the playback interface is a user interface presented while audio is being played. Optionally, the information displayed in the playing interface includes, but is not limited to, at least one of the following: name of target audio, producer, release time, related picture, related video, lyric, volume adjustment control.

Optionally, the target audio belongs to multi-track audio, and the terminal device can parse and split a plurality of tracks from a file corresponding to the target audio. The audio track refers to an audio track in which audio that can be independently displayed and adjusted and is not mixed with other audio in the target audio is located. Optionally, the categories of the audio tracks are classified according to the sources of the corresponding audio, and at least the following classification modes exist.

In some embodiments, the plurality of tracks includes a person track and an accompaniment track, the person track being a track that includes only person audio data and the accompaniment track being a track that includes only accompaniment audio data. For example, the song-like audio may include a singer's voice (i.e., a human voice) and an accompaniment of a song.

In some embodiments, the plurality of audio tracks includes a person audio track and a background audio track, the background audio track being an audio track containing only background sound. For example, reciting-type audio, talking books, radio dramas, and the like may include background sounds for atmosphere-warming and audio presentation effects, in addition to a vocal track.

In some embodiments, the plurality of tracks includes tracks corresponding to a plurality of instruments, respectively. For target audio containing multiple instrument sounds, sounds emitted by different instruments may be divided into different tracks. For example, for classical symphony, the plurality of tracks may include a piano track, a violin track, a cello track, a trombone track, or the like; as another example, for band music, the plurality of tracks may include a keyboard track, a guitar track, a bass track, a drum track, a vocal track, and so forth.

In some embodiments, the plurality of audio tracks includes a plurality of vocal audio tracks, the plurality of vocal audio tracks respectively corresponding to audio uttered by different persons. Alternatively, for target audio containing the voices of multiple persons, the audio of different persons may be separated into different tracks.

In some possible implementations, the target audio may be multi-track audio resulting from single track audio decoding. Alternatively, after a single-track audio is input into a trained AI (Artificial Intelligence) model, the AI model outputs a target audio including a plurality of tracks. Optionally, the AI model is a neural network model.

Step 202, displaying identification information corresponding to the plurality of audio track combinations in a playing interface.

Optionally, each track combination comprises at least one track of the plurality of tracks. That is, one track may be included in any one of the plurality of track combinations, or a plurality of tracks may be included in the track combination.

In some embodiments, different track combinations may contain the same track. For example, track combination a of the plurality of track combinations includes track 1, track 2, and track 3 of the plurality of tracks; track combination B contains track 2 and track 4 of multiple tracks, and track combination a and track combination B contain the same track (i.e., track 2).

In some embodiments, the audio track combinations correspond to identification information for distinguishing between different audio track combinations. Alternatively, as shown in fig. 3, in the playback interface 30, the identification information of the track combination may be the number of the track combination, such as track combination a, track combination B, track combination C, and track combination D.

In some embodiments, in the identification information corresponding to the audio track combination, the identification information of each audio track included in the audio track combination is displayed. In some embodiments, the identification information of the audio track may be used to visually display the sound source of the corresponding audio, such as the corresponding musical instrument (e.g., piano, violin, cello, guitar, bass, etc.) or human voice (e.g., female, male, child, etc.). Alternatively, the identification information of the track may be displayed as a name of the sound source corresponding to the audio, or may be displayed as a pattern of the sound source corresponding to the audio. As shown in fig. 4, the identification information 41 of the violin track may be displayed as a pattern in the shape of a violin. In the identification information corresponding to the audio track combination, the identification information of each audio track contained in the identification information is displayed, so that a user can intuitively and conveniently know the audio tracks contained in each audio track combination, and the user can conveniently select the audio track combination needing to be played.

Alternatively, the identification information of the audio tracks in the same audio track combination is displayed in a cluster, such as in the same region. As shown in fig. 4, the identification information 42 of the person track and the identification information 43 of the piano track are displayed in the same display area (as in a circle). Optionally, the more the identification information of the audio track contained in the audio track combination, the larger the area of the display area corresponding to the audio track combination; accordingly, the less the identification information of the audio track contained in the audio track combination, the smaller the display area corresponding to the audio track combination.

In some possible implementations, after the step 202, the method further includes: and responding to the merging operation of at least two audio track combinations in the plurality of audio track combinations, generating merged audio track combinations according to the audio tracks contained in the at least two audio track combinations, and displaying identification information corresponding to the merged audio track combinations in a playing interface, wherein the identification information corresponding to the merged audio track combinations is displayed with the identification information of the audio tracks contained in the at least two audio track combinations.

In the above implementation, the merging operation may be a moving operation for the identification information of at least two audio track combinations. Optionally, in a case where a distance between the identification information of at least two audio track combinations is smaller than a first threshold, or in a case where an area of an overlapping region between the identification information of at least two audio track combinations is larger than a second threshold, the two audio track combinations are merged into a new audio track combination, that is, a merged audio track combination is obtained. Of course, the combined soundtrack combination may also continue to be combined with other soundtrack combinations to generate new combined soundtrack combinations.

In some possible implementations, after the step 202, the method further includes: and in response to the movement operation of the identification information of the third audio track in the first audio track combination, if the termination position of the movement operation is positioned at the position of the identification information corresponding to the second audio track combination, the identification information of the third audio track is additionally displayed in the identification information corresponding to the second audio track combination. I.e. the second audio track combination has a third audio track added.

Optionally, in the above implementation, after the moving operation for the identification information of the third audio track in the first audio track combination, the identification information of the third audio track still remains to be displayed in the first audio track combination, that is, the first audio track combination still contains the third audio track; or, after the moving operation for the identification information of the third audio track in the first audio track combination, the identification information of the third audio track is added and displayed in the identification information corresponding to the second audio track combination, and the identification information of the third audio track is cancelled and displayed in the identification information of the second audio track combination, that is, the third audio track is removed in the second audio track combination.

In some possible embodiments, after step 202, the method further includes: in response to a removal operation for the identification information of the fourth track in the third track combination, the fourth track is deleted in the third track combination, and the identification information of the fourth track is undisplayed in the identification information corresponding to the third track combination.

In the above-described implementation, the fourth track is removed from the third track combination after the removal operation for the identification information of the fourth track. In some embodiments, the removing operation for the identification information of the fourth track refers to moving the identification information of the fourth track out of the identification information of the third track combination. Optionally, after the identification information of the fourth audio track is moved out of the identification information of the third audio track combination, canceling the display of the identification information of the fourth audio track; or after the identification information of the fourth track is moved to the outside of the identification information of the third track combination, if the identification information of the track combination only including the fourth track does not exist in the playing interface, a new track combination is generated, and the identification information of the new track combination only displays the identification information of the fourth track, namely the new track combination only includes the fourth track.

In the above several implementation manners, by combining the audio track combinations, moving the identification information of the audio track from one audio track combination to another audio track combination, removing the identification information of the audio track from the identification information of the audio track combination, and the like, the existing audio track combinations are modified and adjusted, the selection diversity of the audio track combinations is improved, and the flexibility and richness of the audio playing manner are further improved.

After the audio track combination is adjusted, the adjusted multiple audio track combinations are obtained, and the subsequent steps can be continuously executed based on the adjusted multiple audio track combinations.

Step 203, in response to the operation of the identification information corresponding to the target audio track combination in the plurality of audio track combinations, playing the audio corresponding to the target audio track combination when the plurality of audio tracks are included in the target audio track combination.

In some embodiments, the audio corresponding to the target audio track combination is played in response to a click or press operation on the identification information corresponding to the target audio track combination. Optionally, audio corresponding to the audio tracks contained in the target audio track combination is played. Optionally, the click operation may be a single click, a double click, a triple click, or the like for the identification information corresponding to the target audio track combination. Alternatively, the pressing operation refers to an operation in which the pressing time length of the identification information corresponding to the target audio track combination exceeds a third threshold.

In some embodiments, where one track is included in the target track combination, the audio of the one track included in the target track combination is played. In some embodiments, in a case where a plurality of tracks are included in the target track combination, the audio corresponding to the target track combination is obtained by mixing the audio of the plurality of tracks included in the target track combination. In some embodiments, the client performs audio mixing on audio containing a plurality of audio tracks in the target audio track combination to obtain mixed audio; the mixed audio is then played. In some embodiments, the audio of multiple tracks included in the target track combination is mixed by the server, and then the mixed audio is sent to the client by the server (or downloaded from the server by the client), and played by the client.

In some embodiments, in the case where a plurality of tracks are included in the target audio track combination, the playback control parameters of the plurality of tracks included in the target audio are adjusted; in the adjusted playing control parameters of the multiple audio tracks contained in the target audio, the playing control parameters of the multiple audio tracks contained in the target audio track combination are to be played, and the playing control parameters of other audio tracks except the multiple audio tracks contained in the target audio track combination are not played; and mixing the audio of the audio track to be played with the playing control parameter to obtain the audio after mixing.

In summary, in the technical solution provided by the embodiment of the present application, the target audio is an audio including a plurality of audio tracks, and through a plurality of selectable audio track combinations displayed in the playing interface of the target audio, the combination modes of the audio track combinations are various, and a user can select audio only playing a part of audio tracks in the target audio according to his own mind, thereby improving the flexibility and richness of the audio playing mode.

Please refer to fig. 5, which shows a flowchart of an audio playing method according to another embodiment of the present application. In the present embodiment, the method is mainly applied to the client described above for illustration. As shown in FIG. 5, the step 203 may include the following steps (2031-2032).

Step 2031, in response to the dragging operation for the identification information corresponding to the target audio track combination, if the termination position of the dragging operation is located within the area to be played, displaying the identification information corresponding to the target audio track combination in the area to be played.

In some embodiments, the area to be played is displayed in the playing interface, and the identification information corresponding to each of the plurality of audio track combinations is displayed outside the area to be played. Optionally, each audio track combination only includes one audio track, that is, only the identification information of one audio track is displayed in the identification information of each audio track combination; the drag operation for the target track combination is a drag operation for the identification information of the target track combination in the target track. In some embodiments, the identification information of the target audio track combination moves as the user's current click or touch location moves. The user can drag the identification information of the audio track to be played to the area to be played through dragging operation; the identification information of the combination of audio tracks or the identification information of the audio tracks that have been displayed in the playback area may be dragged outside the playback area to indicate that the audio of the combination of audio tracks or the audio tracks is not to be played.

Step 2032, playing the audio corresponding to the target audio track combination contained in the region to be played.

In some embodiments, the audio of the audio track corresponding to the target audio track combination displayed in the region to be played is mixed to obtain mixed audio, and the mixed audio is played.

In some embodiments, the region to be played comprises identification information of a preset track, the preset track referring to other tracks than the plurality of tracks of the target audio. The preset track may be one or more. The preset audio track can be set by the user, or can be recommended by the client or the background server. Optionally, the audio of the audio track corresponding to the target audio track combination is mixed with the audio of the preset audio track, so as to obtain the mixed audio.

In some embodiments, the step 2032 further comprises the following steps (1-2):

1. under the condition that the area to be played contains a plurality of target audio track combinations, determining the playing starting time corresponding to each of the plurality of target audio track combinations according to the sequence that the identification information corresponding to the plurality of target audio track combinations is dragged to the area to be played;

2. and playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively.

In this embodiment, the earlier the identification information of the target audio track combination is dragged to the region to be played, the earlier the corresponding start playing time is; the later the sequence that the identification information of the target audio track combination is dragged to the area to be played, the later the corresponding start playing time. Or, the closer the identification information of the target audio track combination is dragged to the region to be played, the later the corresponding playing start time is; the later the sequence that the identification information of the target audio track combination is dragged to the area to be played, the earlier the corresponding start playing time. The embodiment of the present application is not particularly limited to this.

In some embodiments, the two target tracks that are dragged to the area to be played and are adjacent in sequence are combined, and the corresponding time difference of the start playing time may be preset by a related technician or may be set by a user, which is not specifically limited in this embodiment of the present application.

Illustratively, the sequence of dragging to the area to be played is as follows: the target track combination 1 is earlier than the target track combination 2, and the target track combination 2 is earlier than the target track combination 3. The start playing time of each track in the target track combination 1 is the same and earlier than the start playing time of each track in the target track combination 2; the start playing time of each track in the target track combination 2 is the same and is earlier than the start playing time of each track in the target track combination 3; the start playing time of each track in the target track combination 3 is the same. Illustratively, the target track combination 1 only contains track 1, the target track combination 2 only contains track 2, and the target track combination 3 contains track 3 and track 4, and if the start playing time of track 1 is set to 0 minute and 0 second, and the time difference between adjacent target track combinations is set to 2 seconds, then: the start of playback time for track 2 is 0 minutes 2 seconds, and the start of playback times for track 3 and track 4 are both 0 minutes 4 seconds.

In some embodiments, the start time of the audio track is adjusted by adding a silent audio clip at the beginning of the audio track and mixing. Taking the above example as an example, the original start playing time of a plurality of tracks of the target audio is the same, all 0 minutes and 0 seconds, track 1 is kept unchanged, and 2 seconds of silent audio clip is added to the beginning of track 2, so that the audio of track 2 is delayed by 2 seconds in auditory effect relative to track 1; a 4 second silent audio clip is added to the beginning of track 3 and track 4 so that the audio of track 3 and the audio of track 4 are both delayed in auditory effect by 4 seconds with respect to track 1, thereby achieving the effect of adjusting the start of playback time. The flexibility and richness of the audio playing mode are further improved.

In some embodiments, the step 2032 further comprises the following steps (1.1-1.2):

1.1, under the condition that a plurality of target audio track combinations are contained in a region to be played, determining playing starting time corresponding to the plurality of target audio track combinations respectively according to the arrangement sequence of identification information corresponding to the plurality of target audio track combinations in the region to be played;

and 1.2, playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively.

In this embodiment, the identification information of the plurality of target track combinations is displayed in a certain order in the region to be played, and the user may also adjust the arrangement order of the identification information of the target track combinations by a drag operation. In some embodiments, the higher and the left the display position of the identification information of the target track combination is, the more forward the identification information of the target track combination is arranged. Optionally, the audio track combination corresponding to the identification information with the earlier arrangement sequence is earlier in the corresponding playing start time; the later the arrangement order, the later the corresponding start playback time of the track combination corresponding to the identification information. For the introduction of the related content of the arrangement order of the identification information of the target audio track combination and the play start time, reference may be made to the sequence in which the identification information of the target audio track combination is dragged to the region to be played and the related content of the corresponding play start time, which are not described herein again.

To sum up, the technical scheme provided by the embodiment of the application determines the audio track corresponding to the audio to be played by dragging the identification information of the audio track combination into the area to be played or moving out of the area to be played, so that the user can operate more conveniently and intuitively, and the flexibility and richness of the audio playing mode are further improved.

In some possible implementation manners, the embodiment of the application further comprises the following steps (2.1-2.3).

And 2.1, acquiring audio parameters of the audio tracks contained in the target audio track combination.

In some embodiments, a multi-track file of the target audio is obtained, and audio parameters of a plurality of tracks included in the target audio, that is, audio parameters of tracks included in the target audio combination, are obtained through the multi-track file of the target audio.

And 2.2, according to the audio parameters of the audio tracks contained in the target audio track combination, determining rendering parameters of the effect animations corresponding to the audio tracks contained in the target audio track combination.

Optionally, the audio parameters include a volume parameter and a tempo change parameter, and the rendering parameters include a size parameter and a display tempo parameter.

In some embodiments, the size parameter of the animation frame of the effect animation corresponding to the first audio track at the first playing time is determined according to the volume parameter corresponding to the first audio track at the first playing time in the target audio track combination. In some embodiments, as shown in FIG. 4, the size parameter is the maximum radius 44 corresponding to the animation frame. Optionally, the size parameter of the animation frame of the effect animation corresponding to the first audio track at each playing time is in positive correlation with the volume of the first audio track at the corresponding playing time. That is, the larger the volume of the first audio track at each playing time is, the larger the size parameter of the animation frame of the effect animation corresponding to the first audio track at the corresponding playing time is; the smaller the volume of the first audio track at each playing time is, the smaller the size parameter of the animation frame of the effect animation corresponding to the first audio track at the corresponding playing time is.

In some embodiments, the display rhythm parameter of the effect animation corresponding to the second audio track is determined according to the rhythm variation parameter corresponding to the second audio track in the target audio track combination. Illustratively, as shown in fig. 4, the display rhythm of the animation of the effect corresponding to the audio track may refer to the display change rhythm of the waveform 45. In some embodiments, the rhythm of display of the animation of the effect corresponding to the second audio track matches the rhythm corresponding to the second audio track. For example, the faster the tempo change corresponding to the second audio track, the faster the display tempo of the effect animation corresponding to the second audio track; the slower the tempo change corresponding to the second audio track, the slower the display tempo of the effect animation corresponding to the second audio track.

In the above embodiment, the rendering parameters are matched with the audio parameters, so that the effect animation corresponding to the target audio track combination is adapted to the played audio, and thus, the user can feel the change of the played audio more intuitively. Thereby improving the interest of user experience.

In some embodiments, the rendering parameters include color parameters, and the animation rendering and displaying the effect corresponding to each of the plurality of audio tracks included in the target audio track combination is performed according to the color parameters corresponding to each of the plurality of audio tracks included in the target audio track combination. For example, the plurality of tracks included in the target track combination include a violin track, a piano track, and a boy track, and then the main color of the effect animation corresponding to the violin track may be blue, the theme color of the effect animation corresponding to the piano track may be black and white, and the main color of the effect animation corresponding to the boy track may be pink. Optionally, the color parameters corresponding to the plurality of audio tracks may be preset by a related technician, or may be set by a user, which is not specifically limited in this embodiment of the present application.

Illustratively, as shown in fig. 4, the

waveforms

45 and 46 are animation effects corresponding to two different tracks, respectively, and the size parameters, shapes, rhythm changes, colors, and the like of the

waveforms

45 and 46 may not be the same.

And 2.3, in the process of playing the audio corresponding to the target audio track combination, rendering and displaying the effect animation corresponding to the target audio track combination according to the rendering parameters of the effect animation corresponding to the audio tracks contained in the target audio track combination.

In some embodiments, each audio track in the target audio track combination corresponds to a respective effect animation, and the effect animation corresponding to each audio track in the target audio track combination is rendered according to the generated rendering parameters.

In some embodiments, rendering and displaying the effect animation corresponding to the target audio track combination according to the rendering parameters of the effect animation corresponding to the audio tracks included in the target audio track combination respectively comprises the following steps (2.3.1-2.3.2).

2.3.1, when a plurality of tracks are included in the target track combination, determining a display order of the plurality of tracks included in the target track combination.

In the case where a plurality of audio tracks are included in the target audio track combination, since each audio track has its own corresponding effect animation, the respective audio tracks have their own corresponding effect animations and need to be displayed in an overlapping manner, that is, the display order is determined according to the overlapping hierarchy. And displaying the effect animation corresponding to the audio track in the front sequence, and displaying the effect animation corresponding to the audio track in the back sequence on the upper layer of the effect animation.

In some embodiments, determining the display order of the plurality of audio tracks included in the target audio track combination includes at least the following implementations (mode 1-mode 3).

Mode 1: the display order of the plurality of audio tracks included in the target audio track combination is determined in accordance with the display priority of the plurality of audio tracks included in the target audio.

In some embodiments, the related art person or the user sets in advance a display priority corresponding to each of the plurality of audio tracks, and determines a display order of each audio track in the target audio track combination according to the display priority. Obviously, the higher the priority of the display of the animation of the track, the earlier the display order thereof; the display animation of the track with the lower priority is displayed, the display order thereof is further back.

Mode 2: the display order of the plurality of audio tracks included in the target audio track combination is determined in accordance with an order setting operation for the plurality of audio tracks included in the target audio track combination.

In some embodiments, after determining the target audio track combination, the user may set or adjust the display order corresponding to the plurality of audio tracks included in the target audio track combination through a sequence setting operation.

Mode 3: the display order of the plurality of audio tracks included in the target audio track combination is determined according to the respective volumes of the plurality of audio tracks included in the target audio track combination.

In some embodiments, the display order of the plurality of audio tracks included in the target audio track combination is determined according to the average volume corresponding to each of the plurality of audio tracks included in the target audio track combination. Generally, the track with the largest average volume in audio is the most important track, such as the human voice track. The effect animation corresponding to the sound track with large average volume is displayed in advance, so that the user cognition is better met, and the user experience is improved.

And 2.3.2, superposing and displaying the effect animation of the plurality of audio tracks contained in the target audio track combination according to the rendering parameters and the display sequence corresponding to the plurality of audio tracks contained in the target audio track combination.

In some embodiments, the display levels of the effect animations corresponding to the respective audio tracks are determined according to the display order corresponding to the respective audio tracks included in the target audio track combination, that is, which audio track corresponds to the effect animation displayed on the upper layer and which audio track corresponds to the effect animation displayed on the lower layer, and then the plurality of effect animations are displayed in an overlapping manner according to the corresponding rendering parameters.

In the implementation manner, the effect animations corresponding to the audio tracks are displayed, and the rendering parameters corresponding to the effect animations are determined according to the audio parameters of the audio tracks, so that the adaptability of the effect animations and the audio parameters of the corresponding audio tracks is improved.

In some possible implementation manners, the embodiment of the application further comprises the following steps (4.1-4.4).

4.1, displaying a playing interface of the target audio, wherein the target audio is audio containing a plurality of audio tracks.

This step is the same as or similar to the content of step 201 in the embodiment of fig. 2, and is not described here again.

And 4.2, displaying identification information corresponding to the plurality of audio track combinations in the playing interface.

Each audio track combination can comprise at least one audio track of a plurality of audio tracks, and the audio tracks contained in different audio track combinations are different, or the audio tracks contained in different audio track combinations are not repeated; that is, for each of the plurality of audio tracks, there is only one combination of audio tracks at most. Illustratively, the plurality of audio track combinations includes 3 audio track combinations: track combination A, track combination B, track combination C; the plurality of audio tracks includes 4 audio tracks: track 1, track 2, track 3, track 4. If track combination a contains track 1, then track combination B and track combination C cannot contain track 1; if track combination B contains track 1 and track 2, then neither track combination a nor track combination C will contain track 1 or track 2. Optionally, each track combination contains only one track; correspondingly, the identification information of each audio track combination only contains the identification information of one audio track; alternatively, each audio track combination is displayed as identification information of the corresponding audio track. Illustratively, the plurality of audio track combinations includes 3 audio track combinations: track combination A, track combination B and track combination C; the plurality of audio tracks includes 3 audio tracks: track 1, track 2, track 3. The 3 track combinations are in one-to-one correspondence with the 3 tracks, each track combination only contains one track, and each track only exists in at most one track combination, for example, track combination a contains track 1, track combination B contains track 2, and track combination C contains track 3.

Other contents of this step are the same as or similar to those of step 202 in the embodiment of fig. 2, and are not described herein again.

And 4.3, controlling the display position of the identification information of at least one of the plurality of audio track combinations to dynamically change in the playing interface.

In some embodiments, the display position of the identification information of some or all of the plurality of audio track combinations in the playback interface may be changed. The display position of the identification information of the track combination changes, and at least the following two cases (case 1-case 2) can be included.

Case 1: the display position of the identification information controlling at least one audio track combination is automatically changed.

In some embodiments, a target rule is preset, and the display position of the identification information of at least one audio track combination can be controlled to automatically and dynamically change according to the target rule. Alternatively, the identification information of at least one combination of audio tracks may change the display position continuously over a period of time. For example, the identification information of the track combination may be moved slowly from one location to another, or moved cyclically according to the movement trajectory in the target rule. In some embodiments, the identification information of the combination of audio tracks may move up and down or left and right. In some embodiments, the identification information of the combination of the audio tracks may be a continuous rotation motion, such as a continuous rotation along a circular track. In an exemplary embodiment, as shown in fig. 6, the identification information of the track combination 61, the identification information of the track combination 62, and the identification information of the track combination 63 in the playback interface 30 continuously rotate along the circular track 64, the circular track 65, and the circular track 66, respectively. And the identification information of each audio track combination in at least one audio track combination can be rotated clockwise or anticlockwise, or can be rotated clockwise and anticlockwise alternately.

Optionally, the target rule may be preset by a relevant technician, or may be preset by a user, which is not specifically limited in this embodiment of the present application.

Case 2: the display position of the identification information of each audio track combination is manually controlled by a user's trigger operation.

For example, in response to a displacement operation of the identification information for a fourth track combination of the plurality of track combinations, the display position of the fourth track combination is changed from the first position to the second position. In some embodiments, the displacement operation for the identification information of the fourth track combination refers to a sliding operation for the identification information of the fourth track combination, the sliding operation being an operation of sliding from the first position to the second position. Alternatively, during the sliding operation, the identification information of the fourth track combination moves along with the movement of the touch position of the touch body (such as the user's finger, stylus pen, etc.). In some embodiments, the displacement operation for the identification information of the fourth audio track combination refers to a click operation for the identification information of the fourth audio track combination. For example, a first click operation is performed on a first position where the identification information of the fourth audio track combination is located, and the identification information of the fourth audio track combination is selected; after the first click operation, the identification information of the fourth track combination is updated and displayed to the second position in response to the second click operation for the second position.

4.4, in some embodiments, playing a plurality of audio tracks in combination with corresponding combined audio; wherein the spatial sound effect of the combined audio is related to the display position of the identification information of the plurality of audio track combinations.

The spatial sound effect can also be called as stereo, and the playing effect with spatial sense/stereoscopic impression can be simulated; and the acoustic positions corresponding to the audio elements simulated by the combined audio can be distinguished by the user in the process of listening to the combined audio. It should be noted that the acoustic position is merely the source of the audio element perceived by the human ear and brain, and does not represent the actual sound production position of the audio element.

In some embodiments, audio of the audio tracks included in the plurality of audio track combinations is mixed according to the direction information and the distance information respectively corresponding to the audio tracks included in the plurality of audio track combinations, so as to obtain combined audio, and the combined audio is played. The acoustic position information includes the direction (e.g., left, right, top, bottom, etc.) and distance of the position of the sound production to be simulated corresponding to the audio track, and the acoustic position information of each audio track (i.e., the direction information and distance information corresponding to each audio track) corresponds to the relative position of the identification information of the audio track combination in which the audio track is located with respect to the identification of the center of hearing in the playback interface. For example, if the identification information of the audio track combination is displayed on the left side of the identification of the listening center and is closer to the identification of the listening center, the audio track included in the audio track combination is positioned on the left side of the user and closer to the user in terms of listening during audio playback; if the identification information of the audio track combination is displayed above the identification of the listening center and is far from the identification of the listening center, the audio track included in the audio track combination is positioned in front of the user and far from the user in listening.

In some embodiments, since the display position of the identification information of at least one audio track combination is dynamically changed in the playing interface, during the playing process of the audio, the spatial sound effect of the audio corresponding to each audio track of the combined audio is also changed in real time according to the change of the display position of the identification information of at least one audio track combination. For example, in the process of moving the display position of the identification information of a certain track combination from the left side to the right side of the identification of the hearing center, the acoustic position information of the track included in the track combination also moves from the left side to the right side of the user in real time, that is, the sound source corresponding to the track included in the track combination is heard and moves from the left side to the right side of the user.

Optionally, when a plurality of tracks are included in the track combination, the acoustic position information corresponding to the plurality of tracks is the same, that is, the spatial sound effect is the same; if the display position of the track combination changes, the parameters (or acoustic position information) of the spatial sound effects of the plurality of tracks included in the track combination also change in the same way, so that the spatial sound effects of the plurality of tracks included in the track combination are always consistent.

In the above implementation manner, in the audio playing process, the display position of the identification information of the audio track combination can be dynamically changed, and the spatial sound effect of the corresponding combined audio can be changed in real time according to the change of the display position of the identification information of the audio track combination, so that the playing flexibility in the audio playing process is improved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 7, a block diagram of an audio playing apparatus according to an embodiment of the present application is shown. The device has the function of realizing the audio playing method example, and the function can be realized by hardware or by hardware executing corresponding software. The device may be the terminal device described above, or may be provided on the terminal device. The apparatus 700 may include: an interface display module 710, an identification display module 720 and an audio playing module 730.

The interface display module 710 is configured to display a playing interface of a target audio, where the target audio is an audio including a plurality of audio tracks.

The identification display module 720 is configured to display, in the playing interface, identification information corresponding to a plurality of audio track combinations respectively, where each of the audio track combinations includes at least one audio track of the plurality of audio tracks, and the identification information corresponding to the audio track combination displays identification information of each audio track included in the audio track combination.

The audio playing module 730 is configured to, in response to an operation on identification information corresponding to a target audio track combination in the multiple audio track combinations, play audio corresponding to the target audio track combination when multiple audio tracks are included in the target audio track combination, where the audio corresponding to the target audio track combination is obtained by mixing audio of multiple audio tracks included in the target audio track combination.

In an exemplary embodiment, the audio playing module 730 is configured to play the audio corresponding to the target audio track combination in response to a click or press operation on the identification information corresponding to the target audio track combination.

In an exemplary embodiment, a to-be-played area is displayed in the playing interface, and the identification information corresponding to each of the plurality of audio track combinations is displayed outside the to-be-played area. As shown in fig. 8, the audio playing module 730 includes: an identification display sub-module 731 and an audio play sub-module 732.

The identifier display sub-module 731 is configured to respond to a dragging operation for the identifier information corresponding to the target audio track combination, and if an end position of the dragging operation is located within the to-be-played area, display the identifier information corresponding to the target audio track combination in the to-be-played area.

The audio playing sub-module 732 is configured to play an audio corresponding to the target audio track combination included in the region to be played.

In an exemplary embodiment, in a case that a plurality of target audio track combinations are included in the region to be played, as shown in fig. 8, the audio playing sub-module 732 is configured to: determining playing start time corresponding to the target audio track combinations according to the sequence of dragging the identification information corresponding to the target audio track combinations to the area to be played; playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively; or determining playing start time corresponding to a plurality of target audio track combinations according to the arrangement sequence of the identification information corresponding to the target audio track combinations in the region to be played; and playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively.

In an exemplary embodiment, as shown in fig. 8, the apparatus 700 further comprises: a parameter acquisition module 740, a parameter determination module 750, and an animation display module 760.

The parameter obtaining module 740 is configured to obtain audio parameters of audio tracks included in the target audio track combination.

The parameter determining module 750 is configured to determine, according to the audio parameters of the audio tracks included in the target audio track combination, rendering parameters of the effect animations corresponding to the audio tracks included in the target audio track combination, respectively.

The animation display module 760 is configured to render and display the effect animation corresponding to the target audio track combination according to the rendering parameters of the effect animation corresponding to the audio tracks included in the target audio track combination, respectively, in the process of playing the audio corresponding to the target audio track combination.

In an exemplary embodiment, the audio parameters include a volume parameter and a tempo change parameter, and the rendering parameters include a size parameter and a display tempo parameter. As shown in fig. 8, the parameter determining module 750 is configured to: determining a size parameter of an animation frame of an effect animation corresponding to a first audio track at a first playing time according to a volume parameter corresponding to the first audio track at the first playing time in the target audio track combination; and/or determining a display rhythm parameter of an effect animation corresponding to a second audio track according to a rhythm change parameter corresponding to the second audio track in the target audio track combination.

In an exemplary embodiment, as shown in fig. 8, the animation display module 760 includes: an order determination sub-module 761 and an animation display sub-module 762.

The order determination sub-module 761 is configured to determine a display order of a plurality of audio tracks included in the target audio track combination when the plurality of audio tracks are included in the target audio track combination.

The animation display sub-module 762 is configured to display, in an overlapping manner, an effect animation of the plurality of audio tracks included in the target audio track combination according to the rendering parameters and the display order respectively corresponding to the plurality of audio tracks included in the target audio track combination.

In an exemplary embodiment, as shown in fig. 8, the order determination sub-module 761 is configured to:

determining a display order of a plurality of audio tracks included in the target audio track combination according to display priorities of the plurality of audio tracks included in the target audio; or, determining a display order of a plurality of audio tracks included in the target audio track combination in accordance with an order setting operation for the plurality of audio tracks included in the target audio track combination; alternatively, the display order of the plurality of tracks included in the target track combination is determined according to the volume corresponding to each of the plurality of tracks included in the target track combination.

In an exemplary embodiment, as shown in fig. 8, the apparatus 700 further comprises: a combination generation module 770 and an identity augmentation module 780.

The combination generating module 770 is configured to, in response to a merging operation for at least two audio track combinations of the multiple audio track combinations, generate a merged audio track combination according to audio tracks included in the at least two audio track combinations, and display identification information corresponding to the merged audio track combination in the playing interface, where the identification information corresponding to the merged audio track combination displays identification information of audio tracks included in the at least two audio track combinations.

The identification adding module 780 is configured to, in response to a moving operation for the identification information of the third audio track in the first audio track combination, add and display the identification information of the third audio track in the identification information corresponding to the second audio track combination if an end position of the moving operation is located at an identification information position corresponding to the second audio track combination.

In an exemplary embodiment, as shown in fig. 8, the apparatus 700 further comprises: the track deletion module 790.

The audio track deleting module 790 is configured to, in response to a removing operation for the identification information of a fourth audio track in a third audio track combination, delete the fourth audio track in the third audio track combination, and cancel and display the identification information of the fourth audio track in the identification information corresponding to the third audio track combination.

In an exemplary embodiment, the identification display module 720 is further configured to control a display position of the identification information of at least one of the plurality of audio track combinations to be dynamically changed in the playing interface.

In an exemplary embodiment, the audio playing module 730 is further configured to play combined audio corresponding to the plurality of audio track combinations; wherein the spatial sound effect of the combined audio is related to the display position of the identification information of the plurality of audio track combinations.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, the division of each functional module is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Referring to fig. 9, a block diagram of a terminal device 900 according to an embodiment of the present application is shown. The terminal device 900 may be an electronic device such as a mobile phone, a tablet computer, a game console, an electronic book reader, a multimedia player, a wearable device, a PC, etc. The terminal equipment is used for implementing the audio playing method. The terminal device may be the terminal device 11 in the implementation environment shown in fig. 1. Specifically, the method comprises the following steps:

in general, terminal device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, a 9-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 902 is used to store a computer program and is configured to be executed by one or more processors to implement the audio playback method described above.

In some embodiments, the terminal device 900 may further optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, display screen 905, camera assembly 906, audio circuitry 907, and power supply 908.

Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of terminal device 900 and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the above-described audio playback method.

In an exemplary embodiment, a computer program product is also provided, which is loaded and executed by a processor to implement the above-mentioned audio playing method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An audio playing method, the method comprising:

displaying identification information respectively corresponding to a plurality of audio track combinations in the playing interface, wherein each audio track combination comprises at least one audio track in the plurality of audio tracks, and the identification information corresponding to the audio track combination displays the identification information of each audio track contained in the audio track combination;

in response to an operation of identifying information corresponding to a target audio track combination of the plurality of audio track combinations, playing audio corresponding to the target audio track combination when a plurality of audio tracks are included in the target audio track combination, wherein the audio corresponding to the target audio track combination is obtained by mixing audio of the plurality of audio tracks included in the target audio track combination;

determining rendering parameters of effect animations respectively corresponding to the audio tracks contained in the target audio track combination according to the audio parameters of the audio tracks contained in the target audio track combination;

and in the process of playing the audio corresponding to the target audio track combination, rendering and displaying the effect animations corresponding to the audio tracks contained in the target audio track combination in a layered overlapping manner according to the rendering parameters of the effect animations corresponding to the audio tracks contained in the target audio track combination.

2. The method of claim 1, wherein said playing audio corresponding to a target audio track combination of the plurality of audio track combinations in response to the operation of identifying information corresponding to the target audio track combination comprises:

and responding to clicking or pressing operation aiming at the identification information corresponding to the target audio track combination, and playing the audio corresponding to the target audio track combination.

3. The method according to claim 1, wherein a region to be played is displayed in the playing interface, and identification information corresponding to each of the plurality of audio track combinations is displayed outside the region to be played;

the playing the audio corresponding to the target audio track combination in response to the operation of the identification information corresponding to the target audio track combination in the plurality of audio track combinations comprises:

responding to a dragging operation aiming at the identification information corresponding to the target audio track combination, and if the termination position of the dragging operation is located in the area to be played, displaying the identification information corresponding to the target audio track combination in the area to be played;

and playing the audio corresponding to the target audio track combination contained in the region to be played.

4. The method according to claim 3, wherein in a case where a plurality of target audio track combinations are included in the region to be played, the playing audio corresponding to the target audio track combinations included in the region to be played comprises:

determining playing start time corresponding to the target audio track combinations according to the sequence of dragging the identification information corresponding to the target audio track combinations to the area to be played; playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively;

or,

determining playing start time corresponding to the target audio track combinations according to the arrangement sequence of the identification information corresponding to the target audio track combinations in the region to be played; and playing the audio corresponding to the target audio track combinations according to the playing starting time corresponding to the target audio track combinations respectively.

5. The method of claim 1, wherein the audio parameters comprise a volume parameter and a tempo change parameter, and wherein the rendering parameters comprise a size parameter and a display tempo parameter;

the method for determining rendering parameters of effect animations respectively corresponding to the audio tracks contained in the target audio track combination according to the audio parameters of the audio tracks contained in the target audio track combination comprises at least one of the following steps:

determining a size parameter of an animation frame of an effect animation corresponding to a first audio track at a first playing time according to a volume parameter corresponding to the first audio track at the first playing time in the target audio track combination;

and determining a display rhythm parameter of an effect animation corresponding to a second audio track according to the rhythm change parameter corresponding to the second audio track in the target audio track combination.

6. The method according to claim 1, wherein said rendering and displaying the effect animation corresponding to the target audio track combination according to the rendering parameters of the effect animation corresponding to the audio tracks included in the target audio track combination, respectively, comprises:

determining a display order of a plurality of audio tracks included in the target audio track combination in a case where the plurality of audio tracks are included in the target audio track combination;

and according to rendering parameters and display sequences corresponding to a plurality of audio tracks contained in the target audio track combination, superposing and displaying the effect animation of the plurality of audio tracks contained in the target audio track combination.

7. The method of claim 6, wherein determining the display order of the plurality of audio tracks included in the target audio track combination comprises:

determining a display order of a plurality of audio tracks included in the target audio track combination according to display priorities of the plurality of audio tracks included in the target audio;

or,

determining a display order of a plurality of audio tracks included in the target audio track combination in accordance with an order setting operation for the plurality of audio tracks included in the target audio track combination;

or,

and determining the display sequence of the plurality of audio tracks contained in the target audio track combination according to the volume corresponding to the plurality of audio tracks contained in the target audio track combination.

8. The method according to claim 1, wherein after displaying the identification information corresponding to each of the plurality of audio track combinations in the playback interface, the method further comprises at least one of:

in response to a merging operation for at least two audio track combinations of the plurality of audio track combinations, generating a merged audio track combination according to audio tracks contained in the at least two audio track combinations, and displaying identification information corresponding to the merged audio track combination in the playing interface, wherein the identification information corresponding to the merged audio track combination is displayed with the identification information of the audio tracks contained in the at least two audio track combinations;

in response to a moving operation of the identification information of a third audio track in the first audio track combination, if the termination position of the moving operation is located at the position of the identification information corresponding to the second audio track combination, the identification information of the third audio track is additionally displayed in the identification information corresponding to the second audio track combination.

9. The method according to claim 1, further comprising, after displaying identification information corresponding to each of the plurality of audio track combinations in the playback interface:

in response to a removal operation for identification information of a fourth track in a third track combination, deleting the fourth track in the third track combination, and canceling display of the identification information of the fourth track in identification information corresponding to the third track combination.

10. The method according to claim 1, further comprising, after displaying identification information corresponding to each of the plurality of audio track combinations in the playback interface:

and controlling the display position of the identification information of at least one of the plurality of audio track combinations, and dynamically changing in the playing interface.

11. The method of claim 10, further comprising:

playing combined audio corresponding to the plurality of audio track combinations;

wherein the spatial sound effect of the combined audio is related to the display position of the identification information of the plurality of audio track combinations.

12. An audio playback apparatus, comprising:

an audio playing module, configured to play, in response to an operation on identification information corresponding to a target audio track combination in the multiple audio track combinations, audio corresponding to the target audio track combination when multiple audio tracks are included in the target audio track combination, where the audio corresponding to the target audio track combination is obtained by mixing audio of multiple audio tracks included in the target audio track combination;

the parameter determining module is used for determining rendering parameters of effect animations respectively corresponding to the audio tracks contained in the target audio track combination according to the audio parameters of the audio tracks contained in the target audio track combination;

and in the process of playing the audio corresponding to the target audio track combination, rendering and displaying the effect animations corresponding to the audio tracks contained in the target audio track combination in a layered and superposed manner according to the rendering parameters of the effect animations corresponding to the audio tracks contained in the target audio track combination.

13. A computer device, characterized in that it comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the audio playback method according to any one of the preceding claims 1 to 11.

14. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the audio playback method according to any one of claims 1 to 11.