WO2023185425A1

WO2023185425A1 - Music matching method and apparatus, electronic device, storage medium, and program product

Info

Publication number: WO2023185425A1
Application number: PCT/CN2023/080987
Authority: WO
Inventors: 胡建丰; 黄鸣晨; 张依依
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2022-04-01
Filing date: 2023-03-13
Publication date: 2023-10-05
Also published as: CN116939323A

Abstract

Disclosed in embodiments of the present application are a music matching method and apparatus, an electronic device, a computer readable storage medium, and a computer program product. The method comprises: displaying a music matching interface of music to be matched, the music matching interface comprising: a matching control; and displaying an audio and video interface in response to a trigger operation on the matching control, the audio and video interface comprising a target track of the music to be matched and at least one target video matching the target track.

Description

Music matching method, device, electronic equipment, storage medium and program product

Cross-references to related applications

This application is filed based on the Chinese patent application with application number 202210348876.

Technical field

The present application relates to the field of music processing technology, and in particular to a music matching method, device, electronic device, storage medium and program product.

Background technique

With the development of science and technology, electronic devices are becoming more and more popular, and the functions of electronic devices are becoming more and more abundant. For example, users can listen to music through electronic devices, use fragmented time to watch short videos, etc.

Short videos generally include video frames and music. Due to the invisibility of music, if the user wants to see which videos include the music of interest, the user needs to manually view them one by one, resulting in low efficiency in viewing videos that match the music of interest. .

Contents of the invention

Embodiments of the present application provide a music matching method, device, electronic device, computer-readable storage medium, and computer program product, which can improve viewing efficiency of videos that match the music of interest.

The embodiment of the present application provides a music matching method, including:

Display a music matching interface for music to be matched, and the music matching interface includes matching controls;

In response to the triggering operation of the above matching control, an audio and video interface is displayed. The audio and video interface includes a target audio track of the music to be matched, and at least one target video matching the target audio track.

An embodiment of the present application also provides a music matching device, including:

The first display module is configured to display a music matching interface for music to be matched, and the music matching interface includes matching controls;

The second display module is configured to display an audio and video interface in response to a triggering operation on the matching control. The audio and video interface includes a target audio track of the music to be matched and at least one target video matching the target audio track.

In addition, an embodiment of the present application further provides an electronic device, including a processor and a memory. The memory stores a computer program. The processor is configured to implement the music matching method provided by the embodiment of the present application when running the computer program in the memory. .

In addition, embodiments of the present application also provide a computer-readable storage medium that stores a computer program. The computer program is suitable for loading by the processor to execute the music matching method provided by the embodiments of the present application.

In addition, embodiments of the present application also provide a computer program product, including a computer program. When the computer program is executed by a processor, the music matching method provided by the embodiment of the present application is implemented.

The embodiments of this application have the following beneficial effects:

In the embodiment of the present application, a matching control is included in the music matching interface of the music to be matched. In this way, the user is provided with the function of video matching for the music of interest. When the user triggers the matching control, the to-be-matched music is displayed in the audio and video interface. The target audio track of the matching music and the target video matching the target audio track are displayed. The display of the target audio track of the music to be matched realizes the visualization of the music to be matched. For the visualized music, the matching of the target audio track is realized. The automatic search and display of target videos improves the viewing efficiency of videos that match the music of interest.

Description of drawings

Figure 1A is a schematic architectural diagram of a music matching system provided by an embodiment of the present application;

Figure 1B is a schematic scene diagram of the music matching process provided by the embodiment of the present application;

Figure 2 is a schematic flow chart of the music matching method provided by the embodiment of the present application;

Figure 3 is a schematic diagram of a music interface provided by an embodiment of the present application;

Figure 4 is a schematic diagram of the music interface and music matching interface provided by the embodiment of the present application;

Figure 5 is a schematic diagram of the music extraction interface provided by the embodiment of the present application;

Figure 6 is a schematic diagram of the first uploading process of extracting music provided by the embodiment of the present application;

Figure 7 is a schematic diagram of another music interface provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a music matching interface for music to be matched provided by an embodiment of the present application;

Figure 9 is a schematic diagram of another music interface provided by an embodiment of the present application;

Figure 10 is a schematic diagram of the audio and video interface provided by the embodiment of the present application;

Figure 11 is a schematic diagram of another audio and video interface provided by an embodiment of the present application;

Figure 12 is a schematic diagram of the process of separating tracks of music to be matched provided by an embodiment of the present application;

Figure 13 is a schematic diagram of the waveforms of different musical instruments provided by the embodiment of the present application;

Figure 14 is a schematic diagram of the sound waveform provided by the embodiment of the present application;

Figure 15 is a schematic diagram of another process of track separation of music to be matched provided by an embodiment of the present application;

Figure 16 is a schematic diagram of target drum track data provided by an embodiment of the present application;

Figure 17 is a schematic diagram of the playback process provided by the embodiment of the present application;

Figure 18 is a schematic flow chart of another music matching method provided by an embodiment of the present application;

Figure 19 is a schematic diagram of the music matching interface provided by the embodiment of the present application;

Figure 20 is a schematic diagram of the process of obtaining a video to be matched provided by an embodiment of the present application;

Figure 21 is a schematic structural diagram of a music matching device provided by an embodiment of the present application;

Figure 22 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts fall within the scope of protection of this application.

“Multiple” in the embodiments of this application refers to two or more than two. “First”, “second”, etc. in the embodiments of this application are used to differentiate the description and should not be understood as implying relative importance.

Before describing the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application will be described. The nouns and terms involved in the embodiments of the present application are subject to the following explanations.

1) Client, an application running in the terminal to provide various services, such as instant messaging client and video playback client.

2) Response is used to represent the conditions or states on which the performed operations depend. When the dependent conditions or states are met, the one or more operations performed may be in real time or may have a set delay; Unless otherwise specified, there is no restriction on the execution order of the multiple operations performed.

Based on the above explanation of nouns and terms involved in the embodiments of the present application, the following describes the music matching system provided by the embodiments of the present application. In practical applications, the music matching method provided by the embodiment of the present application can be implemented by the terminal or the server alone, or by the terminal and the server collaboratively. Taking the collaborative implementation of the terminal and the server as an example, see Figure 1A. Figure 1A is provided by the embodiment of the present application. Schematic diagram of the architecture of the music matching system 100. In order to support an exemplary application, a terminal (terminal 400 is illustrated as an example) is connected to the server 200 through the network 300. The network 300 can be a wide area network or a local area network, or a combination of the two. Data transmission is achieved using wireless or wired links.

Terminal 400 is configured to display a music matching interface for music to be matched, and the music matching interface includes matching controls;

And, in response to the triggering operation of the matching control, send a matching request carrying the music to be matched to the server 200;

The server 200 is configured to obtain the target audio track of the music to be matched, perform video matching based on the target audio track, obtain at least one target video that matches the target audio track, and return the audio track information of the music to be matched and the target audio track. Match at least one target video to the terminal 400;

The terminal 400 is also configured to display an audio and video interface, which includes: a target audio track of the music to be matched, and at least one target video matching the target audio track.

Among them, the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

Moreover, multiple servers can be composed into a blockchain, and the servers are nodes on the blockchain.

The terminal can be a smartphone, tablet, laptop, desktop computer, smart speaker, smart watch, etc., but is not limited to this. The terminal and the server can be connected directly or indirectly through wired or wireless communication methods, which is not limited in this application.

In some embodiments, the music matching method provided by the embodiments of the present application can also be implemented by a terminal alone. As shown in Figure 1, the terminal can display a music matching interface for music to be matched, and the music matching interface includes matching controls; in response to the matching control The trigger operation displays the audio and video interface, which includes the target audio track of the music to be matched and the target video matching the target audio track.

Based on the above description of the music matching system and music matching scenarios, the music matching method, device, electronic device, computer readable storage medium and computer program product of the embodiments of the present application will be described in detail. It should be noted that the order of description of the following embodiments does not limit the preferred order of the embodiments.

In this embodiment, description will be made from the perspective of a music matching device. The music matching device can be integrated in an electronic device such as a server or terminal. In order to facilitate the description of the music matching method of the present application, the detailed description will be given below with the music matching device integrated in the terminal, that is, with the terminal as the execution subject.

Please refer to Figure 2, which is a schematic flow chart of a music matching method provided by an embodiment of the present application. The music matching method may include:

S201. The terminal displays a music matching interface for the music to be matched, and the music matching interface includes matching controls.

In actual applications, the terminal is equipped with a client, which can be a music matching client dedicated to music matching, or other clients with music matching functions, such as video playback clients, live broadcast clients, real-time Communication client, etc. When the terminal receives the client's startup instruction, it starts the client and displays the client's interface. In actual applications, what is displayed may be the homepage of the client, the music interface of the client, or other interfaces of the client.

When what is displayed is not the music interface of the client, the terminal displays the music interface in response to the target object's triggering operation on the control of the music page.

Wherein, the music interface may include a music identification of at least one piece of music. In response to the first selection operation of the music identification by the target object, the terminal displays a music matching interface of the first music identification corresponding to the first selection operation. The music matching interface includes matching control, the first music corresponding to the first music identification is the music to be matched.

For example, the music interface may be as shown in Figure 3, and the terminal displays the music matching interface of the first music identification in response to the target object's selection operation on the first music identification.

In practical applications, the music interface may also include a search control, and the terminal may display a search result interface in response to the target object's input operation on the search control. The search result interface includes a music identification of the music result corresponding to the input operation.

It should be understood that the client can exist in the form of an application program, a web page, or a small program. As for the existence form of the client, the user can choose according to the actual situation, which is not limited in this embodiment.

It should be noted that the music matching interface of the music to be matched can be a sub-interface of the music interface, that is, the music matching interface of the music to be matched is displayed in a certain area of the music interface, for example, as shown in Figure 4. Alternatively, the music matching interface can also be a separate interface, and the terminal jumps to a separate music matching interface in response to the target object's selection operation of the music logo. This embodiment is not limited here.

When the music interface includes a music identifier, the music corresponding to the music identifier is the music that already exists in the client, that is, the music to be matched is the music that already exists in the client, that is, the music that corresponds to the client already exists in the server. Music to be matched.

In some embodiments, the music to be matched may also be extracted music uploaded by the target object. In this case, the music interface may also include an extraction control. The terminal displays a music selection interface in response to a triggering operation on the extraction control. The music selection interface includes an extraction control. music or extract videos.

In response to the target object's initial selection operation of extracting music or extracting video, the terminal uploads the first extraction music or the first extraction video corresponding to the initial selection operation to the server corresponding to the client (if the initial selection operation corresponds to a video, then The terminal can first extract the music in the first extracted video to obtain the first extracted music, and then upload the first extracted music to the server corresponding to the client), and when the upload is successful, display the music extraction interface. The music extraction interface It includes a music matching interface for first extracting music, that is, the music matching interface for music to be matched at this time is a sub-interface of the music extraction interface.

The music extraction interface may also include an extraction control, so that the terminal can continue to display the music selection interface in response to the target object's triggering operation on the extraction control. For example, the music extraction interface can be shown in Figure 5.

Referring to Figure 6, Figure 6 is a schematic diagram of the uploading process of the first extracted music provided by the embodiment of the present application. The process of the terminal uploading the first extracted music to the server corresponding to the client may include:

Step 601: The terminal sends the first extracted music and permission package to the upload center through the client;

Step 602: The upload center unpacks the permission package through the business-side upload module to obtain the client's permission information;

Step 603: Upload the middle platform and then send the permission information to the login middle platform through the business side upload module;

Step 604: The login platform verifies the client's permissions based on the permission information, and returns the verification results to the upload platform;

Step 605: If the verification result is successful, the upload center generates the file identifier of the first extracted music through the business side upload module, and sends the first extracted music and file identifier to the cloud database for storage;

Step 606: Upload the file identifier returned by the middle station and the storage address of the first extracted music in the cloud database to the terminal;

Step 607: The terminal plays the first extracted music.

If the music matching interface of the music to be matched is a sub-interface of the music interface, or the music matching interface of the music to be matched is a sub-interface of the music extraction interface, the terminal can display multiple music to be matched in the music interface or the music extraction interface music matching interface.

That is to say, after the terminal displays the music matching interface of the music to be matched, since the music interface also includes a music identification, the terminal can also display the second selection in response to the target object's second selection operation on the music identification in the music interface. Operate the music matching interface of the corresponding second music identification, and the second music corresponding to the second music identification is also the music to be matched. The music identifier is an identifier in the music interface that does not have a corresponding music matching interface.

That is, the target object can select multiple music logos in the music interface, each music logo corresponding to a piece of music, so that the music matching interface corresponding to the multiple music logos is displayed in the music interface.

Alternatively, since the music extraction interface also includes an extraction control, the terminal can continue to display the music selection interface in response to the target object's triggering operation on the extraction control, and then the terminal can continue to display the music in response to the target object's target selection operation on extracting music or extracting video. The music extraction interface includes a first music matching interface for extracting music and a second music matching interface for extracting music corresponding to the target selection operation.

In addition, the music matching interface of the first music may also include a listening area, in which a playback progress bar and a playback progress bar of the music being auditioned may be displayed. The adjustment control is used to adjust the playback progress. Based on the audition area, the user can audition the music; when the terminal responds to the second selection operation of selecting the music identification, while selecting the music, the terminal selects the first music. The music matching interface may not display the audition area, but there may also be matching controls in the music matching interface of First Music.

Alternatively, the first music matching interface for extracting music can also include a listening area, in which a playback progress bar of the music being listened to and an adjustment control for adjusting the playback progress can be displayed. Based on the listening area, the user can listen to the music; When the terminal responds to a target selection operation of extracting music or extracting videos, the audition area may not be displayed on the first music matching interface for extracting music, but there may also be a matching control in the first music matching interface for extracting music.

For example, take the music matching interface of the music to be matched on the music interface as an example for explanation. In response to the target object's first selection operation on the music identification, the terminal displays the music matching interface of the first music identification corresponding to the first selection operation on the music interface. At this time, the music interface may be as shown in 701 in Figure 7 . In response to the target object's second selection operation on the music identification, the terminal displays the music matching interface of the second music identification corresponding to the second selection operation, and does not display the audition area on the music matching interface of the first music. At this time, the music matching interface This can be shown as 702 in Figure 7 .

It should be noted that the terminal can display more music matching interfaces corresponding to music. The implementation method may refer to the foregoing embodiments, and this embodiment will not be described in detail here.

In other embodiments, the music matching interface may also include shooting controls. The terminal may display a shooting interface of the music to be matched in response to the first triggering operation of the shooting control by the target object, so that the terminal can shoot the video according to the music to be matched.

In other embodiments, the music matching interface may also include a collection control for collecting music. The terminal can display a collection page in response to the target object's second triggering operation on the collection control. The collection page includes the music to be matched. This facilitates the user's search for the collected music and improves the efficiency of music search.

Wherein, the process of the terminal displaying the collection page in response to the target object's second triggering operation on the collection control may be: the terminal may respond to the target object's second triggering operation on the collection control, verify the login status of the target account corresponding to the target object , if the login status of the target account is logged in, the collection interface is displayed, and the collection page includes the music to be matched. If the login status of the target account is not logged in, the login interface is displayed. The login page includes login controls, and the terminal responds to the target The object's confirmation operation on the login control displays the collection interface; in this way, the collection of matching music is only executed when the target account of the target object is logged in, ensuring that the collected music is targeted and the ownership of the collected music is ensured. This allows the target object to view the music he has collected on the collection page under his/her target account.

For example, the music matching interface can be shown in Figure 8. The music matching interface includes playback controls, shooting controls, collection controls, matching controls, and audition areas. It should be understood that the terminal can play the music to be matched when receiving the play instruction, and display the audition area on the music matching interface of the music to be matched. When the terminal detects that the music to be matched is in a paused state, the terminal can play the music to be matched in the music matching interface of the music to be matched. The interface does not need to display the listening area.

It should be noted that if the music matching interface is a sub-interface and the music matching interface includes playback controls, shooting controls, collection controls, matching controls and audition areas, then when the terminal responds to the second selection operation, the music matching interface for the first music Can include playback controls, shooting controls, collection controls, and matching controls. Alternatively, when the terminal responds to the target selection operation, the first music matching interface for extracting music may also include a playback control, a shooting control, a collection control, and a matching control.

For example, when the music matching interface is a sub-interface of the music interface, the music matching interface of the first music can be as shown in Figure 9.

S202. In response to the triggering operation of the matching control, display the audio and video interface. The audio and video interface includes the target audio track of the music to be matched and at least one target video matching the target audio track.

After the terminal displays the music matching interface of the music to be matched, the target object can trigger the matching control, so that the terminal displays an audio and video interface in response to the triggering operation of the matching control. The audio and video interface includes the target audio track of the music to be matched, and the audio and video interface. The target audio track matches the target video, wherein the number of the target video is at least one. In some embodiments, the at least one target video may be presented in the form of a target video set.

Here, after the target object triggers the matching control, a matching instruction for the music to be matched is generated. The matching instruction is used to instruct to obtain a video that matches the music to be matched. The terminal responds to the triggering operation of the target object, that is, responds to the matching instruction. , separate the audio tracks of the music to be matched, obtain the target audio track, and obtain the target video matching the target audio track based on the obtained target audio track; in actual implementation, the audio track separation and target video acquisition operations can be performed by the terminal or Server implementation; the number of target audio tracks can be one or more. In actual implementation, each target audio track can correspond to a music attribute of the music to be matched. For example, the target audio track can be a vocal attribute corresponding to the music to be matched. The target vocal track, the target drum track corresponding to the drum beat attribute of the music to be matched, the target accompaniment track corresponding to the accompaniment attribute of the music to be matched, the target bass track corresponding to the bass attribute of the music to be matched, and the target bass track corresponding to the bass attribute of the music to be matched. At least one of the target sound effect tracks for the music's sound effect attribute.

In practical applications, the audio and video interface may include a first display area and a second display area. In response to the triggering operation of the matching control, the terminal displays the target audio track of the music to be matched in the first display area and displays the target video matching the target audio track in the second display area according to the preset display order.

For example, when the target audio track includes the target vocal track, the target drum track, the target accompaniment track, and the target bass track, the audio and video interface can be as shown in Figure 10; in actual applications, when the number of target audio tracks When there are multiple target audio tracks, the display order of the target audio tracks can correspond to the importance of the music attributes corresponding to the target audio tracks, and the importance of the music attributes can be set by the user based on their own needs.

That is, perform pitch separation on the target vocal data of the music to be matched (ambience 120 pitches) to obtain each pitch data, that is, separate the target vocal data corresponding to the target vocal track, and then compare the pitch data Normalize it so that it is displayed as 24-layer pitch on the audio and video interface. The musical scale map is the target vocal track, thereby achieving the effect of the music to be matched changing as the pitch of the human voice changes in pitch or pitch.

The target drum track data corresponding to the target drum track includes target drum track data of heavy drum type and target drum track data of light drum type. In some embodiments, the terminal displays the target drum beat of the music to be matched on the audio and video interface. When playing a track, the target drum track data of the heavy drum type and the target drum track data of the light drum type can be displayed differently in the target drum track, that is, the drum beats of the heavy drum type and the drum beats of the light drum type can be displayed differently, for example, The drum beats corresponding to the target drum track data of the heavy drum type can be drawn using one graphic (such as a big blue circle), and the drum beats corresponding to the target drum track data of the light drum type can be drawn using another graphic (such as a small green circle). , according to the time sequence in which the target drum track data appears, the drum beats corresponding to each target drum track data are drawn on the target drum track. Moreover, the terminal can use CALayer technology when drawing drum beats. Compared with UIView technology, CALayer technology can improve rendering performance.

In addition, the terminal can dynamically enlarge and display the drum beat reached by the playback progress bar (that is, the drum beat corresponding to the current playback position in the progress bar), so that the target object can more clearly understand the rhythm of the drum beat reached by the playback progress bar.

In some embodiments, when the terminal displays the target accompaniment track of the music to be matched on the audio and video interface, the pitch of the accompaniment of the music to be matched can be displayed in the target accompaniment track, and the pitch of the target accompaniment data of the music to be matched can be displayed. Drawn on the target accompaniment track, it is convenient for the target object to visually understand the ups and downs of the accompaniment of the music to be matched.

Because bass is low-frequency audio, it is difficult for the target object to feel the bass. Therefore, this embodiment draws the presence or absence of the target bass data of the music to be matched, and realizes the visualization of the bass data of the music to be matched, thereby making it easier for the target object to better understand the bass data of the music to be matched. Match the composition of the music.

Displaying the target track of the music to be matched on the audio and video interface can not only display the information of the music to be matched more accurately, but also allow the target object to see the music to be matched while hearing the music to be matched, making it easier for the target object to understand the music to be matched. Matching music allows non-professional target audiences to better understand the music to be matched.

In some embodiments, after the terminal displays at least one target video on the audio and video interface, the target object can select one of the at least one target video. For example, when there are multiple target videos and the multiple target videos constitute a target video collection. , the target object can select the target video in the target video collection, and the terminal responds to the target object's selection operation and plays the target video corresponding to the selection operation in the target video collection.

In actual applications, the terminal can play the target video corresponding to the selected operation with an enlarged animation effect, or the second display area includes the first sub-display area and the second sub-display area, and then the target video corresponding to the selected operation is played in The second sub-display area is played. At this time, the target video matching the target audio track is displayed in the second display area, including:

The target video matching the target audio track is displayed in the first sub-display area; the playback video is displayed in the second sub-display area, and the playback video is the selected target video in the first sub-display area. In practical applications, multiple target videos can be displayed in the first sub-display area. One of the multiple target videos is selected. The selected target video is played in the second sub-display area. When the user switches to When the target video in the selected state is selected, the target video played in the second sub-display area is also switched synchronously. In this way, the user can switch the selected target video in the first sub-display area to achieve playback in the second sub-display area. Browse the content of each target video.

The target video in the selected state is the target video corresponding to the selected operation.

For example, the audio and video interface can be as shown in Figure 11. At this time, the target video 1 is a playback video, and the terminal displays the target video 1 in the second sub-display area.

It should be noted that Figure 10 and Figure 11 are only examples of the audio and video interface. In the process of actual application, the audio and video interface can also be in other forms.

If the target video includes multiple target videos, a set of target videos matching the target audio track will be displayed in the first sub-display area, including:

Obtain the playback volume of each target video in the target video collection that matches the target audio track;

According to the play volume, the target videos in the target video collection are displayed in the first sub-display area in order.

Wherein, the target videos may be displayed in the second display area in order from large to small playback volume of the target videos.

Alternatively, you can also obtain the release time of the target video, and then display the target video in the second display area in the order of the release time of the target video.

And, if the preset number of target videos that can be displayed in the second display area is less than the number of target videos matching the target audio track, the terminal can first display the preset number of target videos in the second display area, and then respond to the target object The sliding operation displays the target video that has not yet been displayed in the second display area.

In practical applications, the audio and video interface can also include adjustment controls for the target audio track. After the audio and video interface is displayed in response to the triggering operation on the matching control, it also includes:

In response to the triggering operation on the adjustment control, obtain the current playback volume of the audio file of the adjustment target track corresponding to the adjustment control;

When the current playback volume exceeds the mute volume, adjust the current playback volume to the mute volume, and add a mask layer to the adjustment target audio track to hide the adjustment target audio track in the audio and video interface and obtain the adjusted music.

After the tracks of the music to be matched are separated and each target track is obtained, the data of each target track is equivalent to a separate audio file in the m4a format. The terminal can respond to the triggering operation of the adjustment control to realize the target track. Play and stop playing audio files.

Therefore, when the terminal responds to the triggering operation of the adjustment control, it obtains the current playback volume of the audio file of the adjustment target track corresponding to the adjustment control, and stores the current playback volume as the historical playback volume. When the current playback volume exceeds the mute volume , adjust the current playback volume to the mute volume, obtain the adjusted music, and add a mask layer to the adjustment target audio track to hide the adjustment target audio track in the audio and video interface.

When the current playback volume does not exceed the mute volume, adjust the playback volume of the audio file corresponding to the adjustment target audio track to the historical playback volume, and remove the mask layer on the adjustment target audio track to display the adjustment target audio track on the audio and video interface.

That is, when the current playback volume exceeds the mute volume, the terminal adjusts the current playback volume to the mute volume, obtains the adjusted music, and adds a mask layer to the adjustment target audio track. After hiding the adjustment target audio track in the audio and video interface, you can also remove the mask layer on the adjustment target audio track in response to the triggering operation of the adjustment control, so that the adjustment target audio track is displayed on the audio and video interface, and the adjustment target audio track corresponding to The playback volume of the audio file is adjusted to the current playback volume, thereby realizing the playback volume recovery processing and visual recovery processing of the adjustment target audio track.

The mute volume can be 0 or other volume thresholds. The user can set it according to the actual situation, which is not limited in this embodiment.

The adjustment control can be an identification of the target audio track, or the adjustment control can also be an additionally set control. Each target audio track has a corresponding adjustment control, so that the terminal can respond to a trigger operation on the adjustment control and mute the single target audio track corresponding to the trigger operation. Moreover, when the terminal adjusts the current playback volume of the target audio track to a mute volume, the terminal can hide the target audio track, other target audio tracks can still be displayed normally, and the audio files of other target audio tracks can still be played normally.

In addition, the terminal can adjust the current playback volume of multiple target audio tracks to a mute volume, so that only the audio file of one target audio track is ultimately played, so that the target object can better understand the sound of a single target audio track in the music to be matched. effect, thereby helping the target object to better understand the music to be matched in layers.

In some embodiments, when the current playback volume exceeds the mute volume, the current playback volume is adjusted to the mute volume, and a mask layer is added to the adjustment target audio track to hide the adjustment target audio track in the audio and video interface to obtain the adjusted music After that, it also includes:

Based on the adjusted music, update the target videos in the target video collection to obtain the updated video collection;

Display the updated video collection on the audio and video interface.

Because the audio track corresponding to the adjusted music obtained after adding a mask layer to the adjusted target audio track is different from the target audio track of the music to be matched, the video matching the audio track of the adjusted music will also change, so in After obtaining the adjusted music, the terminal can also update the target video in the target video collection based on the adjusted music, obtain the updated video collection, and then display the updated video collection on the audio and video interface, so that the music displayed on the audio and video interface Keep it up to date with the video.

Among them, based on the adjusted music, the target video in the target video collection is updated to obtain the updated video collection, including: if the adjusted target audio track is a preset target audio track, the adjusted music is determined according to the audio track of the adjusted music The corresponding target pattern string; according to the target pattern string, update the target videos in the target video set to obtain the updated video set.

The default target audio track refers to the target audio track used for video matching. For example, if the video to be matched corresponding to the initial music that matches the target drum track of the music to be matched is used as the target video, then the target drum track is the preset target track.

If the adjusted target audio track is the target audio track used for video matching, and the audio track corresponding to the adjusted music is missing the adjusted target audio track, the video obtained by matching based on the adjusted music audio track will be the target video in the target video set. are not the same, so the target video collection is updated.

If the adjusted target audio track is not the default target audio track, there is no need to update the target video in the target video collection.

For example, the target audio tracks are the target drum track, the target vocal track and the target bass track of the music to be matched, and the target audio track is adjusted to the target vocal track, that is, the adjusted music does not include the target vocal track. The target drum track and the target bass track are target audio tracks used for video matching, that is, the video to be matched corresponding to the initial music that matches the target drum track and matches the target bass track is used as the target video. Since the adjusted music still includes the target drum track and the target bass track, and the target drum track and the target bass track are the target tracks used for video matching, even if the target drum track and the target bass track are used for video matching, Matching is performed again, and the obtained video set is the same as the target video set. Therefore, there is no need to update the target video set.

According to the track of the adjusted music, the process of determining the target pattern string corresponding to the adjusted music can be referred to the process of determining the pattern string of the music to be matched. According to the target pattern string, the process of obtaining the updated video collection can be referred to the process of determining the target video collection. The process will not be described again in this embodiment.

Because the audio track corresponding to the adjusted music obtained after adding a mask layer to the adjusted target audio track is different from the target audio track of the music to be matched, the video matching the audio track of the adjusted music will also change, so in After obtaining the adjusted music, if the target audio track is adjusted to the preset target audio track, the terminal can also update the target video in the target video collection to obtain the updated video collection, and then display the updated video collection on the audio and video interface, so that The music displayed on the audio and video interface remains consistent with the video.

In some embodiments, in response to a triggering operation on the matching control, the process of displaying the audio and video interface may be:

In response to the triggering operation of the matching control, track separation is performed on the music to be matched, and the target track data corresponding to the music to be matched is obtained;

Format the target audio track data and obtain the pattern string corresponding to the target audio track data;

According to the pattern string, determine the target video that matches the target audio track, and the target audio track is the audio track corresponding to the target audio track data;

Display the audio and video interface.

Referring to Figure 12, Figure 12 is a schematic diagram of the process of track separation of music to be matched provided by an embodiment of the present application. The process of track separation of music to be matched and obtaining the target track corresponding to the music to be matched may include:

Step 1: In response to the triggering operation of the matching control, the terminal sends the file identification of the music to be matched and the target account of the target object to the matching server.

Step 2: The matching server verifies the login status of the target account.

Step 3. If the login status of the target account is logged in and the file identifier exists in the cloud database, the matching server searches for the audio track separation pipeline of the file identifier from the cache.

Step 4: If the file identifier has not been matched to the video, the matching server creates the audio track separation pipeline for the file identifier, and then stores the audio track separation pipeline in the cache.

Step 5: The matching server sends the file identification to the audio server.

Step 6: The audio server creates an audio track separation task corresponding to the file identification, and runs the audio track separation task to separate the audio tracks of the music to be matched corresponding to the file identification. At the same time, the identification of the audio track separation task is returned to the matching server.

Step 7. The matching server stores the identifier of the track separation task in the cache.

Step 8: When the audio server completes the separation of the audio tracks of the music to be matched, the audio server then sends the target audio track data of the target audio track to the matching server and the cloud database.

Step 9: The matching server then sends the target audio track data of the target audio track to the terminal.

Step 10: The terminal draws the target audio track based on the target audio track data.

Moreover, the audio server's process of separating tracks of the matching music includes:

Step 61: Create a corresponding step sub-flow for each step, that is, when the step is run, a step sub-flow corresponding to the step is created.

Step 62: Send the step sub-stream to the matching server.

Step 63: The matching server then sends the step sub-pipeline and the audio track separation pipeline to the wormhole.

In response to the triggering operation of the matching control, the terminal sends the file identification of the music to be matched to the pipeline server, so that the pipeline server can create a pipeline task corresponding to the file identification. Then perform step 64. The pipeline server runs the pipeline task and sends a pipeline acquisition request to the wormhole (wormhole refers to the channel connecting the pipeline server and the matching server). Step 65. The wormhole separates the audio track based on the pipeline acquisition request. Pipeline and steps The sub-pipeline is sent to the pipeline server, and the pipeline server then sends the track separation pipeline and step sub-pipeline to the pipeline database, and ends the pipeline task when the matching of the music to be matched is completed.

Among them, the track separation of the music to be matched can be performed through a trained neural network model or an independent component analysis algorithm. Because the vibration of the sound source does not produce sound waves of a single frequency, but a composite sound composed of a fundamental tone and overtones of different frequencies. For example, as shown in Figure 13, Figure 13 shows the waveforms of different musical instruments. It can also be seen from Figure 14 that the sound waveform is composed of different waveforms.

Therefore, the tracks of the music to be matched can be separated to obtain the waveforms of each target track, and then the target track data of each target track is determined based on the amplitude and frequency of each waveform. Moreover, when performing audio track separation, you can first perform Fourier transform on the music to be matched to obtain the matrix of the music to be matched in the frequency domain, and then divide the matrix to obtain the sub-matrix of each target audio track. The target audio track The submatrix is also the waveform of the target audio track.

If the matching server finds the audio track separation pipeline identified by the file from the cache, it obtains the target audio track data of the target audio track identified by the file from the cloud database, and returns the target audio track data of the target audio track to the terminal ( Refer to Figure 15).

Figure 15 is a schematic diagram of another process of track separation of music to be matched provided by an embodiment of the present application. Referring to Figure 15, the process of track separation of music to be matched includes:

Step 151: In response to the triggering operation of the matching control, the terminal sends the file identification of the music to be matched and the target account of the target object.

Step 152: The matching server verifies the login status of the target account.

Step 153: If the login status of the target account is logged in and the file identifier exists in the cloud database, the matching server searches for the audio track separation pipeline of the file identifier from the cache.

Step 154: If the audio track separation pipeline with the file identification exists in the cache, the matching server sends the file matching identification.

Step 155: The cloud database returns the target audio track data identified by the file to the matching server.

Step 156: The matching server returns the target audio track data of the file identification to the terminal.

Step 157: The terminal draws the target audio track according to the target audio track data.

Each step in the process of separating the tracks of the matched music is created with a corresponding step sub-flow, so that when a problem occurs in the process of separating the tracks of the matched music, the problematic step can be quickly determined without starting from scratch. Match music for track separation.

The target audio track data corresponding to the target audio track is in the form of a json list. For example, when the target audio track is the target drum track, the target drum track data can be as shown in Figure 16 (where SlowRhythm represents the heavy drum type, and PuckingDrum represents the light drum type. drum type). In order to speed up the matching, the terminal can format the target audio track data corresponding to the target audio track to obtain the pattern string corresponding to the target audio track, and then determine the target video matching the target audio track based on the pattern string.

Among them, according to the pattern string, the target video matching the target audio track is determined, including:

Obtain the video to be matched and obtain the main string of the initial music in each video to be matched;

Filter out the initial music corresponding to the main string matching the pattern string to obtain the target music;

The video to be matched corresponding to the target music is used as the target video matching the target audio track. When the number of target videos matching the target audio track is multiple, multiple target videos are constructed to obtain a target video set.

In this embodiment, the initial music in the video to be matched can be separated into tracks first to obtain the initial track data of each initial track, and then each initial track data can be formatted to obtain the main string corresponding to the initial track. , so that after getting the pattern string, the terminal can convert the pattern string to Match with the main string, and then use the initial music corresponding to the main string matching the pattern string as the target music. Finally, the terminal uses the video to be matched containing the target music as the target video that matches the target audio track.

It should be noted that when the music to be matched includes multiple target audio tracks, if the pattern string is matched with the main string, the pattern string of the same type of audio track is matched with the main string. For example, when the target track is a drum track corresponding to the drum beat attribute of the music to be matched, the initial track is also a drum track, and then the pattern string of the drum track of the music to be matched is combined with the pattern string of the drum track of the initial music. Main string to match.

For another example, when the target audio track is a vocal track corresponding to the vocal attribute of the music to be matched, the initial audio track is also a vocal track, and then the pattern string of the target vocal track of the music to be matched is matched with the pattern string of the initial music. The main string of the vocal track is matched.

If the music to be matched includes multiple target audio tracks, the pattern string of each target audio track can be matched with the main string of each initial audio track, or the pattern string of one of the target audio tracks can be matched with one of the initial audio tracks. The main strings of each target audio track can be matched, or the fusion pattern string can be obtained after fusing the pattern strings of each target audio track. After fusing the main strings of each initial audio track, the fusion main string can be obtained, and then the fusion pattern string and the fusion main string can be obtained. Strings are matched, which is not limited in this embodiment.

When matching the pattern string of one of the target audio tracks with the main string of one of the initial audio tracks, the target audio track can be the target drum track of the music to be matched, and the initial audio track can be the drum track of the target music, then When the target track data includes the target drum track data of the music to be matched, the target track data is formatted to obtain the pattern string corresponding to the target track data, including:

Sort the target drum track data according to the time corresponding to each target drum track data to obtain the first target drum sequence;

The target drum track data is formatted according to the first target drum sequence to obtain a pattern string corresponding to the target drum track, and the target drum track is a track corresponding to the target drum track data.

In order to obtain more comprehensive information about the music to be matched, in this embodiment, each target drum track data is sorted according to the time corresponding to each target drum track data, to obtain the first target drum sequence, and then according to the first target drum sequence Format the target drum track data to obtain the pattern string corresponding to the target drum track.

When sorting the target drum track data according to the time corresponding to each target drum track data, the target drum track data can be arranged in ascending order, or the target drum track data can be arranged in descending order. This embodiment is not limited here.

Among them, the target drum track data is formatted according to the first target drum sequence to obtain a pattern string corresponding to the target drum track, including:

Calculate the target time interval of the target drum track data pair in the first target drum sequence, where the target drum track data pair includes two adjacent target drum track data;

For each target drum track data included in the first target drum sequence, a drum type according to the target drum track data in a pair of target drum track data containing the target drum track data, and a drum type containing the target drum track data. The target time interval of the target drum track data pair, format the target drum track data;

Based on the formatting result of data of each target drum track included in the first target drum sequence, a pattern string corresponding to the target drum track is obtained.

The drum type of the target drum track data may include a heavy drum type and a light drum type. Pair the target drum track data containing the target drum track data, and the drum type of the target drum track data and the target drum track data pair. The target time interval is the result of formatting the data of two adjacent target drum tracks, and then the pattern string is determined based on the result of the formatting of each target drum track.

For example, the pattern string can be:

S0P520P520P520P520P520P520P520P520P520P520PS0P, where S represents the heavy drum type, P represents the light drum type, and the number between S and P represents the target time interval. Counting from the left, the target drum track data corresponding to the first S and the first P correspond to The target drum track data can be a target drum track data pair. The first S and the first 0 are the results of the target drum track data format corresponding to the first S. The first 0 and the first P That is, the result of the target drum track data format corresponding to the first P. That is, the first S, the first 0 and the first P contain the target drum track data corresponding to the first S. The result after formatting with the target drum track data containing the target drum track data corresponding to the first P.

As can be seen from the above example, there is a situation where the target time interval is 0. When the target time interval is 0, it means that the two adjacent target drum track data are invalid data. Then the terminal can delete the target after getting the target time interval. One of the two adjacent target drum track data corresponding to the time interval of 0.

Therefore, in other embodiments, for each target drum track data included in the first target drum sequence, the drum type of the target drum track data in the pair of target drum track data including the target drum track data is , and the target time interval of the target drum track data pair containing the target drum track data, format the target drum track data, including:

Delete the target drum track data of the target drum track data pair corresponding to the target time interval that does not exceed the preset time interval in the first target drum sequence to obtain the second target drum sequence;

For each target drum track data included in the second target drum sequence, a drum type according to the target drum track data in a pair of target drum track data containing the target drum track data, and a drum type containing the target drum track data. The target time interval of the target drum track data pair, format the target drum track data;

Based on the formatting result of each target drum track data included in the first target drum sequence, a pattern string corresponding to the target drum track is obtained, including:

Based on the formatting result of data of each target drum track included in the second target drum sequence, a pattern corresponding to the target drum track is obtained string.

The preset time interval may be 0 or other time intervals, and may be set according to the actual situation, which is not limited in this embodiment.

For example, the pattern string is:

S0P520P520P520P520P520P520P520P520P520P520PS0P, then counting from the left, the target time interval between the first S and the first P is 0, then the first S or the first P can be deleted.

In this embodiment, the target drum beat track data pair corresponding to the target time interval that does not exceed the preset time interval is deleted from the first target drum beat sequence, and the corresponding target drum beat track data is obtained to obtain the second target drum beat sequence, and then For each target drum track data included in the second target drum sequence, a drum type according to the target drum track data in a pair of target drum track data containing the target drum track data, and a drum type containing the target drum track data. Format the target drum track data at the target time interval of the target drum track data pair, and obtain the pattern corresponding to the target drum track based on the formatting result of each target drum track data included in the second target drum sequence. string, so that invalid target drum track data can be deleted, and the calculation amount of formatting the target drum track data in the second target drum sequence can be reduced, thereby obtaining the pattern string corresponding to the target track more quickly.

It should be understood that when the target drum track data of the music to be matched includes multiple, it is not necessary to calculate all the target drum track data of the music to be matched, and only the first target number of target drum track data in the target drum sequence can be calculated, so that Reduce the calculation amount of the target time interval and the subsequent calculation amount of matching the pattern string and the main string.

The target quantity can be set according to the duration of the music to be matched. For example, when the duration of the music to be matched is 3 minutes, the target number can be set to 90.

Or, when the target drum track data to be matched includes multiple data, the pattern string will be longer. In the same way, the main string also has the same problem. Therefore, after obtaining the pattern string corresponding to the target drum track based on the formatting result of each target drum track included in the second target drum sequence, it also includes:

Encode the pattern string to obtain the encoded pattern string;

Obtain the main string of the initial music in each video to be matched, including:

Obtain the encoded main string of the initial music in each video to be matched;

Filter out the initial music corresponding to the main string matching the pattern string and obtain the target music, including:

Filter out the initial music corresponding to the encoded main string that matches the encoded pattern string to obtain the target music.

The method of matching the pattern string and the main string can be selected according to the actual situation. For example, choose the Knuth-Morris-Pratt algorithm (Knuth-Morris-Pratt, KMP) or the suffix matching method (Boyer-Moore, BM) or Sunday algorithm as the matching method in this embodiment, which is not limited in this embodiment.

When matching the pattern string of one of the target audio tracks with the main string of one of the initial audio tracks, the target audio track can also be the target bass track of the music to be matched, and the initial audio track can be the bass track of the target music, Regarding the process of formatting the target bass track, you may refer to the process of formatting the target drum track, which will not be described again in this embodiment.

In other embodiments, the initial music corresponding to the main string matching the pattern string is filtered out to obtain the target music, including:

Match the pattern string with the main string;

Determine at least one initial music corresponding to the main string whose matching degree is greater than the preset matching threshold as candidate music;

Screen out at least one target music from the candidate music according to the target audio track data.

In this embodiment, at least one initial music corresponding to the main string whose matching degree is greater than the preset matching threshold is not directly used as the candidate music, instead of being used as the target music, and then filtered out from the candidate music based on the target track data. At least one target music is obtained, thereby obtaining target music with a higher matching degree to the music to be matched.

Among them, at least one target music is selected from the candidate music according to the target track data, including:

Filter out the target drum track data from the target track data;

Extract the first time data of the target drum beat from the target drum beat track data, and determine the second time data of the candidate music based on the drum beat track data corresponding to the candidate music;

According to the first time data and the second time data, at least one target music is screened out from the candidate music.

After the audio tracks of the music to be matched are separated, the target audio track data corresponding to each target audio track of the music to be matched can be obtained. Then, the target drum beat track data can be filtered out from the target track data. The target drum beat track data includes the target drum beat and the first time data corresponding to the target drum beat. The drum beat track data corresponding to the candidate music includes the drum beat of the candidate music and the candidate music. The second time data corresponding to the drum beat of the music.

After obtaining the first time data and the second time data, compare the first time data and the second time data. If the first time data is the same and the second time data is the same, then the target drum beat corresponding to the first time data is The drum beats corresponding to the second time data are the same. When each drum beat of the candidate music is the same as each target drum beat, the candidate music is the target music, or when the number of drum beats of the candidate music that are the same as the target drum beat exceeds the th. When there is a preset number, the candidate music can also be used as the target music.

In practical applications, the first time data of the target drum beat is extracted from the target drum beat track data, including:

Identify the time data corresponding to each drum beat in the target drum beat track data to obtain a time data set;

Obtain the initial position of the target character corresponding to the target drum beat in the pattern string in the target drum beat track data;

According to the initial position, the first time data corresponding to the target drum beat is filtered out from the time data collection.

The time data corresponding to each drum beat has a corresponding position in the target drum beat track data. After obtaining the time data set, the target can be filtered out from the time data set according to the initial position of the target character corresponding to the target drum beat in the pattern string. The first time data corresponding to the drum beat.

For example, the pattern string corresponding to the target drum beat track is S0P520P520P520P520P520P520P520P520P520P520PS0P, and the target drum beat is the first drum beat, then the target character corresponding to the first drum beat in the pattern string is the first S in the pattern string, then the first S is in The initial position in the pattern string is the first one, and the first time data in the time data set is the first time data corresponding to the first drum beat.

The process of determining the second time data of the candidate music based on the drum beat track data corresponding to the candidate music may refer to the process of extracting the first time data of the target drum beat from the target drum beat track data, which will not be described again in this embodiment.

After displaying the audio and video interface in response to the triggering operation of the matching control, the terminal can directly play the music to be matched and play the video. Alternatively, the audio and video interface can include playback controls, and the terminal can also respond to the target object to play the The trigger operation of the control is to play the music to be matched and play the video.

In addition, when the terminal plays the music to be matched, it can play dynamic effects (i.e. dynamic special effects) on the target audio track according to the playback progress bar, that is, dynamically play the pattern in the target audio track corresponding to the position where the playback progress bar reaches.

It should be noted that at least one target audio track on the audio and video interface can be dynamically played according to the playback progress bar, and the dynamic playback method of each target audio track can be the same or different. This implementation The examples are not limited here.

When the target audio track is the target drum beat track, the pattern in the target audio track can refer to the drum beats in the target audio track. Therefore, in some embodiments, dynamic playback is performed on the target audio track according to the playback progress bar, including:

Filter out the target drum beat track in the target audio track, and identify the currently playing target drum beat in the target drum beat track according to the playback progress bar;

According to the drum type of the target drum beat, animate the target drum beat on the target drum track.

Drum beat types include heavy drum types and light drum types. The dynamic effect types corresponding to different drum beat types, that is, the dynamic effect playback methods corresponding to different drum beat types can be different or the same. The dynamic effect playback method can be in the form of dynamic amplification or static amplification. As for the dynamic effect playback method, the user can choose according to the actual situation, and this embodiment is not limited here.

Among them, according to the playback progress bar, the target drum beat currently being played is identified in the target drum beat track, including:

Obtain the position information of the playback progress bar in the target drum track and the position interval of each drum beat in the target drum track;

Match the position information with the position interval, and use the drum beat corresponding to the position interval matching the position information as the target drum beat currently being played.

However, if two drum beats overlap on the target drum track, that is, if there are two drum beats reached by the playback progress bar at this time, the terminal will play the two drum beats reached by the progress bar at the same time, that is, the two drum beats reached by the progress bar will be filtered out at this time. There are two target drum beats, which causes the previous drum beat of the two drum beats to be played repeatedly, resulting in an error.

In order to solve this technical problem, in other embodiments, Figure 17 is a schematic diagram of the playback process provided by the embodiment of the present application. Referring to Figure 17, according to the drum type of the target drum beat, the target drum beat is played on the target drum track. Motion effect playback, including:

Step 171: Obtain the position information of the playback progress bar in the target drum track and the position interval of each drum beat in the target drum track, and use the drum beat corresponding to the position interval matching the position information as the currently played target drum beat.

Step 172: Determine the storage status of the target drum beat in the played array;

Step 173: If the storage status is unstored, obtain the drum beat type of the target drum beat, and determine the dynamic effect type of the target drum beat based on the drum beat type;

Step 174: Based on the motion effect type, play the target drum beat with motion effect on the target drum beat track, and store the target drum beat in the played array.

If the storage status is unstored, it means that the terminal has not played the target drum beat. The terminal can play the target drum beat based on the dynamic effect type of the target drum beat and store the target drum beat in the played array.

If the storage state is a stored state, indicating that the target drum beat has been played, the target drum beat can be deleted from the played array. Therefore, in other embodiments, the storage status of the target drum beat in the played array is determined. After that, it also includes:

Step 175: If the storage status is the stored status, obtain the current position information of the playback progress bar on the target drum track;

Step 176: Determine whether the current position information matches the position interval of the target drum beat. If so, play the music to be matched. If not, perform step 177.

Step 177: When the current position information does not match the position interval of the target drum beat, delete the target drum beat in the played array.

In this embodiment, a played array is set, and then the target drum beat that has been played is stored in the played array, so that the terminal can determine whether the target drum beat has been played based on the played array, so that the already played drum beat will not be played repeatedly. The target drum beat that was played.

In some embodiments, when the terminal displays the music matching interface, it is in the inspiration mode by default. The inspiration mode is used to realize automatic matching of music and video. In response to the triggering operation of the matching control, the terminal can display the music in the following manner. Video interface:

The terminal displays an audio and video interface in response to a triggering operation on the matching control in the inspiration mode;

Correspondingly, the user can switch the inspiration mode based on the mode switching control. In actual implementation, the method also includes:

A mode switching control for mode switching is displayed in the audio and video interface; in response to the triggering operation of the mode switching control, the control switches the inspiration mode to the editing mode, and the editing mode is used to edit the music to be matched; thus, during editing In this mode, users can edit the music to be matched, and then match the edited music with related videos.

In some embodiments, after the terminal controls switching the inspiration mode to the editing mode, the method further includes:

In response to the editing operation of the music to be matched in the editing mode, the terminal displays each audio track obtained by separating the audio tracks of the edited music to be matched; and updates and displays at least one target video that matches each audio.

In some embodiments, in order to facilitate the user to understand the editing method of matching music, the user may be guided to edit. Accordingly, the method further includes:

The terminal displays editing guidance information in response to the triggering operation of the mode switching control. The editing guidance information is used to guide the editing object to edit the music to be matched in the editing mode.

In the embodiment of the present application, a matching control is included in the music matching interface of the music to be matched. In this way, the user is provided with the function of video matching for the music of interest. When the user triggers the matching control, it is displayed in the audio and video display interface. The target audio track of the music to be matched and the target video matching the target audio track. The display of the target audio track of the music to be matched realizes the visualization of the music to be matched. For the visualized music, the matching with the target audio track is realized. The automatic search and display of target videos improves the viewing efficiency of videos that match the music of interest. That is, in this application, the music matching interface includes a matching control, and in response to the triggering operation of the matching control, the target video matching the target audio track of the music to be matched can be automatically found, and the target audio track and the target audio track can be displayed on the audio and video interface. The target videos do not need to be viewed manually one by one, which is more convenient.

According to the methods described in the above embodiments, examples will be given for detailed description below.

Please refer to FIG. 18 , which is a schematic flowchart of a music matching method provided by an embodiment of the present application. The music matching method process may include:

S1801. The terminal displays a music matching interface of the first music in the music interface. The music matching interface of the first music includes a matching control and a listening area, and the first music is the music to be matched.

Here, the playback progress bar of the auditioned music and the adjustment control for adjusting the playback progress can be displayed in the audition area. Based on the audition area, the user can audition the music.

S1802. The terminal displays the music matching interface of the second music and the music matching interface of the first music in the music interface. The music matching interface of the second music includes a matching control and a listening area, and the music matching interface of the first music includes a matching control. The second music is the music to be matched.

S1803. The terminal determines that a piece of music is the music to be matched, and in response to the triggering operation of the matching control of the music to be matched, separates the tracks of the music to be matched, and obtains the target track data corresponding to the target track of the music to be matched. The target track includes Target vocal track, target backing track, target bass track, and target drum track.

At this time, the music to be matched may be the first music or the second music.

S1804. The terminal sorts the target drum beat track data according to the time corresponding to the target drum beat track data, and obtains the first target drum beat sequence.

S1805. The terminal calculates the target time interval of the target drum track data pair in the first target drum sequence, and deletes the target drum track data pair corresponding to the target time interval that does not exceed the preset time interval in the first target drum sequence, corresponding to The target drum track data is obtained to obtain a second target drum sequence, and the target drum track data pair includes two adjacent target drum track data.

S1806. For each target drum track data included in the second target drum sequence, the terminal determines the drum type of the target drum track data in the target drum track data pair containing the target drum track data, and the target drum track data containing the target drum track data. The target drum beat track data is formatted according to the target time interval of the target drum beat track data pair of the track data, and the target drum beat sound is obtained based on the formatted result of each target drum beat track data included in the second target drum beat sequence. The pattern string corresponding to the track.

S1807. The terminal obtains the initial music in the video to be matched, separates the tracks of the initial music, and obtains the initial drum track data of the initial music.

When acquiring the video to be matched, the terminal may extract the initial music in the video to be matched, and separate the audio tracks of the initial music. Alternatively, the terminal may also extract the initial music and separate the tracks of the initial music when receiving the separation instruction after acquiring the video to be matched. This embodiment is not limited here.

S1808: The terminal sorts the initial drum track data according to the time corresponding to the initial drum track data of the initial drum track to obtain the first initial drum sequence.

S1809. The terminal calculates the initial time interval of the initial drum track data pair in the first initial drum sequence, and deletes the initial drum track data pair corresponding to the initial time interval that does not exceed the preset time interval in the first initial drum sequence, corresponding to The initial drum track data is obtained to obtain a second initial drum sequence, and the initial drum track data pair includes two adjacent initial drum track data.

S18010. For each initial drum beat track data included in the second initial drum beat sequence, the terminal determines the drum beat type of the initial drum beat track data in the pair of initial drum beat track data that includes the initial drum beat track data, and the initial drum beat track data that includes the initial drum beat track data. the initial time interval of the initial drum beat track data pair of the track data, format the initial drum beat track data, and obtain the initial drum beat sound based on the formatted result of each initial drum beat track data included in the second initial drum beat sequence The main string corresponding to the track.

S18011. The terminal matches the pattern string and the main string, uses at least one initial music with a matching degree greater than the preset matching degree as candidate music, and identifies the time data corresponding to each drum beat in the target drum beat track data to obtain the time data. gather.

S18012. The terminal obtains the initial position of the target character corresponding to the target drum beat in the pattern string in the target drum beat track data, and filters out the first time data corresponding to the target drum beat in the time data collection based on the initial position.

S18013. The terminal determines the second time data of the candidate music based on the drum beat track data corresponding to the candidate music, selects at least one target music from the candidate music based on the first time data and the second time data, and adds the target music containing the target music to the candidate music. Match the video as the target video.

In this embodiment, the target drum track data of the music to be matched is matched with the initial drum track data of the initial music in the video to be matched, thereby determining the target video containing the music to be matched.

For example, referring to Figure 19, the terminal separates the tracks of the music to be matched through the audio track separation task, obtains the target track data, and formats the target track data to obtain the pattern string. The terminal extracts the initial music from the video to be matched through the video preprocessing task, then separates the audio track of the initial music to obtain the initial audio track data, formats the initial audio track data to obtain the main string, and finally combines the video to be matched and the main string Associations are stored in the video library. Finally, the terminal matches the pattern string and the main string through the video matching task. When the matching degree between the pattern string and the main string is greater than the preset threshold, the main string corresponding to the matching degree greater than the preset threshold will be matched with at least one initial music as candidate music.

Then, according to the initial position of the target character corresponding to the target drum beat in the pattern string in the target drum beat track data, the first time data corresponding to the target drum beat and the drum beat track data corresponding to the candidate music are filtered out from the time data set to determine the candidate music The second time data is used to select at least one target music from the candidate music, and the to-be-matched video containing the target music is used as the target video, that is, the pattern string is restored.

The video to be matched may be a video shot by the target object using the original music. For example, as shown in Figure 20, after acquiring the video to be matched, the terminal stores the video to be matched containing the same initial music into the video library. The same initial music may refer to the initial music with the same music identifier. Since The music identifiers are the same, so the same initial music may include exactly the same initial music, or may include part of the same initial music. For example, the partially identical initial music may be: the initial music a and the edited initial music a are the same initial music.

Then, after the terminal obtains the music to be matched, it obtains the video to be matched from the video library based on the music to be matched, and matches the main string of the video to be matched with the pattern string of the music to be matched.

S18014. The terminal displays the target vocal track, the target accompaniment track, the target bass track and the target drum track in the first display area in accordance with the preset display order, displays the target video in the first sub-display area, and plays The video is displayed in the second sub-display area, and the played video is the target video in the selected state. The first display area and the second display area form an audio and video interface, and the audio and video interface includes playback controls and adjustment controls.

S18015. In response to the triggering operation on the adjustment control, the terminal obtains the current playback volume of the audio file of the adjustment target track corresponding to the adjustment control.

S18016. When the current playback volume exceeds the mute volume, adjust the current playback volume to the mute volume, and add a mask layer to the adjustment target audio track to hide the adjustment target audio track in the audio and video interface and obtain the adjusted music.

When the current playback volume does not exceed the mute volume, remove the mask on the adjustment target audio track to display the adjustment target audio track on the audio and video interface, and adjust the playback volume of the audio file corresponding to the adjustment target audio track to the historical playback volume.

S18017. If the adjusted target audio track is the preset target audio track, determine the target pattern string corresponding to the adjusted music based on the adjusted music track, and update the target video in the target video collection according to the target pattern string to obtain Updated video collection.

For other implementable methods and corresponding beneficial effects in this embodiment, reference can be made to the above music matching method, which will not be described again in this embodiment.

In order to facilitate better implementation of the music matching method provided by the embodiment of the present application, the embodiment of the present application also provides a device based on the above music matching method. The meanings of the nouns are the same as in the above music matching method. For implementation details, please refer to the description in the method embodiment.

For example, as shown in Figure 21, the music matching device may include:

The first display module 2101 is configured to display a music matching interface for music to be matched, and the music matching interface includes matching controls.

The second display module 2102 is configured to display an audio and video interface in response to the triggering operation of the matching control. The audio and video interface includes the target audio track of the music to be matched and the target video matching the target audio track.

In practical applications, the second display module 2102 is also configured to perform:

In response to the triggering operation of the matching control, track separation is performed on the music to be matched, and the target track data of the music to be matched is obtained;

Determine at least one target video matching the target audio track according to the pattern string, and the target audio track is the audio track corresponding to the target audio track data;

Display an audio and video interface including the target audio track and the at least one target video.

Obtain the main string of the initial music in each video to be matched;

The video to be matched corresponding to the target music is determined as the target audio and video that matches the target audio track, and the target video is constructed.

Match the pattern string with the main string to obtain the matching degree between the pattern string and the main string;

Select the target drum beat track data from the target track data filter;

According to the first time data and the second time data of each candidate music, at least one target music is screened out from the candidate music.

Obtain the initial position of the target character in the pattern string, wherein the target character is the character corresponding to the target drum beat in the target drum beat track data;

According to the initial position, the time data corresponding to the target drum beat is filtered out from the time data set as the first time data.

The target track data includes target drum track data of the music to be matched.

Correspondingly, the second display module 2102 is also configured to perform:

Each target drum track data is formatted according to the first target drum sequence to obtain a pattern string corresponding to the target drum track, and the target drum track is a track corresponding to the target drum track data.

For each target drum track data in the first target drum sequence, match the drum type of the target drum track data and the target drum sound containing the target drum track data according to the target drum track data containing the target drum track data. According to the target time interval of the track data pair, the target drum track data is formatted, and the formatting result corresponding to each target drum track data is obtained;

Based on the formatting result corresponding to each target drum track data, a pattern string corresponding to the target drum track is determined.

Based on the formatting result of data of each target drum track included in the second target drum sequence, a pattern string corresponding to the target drum track is obtained.

In response to the triggering operation of the matching control, display the target audio track of the music to be matched in the first display area according to a preset display order;

Display the target video set matching the target audio track in the second display area;

The audio and video interface includes a first display area and a second display area.

In some embodiments, the number of the target audio tracks is at least two, and each target audio track corresponds to a music attribute of the music to be matched. The second display module 2102 is also configured to, when the audio track is The first display area in the video interface displays at least two of the target audio tracks according to a preset display order.

In practical applications, the second display area includes a first sub-display area and a second sub-display area, the number of the target videos is multiple, and the multiple target videos constitute a target video set, and the target video set includes playback video.

Correspondingly, the second display module 2102 is also configured to perform:

Display the target video set matching the target audio track in the first sub-display area;

The playback video is displayed in the second sub-display area, and the playback video is the selected target video in the target video collection.

The target videos in the target video collection are sequentially displayed in the first sub-display area in order from high to low playback volume.

In practical applications, the target audio track is obtained by separating the audio tracks of the music to be matched; the audio and video interface also includes adjustment controls for the target audio track.

Correspondingly, the music matching device also includes:

Silent hidden processing module, configured to execute:

When the current playback volume exceeds the mute volume, adjust the current playback volume to the mute volume, and add a mask layer to the adjustment target audio track to hide the above adjustment target audio track in the audio and video interface, and obtain the adjusted music.

In practical applications, music matching devices also include:

Update module, configured to execute:

If the adjusted target audio track is a preset target audio track, determine the target pattern string corresponding to the adjusted music based on the adjusted music track;

According to the target pattern string, the target videos in the target video collection are updated to obtain the updated video collection.

In some embodiments, the second display module is further configured to display an audio and video interface in response to a triggering operation on the matching control in the inspiration mode;

And, display mode switching control in the audio and video interface;

The device further includes a switching control configured to control switching of the inspiration mode to an editing mode in response to a triggering operation of the mode switching control, and the editing mode is used to edit the music to be matched.

In some embodiments, the second display module is further configured to, in response to an editing operation on the music to be matched in the editing mode, display the audio track separation of the edited music to be matched. individual audio tracks;

The update displays at least one target video matching the respective audio.

In some embodiments, the second display module is further configured to display editing guidance information in response to the triggering operation of the mode switching control, and the editing guidance information is used to guide the editing object in the editing mode. The music to be matched is edited.

During specific implementation, each of the above modules can be implemented as an independent entity, or can be combined in any way to be implemented as the same or several entities. The specific implementation methods and corresponding beneficial effects of each of the above modules can be found in the previous method embodiments. I won’t go into details here.

An embodiment of the present application also provides an electronic device, which may be a server or a terminal, etc., as shown in Figure 22, which shows a schematic structural diagram of the electronic device involved in the embodiment of the present application. Specifically:

The electronic device may include components such as a processor 2201 of one or more processing cores, a memory 2202 of one or more computer-readable storage media, a power supply 2203, and an input unit 2204. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 22 does not constitute a limitation of the electronic device, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components. in:

The processor 2201 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire electronic device, by running or executing computer programs and/or modules stored in the memory 2202, and calling programs stored in the memory 2202. Data, perform various functions of electronic devices and process data. Optionally, the processor 2201 may include one or more processing cores; preferably, the processor 2201 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 2201.

The memory 2202 may be configured to store computer programs and modules, and the processor 2201 executes various functional applications and data processing by running the computer programs and modules stored in the memory 2202. The memory 2202 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, a computer program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store data based on Data created by the use of electronic devices, etc. In addition, memory 2202 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 2202 may also include a memory controller to provide the processor 2201 with access to the memory 2202.

The electronic device also includes a power supply 2203 that supplies power to various components. Preferably, the power supply 2203 can be logically connected to the processor 2201 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system. The power supply 2203 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components.

The electronic device may also include an input unit 2204 that may be configured to receive input numeric or character information and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may also include a display unit and the like, which will not be described again here. Specifically, in this embodiment, the processor 2201 in the electronic device will load the executable files corresponding to the processes of one or more computer programs into the memory 2202 according to the following instructions, and the processor 2201 will run the executable files stored in the computer program. Computer programs in memory 2202 to implement various functions, such as:

In response to the triggering operation of the matching control, an audio and video interface is displayed, and the audio and video interface includes a target audio track of the music to be matched, and at least one target video matching the target audio track.

The specific implementation of each of the above operations and the corresponding beneficial effects can be found in the detailed description of the music matching method above, and will not be described again here.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by a computer program, or by controlling relevant hardware by a computer program. The computer program can be stored in a computer-readable storage. media and loaded and executed by the processor.

To this end, embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and the computer program can be loaded by a processor to execute the steps in any music matching method provided by the embodiments of the present application. . For example, the computer program can perform the following steps:

The specific implementation of each of the above operations and the corresponding beneficial effects can be found in the previous embodiments, and will not be described again here.

Among them, the computer-readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Since the computer program stored in the computer-readable storage medium can execute the steps in any music matching method provided by the embodiments of the present application, any music matching method provided by the embodiments of the present application can be implemented. The beneficial effects that can be achieved are detailed in the previous section. The embodiments will not be described again here.

Among them, according to one aspect of the present application, a computer program product or computer program is provided. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the above music matching method.

The music matching method, device, electronic equipment and computer-readable storage medium provided by the embodiments of the present application have been introduced in detail above. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The above embodiments The description is only used to help understand the method and core ideas of the present application; at the same time, for those skilled in the art, there will be changes in the specific implementation and application scope based on the ideas of the present application. In summary, , the content of this description should not be understood as a limitation of this application.

Claims

A music matching method, the method is executed by an electronic device, including:

Display a music matching interface for music to be matched, and the music matching interface includes matching controls;

In response to the triggering operation of the matching control, an audio and video interface is displayed, and the audio and video interface includes: the target audio track of the music to be matched, and at least one target video matching the target audio track.
The music matching method according to claim 1, wherein the display of an audio and video interface in response to a triggering operation of the matching control includes:

In response to the triggering operation of the matching control, track separation is performed on the music to be matched, and the target track data of the music to be matched is obtained;

Format the target audio track data to obtain a pattern string corresponding to the target audio track data;

Determine at least one target video that matches the target audio track according to the pattern string, and the target audio track is the audio track corresponding to the target audio track data;

Display an audio and video interface including the target audio track and the at least one target video.
The music matching method according to claim 2, wherein determining, according to the pattern string, at least one target video matching the target audio track includes:

Obtain the main string of the initial music in each video to be matched;

Filter out the initial music corresponding to the main string matching the pattern string to obtain the target music;

The video to be matched corresponding to the target music is determined as the target video matching the target audio track.
The music matching method according to claim 3, wherein the filtering out the initial music corresponding to the main string matching the pattern string to obtain the target music includes:

Match the pattern string with the main string to obtain the matching degree between the pattern string and the main string;

Determine at least one initial music corresponding to the main string whose matching degree is greater than the preset matching threshold as candidate music;

Target music is filtered out from the at least one candidate music according to the target track data.
The music matching method according to claim 4, wherein filtering out the target music from the at least one candidate music according to the target track data includes:

Filter out target drum beat track data from the target track data;

Extract the first time data of the target drum beat from the target drum beat track data, and determine the second time data of each of the candidate music based on the drum beat track data corresponding to each of the candidate music;

Target music is filtered out from the at least one candidate music according to the first time data and the second time data of each candidate music.
The music matching method according to claim 5, wherein the extracting the first time data of the target drum beat from the target drum beat track data includes:

Identify the time data corresponding to each drum beat in the target drum beat track data to obtain a time data set;

Obtain the initial position of the target character in the pattern string, wherein the target character is the character corresponding to the target drum beat in the target drum beat track data;

According to the initial position, the time data corresponding to the target drum beat is filtered out from the time data set as the first time data.
The music matching method according to claim 2, wherein the target track data includes target drum track data corresponding to each drum beat in the music to be matched;

Formatting the target audio track data to obtain a pattern string corresponding to the target audio track data includes:

Sorting the target drum beat track data according to the time corresponding to each target drum beat track data to obtain the first target drum beat sequence;

According to the first target drum sequence, each target drum track data is formatted to obtain a pattern string corresponding to the target drum track, and the target drum track is the track corresponding to the target drum track data. .
The music matching method according to claim 7, wherein the formatting of each target drum track data according to the first target drum sequence to obtain a pattern string corresponding to the target drum track includes:

Calculate the target time interval of the target drum track data pair in the first target drum sequence, where the target drum track data pair includes two adjacent target drum track data;

For each target drum beat track data in the first target drum beat sequence, according to the drum beat type of the target drum beat track data pair and the target time interval of the target drum beat track data pair, Format the target drum track data to obtain formatting results corresponding to each target drum track data;

Based on the formatting result corresponding to each target drum track data, a pattern string corresponding to the target drum track is determined.
The music matching method according to claim 8, wherein for each target drum beat track data in the first target drum beat sequence, the drum beat type of the target drum beat track data is matched according to the target drum beat track data. , and the target time interval of the target drum track data pair, formatting the target drum track data includes:

In the first target drum beat sequence, delete the target drum beat track data pairs whose target time interval does not exceed the preset time interval to obtain the second target drum beat sequence;

For each target drum track data in the second target drum sequence, the drum type corresponding to the target drum track data pair containing the target drum track data, and the target drum track data containing the target drum track data The target time interval of the drum track data pair, formatting the target drum track data;

Determining the pattern string corresponding to the target drum track based on the formatting result corresponding to each target drum track data includes:

Based on the formatting result corresponding to each target drum track data in the second target drum sequence, a pattern string corresponding to the target drum track is determined.
The music matching method according to claim 1, wherein the display of an audio and video interface in response to a triggering operation of the matching control includes:

In response to the triggering operation of the matching control, display the target track of the music to be matched in the first display area;

Display the target video matching the target audio track in the second display area;

The audio and video interface includes the first display area and the second display area.
The music matching method according to claim 10, wherein the number of the target audio tracks is at least two, each of the target audio tracks corresponds to a music attribute of the music to be matched, and the target audio track is The target audio track of the matching music is displayed in the first display area, including:

In the first display area of the audio and video interface, at least two of the target audio tracks are displayed in a preset display order.
The music matching method according to claim 10, wherein the second display area includes a first sub-display area and a second sub-display area, the number of the target videos is multiple, and a plurality of the target videos constitute a target A video collection, the target video collection includes playback videos;

The target video matching the target audio track is displayed in the second display area, including:

Display each target video in the target video set in the first sub-display area;

The playback video is displayed in the second sub-display area, and the playback video is a target video in a selected state in the target video set.
The music matching method according to claim 12, wherein displaying each target video in the target video set in the first sub-display area includes:

Obtain the playback volume of each target video in the target video collection;

Each target video in the target video set is displayed in the first sub-display area in order from high to low playback volume.
The music matching method according to claim 1, wherein the target audio track is obtained by separating the audio tracks of the music to be matched; the audio and video interface further includes an adjustment control for the target audio track;

After the audio and video interface is displayed in response to the triggering operation of the matching control, the method further includes:

In response to a triggering operation on the adjustment control, obtain the current playback volume of the audio file of the adjustment target audio track corresponding to the adjustment control;

When the current playback volume exceeds the mute volume, adjust the current playback volume to the mute volume, and add a mask layer to the adjustment target audio track;

Wherein, the mask layer is used to hide the adjustment target audio track to obtain the adjusted music.
The music matching method according to claim 14, wherein after adding a mask layer to the adjustment target audio track, it further includes:

If the adjusted target audio track is a preset target audio track, determine the target pattern string corresponding to the adjusted music according to the audio track of the adjusted music;

According to the target pattern string, the target videos in the target video set are updated to obtain an updated video set.
The music matching method according to claim 1, wherein in response to a triggering operation of the matching control, Display audio and video interface, including:

In response to a triggering operation on the matching control in the inspiration mode, display an audio and video interface;

The method also includes:

Display a mode switching control in the audio and video interface;

In response to a triggering operation on the mode switching control, the inspiration mode is controlled to be switched to an editing mode, and the editing mode is used to edit the music to be matched.
The music matching method according to claim 16, wherein after the control switches the inspiration mode to the editing mode, the method further includes:

In response to an editing operation on the music to be matched in the editing mode, displaying each audio track obtained by separating the tracks of the edited music to be matched;

The update displays at least one target video matching the respective audio.
The music matching method according to claim 16, wherein the method further includes:

In response to the triggering operation of the mode switching control, editing guidance information is displayed, and the editing guidance information is used to guide the editing object to edit the music to be matched in the editing mode.
A music matching device including:

The first display module is configured to display a music matching interface for music to be matched, and the music matching interface includes matching controls;

The second display module is configured to display an audio and video interface in response to a triggering operation on the matching control. The audio and video interface includes: the target audio track of the music to be matched, and at least one audio track that matches the target audio track. A target video.
An electronic device including a processor and a memory, the memory stores a computer program, the processor is configured to run the computer program in the memory to perform the music matching method according to any one of claims 1 to 18 .
A computer-readable storage medium stores a computer program, and the computer program is suitable for loading by a processor to execute the music matching method described in any one of claims 1 to 18.
A computer program product, the computer program product stores a computer program, the computer program is suitable for loading by a processor to execute the music matching method described in any one of claims 1 to 18.