CN109841224B

CN109841224B - Multimedia playing method, system and electronic equipment

Info

Publication number: CN109841224B
Application number: CN201711210892.8A
Authority: CN
Inventors: 侯会满
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-11-27
Filing date: 2017-11-27
Publication date: 2021-12-31
Anticipated expiration: 2037-11-27
Also published as: CN109841224A

Abstract

The present disclosure provides a multimedia playing method, including: acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and controlling the playing mode of the first multimedia file according to the volume and/or the tone of the sound of the user, wherein the controlling of the playing mode of the first multimedia file comprises adjusting the first multimedia file to be matched with the volume and/or the tone of the sound of the user, and outputting audio according to the adjusted first multimedia file.

Description

Multimedia playing method, system and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a multimedia playing method, system and electronic device.

Background

With the development of the mobile internet and the popularization of smart phones, users have more social and entertainment modes. For example, a user using a smartphone-installed application may interact with other users to sing songs. However, in the process of implementing the concept of the present invention, the inventor finds that at least the following problems exist in the prior art: when a user can interactively sing with other users by using an application installed on a smart phone, the user needs to manually operate the application. For example, the user may manually switch the original sound of the song, manually adjust the volume and/or volume of the song, and so on.

Disclosure of Invention

In view of the above, the present disclosure provides a multimedia playing method, a multimedia playing system and an electronic device.

One aspect of the present disclosure provides a multimedia playing method, including: acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and controlling the playing mode of the first multimedia file according to the volume and/or the tone of the sound of the user, wherein the controlling of the playing mode of the first multimedia file comprises adjusting the first multimedia file to be matched with the volume and/or the tone of the sound of the user, and outputting audio according to the adjusted first multimedia file.

According to an embodiment of the present disclosure, the controlling the playing manner of the first multimedia file includes outputting audio according to the first multimedia file and the second multimedia file when the volume of the user sound is lower than a predetermined threshold.

According to an embodiment of the present disclosure, when the method is performed by an electronic device connected with an external device, the outputting the audio includes outputting the audio through the external device.

According to an embodiment of the present disclosure, adjusting the first multimedia file to match the volume of the user's voice includes adjusting the volume of audio in the first multimedia file to be the same as or similar to the volume of the user's voice; and/or adjusting the first multimedia file to match the pitch of the user's voice comprises adjusting the pitch of the audio in the first multimedia file to be the same as or similar to the pitch of the user's voice.

According to an embodiment of the present disclosure, the second multimedia file includes audio associated with the first multimedia file or audio of a person associated with the user.

According to an embodiment of the present disclosure, the audio of the person associated with the user is synthesized from the timbre of the person associated with the user and/or the content associated with the first multimedia file of the first multimedia file.

According to an embodiment of the present disclosure, the method is performed by a server connected with an electronic device; the electronic equipment receives user voice; and the server controls the playing mode of the first multimedia file through the electronic equipment.

Another aspect of the present disclosure provides a multimedia playing system, including: the acquisition module is used for acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and the method is used for controlling the playing mode of the first multimedia file according to the volume and/or the tone of the user sound, and the controlling of the playing mode of the first multimedia file comprises the steps of adjusting the first multimedia file to be matched with the volume and/or the tone of the user sound and outputting audio according to the adjusted first multimedia file.

According to an embodiment of the present disclosure, when the system is executed by an electronic device connected with an external device, the outputting the audio includes outputting the audio through the external device.

According to an embodiment of the present disclosure, the system further includes a speech synthesis module for synthesizing the audio of the person associated with the user according to the timbre of the person associated with the user and/or the content associated with the first multimedia file.

According to an embodiment of the present disclosure, the system is executed by a server connected with an electronic device; the electronic equipment receives user voice; and the server controls the playing mode of the first multimedia file through the electronic equipment.

According to an embodiment of the present disclosure, the system further includes a voice recognition module for recognizing a volume and/or a tone of a user's voice.

Another aspect of the present disclosure provides an electronic device including: one or more processors; and one or more memories storing executable instructions that, when executed by the processor, cause the processor to perform the method as described above.

Another aspect of the present disclosure provides a readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the problem that when a user uses the application installed by the smart phone to interactively sing with other users, the user does not need to rely on manual operation too much is solved, and the user experience is improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

fig. 1 schematically shows an exemplary system architecture to which the multimedia playing method and electronic device of the present disclosure may be applied;

FIG. 2 schematically shows a flow chart for a multimedia playback method according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart of a method for multimedia playback according to another embodiment of the present disclosure;

FIG. 4 schematically shows a block diagram of a multimedia playback system according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of a multimedia playback system according to another embodiment of the present disclosure; and

fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B", or "a and B".

An embodiment of the present disclosure provides a multimedia playing method, including: acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and controlling the playing mode of the first multimedia file according to the volume and/or the tone of the sound of the user, wherein the controlling of the playing mode of the first multimedia file comprises adjusting the first multimedia file to be matched with the volume and/or the tone of the sound of the user, and outputting audio according to the adjusted first multimedia file.

Fig. 1 schematically illustrates an exemplary system architecture 100 to which the multimedia playback method and electronic device of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, 104, a network 105, and a server 106. The network 105 serves as a medium for providing communication links between the

terminal devices

101, 102, 103, 104 and the server 106. Network 105 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103, 104 to interact with a server 106 via a network 105 to receive or send messages or the like. The

terminal devices

101, 102, 103, 104 may have installed thereon various communication client applications and entertainment client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, cool me karaoke applications, national karaoke applications, and the like (by way of example only).

The

terminal devices

101, 102, 103, 104 may be various electronic devices having a voice recognition function and supporting voice recognition, including but not limited to smart speakers, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 106 may be a server providing various services, such as a background management server (for example only) providing support for a website where the user loads songs with the

terminal devices

101, 102, 103, 104. The background management server may analyze and otherwise process the received data such as the user request, and feed back a processing result (e.g., a song, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the multimedia playing method provided by the embodiment of the present disclosure may be generally executed by the server 106, and may also be executed by the terminal device. Accordingly, the multimedia playing apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 106. The multimedia playing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 106 and capable of communicating with the

terminal devices

101, 102, 103, 104 and/or the server 106. Accordingly, the multimedia playing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 106 and capable of communicating with the

terminal devices

101, 102, 103, 104 and/or the server 106.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a multimedia playing method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S201 and S202.

In operation S201, parameters of a user' S voice, including volume and/or pitch, are acquired.

In operation S202, a play mode of the first multimedia file is controlled according to a volume and/or a tone of a user' S voice. According to the embodiment of the disclosure, controlling the playing mode of the first multimedia file comprises adjusting the first multimedia file to match with the volume and/or tone of the user's voice, and outputting audio according to the adjusted first multimedia file.

According to the embodiment of the disclosure, the volume and/or the tone of the sound of the user are obtained through the method, and the playing mode of the first multimedia file is controlled according to the volume and/or the tone of the sound of the user, so that the times of manual operation on the first multimedia file by the user are reduced, and the time is saved for the user.

According to the embodiment of the present disclosure, the parameter of the user sound may be, for example, the volume of the user sound, the pitch of the user sound, the volume of the user sound and the pitch of the user sound.

By the embodiment of the present disclosure, it is possible to adjust the first multimedia file to match the volume and/or tone of the user's voice according to the volume and/or tone of the user's voice, and output audio according to the adjusted first multimedia file.

For example, the first multimedia file may be an accompaniment of a song, the content of the user's voice may be lyrics corresponding to the accompaniment of the song, and the volume and/or pitch of the user's voice may correspond to the volume and/or pitch emitted by the user when singing the lyrics. Accordingly, at this time, the electronic device may adjust the accompaniment of the song according to the volume and/or tone of sound emitted by the user when singing the lyrics.

For example, the volume of the accompaniment of the song may be adjusted to be the same as or similar to the volume of the user voice, or the pitch of the accompaniment of the song may be adjusted to be the same as or similar to the pitch of the user voice, or the like, but is not limited thereto. The adjusted song accompaniment is then output by the electronic device. Therefore, the volume and/or the tone of the first multimedia are the same as those of the voice of the user, so that the user does not need to spend time for adjusting the volume and/or the tone of the song accompaniment during singing, and the experience effect of the user is further improved.

Fig. 3 schematically shows a flow chart of a method for multimedia playback according to another embodiment of the present disclosure.

According to an embodiment of the present disclosure, when the parameter includes a volume, controlling the playing mode of the first multimedia file further includes outputting audio according to the first multimedia file and the second multimedia file when the volume of the user sound is lower than a predetermined threshold, and executing the method as specifically shown in fig. 3.

In operation S301, a volume of a user' S voice is acquired.

In operation S302, when the volume of the user' S voice is lower than a predetermined threshold, audio is output according to the first and second multimedia files.

The volume of the user's voice may be divided into 10 levels, for example, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 according to the intensity of the volume, wherein 0 may be considered as silence. Of course, the sound volume may be expressed by other expression forms, for example, the sound volume is expressed by using the sequence of the english alphabet, and the like, but the present invention is not limited thereto.

According to an embodiment of the present disclosure, the preset threshold may be set to 1. When the volume of the user sound is lower than 1, it may be considered that the volume of the user sound is 0. In this case, the first multimedia file and the second multimedia file are taken together as output audio, and the audio is played. For example, the first multimedia file can be a song accompaniment, the second multimedia file can be an original song of the song, and when the volume of the voice of the user is 0, the original song of the song and the accompaniment of the song are played together, so that the user can be very harmonious in the process of singing the whole song, the situation that the song is intermittently interrupted can not occur, and the antiphonal singing function of the voice and the original song of the user can be realized.

According to an embodiment of the present disclosure, when the above method is performed by an electronic device connected with an external device, outputting the audio includes outputting the audio through the external device.

The external device may be, for example, the same device as the electronic device or a device different from the electronic device. According to an embodiment of the present disclosure, the electronic device may be, for example, a mobile phone, a tablet computer, a notebook computer, a smart speaker, and the like, but is not limited thereto. Accordingly, the external device may also be a mobile phone, a tablet computer, a notebook computer, a smart speaker, etc., but is not limited thereto. The above method is described below by taking an electronic device as a mobile phone and an external device as a smart speaker as an example.

Specifically, the mobile phone acquires parameters of the user's voice, which include volume and/or tone. The mobile phone adjusts the first multimedia file to be matched with the volume and/or the tone of the user sound according to the volume and/or the tone of the user sound, sends the adjusted first multimedia file to an external device connected with the electronic device, and then the external device plays the audio of the adjusted first multimedia file. It should be noted that, when the volume of the user sound is lower than a predetermined threshold, the first multimedia file and the second multimedia file are sent to an external device connected to the electronic device together, and are played.

According to an embodiment of the present disclosure, the second multimedia file includes audio associated with the first multimedia file or audio of a person associated with the user. The audio of the person associated with the user is synthesized from the timbre of the person associated with the user and/or the content associated with the first multimedia file.

For example, when a first multimedia file accompanies a song, a second multimedia file with audio associated with the first multimedia file is the original song. The audio of the person associated with the user may be synthesized based on the timbre of the song accompaniment and the person associated with the user and/or the content associated with the song accompaniment (e.g., the lyrics of the song). Therefore, the synthesized second pair of media files and the first multimedia file are played together, and the singing with the user is realized in a personalized mode.

According to the embodiment of the disclosure, the method is executed by a server connected with the electronic equipment; the electronic equipment receives user voice; and the server controls the playing mode of the first multimedia file through the electronic equipment.

When the method is executed by a server connected to the electronic device, the operation steps of the method are similar to those of the method executed by the electronic device, and are not described herein again.

Fig. 4 schematically shows a block diagram of a multimedia playback system according to an embodiment of the present disclosure.

As shown in fig. 4, the system 400 includes an acquisition module 410 and a control module 420.

An obtaining module 410 is configured to obtain parameters of the user's voice, where the parameters include volume and/or pitch.

The control module 420 is configured to control a playing mode of the first multimedia file according to the volume and/or tone of the user sound. According to the embodiment of the disclosure, controlling the playing mode of the first multimedia file comprises adjusting the first multimedia file to match with the volume and/or tone of the user's voice, and outputting audio according to the adjusted first multimedia file.

According to the embodiment of the present disclosure, the detailed processes related to obtaining the user sound parameters and controlling the playing mode of the first multimedia file can be referred to the description above with reference to fig. 2 to 3, and are not repeated here.

Fig. 5 schematically shows a block diagram of a multimedia playback system according to another embodiment of the present disclosure.

As shown in fig. 5, in addition to the acquisition module 410 and the control module 420 in the fig. 4 embodiment, the system 500 includes a speech recognition module 510 and a speech synthesis module 520.

Specifically, the speech recognition module 510 is used for recognizing the volume and/or pitch of the user's voice.

A speech synthesis module 520 for synthesizing audio of the person associated with the user from the timbre of the person associated with the user and/or the content associated with the first multimedia file.

Fig. 6 schematically shows a block diagram of an electronic device suitable for implementing the multimedia playing method and system according to an embodiment of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, an electronic device 600 according to an embodiment of the present disclosure includes a processor 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. Processor 601 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 610 may also include onboard memory for caching purposes. Processor 610 may include a single processing unit or multiple processing units for performing different actions of the method flows described with reference to fig. 2-3 in accordance with embodiments of the disclosure.

In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are stored. The processor 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. The processor 601 performs various operations of the methods and systems described above with reference to fig. 2-3 by executing programs in the ROM 602 and/or RAM 603. It is to be noted that the programs may also be stored in one or more memories other than the ROM 602 and RAM 603. The processor 601 may also perform various operations of the methods and systems described above with reference to fig. 2-3 by executing programs stored in the one or more memories.

Electronic device 600 may also include input/output (I/O) interface 605, input/output (I/O) interface 605 also connected to bus 604, according to an embodiment of the disclosure. The system 600 may also include one or more of the following components connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

According to an embodiment of the present disclosure, the method described above with reference to the flow chart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program, when executed by the processor 601, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing. According to embodiments of the present disclosure, a computer-readable medium may include the ROM 602 and/or RAM 603 described above and/or one or more memories other than the ROM 602 and RAM 603.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to perform the method of the present disclosure: acquiring the volume and/or tone of the user sound; and adjusting the first multimedia file to be matched with the volume and/or the tone of the user sound according to the volume and/or the tone of the user sound, and outputting audio according to the adjusted first multimedia file.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A multimedia playback method, comprising:

acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and

controlling the playing mode of a first multimedia file according to the volume and/or tone of the sound of a user, wherein the controlling of the playing mode of the first multimedia file comprises outputting audio according to the first multimedia file and a second multimedia file when the volume of the sound of the user is lower than a preset threshold value;

wherein the first multimedia file is a song accompaniment and the second multimedia file includes audio of a person associated with the user, enabling output audio to be antiphonal singing with the user;

wherein the audio of the person associated with the user is synthesized from the timbre of the person associated with the user and the lyrics associated with the first multimedia file.

2. The method of claim 1, wherein when the method is performed by an electronic device connected with an external device, the outputting audio comprises outputting audio through the external device.

3. The method of claim 1, wherein:

adjusting the first multimedia file to match the volume of the user's voice comprises adjusting the volume of the audio in the first multimedia file to be the same as or similar to the volume of the user's voice; and/or

Adjusting the first multimedia file to match the pitch of the user's voice includes adjusting the pitch of the audio in the first multimedia file to be the same as or similar to the pitch of the user's voice.

4. The method of claim 1, wherein:

the method is performed by a server connected with an electronic device;

the electronic equipment receives user voice; and

and the server controls the playing mode of the first multimedia file through the electronic equipment.

5. A multimedia playback system, comprising:

the acquisition module is used for acquiring parameters of user voice, wherein the parameters comprise volume and/or tone; and

the control module is used for controlling the playing mode of the first multimedia file according to the volume and/or the tone of the sound of a user, wherein the step of controlling the playing mode of the first multimedia file comprises the step of outputting audio according to the first multimedia file and the second multimedia file when the volume of the sound of the user is lower than a preset threshold value;

wherein the first multimedia file is a song accompaniment and the second multimedia file comprises audio of a person associated with the user;

a speech synthesis module to synthesize audio of a person associated with a user from a timbre of the person associated with the user and lyrics associated with the first multimedia file.

6. The system of claim 5, wherein when the system is executed by an electronic device connected to an external device, the outputting audio comprises outputting audio through the external device.

7. The system of claim 5, wherein:

8. The system of claim 5, wherein:

the system is executed by a server connected with the electronic equipment;

the electronic equipment receives user voice; and

9. The system of claim 5, further comprising:

and the voice recognition module is used for recognizing the volume and/or tone of the voice of the user.

10. An electronic device, comprising:

one or more processors; and

one or more memories storing executable instructions that, when executed by a processor, cause the processor to perform the method of any of claims 1-4.

11. A readable storage medium having stored thereon instructions for performing the method of any of claims 1-4.