CN112669849A - Method, apparatus, device and storage medium for outputting information - Google Patents

Method, apparatus, device and storage medium for outputting information Download PDF

Info

Publication number
CN112669849A
CN112669849A CN202011502744.5A CN202011502744A CN112669849A CN 112669849 A CN112669849 A CN 112669849A CN 202011502744 A CN202011502744 A CN 202011502744A CN 112669849 A CN112669849 A CN 112669849A
Authority
CN
China
Prior art keywords
audio
information
reading
processing
outputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011502744.5A
Other languages
Chinese (zh)
Inventor
张宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu International Technology Shenzhen Co ltd
Original Assignee
Baidu International Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu International Technology Shenzhen Co ltd filed Critical Baidu International Technology Shenzhen Co ltd
Priority to CN202011502744.5A priority Critical patent/CN112669849A/en
Publication of CN112669849A publication Critical patent/CN112669849A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for outputting information, and relates to the technical field of voice. The specific implementation scheme is as follows: acquiring reading audio; acquiring a dubbing music audio; processing the read audio according to the read audio and the dubbing audio to obtain processed audio; based on the processed audio, playback information is generated and output. The realization mode does not require a singer to have certain music theory knowledge and singing skill, and improves the quality of singing, speaking and singing music of common users.

Description

Method, apparatus, device and storage medium for outputting information
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to the field of speech technologies, and in particular, to a method, an apparatus, a device, and a storage medium for outputting information.
Background
At present, Rap is a popular form of music, and is characterized by rapidly telling a series of words of lingering charms under the background of mechanical rhythm sound. With the pursuit of the user for the individual elements, the user hopes that the user can sing the talking and singing music while listening to the talking and singing music.
Because the rap music usually needs to be sung with certain musical theory knowledge and singing skill, the rap music has certain difficulty for common users, so that the effect of the common users in singing the rap music is poor.
Disclosure of Invention
A method, apparatus, device, and storage medium for outputting information are provided.
According to a first aspect, there is provided a method for outputting information, comprising: acquiring reading audio; acquiring a dubbing music audio; processing the read audio according to the read audio and the dubbing audio to obtain processed audio; based on the processed audio, playback information is generated and output.
According to a second aspect, there is provided an apparatus for outputting information, comprising: a first acquisition unit configured to acquire speakable audio; a second acquisition unit configured to acquire the soundtrack audio; the audio processing unit is configured to process the reading audio according to the reading audio and the dubbing audio to obtain a processed audio; an audio output unit configured to generate and output the playback information based on the processed audio.
According to a third aspect, there is provided an electronic device for outputting information, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect.
According to a fifth aspect, a computer program product comprising a computer program which, when executed by a processor, implements the method as described in the first aspect.
The technique according to the application does not require a singer to have certain music theory knowledge and singing skill, and improves the quality of singing and singing music of common users.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for outputting information, in accordance with the present application;
FIG. 3 is a schematic diagram of an application scenario of a method for outputting information according to the present application;
FIG. 4 is a flow diagram of another embodiment of a method for outputting information according to the present application;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for outputting information according to the present application;
fig. 6 is a block diagram of an electronic device for implementing a method for outputting information according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Speech technology refers to key technologies in the computer field, including automatic speech recognition technology (ASR) and speech synthesis technology (TTS). The earliest speech technologies, from the "automatic translation telephony" project, contained three very major technologies, speech recognition, natural language understanding and speech synthesis. The research work on speech recognition dates back to the Audry system of AT & T Bell laboratories in the 50 s of the 20 th century, after which researchers have broken through three major hurdles, large vocabulary, continuous speech, and unspecified people. The computer speaking needs to use the speech synthesis technology, the core of the speech synthesis technology is the text-to-speech conversion technology, the speech synthesis is even applied to an information system of an automobile, and an automobile owner can convert text files, e-mails, network news or novels downloaded into a system computer into speech to listen in the automobile.
In the application, the reading audio and the dubbing audio are obtained and processed according to the reading audio and the dubbing audio, and the playing information is generated based on the obtained processing audio. Therefore, a singer does not need to have certain music theory knowledge and singing skill, and the quality of singing, speaking and singing music of a common user is improved.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for outputting information or apparatus for outputting information may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice playing application, may be installed on the terminal devices 101, 102, 103. The terminal devices 101, 102, 103 may also be mounted with microphone arrays to capture the user's voice, etc.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, car computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background server that processes audio transmitted by the terminal devices 101, 102, 103. The background server may process the audio sent by the user according to the dubbing audio to obtain a processed audio, and feed back the processed audio to the terminal devices 101, 102, and 103.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for outputting information provided in the embodiment of the present application is generally performed by the server 105. Accordingly, a device for outputting information is generally provided in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present application is shown. The method for outputting information of the embodiment comprises the following steps:
step 201, obtaining reading audio.
In this embodiment, an execution subject of the method for outputting information (e.g., the server 105 shown in fig. 1) may acquire speakable audio in various ways. For example, a user may speak lyrics to an array of microphones on a terminal, get spoken audio, and send the spoken audio to an executing subject. Or, the executing body may first obtain a segment of text, and then generate the speech of the text by using a speech generation algorithm, and use the speech as the reading audio.
In step 202, soundtrack audio is obtained.
In this embodiment, the execution subject may also acquire the soundtrack audio in various ways. For example, the execution subject may randomly select one of the soundtrack audios from a preset soundtrack audio library as the soundtrack audio. Alternatively, the executing agent may first perform speech analysis on the speakable audio to obtain the corresponding text. And then searching the audio corresponding to the text from the score audio library to be used as score audio. Alternatively, the user may upload the soundtrack audio via the terminal.
And step 203, processing the read audio according to the read audio and the dubbing audio to obtain a processed audio.
After the execution subject obtains the reading audio and the dubbing audio, the reading audio can be processed according to the two, and a processed audio is obtained. Specifically, the execution subject may perform lengthening or shortening processing on the read audio according to the duration of the read audio and the duration of the soundtrack audio. Alternatively, the executing entity may first determine the matching relationship of each text in the reading audio and the note in the dubbing audio. And adjusting the duration of the corresponding characters according to the matching relation and the duration of the musical notes in the dubbing audio.
And step 204, generating and outputting playing information based on the processed audio.
The execution main body can generate the playing information and output the playing information after obtaining the processed audio. Specifically, the execution main body may directly output the processed audio as the play information. Alternatively, the execution subject may also generate identification information (e.g., song name, song author) for the processed audio, and output the processed audio and the identification information as play information.
With continued reference to fig. 3, a schematic illustration of one application scenario of the method for outputting information according to the present application is shown. In the application scenario of fig. 3, the user inputs reading audio through the karaoke application installed on the mobile phone 301, and selects the soundtrack audio. The handset 301 sends the reading audio input by the user to the server 302, and sends the soundtrack audio selected by the user to the server 302. After receiving the reading audio and the dubbing audio, the server 302 processes the reading audio, and the processed audio is the rap music. The server 302 returns the obtained rap music to the mobile phone 301, and the mobile phone 301 can directly play the rap music for the user to enjoy.
According to the method for outputting the information, the user does not need to have certain reading basis and singing experience, the reading audio can be processed only by the reading audio and the vocal music audio of the user, the processed music works are obtained, and the quality of singing the music works by the common user is improved.
With continued reference to FIG. 4, a flow 400 of another embodiment of a method for outputting information in accordance with the present application is shown. As shown in fig. 4, the method of the present embodiment may include the following steps:
step 401, receiving target lyric information sent by a terminal; and generating reading audio according to the target lyric information.
In this embodiment, the execution main body may receive target lyric information sent by the terminal. The terminal here may be a terminal used by a user. The target lyric information may be the lyric information of a certain published song, or may be the lyric information created by the user himself. The target lyric information can be Chinese lyrics or foreign language lyrics. The execution subject may generate the reading audio based on the target lyric information. Specifically, the execution subject may generate audio using an existing TTS (Text To Speech) algorithm, and use the obtained audio as the reading audio. Or, the execution main body may adjust the fundamental frequency of the obtained audio according to the fundamental frequency of the sound of the user, so that the tone color in the processed audio is similar to the tone color of the user.
In some optional implementations of this embodiment, the executing subject may also determine the target lyric information by the following steps not shown in fig. 4: responding to the satisfaction of a preset condition, and outputting the lyric information in a preset lyric library; and in response to receiving a selected instruction sent by the terminal for the output lyric information, taking the lyric information targeted by the selected instruction as target lyric information.
In the implementation mode, the execution main body can also detect whether the preset conditions are met or not in real time, and if so, the execution main body can output the lyric information in the preset lyric library. The preset condition here may be a condition for performing the subject recognition that the user needs to refer to the lyric information. The preset conditions may include, but are not limited to: the lyric creation time of the user is longer than the preset time, a lyric viewing request sent by the user through a terminal is received, and the similarity between the lyric created by the user and at least one lyric in a lyric library is larger than a preset threshold value. After the execution main body outputs the lyric information in the lyric library, a selection instruction aiming at the outputted lyric information sent by the terminal can be received, and the lyric information aiming at the selection instruction is used as target lyric information. The selected instruction can be sent by the user through the terminal and indicates that the user selects a certain lyric information displayed by the terminal. The execution body may use the lyric information selected by the user as the target lyric information.
At step 402, soundtrack audio is obtained.
Step 403, determining a matching relationship between characters of the reading audio and musical notes in the dubbing audio according to the reading audio and the dubbing audio; and processing the read audio according to the matching relation to obtain processed audio.
In this embodiment, after obtaining the reading audio and the dubbing audio, the executing entity may determine a matching relationship between the text of the reading audio and the musical note in the dubbing audio. Specifically, the execution main body may divide the reading audio into audio units with characters as units, divide the soundtrack audio into audio units with notes as units, and match the audio units of the soundtrack audio and the soundtrack audio to obtain a matching relationship. And then, according to the matching relation, carrying out duration processing or frequency processing on each audio unit in the reading audio to obtain processed audio.
In some optional implementations of this embodiment, the executing subject may also process the speakable audio through the following steps not shown in fig. 4: determining a heavy (zh oa) reading word in the read-aloud audio that is at a position that matches the position of the heavy (zh oa) sound according to the matching relationship; and processing the re-read characters to obtain a processed audio.
In this implementation, the soundtrack audio may also include stress locations. The execution subject can determine the rereaded characters matched with the accent positions in the reading audio according to the matching relation. The execution body may adjust the frequency or pitch of the rereaded text so that the rereaded text in the processed audio is rereaded. This may further improve the quality of the resulting musical composition.
Step 404, determining a cover and a display effect corresponding to the processed audio; and outputting the display effect and the cover for display.
In this embodiment, the execution main body may further determine a cover corresponding to the processing audio and a display effect. The cover is a picture displayed by the terminal when the audio is played, and the cover can be an image or a video. Specifically, the execution body may determine the cover and the display effect in various ways. For example, the executive may accept a user selected cover and display effect. Alternatively, the execution body may take the avatar of the user as a cover and randomly assign a display effect. Alternatively, the executing body may select a cover and a display effect according to a music style of the processed audio. The execution main body can output the cover and the display effect to the terminal so that the terminal can display the cover with the display effect when playing and processing the audio.
The method for outputting the information, provided by the embodiment of the application, can provide lyric information for a user, and can generate reading audio according to the lyric information, so that the use experience of the user is improved; the reading audio can be processed according to the matching relation and the accent position, so that the quality of the musical works is improved; the cover can be displayed with the display effect during playing, and the display form is enriched.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 5, the apparatus 500 for outputting information of the present embodiment includes: a first acquisition unit 501, a second acquisition unit 502, an audio processing unit 503, and an audio output unit 504.
A first obtaining unit 501 configured to obtain speakable audio.
A second obtaining unit 502 configured to obtain the soundtrack audio.
And the audio processing unit 503 is configured to process the reading audio according to the reading audio and the dubbing audio, so as to obtain a processed audio.
An audio output unit 504 configured to generate and output playback information based on the processed audio.
In some optional implementations of this embodiment, the first obtaining unit 501 may be further configured to: receiving target lyric information sent by a terminal; and generating reading audio according to the target lyric information.
In some optional implementations of this embodiment, the first obtaining unit 501 may be further configured to: responding to the satisfaction of a preset condition, and outputting the lyric information in a preset lyric library; and in response to receiving a selected instruction sent by the terminal for the output lyric information, taking the lyric information targeted by the selected instruction as target lyric information.
In some optional implementations of this embodiment, the audio processing unit 503 may be further configured to: determining a matching relation between characters of the reading audio and musical notes in the dubbing audio according to the reading audio and the dubbing audio; and processing the read audio according to the matching relation to obtain processed audio.
In some alternative implementations of the present embodiment, the soundtrack audio includes accent positions. The audio processing unit 503 may be further configured to: determining the rereaded characters matched with the accent positions in the reading audio according to the matching relation; and processing the re-read characters to obtain a processed audio.
In some optional implementations of the present embodiment, the audio output unit 504 may be further configured to: determining a cover and a display effect corresponding to the processed audio; and outputting the display effect and the cover for display.
It should be understood that the units 501 to 504, which are described in the apparatus 500 for outputting information, correspond to the respective steps in the method described with reference to fig. 2, respectively. Thus, the operations and features described above for the method for outputting information are equally applicable to the apparatus 500 and the units included therein and will not be described again here.
The application also provides an electronic device, a readable storage medium and a computer program product according to the embodiment of the application.
Fig. 6 shows a block diagram of an electronic device 600 that performs a method for outputting information according to an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An I/O interface (input/output interface) 605 is also connected to the bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as a method for outputting information. For example, in some embodiments, the method for outputting information may be implemented as a computer software program tangibly embodied in a machine-readable storage medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method for outputting information described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method for outputting information.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. The program code described above may be packaged as a computer program product. These program code or computer program products may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the computing unit 601, causes the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this application, a machine-readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable storage medium may be a machine-readable signal storage medium or a machine-readable storage medium. A machine-readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS").
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solution of the present application can be achieved, and the present invention is not limited thereto.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (15)

1. A method for outputting information, comprising:
acquiring reading audio;
acquiring a dubbing music audio;
processing the read audio according to the read audio and the dubbing audio to obtain a processed audio;
and generating and outputting playing information based on the processed audio.
2. The method of claim 1, wherein the obtaining speakable audio comprises:
receiving target lyric information sent by a terminal;
and generating the reading audio according to the target lyric information.
3. The method of claim 2, wherein the receiving the lyric information transmitted by the terminal comprises:
responding to the satisfaction of a preset condition, and outputting the lyric information in a preset lyric library;
in response to receiving a selected instruction sent by the terminal and aiming at the output lyric information, taking the lyric information aimed at by the selected instruction as target lyric information.
4. The method of claim 1, wherein said processing said speakable audio in accordance with said speakable audio and said soundtrack audio to obtain processed audio comprises:
according to the reading audio and the dubbing audio, determining a matching relation between characters of the reading audio and notes in the dubbing audio;
and processing the read audio according to the matching relation to obtain a processed audio.
5. The method of claim 4, wherein the soundtrack audio comprises accent positions; and
processing the reading audio according to the matching relationship to obtain a processed audio, including:
according to the matching relation, determining the rereaded characters matched with the accent positions in the reading audio;
and processing the rereaded words to obtain the processing audio.
6. The method of claim 1, wherein the generating and outputting playback information based on the processed audio comprises:
determining a cover and a display effect corresponding to the processed audio;
and outputting the display effect and the cover for display.
7. An apparatus for outputting information, comprising:
a first acquisition unit configured to acquire speakable audio;
a second acquisition unit configured to acquire the soundtrack audio;
the audio processing unit is configured to process the reading audio according to the reading audio and the soundtrack audio to obtain a processed audio;
an audio output unit configured to generate and output playback information based on the processed audio.
8. The apparatus of claim 7, wherein the first obtaining unit is further configured to:
receiving target lyric information sent by a terminal;
and generating the reading audio according to the target lyric information.
9. The apparatus of claim 8, wherein the first obtaining unit is further configured to:
responding to the satisfaction of a preset condition, and outputting the lyric information in a preset lyric library;
in response to receiving a selected instruction sent by the terminal and aiming at the output lyric information, taking the lyric information aimed at by the selected instruction as target lyric information.
10. The apparatus of claim 7, wherein the audio processing unit is further configured to:
according to the reading audio and the dubbing audio, determining a matching relation between characters of the reading audio and notes in the dubbing audio;
and processing the read audio according to the matching relation to obtain a processed audio.
11. The apparatus of claim 10, wherein the soundtrack audio comprises accent positions; and
the audio processing unit is further configured to:
according to the matching relation, determining the rereaded characters matched with the accent positions in the reading audio;
and processing the rereaded words to obtain the processing audio.
12. The apparatus of claim 7, wherein the audio output unit is further configured to:
determining a cover and a display effect corresponding to the processed audio;
and outputting the display effect and the cover for display.
13. An electronic device for outputting information, comprising:
at least one computing unit; and
a storage unit communicatively coupled to the at least one computing unit; wherein the content of the first and second substances,
the storage unit stores instructions executable by the at least one computing unit to enable the at least one computing unit to perform the method of any of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a computing unit, implements the method according to any one of claims 1-6.
CN202011502744.5A 2020-12-18 2020-12-18 Method, apparatus, device and storage medium for outputting information Pending CN112669849A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011502744.5A CN112669849A (en) 2020-12-18 2020-12-18 Method, apparatus, device and storage medium for outputting information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011502744.5A CN112669849A (en) 2020-12-18 2020-12-18 Method, apparatus, device and storage medium for outputting information

Publications (1)

Publication Number Publication Date
CN112669849A true CN112669849A (en) 2021-04-16

Family

ID=75406401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011502744.5A Pending CN112669849A (en) 2020-12-18 2020-12-18 Method, apparatus, device and storage medium for outputting information

Country Status (1)

Country Link
CN (1) CN112669849A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
US20180032610A1 (en) * 2016-07-29 2018-02-01 Paul Charles Cameron Systems and methods for automatic-creation of soundtracks for speech audio
WO2018121368A1 (en) * 2016-12-30 2018-07-05 阿里巴巴集团控股有限公司 Method for generating music to accompany lyrics and related apparatus
CN109949783A (en) * 2019-01-18 2019-06-28 苏州思必驰信息科技有限公司 Song synthetic method and system
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
US20180032610A1 (en) * 2016-07-29 2018-02-01 Paul Charles Cameron Systems and methods for automatic-creation of soundtracks for speech audio
WO2018121368A1 (en) * 2016-12-30 2018-07-05 阿里巴巴集团控股有限公司 Method for generating music to accompany lyrics and related apparatus
CN109949783A (en) * 2019-01-18 2019-06-28 苏州思必驰信息科技有限公司 Song synthetic method and system
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment
CN111862913A (en) * 2020-07-16 2020-10-30 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music

Similar Documents

Publication Publication Date Title
US11727914B2 (en) Intent recognition and emotional text-to-speech learning
CN108962219B (en) method and device for processing text
CN108831437B (en) Singing voice generation method, singing voice generation device, terminal and storage medium
JP6633153B2 (en) Method and apparatus for extracting information
JP6507316B2 (en) Speech re-recognition using an external data source
WO2020177190A1 (en) Processing method, apparatus and device
US20180374461A1 (en) System and method for automatically generating media
US20150356967A1 (en) Generating Narrative Audio Works Using Differentiable Text-to-Speech Voices
US20240021202A1 (en) Method and apparatus for recognizing voice, electronic device and medium
US20140046667A1 (en) System for creating musical content using a client terminal
CN112309365B (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN107705782B (en) Method and device for determining phoneme pronunciation duration
CN111402842A (en) Method, apparatus, device and medium for generating audio
CN112908292B (en) Text voice synthesis method and device, electronic equipment and storage medium
US20140236597A1 (en) System and method for supervised creation of personalized speech samples libraries in real-time for text-to-speech synthesis
KR20200027331A (en) Voice synthesis device
EP3646315A1 (en) System and method for automatically generating media
CN112382267A (en) Method, apparatus, device and storage medium for converting accents
CN111477210A (en) Speech synthesis method and device
US20090177473A1 (en) Applying vocal characteristics from a target speaker to a source speaker for synthetic speech
CN112383721B (en) Method, apparatus, device and medium for generating video
CN112381926A (en) Method and apparatus for generating video
CN112669849A (en) Method, apparatus, device and storage medium for outputting information
CN114999440A (en) Avatar generation method, apparatus, device, storage medium, and program product
CN114999441A (en) Avatar generation method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination