WO2023011050A1 - 进行连麦合唱的方法、系统、设备及存储介质 - Google Patents

进行连麦合唱的方法、系统、设备及存储介质 Download PDF

Info

Publication number
WO2023011050A1
WO2023011050A1 PCT/CN2022/101609 CN2022101609W WO2023011050A1 WO 2023011050 A1 WO2023011050 A1 WO 2023011050A1 CN 2022101609 W CN2022101609 W CN 2022101609W WO 2023011050 A1 WO2023011050 A1 WO 2023011050A1
Authority
WO
WIPO (PCT)
Prior art keywords
multimedia data
singing
terminal
live multimedia
live
Prior art date
Application number
PCT/CN2022/101609
Other languages
English (en)
French (fr)
Inventor
冯涛
黄斯亮
王玉奎
王磊
刘腾飞
欧阳金凯
管振航
文绍斌
雷勇
杜擎
李扬
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Publication of WO2023011050A1 publication Critical patent/WO2023011050A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/239Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests
    • H04N21/2393Interfacing the upstream path of the transmission network, e.g. prioritizing client content requests involving handling client requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present application relates to the technical field of the Internet, and in particular to a method, system, device and storage medium for chorus singing.
  • two anchors can sing together. That is to say, two anchors can sing the same song in accordance with the preset singing order. Users can watch the live video of two anchors singing a song together by entering the live broadcast room of any anchor.
  • the embodiment of the present application provides a method, system, device, and storage medium for chorusing with microphones, which can solve the problem that two anchors appearing in the audience terminal are incoherent when they sing together. Described technical scheme is as follows:
  • a method for performing chorus with wheat is provided, the method is applied to a first terminal, and the method includes:
  • the processing in the first processing state includes: in the locally generated first live multimedia data, add the accompaniment playing when recording the first live multimedia data progress and singing tags, and send the added and processed first live multimedia data to the server;
  • switch to the second processing state includes: When the second live multimedia data of the tag is added, a delay tag is added to the first live multimedia data currently generated locally, and the first live multimedia data after adding processing is sent to the server; When the second terminal sings the second live multimedia data with labels and accompaniment playing progress, add the received second terminal accompaniment playing progress and non-singing labels to the first live multimedia data currently generated locally, and send the received information to the server Send the added and processed first live multimedia data.
  • the method also includes:
  • determining at least one local singing time period according to the segment information of the target song includes:
  • the segmentation information of the target song determine the singing paragraphs corresponding to the first terminal and the second terminal respectively in the target song;
  • any singing paragraph corresponding to the first terminal if the any singing paragraph is not the corresponding last singing paragraph in the target song, then based on the target corresponding to the any singing paragraph in the target song Play end time, and the target play start time point of the next singing paragraph adjacent to any singing paragraph, determine the singing end time point corresponding to any singing paragraph;
  • the singing end time point of the previous singing paragraph adjacent to the any singing paragraph is determined as the any singing paragraph The start time of the concert;
  • the singing time period corresponding to the any singing paragraph is determined.
  • the target playback end time corresponding to any of the singing paragraphs in the target song and the target playing start time point of the next singing paragraph adjacent to the any singing paragraph, determine the The end time of singing corresponding to any singing paragraph, including:
  • the intermediate time point between the target playback end time and the target start playback time point is determined as the singing end time point corresponding to any singing paragraph ;
  • the time interval is less than or equal to the preset time interval threshold, based on the preset division ratio, determine the target time point between the target playback end time and the target playback start time point, wherein the target The ratio of the first time interval between the time point and the target playback end time point to the second time interval between the target time point and the target start and end time point satisfies the division ratio, and the The first time interval is greater than the second time interval.
  • the method further includes:
  • the second aspect provides a method for performing Lianmai chorus, the method is applied to a server, and the method includes:
  • the received second live multimedia data When the received second live multimedia data carries a delay tag, delete the second live multimedia data; when the received second live multimedia data carries a non-singing tag and an accompaniment playback progress, Based on the accompaniment playback progress carried by the second live multimedia data and the accompaniment playback progress carried by the first live multimedia data, the first live multimedia data and the second live multimedia data are synthesized to obtain a composite Processed live multimedia data;
  • the first live multimedia data and the second live multimedia data Carry out synthesis processing to obtain the live multimedia data after synthesis processing, including:
  • video alignment processing is performed on the video frames in the first live multimedia data and the video frames in the second live multimedia data to obtain video data after video alignment processing.
  • the method also includes:
  • the first terminal When it is determined that the first terminal enters the second processing state, determine to receive the number of first data packets corresponding to the first live multimedia data sent by the first terminal in the first processing state, and determine receiving the second data packet number corresponding to the second live multimedia data sent by the second terminal in the second processing state;
  • the number of the first data packets is greater than the number of the second data packets, based on the difference between the number of the first data packets and the number of the second data packets, supplementary packets are performed on the received second live multimedia data Processing, if the number of the first data packets is smaller than the number of the second data packets, then perform packet deletion processing on the second live multimedia data based on the difference.
  • a device for performing chorus with microphones the device is applied to a first terminal, and the device includes:
  • the sending module is used to send the chorus request of the target song to the server;
  • a determining module configured to receive the start singing command of the target song sent by the server, and determine at least one local singing time period according to the segmentation information of the target song;
  • the processing module is used to start playing the accompaniment of the target song and enter the first processing state, the processing in the first processing state includes: adding and recording the first live multimedia data to the locally generated first live multimedia data The accompaniment playing progress and singing label when data, send the first live multimedia data after adding processing to described server;
  • the switching module is used to switch to the second processing state when the accompaniment playback progress reaches the end time point of the current singing time period;
  • the processing module is configured to, in the second processing state, when receiving the second live multimedia data sent by the server and carrying the second terminal non-singing tag, the first live multimedia data currently generated locally Add delay tag in, and send to described server the first live multimedia data after adding processing;
  • the switching module is also used for:
  • the determination module is used for:
  • the segmentation information of the target song determine the singing paragraphs corresponding to the first terminal and the second terminal respectively in the target song;
  • any singing paragraph corresponding to the first terminal if the any singing paragraph is not the corresponding last singing paragraph in the target song, then based on the target corresponding to the any singing paragraph in the target song Play end time, and the target play start time point of the next singing paragraph adjacent to any singing paragraph, determine the singing end time point corresponding to any singing paragraph;
  • the singing end time point of the previous singing paragraph adjacent to the any singing paragraph is determined as the any singing paragraph The start time of the concert;
  • the singing time period corresponding to the any singing paragraph is determined.
  • the determination module is used for:
  • the intermediate time point between the target playback end time and the target start playback time point is determined as the singing end time point corresponding to any singing paragraph ;
  • the time interval is less than or equal to the preset time interval threshold, based on the preset division ratio, determine the target time point between the target playback end time and the target playback start time point, wherein the target The ratio of the first time interval between the time point and the target playback end time point to the second time interval between the target time point and the target start and end time point satisfies the division ratio, and the The first time interval is greater than the second time interval.
  • processing module is also used for:
  • a device for performing chorus with wheat characterized in that the device is applied to a server, and the device includes:
  • the receiving module is used to receive the chorus request of the target song sent by the first terminal and the second terminal;
  • a sending module configured to send an order to start singing the target song to the first terminal and the second terminal;
  • the receiving module is configured to receive the first live multimedia data sent by the first terminal in the first processing state, and the second live multimedia data sent by the second terminal in the second processing state, wherein the first live The multimedia data carries the accompaniment playback progress;
  • a processing module configured to delete the second live multimedia data when the received second live multimedia data carries a delay tag; when the received second live multimedia data carries a non-singing tag and
  • the accompaniment is playing progress, based on the accompaniment playing progress carried by the second live multimedia data and the accompaniment playing progress carried by the first live multimedia data, the first live multimedia data and the second live multimedia data are Composite processing, obtaining the live multimedia data after the composite processing;
  • the sending module is configured to send the synthesized live multimedia data to audience terminals corresponding to the first terminal and the second terminal.
  • processing module is used for:
  • video alignment processing is performed on the video frames in the first live multimedia data and the video frames in the second live multimedia data to obtain video data after video alignment processing.
  • processing module is also used for:
  • the first terminal When it is determined that the first terminal enters the second processing state, determine to receive the number of first data packets corresponding to the first live multimedia data sent by the first terminal in the first processing state, and determine receiving the second data packet number corresponding to the second live multimedia data sent by the second terminal in the second processing state;
  • the number of the first data packets is greater than the number of the second data packets, based on the difference between the number of the first data packets and the number of the second data packets, supplementary packets are performed on the received second live multimedia data Processing, if the number of the first data packets is smaller than the number of the second data packets, then perform packet deletion processing on the second live multimedia data based on the difference.
  • a system for performing Lianmai chorus includes a first terminal, a second terminal and a server, wherein:
  • the first terminal is configured to send a chorus request of the target song to the server; receive the start singing command of the target song sent by the server, and determine the local at least A singing time period; start playing the accompaniment of the target song and enter the first processing state, the processing in the first processing state includes: adding and recording the first live multimedia data in the locally generated first live multimedia data The accompaniment playing progress and singing label when data, send the first live multimedia data after adding processing to described server;
  • the processing in the second processing state includes: when receiving the second live multimedia data sent by the server and carrying the second terminal non-singing tag, adding a delay tag to the first live multimedia data currently generated locally, and Send the first live multimedia data after adding processing to the server; when receiving the second live multimedia data sent by the server and carrying the second terminal singing label and accompaniment playback progress, the currently generated first locally In the live multimedia data, add the received second terminal accompaniment play progress and non-singing label, and send the added processed first live multimedia data to the server;
  • the server is configured to receive the chorus request of the target song sent by the first terminal and the second terminal; send the start singing command of the target song to the first terminal and the second terminal; receive the first terminal and the second terminal; The first live multimedia data sent by a terminal in the first processing state, and the second live multimedia data sent by the second terminal in the second processing state, wherein the first live multimedia data carries the accompaniment playback progress; when receiving When the received second live multimedia data carries a delay tag, delete the second live multimedia data; when the received second live multimedia data carries a non-sung tag and an accompaniment playback progress, based The accompaniment playback progress carried by the second live multimedia data, and the accompaniment playback progress carried by the first live multimedia data, the first live multimedia data and the second live multimedia data are synthesized to obtain the synthesized the live multimedia data; sending the synthesized live multimedia data to audience terminals corresponding to the first terminal and the second terminal.
  • a terminal in a sixth aspect, includes a processor and a memory, at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the above described in the first aspect The operation performed by the method of performing Lianmai chorus.
  • a seventh aspect provides a server, the server includes a processor and a memory, at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the second aspect. The operation performed by the method of performing Lianmai chorus.
  • a computer-readable storage medium is provided, at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement the method for performing chorus with microphones as described above. operation.
  • the process of chorus with mic is divided into two processing states.
  • the first terminal when the first terminal is in the first processing state, it can send a packet containing For the live multimedia data of the accompaniment playing progress
  • the second terminal when the second terminal is in the second processing state, it can send to the server the live multimedia data carrying the received accompaniment playing progress sent by the first terminal.
  • the live multimedia data carrying the playing progress of the accompaniment sent separately is synthesized. In this way, the live multimedia data sent to the audience terminal is sent according to the progress of the accompaniment, which can solve the problem of incoherent singing at the audience terminal when the two anchors sing together.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • Fig. 2 is a flow chart of a method for chorusing with wheat provided by the embodiment of the present application
  • Fig. 3 is a flow chart of a method for chorusing with wheat provided by the embodiment of the present application.
  • Fig. 4 is a flow chart of a method for chorusing with wheat provided by the embodiment of the present application.
  • Fig. 5 is a schematic diagram of a method for performing chorus with wheat provided by the embodiment of the present application.
  • Fig. 6 is a schematic diagram of a method for performing chorus with wheat provided by the embodiment of the present application.
  • Fig. 7 is a flow chart of a method for chorusing with wheat provided by the embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a device for chorusing with wheat provided by the embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a device for chorusing with wheat provided by the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a server structure provided by an embodiment of the present application.
  • the method for performing chorus with microphones can be jointly implemented by a terminal and a server.
  • the terminal can run applications useful for live broadcasting.
  • the terminal can have components such as cameras, microphones, and earphones.
  • the terminal has communication functions and can access the Internet.
  • the terminal can be a mobile phone, a tablet computer, a smart wearable device, a desktop computer, a notebook computer, etc.
  • the server may be a background server of the above-mentioned application program, and the server may establish communication with the terminal.
  • the server can be a single server or a server group. If it is a single server, the server can be responsible for all the processing in the following schemes. If it is a server group, different servers in the server group can be responsible for the following schemes respectively. Different processing in , the specific processing allocation can be set arbitrarily by technicians according to actual needs, and will not be repeated here.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the first anchor corresponding to the first terminal can perform the live broadcast with the second host corresponding to the second terminal.
  • the first terminal can send the first live multimedia data including the live video and audio of the first anchor to the server
  • the second terminal can send the second live multimedia data including the live video and audio of the second anchor to the server.
  • Send to the server, and then the server can synthesize the first live multimedia data and the second live multimedia data to obtain the synthesized live multimedia data, and send it to the audience terminal watching the live broadcast of the first anchor and the second anchor.
  • the first terminal can also send the first live multimedia data to the second terminal, and the second terminal can also send the first live multimedia data to the first terminal, and then the first anchor and the second anchor can communicate through the live multimedia data. interactive.
  • the first terminal sends the first live multimedia data to the server and the second terminal, and when the second terminal and the server receive the first live multimedia data sent by the first terminal, there is already a delay, and the second terminal sends the second When the host interacts with the first live multimedia data and sends the second live multimedia data to the first terminal and the server, there will also be a certain delay.
  • the second anchor sings the second line of lyrics after receiving and playing the audio of the first anchor singing the first lyrics
  • the second terminal The second live multimedia data in the process of the second anchor singing the second sentence of lyrics is sent to the first terminal and the server. Due to the network delay, the time point when the first terminal and the server receive the audio of the second line of lyrics will be later than the time point when the second anchor sings the second line of lyrics.
  • the server there will be a delay time between receiving the live multimedia data corresponding to the first lyrics sent by the first terminal and the live multimedia data corresponding to the second lyrics sent by the second terminal (in this delay time What the server receives is the live multimedia data of the second anchor before singing the second lyrics), so after the server sends the live multimedia data corresponding to the first anchor and the second anchor to the audience terminal, the audience terminal sees the first anchor and There will be a period of delay between the second anchor singing the two lines of lyrics, which will cause the process of the two anchors singing the song displayed on the audience terminal to be incoherent.
  • Fig. 2 is a flow chart of a method for performing chorus with microphones provided by an embodiment of the present application, and the method is applied to a first terminal.
  • the method includes:
  • Step 201 sending a chorus request of the target song to the server.
  • Step 202 Receive the start singing command of the target song sent by the server, and determine at least one local singing time period according to the segmentation information of the target song.
  • Step 203 start playing the accompaniment of the target song and enter the first processing state, in the first processing state, in the first live multimedia data generated locally, add the accompaniment playback progress and singing label when recording the first live multimedia data, Send the added and processed first live multimedia data to the server.
  • Step 204 Switch to the second processing state when the accompaniment playing progress reaches the end time point of the current singing time period.
  • a delay tag is added to the first live multimedia data currently generated locally, and sent to the server Add the processed first live multimedia data; when receiving the second live multimedia data sent by the server and carrying the second terminal singing label and accompaniment playback progress, in the first live multimedia data currently generated locally, add the received The accompaniment playing progress and the non-singing label of the second terminal send the added and processed first live multimedia data to the server.
  • Fig. 3 is a flow chart of a method for performing chorus with microphones provided by an embodiment of the present application, and the method is applied to a first terminal.
  • the method includes:
  • Step 301 Receive a chorus request for a target song sent by the first terminal and the second terminal.
  • Step 302 sending a command to start singing the target song to the first terminal and the second terminal.
  • Step 303 Receive the first live multimedia data sent by the first terminal in the first processing state, and the second live multimedia data sent by the second terminal in the second processing state, wherein the first live multimedia data carries accompaniment playback progress.
  • Step 304 when the received second live multimedia data carries a delay tag, delete the second live multimedia data;
  • the accompaniment playing progress carried by the live multimedia data and the accompaniment playing progress carried by the first live multimedia data are combined to obtain the synthesized live multimedia data.
  • Step 305 sending the synthesized live multimedia data to the audience terminals corresponding to the first terminal and the second terminal.
  • Fig. 4 is a flow chart of a method for chorusing with wheat provided by an embodiment of the present application. The method is applied to the interaction between the first terminal and the server. Referring to Fig. 4, this embodiment includes:
  • Step 401 the first terminal sends a chorus request of the target song to the server.
  • the first anchor when the first anchor wants to sing together with the second anchor, the first anchor can operate the live broadcast application program in the first terminal, select the target song to be sung, and then send the corresponding song to the second terminal and the second anchor.
  • the server sends a chorus request for the target song.
  • the second anchor After the second terminal receives the chorus request of the target song sent by the first terminal, the second anchor can choose to accept the chorus of the target song with the first anchor, and can also send the chorus request of the target song to the server .
  • Step 402 the first terminal receives the start singing command of the target song sent by the server.
  • the server After receiving the chorus request of the target song sent by the first terminal and the second terminal, the server can simultaneously send the start singing command of the target song to the first terminal and the second terminal.
  • Step 403 the first terminal determines at least one local singing time period according to the segmentation information of the target song.
  • the segmentation information of the target song may be pre-stored in the first terminal, or may be sent to the first terminal by the server.
  • the segmentation information of the target song the part of the lyrics to be sung by the lead singer and the receiving party may be recorded.
  • the lead singer can be the initiator of the Lianmai chorus, that is, the first anchor
  • the singing party can be the receiving end of the Lianmai chorus, that is, the second anchor.
  • the first terminal can determine the singing time period when the first terminal (that is, the local) needs to sing the target song according to the part of the lyrics that the lead singer and the singing party need to sing respectively recorded in the segment information of the target song, that is, after starting the chorus , the first anchor needs to sing the target song within the singing time period.
  • the process of determining at least one local singing time period according to the segmentation information of the target song will not be described in detail here.
  • Step 404 the first terminal starts to play the accompaniment of the target song, and enters the first processing state.
  • the first terminal and the second terminal can respectively include two processing states, which are the processing state when the host is in the singing time period, that is, the first processing state, and the processing state when the anchor is in the non-singing time period.
  • the processing state of that is, the second processing state, wherein, when the first terminal is in the first processing state, the second terminal is in the second processing state.
  • the scheme is described in detail by taking the first terminal singing the target song first as an example, that is, after the chorus singing starts, the first terminal first enters the first processing state.
  • the first processing state in the locally generated first live multimedia data, add the accompaniment playing time point and singing label when collecting the first live multimedia data, and send the added and processed first live multimedia data to the server.
  • the live multimedia data collected by the first terminal may be referred to as the first live multimedia data
  • the live multimedia data collected by the second terminal The multimedia data is called the second live multimedia data.
  • the accompaniment audio of the target song can be played
  • the first anchor can sing the target song along with the played accompaniment audio
  • the first terminal can collect the audio when the first anchor sings the target song.
  • the first live multimedia data such as the video data of the target song sung by the first anchor, can be captured by the camera of the first terminal, and the audio data of the target song sung by the first anchor can be recorded by the microphone.
  • the first terminal After the first terminal collects the first live multimedia data when the first anchor sings the target song, it can add the accompaniment playback time point and singing label of the accompaniment audio of the target song currently played to the first live multimedia data. For example, when the microphone captures the audio frame of the target song sung by the first host, the current accompaniment playback time point can be obtained, and the corresponding accompaniment playback time point can be added to the audio frame, such as the metadata added to the audio frame, and Singing tags can be added to the metadata. Afterwards, the first live multimedia data with the added accompaniment playing time point and singing label can be sent to the server and the second terminal, so that after the server and the second terminal receive the first live multimedia data, they can be added according to the first live multimedia data.
  • the singing label of the system determines that the received first live multimedia data sent by the first terminal is the first live multimedia data of the target song sung by the first anchor, and can determine the progress of the first anchor singing the target song according to the progress of the accompaniment.
  • the accompaniment playing progress can be the time point of accompaniment playing
  • the singing label can be any character with identification function, which can be preset by the technician.
  • the second terminal when the first terminal is in the first processing state, the second terminal is in the second processing state, wherein the processing of the second terminal in the second processing state is as follows:
  • the first live multimedia data is the first live multimedia data, or the first live multimedia data without singing tags is added
  • a delay tag is added to the second live multimedia data currently generated locally
  • the added processed second live multimedia data is sent to the server and the first terminal
  • For multimedia data when receiving the first live multimedia data sent by the first terminal and carrying the accompaniment playing progress and singing tags, add the received accompaniment playing progress and non-singing tags to the second live multimedia data currently generated locally , sending the added and processed second live multimedia data to the server and the first terminal.
  • the second terminal may enter the second processing state after receiving the start singing command of the target song sent by the server.
  • the first live multimedia data sent by the first terminal received by the second terminal may not be the live multimedia data sent by the first terminal in the first processing state due to network delay. Therefore, it is possible to determine whether the first live multimedia data sent by the first terminal carries a singing tag, and then determine whether the first live multimedia data received is the first anchor singing when the second terminal is in the second processing state. The live multimedia data sent in the time period.
  • the first live multimedia data sent by the first terminal does not carry a singing tag
  • the reaction and interaction made by the second anchor according to the broadcasted live multimedia data is not generated when seeing or hearing the first anchor sing the target song. Therefore, a delay tag can be added to the second live multimedia data currently collected by the second terminal, and the second live multimedia data with the delay tag added can be sent to the server and the first terminal. In this way, the delay tag can be used to indicate that the second live multimedia data of the second anchor sent by the second terminal is not collected when seeing or hearing the first anchor sing the target song.
  • the processing of adding a delay tag in the second live multimedia is similar to the above-mentioned processing of adding a singing tag in the first live multimedia, and will not be repeated here.
  • the first live multimedia data sent by the first terminal carries a singing label
  • the first live multimedia data sent by the first terminal is the live multimedia data of the first anchor singing the target song
  • the second The anchor's reaction and interaction based on the broadcasted live multimedia data is generated when seeing or hearing the first anchor sing the target song.
  • the accompaniment playback progress added in the received first live multimedia data can be acquired, and the acquired accompaniment playback time point can be added to the second live multimedia data collected by the second terminal, and at the same time can be recorded in the second live multimedia data.
  • the second live multimedia data with the added accompaniment playing time point and the non-singing label can be sent to the first terminal and the server.
  • the accompaniment playing progress added in the first live multimedia received in the second live multimedia data can be used to represent the second live multimedia data, which means that the corresponding first anchor is singing to the accompaniment playing progress when the second anchor watches.
  • the corresponding target song was collected.
  • the processing of adding the accompaniment playback time point and the non-sung label in the second live multimedia is similar to the above-mentioned processing of adding the accompaniment playback time point and the singing label in the first live multimedia, and will not be repeated here.
  • the second terminal since the first live multimedia sent by the first terminal received by the second terminal in the first processing state includes the target song accompaniment audio, the second terminal may not perform the target song accompaniment in the second processing state. play.
  • Step 405 the server receives the first live multimedia data sent by the first terminal, and the second live multimedia data sent by the second terminal.
  • both the first live multimedia data and the second multimedia data carry the accompaniment playing progress
  • the first live multimedia data is the live multimedia data sent by the first terminal when it is in the first state
  • the accompaniment carried therein The playing progress may be the playing time point of the accompaniment when the first terminal collects the first live multimedia data.
  • the second live multimedia data is the live multimedia data sent by the second terminal when it is in the second state
  • the accompaniment playback progress carried in it is the accompaniment carried in the first live multimedia data when the second terminal receives the first live multimedia data playback progress.
  • step 406 the server synthesizes the first live multimedia data and the second live multimedia data, and obtains the synthesized live multimedia data.
  • the server receives the first live multimedia data sent by the first terminal and the second live multimedia data sent by the second terminal.
  • the received first live multimedia data includes the first live multimedia data that carries the singing label sent by the first terminal in the first processing state, and the carrying delay label or non-singing label sent in the second processing state.
  • the received second live multimedia data includes the second live multimedia data carrying the singing label sent by the second terminal in the first processing state, and the carrying delay label or sending in the second processing state
  • the second live multimedia data of the non-singing label is the first live multimedia data that carries the singing label sent by the first terminal in the first processing state, and the carrying delay label or sending in the second processing state.
  • the first live multimedia data and the second live multimedia data can be respectively cached with preset data amounts.
  • the live multimedia data carrying the delay tag may not be cached , that is, delete the first live multimedia data or the second live multimedia data carrying the delay tag.
  • the first live multimedia data and the second live multimedia data cached by the server are the live multimedia data collected in the corresponding singing time period, or the live multimedia data in the singing time period sent by the opposite end are received , the collected live multimedia data.
  • FIG. 6 is a schematic diagram of the synthesized live multimedia data received by the viewer terminal.
  • the first live multimedia data and the second live multimedia data are synthesized to obtain the synthesized live multimedia data
  • the processing can be as follows:
  • Step 4061 Perform audio synthesis processing on the audio frames carrying the same accompaniment playing progress in the first live multimedia data and the second live multimedia data, to obtain audio frames after audio synthesis processing.
  • the audio frame in the first live multimedia data may be called a first audio frame
  • the audio frame in the second live multimedia data may be called a second audio frame
  • the accompaniment playback time points carried by each first audio frame in the first live multimedia data and the accompaniment playback time points carried by the second audio frames can be determined respectively, and then The first audio frame and the second audio frame with the same accompaniment playback time point can be audio synthesized, that is, the first audio frame and the second audio frame with the same accompaniment progress can be synthesized to obtain the audio after audio synthesis frame, that is, the live audio of the first anchor and the second anchor singing together.
  • Step 4062 Based on the audio frames subjected to audio synthesis processing, perform video frame splicing processing on the first video frame included in the first live multimedia data and the second video frame included in the second live multimedia data to obtain video frame splicing processing subsequent video frames.
  • the video frame in the first live multimedia data may be referred to as a first video frame
  • the video frame in the second live multimedia data may be referred to as a second video frame.
  • the first video frame in the first live multimedia data and the second video frame in the second live multimedia data may be aligned with the audio frame for audio synthesis processing according to the audio frame for audio synthesis processing .
  • the alignment processing may be implemented according to the acquisition time points carried in the first video frame, the second video frame, the first audio frame, and the second audio frame.
  • the two audio frames for audio splicing processing are the first audio frame A in the first live multimedia data and the second audio frame B in the second live multimedia data
  • the audio frame carried in the first audio frame A can be determined Acquisition time point a
  • the first video frame A that is the same or closest to the acquisition time point a at the acquisition time can be determined in the first live multimedia data
  • the second video frame B that is the same as or closest to the acquisition time point b at the acquisition time and splice the first video frame A and the second video frame B into one video frame, in which both It includes the picture of the first video frame A, and also includes the picture of the second video frame B. In this way, video frames after video alignment processing can be obtained.
  • the corresponding audio frames and video frames can be sent to the audience terminal.
  • the acquisition time points of the audio frame and video frame can also be uniformly modified, so that the audio frame after the audio synthesis process and the video frame after the video alignment process The collected collection time points are consistent and continuous. In this way, after receiving the synthesized live multimedia data, the viewer terminal can play the audio frames and video frames in the synthesized live multimedia data according to the corresponding collection time points.
  • the server receives the The numbers of data packets corresponding to the live multimedia data sent by the first terminal and the second terminal are different. Therefore, a method for correcting the number of live multimedia data packets is also provided in the present application, as follows:
  • the server when the first terminal and the second terminal switch to the next processing state, can respectively determine the number of data packets sent by the first terminal and the second terminal when they were in the previous processing state. Wherein, if the server determines that the first live multimedia data carries a singing tag, it can determine that the first terminal is in the first processing state; if it determines that the first live multimedia data carries a non-singing tag or a delayed tag, it can determine that the first A terminal is in a second processing state. When the server determines that the singing label carried in the received first live multimedia data is switched to a delayed label or a non-singing label, it can be determined that the first terminal enters the second processing state from the first processing state. Similarly, the number of data packets sent by the second terminal in the second processing state (ie, the second number of data packets) can be determined for the number of data packets sent in the first processing state (ie, the number of first data packets).
  • the received second live multimedia data is subjected to supplementary packet processing, if the first data If the number of packets is smaller than the number of second data packets, packet deletion processing is performed on the second live multimedia data based on the difference.
  • the second live multimedia data sent by the second terminal can be Carry out supplementary packet processing, such as adding empty packets after the last data packet sent by the second terminal in the second processing state, wherein the number of added empty packets may be equal to the difference between the number of first data packets and the number of second data packets .
  • the second terminal sends more data packets corresponding to the second live multimedia, then the data packets sent by the second terminal before switching the processing state can be deleted,
  • the number of corresponding deletions may be equal to the difference between the first number of data packets and the second number of data packets.
  • Step 407 Send the synthesized live multimedia data to audience terminals corresponding to the first terminal and the second terminal.
  • the synthesized live multimedia data can be sent to the audience terminal watching the first anchor and the second anchor perform chorus.
  • the process of chorus with mic is divided into two processing states.
  • the first terminal when the first terminal and the second terminal are performing chorus with mic, the process of chorus with mic is divided into two processing states.
  • the first terminal when the second terminal is in the first processing state, it can send a packet containing
  • the second terminal when the second terminal is in the second processing state, it can send to the server the live multimedia data carrying the received accompaniment playing progress sent by the first terminal.
  • the live multimedia data carrying the playing progress of the accompaniment sent separately is synthesized. In this way, the live multimedia data sent to the audience terminal is sent according to the progress of the accompaniment, which can solve the problem of incoherent singing at the audience terminal when two anchors are singing together.
  • the processing when the first terminal is in the first processing state and the second terminal is in the second processing state is introduced, and the processing of the server is introduced correspondingly.
  • a method of changing from the first processing state to the second processing state when the accompaniment playback progress reaches the singing end time point of the current singing time period, switch to the second processing state.
  • the first terminal when it detects that the accompaniment playback time point is played to the singing end time point of the current singing time period, the first terminal can enter the second processing state and execute the second processing The processing corresponding to the state.
  • the first terminal when the first terminal receives the second live multimedia data carrying the non-singing tag sent by the second terminal, it may add a delay tag to the first live multimedia data currently generated locally, Send the added processed first live multimedia data to the server, when receiving the second live multimedia data that carries the accompaniment playback progress and the singing label sent by the second terminal, in the first live multimedia data currently generated locally, add The received accompaniment playing progress and non-singing tags are sent to the server to add and process the first live multimedia data.
  • the first terminal when the first terminal enters the second processing state, it may stop playing the accompaniment audio of the target song. Due to network delay, the second live multimedia data received by the second terminal when the first terminal enters the second processing state for a period of time is still the live multimedia data sent by the second terminal when it is in the second processing state data. Therefore, when the first terminal enters the second processing state, when receiving the second live multimedia data that carries the non-singing tag sent by the second terminal, it adds a delay tag to the first live multimedia data that is currently generated locally. The server and the second terminal send the first live multimedia data with the delay tag added. Wherein, the delay tag is used to indicate that the first live multimedia data of the first anchor sent by the first terminal is not collected when seeing or hearing the second anchor sing the target song. In addition, the processing of adding a delay label in the first live multimedia is the same as the above-mentioned processing of adding a singing label in the first live multimedia, and will not be repeated here.
  • the live multimedia data sent by the second terminal is the live multimedia data of the target song sung by the second anchor.
  • the reaction and interaction of the played live multimedia data is generated when seeing or hearing the second anchor sing the target song.
  • the accompaniment playback time point added in the received second live multimedia data can be obtained, and the obtained accompaniment playback time point can be added to the first live multimedia data collected by the first terminal, and at the same time, the accompaniment playback time point can be added in the first live multimedia data.
  • a non-sung label is added to the data, and then the live multimedia data with the added accompaniment playing time point and the non-sung label can be sent to the second terminal and the server.
  • the accompaniment playing time point added in the received first live multimedia data is added to the first live multimedia data, which can be used to represent the first live multimedia data.
  • the target song corresponding to the time point was collected.
  • the processing of adding the accompaniment playback time point and the non-singing label in the first live multimedia is similar to the above-mentioned processing of adding the accompaniment playback time point and the singing label in the first live multimedia, and will not be repeated here.
  • this application provides a method of transferring from the second processing state to the first processing state: when the received accompaniment playing time point is the singing start time point of any singing time period, start playing from the singing starting time point Accompaniment, switch to the first processing state.
  • the second terminal when the time point carried in the first live multimedia data sent by the first terminal received by the second terminal is the singing start time point of any singing time period corresponding to the second terminal, the second terminal can Enter the first processing state.
  • the processing of the second terminal in the first processing state, is the same as that of the first terminal, and the accompaniment playback time point and singing label when recording the second live multimedia data can be added to the locally generated live multimedia data, and sent to the server and The first terminal sends the added and processed second live multimedia data.
  • the first terminal and the second terminal are constantly switching between the first processing state and the second processing state, and when the first terminal is in the first processing state , the second terminal should be in the second processing state, and when the second terminal is in the first processing state, the first terminal should be in the first processing state.
  • the above steps 405-407 are processes in which the server processes the first live multimedia data sent by the first terminal and the second live multimedia data sent by the second terminal when the first terminal is in the first processing state.
  • the processing of the server to the second live multimedia data sent by the second terminal is the same as the processing of the server to the first live multimedia data in the above-mentioned steps 405-407;
  • the processing of the sent first live multimedia data is the same as the processing of the second live multimedia data by the server in steps 405-407 above, and will not be repeated here.
  • the process of chorus with mic is divided into two processing states.
  • the first terminal when the first terminal is in the first processing state, it can send a packet containing For the live multimedia data of the accompaniment playing progress
  • the second terminal when the second terminal is in the second processing state, it can send to the server the live multimedia data carrying the received accompaniment playing progress sent by the first terminal.
  • the live multimedia data carrying the playing progress of the accompaniment sent separately is synthesized. In this way, the live multimedia data sent to the audience terminal is sent according to the progress of the accompaniment, which can solve the problem of incoherent singing at the audience terminal when two anchors are singing together.
  • Fig. 7 is a flow chart of determining the singing time period provided by the embodiment of the present application. Referring to Figure 7, this embodiment includes:
  • Step 701 according to the segment information of the target song, determine the singing segments corresponding to the first terminal and the second terminal respectively in the target song.
  • the segmentation information of the target song includes the singer corresponding to each lyric in the target song, for example, the first lyric is sung by the lead singer, the second lyric is sung by the singer, and the third lyric is the lead singer. sang.
  • the lead singer may be the party who initiates the chorus, for example, may be the first anchor corresponding to the first terminal. In this way, the part of the lyrics to be sung by the first terminal and the part of the lyrics to be sung by the second terminal can be determined according to the segmentation information of the target song.
  • Step 702 for any singing paragraph corresponding to the first terminal, if any singing paragraph is not the corresponding last singing paragraph in the target song, then based on the target playback end time corresponding to any singing paragraph in the target song, and any singing paragraph The target playback start time point of the next singing paragraph adjacent to one singing paragraph determines the singing end time point corresponding to any singing paragraph.
  • any singing paragraph corresponding to the first terminal may be any part of lyrics that the first terminal needs to sing.
  • the singing paragraphs corresponding to the first terminal may be the first to third lines of lyrics, the seventh to ninth lines of lyrics, and the thirteenth to fifteenth lines of lyrics.
  • the target song may correspond to a lyrics file, and the playback start time point and the playback end time point of each lyrics in the target song are recorded in the lyrics file. In this way, according to the lyrics file corresponding to the target song, the playback start time point and the playback end time point of each singing section of the target song in the target song can be determined.
  • the process of determining the corresponding singing end time point of any singing paragraph is as follows:
  • the time interval between the target playback end time and the target playback start time point If the time interval is greater than the preset time interval threshold, the intermediate time point between the target playback end time and the target start playback time point is determined as the singing end time point corresponding to any singing paragraph.
  • the preset time interval threshold may be preset by a technician, which is not limited here. For example, it may be 3 seconds, 4 seconds, 5 seconds, etc.
  • the intermediate time point between the target playback end time and the target playback start time point can be determined as the singing end time point corresponding to any singing paragraph.
  • the target time point is determined between the target playback end time and the target playback start time point, wherein the target time point and the target playback end time point
  • the ratio between the first time interval and the second time interval between the target time point and the target start and end time points satisfies the division ratio, and the first time interval is greater than the second time interval.
  • the time interval is less than or equal to the preset time interval threshold, it can be explained that in the two singing paragraphs, the accompaniment time between the last lyric of the previous singing paragraph and the first lyric of the following singing paragraph is relatively short.
  • the anchor sings the corresponding lyrics of the target song
  • the sound may be elongated, resulting in the anchor’s singing time of a line of lyrics may exceed the playing time of the corresponding lyrics set in the lyrics text. Therefore, in the two singing paragraphs corresponding
  • the time interval is short, more time can be reserved for the singer of the previous singing section, so the target time point can be determined between the target playback end time and the target start playback time point based on the preset division ratio.
  • the ratio of the first time interval between the target time point and the target playback end time point and the second time interval between the target time point and the target start and end time point determined according to the preset ratio satisfies the division ratio, And the first time interval is greater than the second time interval.
  • Step 703 if any singing paragraph is not the corresponding first singing paragraph in the target song, then determine the singing end time point of the previous singing paragraph adjacent to any singing paragraph as the singing start time of any singing paragraph point.
  • the start time and end time of singing corresponding to each singing paragraph may be determined.
  • the starting time point of the target song that is, zero minutes and zero seconds, can be determined as the singing paragraph.
  • the singing start time of the paragraph if the corresponding singing paragraph of the first terminal is not the first singing paragraph in the target song, then the start time point corresponding to the singing paragraph may be the end time point of the previous singing paragraph corresponding to the singing paragraph. That is, in two adjacent singing sections in the target song, the singing end time point of the previous singing section is the singing start time point of the next singing section.
  • Step 704 based on the singing start time point and the singing end time point corresponding to any singing paragraph, determine the singing time period corresponding to any singing paragraph.
  • the time period corresponding to the singing starting time point corresponding to each singing paragraph and the singing end time point is the corresponding time period of each singing paragraph. singing time.
  • the embodiment of the present application provides a method of singing time period, which can determine the corresponding singing time periods of the first terminal and the second terminal respectively in the target song according to the segmentation information of the target song, which can be respectively the first terminal
  • Each singing part corresponding to the second terminal is allocated a reasonable singing time.
  • the process of chorus with mic is divided into two processing states.
  • the first terminal when the first terminal is in the first processing state, it can send a packet containing For the live multimedia data of the accompaniment playing progress
  • the second terminal when the second terminal is in the second processing state, it can send to the server the live multimedia data carrying the received accompaniment playing progress sent by the first terminal.
  • the live multimedia data carrying the playing progress of the accompaniment sent separately is synthesized. In this way, the live multimedia data sent to the audience terminal is sent according to the progress of the accompaniment, which can solve the problem of incoherent singing at the audience terminal when two anchors are singing together.
  • Fig. 8 is a device for chorusing with wheat provided by the embodiment of the present application.
  • the device may be the first terminal or the second terminal in the above embodiment, and the device includes:
  • the sending module 810 is used to send the chorus request of the target song to the server;
  • the determining module 820 is configured to receive the start singing command of the target song sent by the server, and determine at least one local singing time period according to the segmentation information of the target song;
  • the processing module 830 is used to start playing the accompaniment of the target song and enter the first processing state, add the accompaniment playback progress and singing tags when recording the first live multimedia data to the locally generated first live multimedia data, Sending the added and processed first live multimedia data to the server;
  • the switching module 840 is used to switch to the second processing state when the accompaniment playback progress reaches the end time point of the current singing time period;
  • the processing module 830 is configured to add a delay tag to the first live multimedia data currently generated locally when receiving the second live multimedia data sent by the server and carrying the non-sung tag of the second terminal, and add a delay tag to the first live multimedia data currently generated locally.
  • the server sends the first live multimedia data after adding processing; when receiving the second live multimedia data sent by the server and carrying the second terminal singing label and accompaniment playback progress, the first live multimedia data currently generated locally In the multimedia data, the received second terminal accompaniment playing progress and non-singing label are added, and the added and processed first live multimedia data is sent to the server.
  • the switching module 840 is also used for:
  • the determining module 820 is configured to:
  • the segmentation information of the target song determine the singing paragraphs corresponding to the first terminal and the second terminal respectively in the target song;
  • any singing paragraph corresponding to the first terminal if the any singing paragraph is not the corresponding last singing paragraph in the target song, then based on the target corresponding to the any singing paragraph in the target song Play end time, and the target play start time point of the next singing paragraph adjacent to any singing paragraph, determine the singing end time point corresponding to any singing paragraph;
  • the singing end time point of the previous singing paragraph adjacent to the any singing paragraph is determined as the any singing paragraph The start time of the concert;
  • the singing time period corresponding to the any singing paragraph is determined.
  • the determining module 820 is configured to:
  • the intermediate time point between the target playback end time and the target start playback time point is determined as the singing end time point corresponding to any singing paragraph ;
  • the time interval is less than or equal to the preset time interval threshold, based on the preset division ratio, determine the target time point between the target playback end time and the target playback start time point, wherein the target The ratio of the first time interval between the time point and the target playback end time point to the second time interval between the target time point and the target start and end time point satisfies the division ratio, and the The first time interval is greater than the second time interval.
  • processing module 830 is also used for:
  • Fig. 9 is a device for chorusing with wheat provided by the embodiment of the present application.
  • the device may be the server in the above embodiment, and the device includes:
  • the receiving module 910 is configured to receive the chorus request of the target song sent by the first terminal and the second terminal;
  • a sending module 920 configured to send an order to start singing the target song to the first terminal and the second terminal;
  • the receiving module 910 is configured to receive the first live multimedia data sent by the first terminal in the first processing state, and the second live multimedia data sent by the second terminal in the second processing state, wherein the first The live multimedia data carries the accompaniment playback progress;
  • the processing module 930 is configured to delete the second live multimedia data when the received second live multimedia data carries a delay tag; when the received second live multimedia data carries a non-singing tag and the accompaniment playback progress, based on the accompaniment playback progress carried by the second live multimedia data and the accompaniment playback progress carried by the first live multimedia data, the first live multimedia data and the second live multimedia data Perform synthesis processing to obtain live multimedia data after synthesis processing;
  • the sending module 920 is configured to send the synthesized live multimedia data to audience terminals corresponding to the first terminal and the second terminal.
  • processing module 930 is configured to:
  • video alignment processing is performed on the video frames in the first live multimedia data and the video frames in the second live multimedia data to obtain video data after video alignment processing.
  • processing module 930 is further configured to:
  • the first terminal When it is determined that the first terminal enters the second processing state, determine to receive the number of first data packets corresponding to the first live multimedia data sent by the first terminal in the first processing state, and determine receiving the second data packet number corresponding to the second live multimedia data sent by the second terminal in the second processing state;
  • the number of the first data packets is greater than the number of the second data packets, based on the difference between the number of the first data packets and the number of the second data packets, supplementary packets are performed on the received second live multimedia data Processing, if the number of the first data packets is smaller than the number of the second data packets, then perform packet deletion processing on the second live multimedia data based on the difference.
  • the device for performing chorus with mics provided in the above-mentioned embodiments only uses the division of the above-mentioned functional modules for illustration when performing chorus with mics.
  • the above-mentioned functions can be assigned by different Completion of functional modules means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device for performing chorus with wheat provided by the above embodiments and the method embodiment for performing chorus with wheat belong to the same idea, and its specific implementation process is detailed in the method embodiment, and will not be repeated here.
  • the present application also provides a system for performing chorus with wheat, and the system includes a first terminal, a second terminal and a server, wherein:
  • the first terminal is configured to send a chorus request of the target song to the server; receive the start singing command of the target song sent by the server, and determine the local at least A singing period; start playing the accompaniment of the target song and enter the first processing state, in the locally generated first live multimedia data, add the accompaniment playback progress and singing label when recording the first live multimedia data, to
  • the server sends the first live multimedia data after adding processing; when the accompaniment playback progress reaches the end time point of the current singing time period, switch to the second processing state, and when receiving the data sent by the server carrying the second When the terminal does not sing the second live multimedia data of the tag, add a delay tag in the first live multimedia data currently generated locally, and send the added processed first live multimedia data to the server; When carrying the second live multimedia data of the second terminal singing tag and accompaniment playing progress, add the received second terminal accompaniment playing progress and non-singing tags to the first live multimedia data currently generated locally.
  • the server sends the added and processed first live multimedia data;
  • the server is configured to receive the chorus request of the target song sent by the first terminal and the second terminal; send the start singing command of the target song to the first terminal and the second terminal; receive the first terminal and the second terminal; The first live multimedia data sent by a terminal in the first processing state, and the second live multimedia data sent by the second terminal in the second processing state, wherein the first live multimedia data carries the accompaniment playback progress; when receiving When the received second live multimedia data carries a delay tag, delete the second live multimedia data; when the received second live multimedia data carries a non-sung tag and an accompaniment playback progress, based The accompaniment playback progress carried by the second live multimedia data, and the accompaniment playback progress carried by the first live multimedia data, the first live multimedia data and the second live multimedia data are synthesized to obtain the synthesized the live multimedia data; sending the synthesized live multimedia data to audience terminals corresponding to the first terminal and the second terminal.
  • Fig. 10 shows a structural block diagram of a computer device 1000 provided by an exemplary embodiment of the present application.
  • the computer device 1000 can be the first terminal or the second terminal in the above-mentioned embodiment, and the computer device 1000 can be a portable mobile terminal, such as: smart phone, tablet computer, MP3 player (moving picture experts group audio layer III, dynamic Image experts compress standard audio layer 3), MP4 (moving picture experts group audio layer IV, moving picture experts compress standard audio layer 4) player, laptop or desktop computer.
  • the computer device 1000 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and other names.
  • a computer device 1000 includes: a processor 1001 and a memory 1002 .
  • the processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 1001 can adopt at least one hardware form among DSP (digital signal processing, digital signal processing), FPGA (field-programmable gate array, field programmable gate array), PLA (programmable logic array, programmable logic array) accomplish.
  • the processor 1001 may also include a main processor and a coprocessor, the main processor is a processor for processing data in the wake-up state, and is also called a CPU (central processing unit, central processing unit); the coprocessor is Low-power processor for processing data in standby state.
  • the processor 1001 may be integrated with a GPU (graphics processing unit, image processor), and the GPU is used for rendering and drawing the content to be displayed on the display screen.
  • the processor 1001 may further include an AI (artificial intelligence, artificial intelligence) processor, where the AI processor is configured to process computing operations related to machine learning.
  • AI artificial intelligence, artificial intelligence
  • Memory 1002 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 1002 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 1002 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 1001 to implement the connection method provided by the method embodiment in this application. Mai Chorus method.
  • the computer device 1000 may optionally further include: a peripheral device interface 1003 and at least one peripheral device.
  • the processor 1001, the memory 1002, and the peripheral device interface 1003 may be connected through buses or signal lines.
  • Each peripheral device can be connected to the peripheral device interface 1003 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 1004 , a display screen 1005 , a camera component 1006 , an audio circuit 1007 , a positioning component 1008 and a power supply 1009 .
  • the peripheral device interface 1003 may be used to connect at least one peripheral device related to I/O (input/output, input/output) to the processor 1001 and the memory 1002 .
  • the processor 1001, memory 1002 and peripheral device interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1001, memory 1002 and peripheral device interface 1003 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 1004 is used to receive and transmit RF (radio frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1004 communicates with the communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 1004 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 1004 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • the radio frequency circuit 1004 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (wireless fidelity, wireless fidelity) network.
  • the radio frequency circuit 1004 may also include circuits related to NFC (near field communication, short-distance wireless communication), which is not limited in this application.
  • the display screen 1005 is used to display a UI (user interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 1005 also has the ability to collect touch signals on or above the surface of the display screen 1005 .
  • the touch signal can be input to the processor 1001 as a control signal for processing.
  • the display screen 1005 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 1005 there may be one display screen 1005, which is arranged on the front panel of the computer device 1000; in other embodiments, there may be at least two display screens 1005, which are respectively arranged on different surfaces of the computer device 1000 or folded Design; in some other embodiments, the display screen 1005 may be a flexible display screen, which is arranged on a curved surface or a folded surface of the computer device 1000 . Even, the display screen 1005 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 1005 can be made of LCD (liquid crystal display, liquid crystal display), OLED (organic light-emitting diode, organic light-emitting diode) and other materials.
  • the camera assembly 1006 is used to capture images or videos.
  • the camera component 1006 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • there are at least two rear cameras which are any one of the main camera, depth-of-field camera, wide-angle camera, and telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function.
  • camera assembly 1006 may also include a flash.
  • the flash can be a single-color temperature flash or a dual-color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • Audio circuitry 1007 may include a microphone and speakers.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 1001 for processing, or input them to the radio frequency circuit 1004 to realize voice communication.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 1001 or the radio frequency circuit 1004 into sound waves.
  • the loudspeaker can be a conventional membrane loudspeaker or a piezoelectric ceramic loudspeaker.
  • the audio circuit 1007 may also include a headphone jack.
  • the positioning component 1008 is used to locate the current geographic location of the computer device 1000, so as to realize navigation or LBS (location based service, location-based service).
  • the positioning component 1008 may be a positioning component based on the GPS (global positioning system, global positioning system) of the United States, the Beidou system of China or the Galileo system of Russia.
  • the power supply 1009 is used to supply power to various components in the computer device 1000 .
  • the power source 1009 can be alternating current, direct current, disposable batteries or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • a wired rechargeable battery is a battery charged through a wired line
  • a wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • the computing device 1000 also includes one or more sensors 1010 .
  • the one or more sensors 1010 include, but are not limited to: an acceleration sensor 1011 , a gyroscope sensor 1012 , a pressure sensor 1013 , a fingerprint sensor 1014 , an optical sensor 1015 and a proximity sensor 1016 .
  • the acceleration sensor 1011 can detect the acceleration on the three coordinate axes of the coordinate system established by the computer device 1000 .
  • the acceleration sensor 1011 can be used to detect the components of the acceleration of gravity on the three coordinate axes.
  • the processor 1001 may control the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011 .
  • the acceleration sensor 1011 can also be used for collecting game or user's motion data.
  • the gyro sensor 1012 can detect the body direction and rotation angle of the computer device 1000 , and the gyro sensor 1012 can cooperate with the acceleration sensor 1011 to collect 3D actions of the user on the computer device 1000 .
  • the processor 1001 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control and inertial navigation.
  • the pressure sensor 1013 may be disposed on the side frame of the computer device 1000 and/or the lower layer of the display screen 1005 .
  • the pressure sensor 1013 can detect the user's grip signal on the computer device 1000 , and the processor 1001 performs left and right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 1013 .
  • the processor 1001 controls the operable controls on the UI interface according to the user's pressure operation on the display screen 1005.
  • the operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • the fingerprint sensor 1014 is used to collect the user's fingerprint, and the processor 1001 recognizes the identity of the user according to the fingerprint collected by the fingerprint sensor 1014, or, the fingerprint sensor 1014 recognizes the user's identity according to the collected fingerprint.
  • the processor 1001 authorizes the user to perform related sensitive operations, such sensitive operations include unlocking the screen, viewing encrypted information, downloading software, making payment, and changing settings.
  • Fingerprint sensor 1014 may be disposed on the front, back or sides of computer device 1000 . When the computer device 1000 is provided with a physical button or a manufacturer's logo, the fingerprint sensor 1014 may be integrated with the physical button or the manufacturer's Logo.
  • the optical sensor 1015 is used to collect ambient light intensity.
  • the processor 1001 may control the display brightness of the display screen 1005 according to the ambient light intensity collected by the optical sensor 1015 . Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is decreased.
  • the processor 1001 may also dynamically adjust shooting parameters of the camera assembly 1006 according to the ambient light intensity collected by the optical sensor 1015 .
  • the proximity sensor 1016 also known as a distance sensor, is usually disposed on the front panel of the computer device 1000 .
  • the proximity sensor 1016 is used to capture the distance between the user and the front of the computer device 1000 .
  • the processor 1001 controls the display screen 1005 to switch from the bright screen state to the off-screen state; when the proximity sensor 1016 detects When the distance between the user and the front of the computer device 1000 gradually increases, the processor 1001 controls the display screen 1005 to switch from the off-screen state to the on-screen state.
  • FIG. 10 does not constitute a limitation to the computer device 1000, and may include more or less components than shown in the figure, or combine certain components, or adopt a different component arrangement.
  • FIG. 11 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 1100 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (cpu) 1101 and one Or more than one memory 1102, wherein at least one instruction is stored in the memory 1102, and the at least one instruction is loaded and executed by the processor 1101 to implement the methods provided by the above method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input and output interface for input and output, and the server may also include other components for realizing device functions, which will not be repeated here.
  • a computer-readable storage medium such as a memory including instructions, and the above instructions can be executed by a processor in the terminal to complete the method for performing chorus with microphones in the above embodiments.
  • the computer readable storage medium may be non-transitory.
  • the computer-readable storage medium may be ROM (read-only memory, read-only memory), RAM (random access memory, random access memory), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

本申请公开了一种进行连麦合唱的方法、系统、设备及存储介质,属于互联网技术领域。该方法包括:接收第一终端发送的第一直播多媒体数据,以及第二终端发送的第二直播多媒体数据,第一直播多媒体数据中携带有伴奏播放进度;当接收到的第二直播多媒体数据中携带有延迟标签时,删除第二直播多媒体数据;当接收到的第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于第二直播多媒体数据携带的伴奏播放进度,以及第一直播多媒体数据携带的伴奏播放进度,对第一直播多媒体数据和第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据。采用本申请可以解决观众终端出现的两个主播在连麦合唱时接唱不连贯的问题。

Description

进行连麦合唱的方法、系统、设备及存储介质
本申请要求于2021年08月06日提交的申请号为202110902528.8、发明名称为“进行连麦合唱的方法、系统、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,特别涉及一种进行连麦合唱的方法、系统、设备及存储介质。
背景技术
随着互联网技术的发展,人们通过网络观看主播直播已经是日常生活中常见的一种娱乐方式。
随着视频直播行业的发展,主播进行直播的形式越来越多,例如两个主播可以进行连麦唱歌。即两个主播可以按照预设的演唱顺序接唱同一首歌曲。用户可以通过进入任一主播的直播间,观看到两个主播共同演唱一首歌曲的直播视频。
在实现本申请的过程中,发明人发现相关技术至少存在以下问题:
在连麦唱歌时,两个主播都是在听到对方主播演唱完属于对方的演唱部分后,才开始演唱自己的演唱部分。但是由于网络传输存在延迟问题,主播在听到对方主播演唱完属于对方的演唱部分后再进行接唱,会导致主播接唱歌曲的接唱时间点落后,如此观众终端可能会出现两个主播接唱歌曲不连贯的问题。
发明内容
本申请实施例提供了一种进行连麦合唱的方法、系统、设备及存储介质,能够解决观众终端出现的两个主播在连麦合唱时接唱不连贯的问题。所述技术方案如下:
第一方面,提供了一种进行连麦合唱的方法,所述方法应用于第一终端,所述方法包括:
向服务器发送目标歌曲的连麦合唱请求;
接收所述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;
开始播放所述目标歌曲的伴奏并进入第一处理状态,所述第一处理状态中的处理包括:在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态,所述第二处理状态中的处理包括:当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据。
可选的,所述方法还包括:
当接收到的第二直播多媒体数据中携带的第二终端伴奏播放进度为任一演唱时间段的演唱开始时间点时,由所述演唱开始时间点开始播放所述伴奏,并切换到所述第一处理状态。
可选的,所述根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段,包括:
根据所述目标歌曲的分段信息,确定所述第一终端和第二终端分别在目标歌曲中分别对应的演唱段落;
对于所述第一终端对应的任一演唱段落,如果所述任一演唱段落非所述目标歌曲中对应的最后一个演唱段落,则基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点;
如果所述任一演唱段落非所述目标歌曲中对应的第一个演唱段落,则将所述任一演唱段落相邻的上一个演唱段落的演唱结束时间点,确定为所述任一演唱段落的演唱开始时间点;
基于所述任一演唱段落对应的演唱开始时间点和演唱结束时间点,确定所述任一演唱段落对应的演唱时间段。
可选的,所述基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点,包括:
确定所述目标播放结束时间与所述目标开始播放时间点之间的时间间隔;
如果所述时间间隔大于预设的时间间隔阈值,则将所述目标播放结束时间与所述目标开始播放时间点之间的中间时间点,确定为所述任一演唱段落对应的演唱结束时间点;
如果所述时间间隔小于等于预设的时间间隔阈值,则基于预设的划分比例,在所述目标播放结束时间与所述目标开始播放时间点之间,确定目标时间点,其中,所述目标时间点与所述目标播放结束时间点之间的第一时间间隔与所述目标时间点与所述目标开始结束时间点之间的第二时间间隔的比值,满足所述划分比例,且所述第一时间间隔大于所述第二时间间隔。
可选的,所述当伴奏播放时间点达到当前的演唱时间段的演唱结束时间点时,切换为第二处理状态之后,所述方法还包括:
停止播放所述目标歌曲的伴奏。
第二方面、提供了一种进行连麦合唱的方法,所述方法应用于服务器,所述方法包括:
接收第一终端以及第二终端发送的目标歌曲的连麦合唱请求;
向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;
接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;
当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;
将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端 对应的观众终端。
可选的,所述基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据,包括:
将所述第一直播多媒体数据以及所述第二直播多媒体数据中,携带有相同伴奏播放进度的音频帧进行音频合成处理,得到音频合成处理后的音频帧;
基于进行音频合成处理的音频帧,对所述第一直播多媒体数据中的视频帧和所述第二直播多媒体数据中的视频帧,进行视频对齐处理,得到视频对齐处理后的视频数据。
可选的,所述方法还包括:
当确定所述第一终端进入所述第二处理状态时,确定接收所述第一终端在所述第一处理状态时发送的所述第一直播多媒体数据对应的第一数据包数目,并确定接收所述第二终端在所述第二处理状态时发送的所述第二直播多媒体数据对应的第二数据包数目;
如果所述第一数据包数目大于所述第二数据包数目,则基于所述第一数据包数目与所述第二数据包数目的差值,对已接收的第二直播多媒体数据进行补包处理,如果所述第一数据包数目小于所述第二数据包数目,则基于所述差值对所述第二直播多媒体数据进行删包处理。
第三方面、提供了一种进行连麦合唱的装置,所述装置应用于第一终端,所述装置包括:
发送模块,用于向服务器发送目标歌曲的连麦合唱请求;
确定模块,用于接收所述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;
处理模块,用于开始播放所述目标歌曲的伴奏并进入第一处理状态,所述第一处理状态中的处理包括:在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
切换模块,用于当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态;
处理模块,用于在所述第二处理状态中,当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;
当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据。
可选的,所述切换模块,还用于:
当接收到的第二直播多媒体数据中携带的第二终端伴奏播放进度为任一演唱时间段的演唱开始时间点时,由所述演唱开始时间点开始播放所述伴奏,并切换到所述第一处理状态。
可选的,所述确定模块,用于:
根据所述目标歌曲的分段信息,确定所述第一终端和第二终端分别在目标歌曲中分别对应的演唱段落;
对于所述第一终端对应的任一演唱段落,如果所述任一演唱段落非所述目标歌曲中对应的最后一个演唱段落,则基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点;
如果所述任一演唱段落非所述目标歌曲中对应的第一个演唱段落,则将所述任一演唱段落相邻的上一个演唱段落的演唱结束时间点,确定为所述任一演唱段落的演唱开始时间点;
基于所述任一演唱段落对应的演唱开始时间点和演唱结束时间点,确定所述任一演唱段落对应的演唱时间段。
可选的,所述确定模块,用于:
确定所述目标播放结束时间与所述目标开始播放时间点之间的时间间隔;
如果所述时间间隔大于预设的时间间隔阈值,则将所述目标播放结束时间与所述目标开始播放时间点之间的中间时间点,确定为所述任一演唱段落对应的演唱结束时间点;
如果所述时间间隔小于等于预设的时间间隔阈值,则基于预设的划分比例, 在所述目标播放结束时间与所述目标开始播放时间点之间,确定目标时间点,其中,所述目标时间点与所述目标播放结束时间点之间的第一时间间隔与所述目标时间点与所述目标开始结束时间点之间的第二时间间隔的比值,满足所述划分比例,且所述第一时间间隔大于所述第二时间间隔。
可选的,所述处理模块还用于:
停止播放所述目标歌曲的伴奏。
第四方面、提供了一种进行连麦合唱的装置,其特征在于,所述装置应用于服务器,所述装置包括:
接收模块,用于接收第一终端以及第二终端发送的目标歌曲的连麦合唱请求;
发送模块,用于向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;
所述接收模块,用于接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;
处理模块,用于当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;
所述发送模块,用于将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端对应的观众终端。
可选的,所述处理模块,用于:
将所述第一直播多媒体数据以及所述第二直播多媒体数据中,携带有相同伴奏播放进度的音频帧进行音频合成处理,得到音频合成处理后的音频帧;
基于进行音频合成处理的音频帧,对所述第一直播多媒体数据中的视频帧和所述第二直播多媒体数据中的视频帧,进行视频对齐处理,得到视频对齐处理后的视频数据。
可选的,所述处理模块,还用于:
当确定所述第一终端进入所述第二处理状态时,确定接收所述第一终端在所述第一处理状态时发送的所述第一直播多媒体数据对应的第一数据包数目,并确定接收所述第二终端在所述第二处理状态时发送的所述第二直播多媒体数据对应的第二数据包数目;
如果所述第一数据包数目大于所述第二数据包数目,则基于所述第一数据包数目与所述第二数据包数目的差值,对已接收的第二直播多媒体数据进行补包处理,如果所述第一数据包数目小于所述第二数据包数目,则基于所述差值对所述第二直播多媒体数据进行删包处理。
第五方面、提供了一种进行连麦合唱的系统,所述系统包括第一终端、第二终端和服务器,其中:
所述第一终端,用于向所述服务器发送目标歌曲的连麦合唱请求;接收所述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;开始播放所述目标歌曲的伴奏并进入第一处理状态,所述第一处理状态中的处理包括:在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态,所述第二处理状态中的处理包括:当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
所述服务器,用于接收所述第一终端以及第二终端发送的目标歌曲的连麦合唱请求;向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的 伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端对应的观众终端。
第六方面、提供了一种终端,所述终端包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如第一方面所述的进行连麦合唱的方法所执行的操作。
第七方面、提供了一种服务器,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如第二方面所述的进行连麦合唱的方法所执行的操作。
第八方面、提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上所述的进行连麦合唱的方法所执行的操作。
本申请实施例提供的技术方案带来的有益效果是:
在本申请实施例,将第一终端和第二终端在进行连麦合唱时,将进行连麦合唱的过程分为了两种处理状态,第一终端在第一处理状态时,可以向服务器发送携带伴奏播放进度的直播多媒体数据,第二终端在第二处理状态时,可以向服务器发送携带接收的第一终端发送的伴奏播放进度的直播多媒体数据,如此,服务器可以根据第一终端和第二终端分别发送的携带伴奏播放进度的直播多媒体数据,进行合成处理。这样发送至观众终端的直播多媒体数据是按照伴奏播放进度发送的,可以解决两个主播在连麦合唱时在观众终端出现接唱不连贯的问题。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实施环境示意图;
图2是本申请实施例提供的一种进行连麦合唱的方法流程图;
图3是本申请实施例提供的一种进行连麦合唱的方法流程图;
图4是本申请实施例提供的一种进行连麦合唱的方法流程图;
图5是本申请实施例提供的一种进行连麦合唱的方法示意图;
图6是本申请实施例提供的一种进行连麦合唱的方法示意图;
图7是本申请实施例提供的一种进行连麦合唱的方法流程图;
图8是本申请实施例提供的一种进行连麦合唱的装置结构示意图;
图9是本申请实施例提供的一种进行连麦合唱的装置结构示意图;
图10是本申请实施例提供的一种终端结构示意图;
图11是本申请实施例提供的服务器结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请提供的进行连麦合唱的方法可以由终端和服务器共同实现。终端可以运行有用于直播应用程序,终端可以具备摄像头、麦克风、耳机等部件,终端具有通信功能,可以接入互联网,终端可以是手机、平板电脑、智能穿戴设备、台式计算机、笔记本电脑等。服务器可以是上述应用程序的后台服务器,服务器可以与终端建立通信。该服务器可以是一个单独的服务器也可以是一个服务器组,如果是单独的服务器,该服务器可以负责下述方案中的所有处理,如果是服务器组,服务器组中的不同服务器分别可以负责下述方案中的不同处理,具体的处理分配情况可以由技术人员根据实际需求任意设置,此处不再赘述。
如图1所示,图1是本申请实施例提供的一种实施环境的示意图。其中,第一终端对应的第一主播可以与第二终端对应的第二主播可以进行连麦直播。在连麦直播的过程中,第一终端可以将包括第一主播直播视频和音频的第一直播多媒体数据发送至服务器,第二终端可以将包括第二主播直播视频和音频的第二直播多媒体数据发送至服务器,然后服务器可以对第一直播多媒体数据和第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据,并发送至观看第一主播和第二主播连麦直播的观众终端。其中,第一终端也可以将第一直播多媒体数据发送至第二终端,第二终端也可以将第一直播多媒体数据 发送至第一终端,进而第一主播和第二主播可以通过直播多媒体数据进行互动。
由于第一终端和第二终端的直播多媒体数据是通过网络传输的,存在不可避免的网络延时。第一终端将第一直播多媒体数据发送至服务器和第二终端,第二终端和服务器接收到第一终端发送的第一直播多媒体数据时,就已经存在了延时,而第二终端将第二主播根据第一直播多媒体数据进行互动过程中的第二直播多媒体数据发送到第一终端和服务器时,也会存在一定的延时。
例如在第一主播和第二主播在直播间中进行连麦合唱时,第二主播在接收并播放第一主播演唱完第一歌词的音频后,接唱了第二句歌词,并且第二终端将第二主播接唱第二句歌词过程中的第二直播多媒体数据发送至第一终端和服务器。由于存在网络延迟的原因,第一终端和服务器接收到第二句歌词的音频的时间点,相对于第二主播的接唱第二句歌词的时间点会靠后。这样在服务器中,接收到第一终端发送的第一句歌词对应的直播多媒体数据与第二终端发送的第二句歌词对应的直播多媒体数据之间会存在一段延迟时间(在该段延迟时间内服务器接收的为第二主播在演唱第二句歌词之前的直播多媒体数据),如此服务器将第一主播和第二主播对应的直播多媒体数据发送至观众终端后,观众终端看到的第一主播和第二主播在接唱两句歌词之间会存在一段延迟时间,从而导致观众终端显示的两个主播接唱歌曲的过程不连贯。
图2是本申请实施例提供的一种进行连麦合唱的方法的流程图,该方法应用于第一终端。参见图2,该方法包括:
步骤201、向服务器发送目标歌曲的连麦合唱请求。
步骤202、接收服务器发送的目标歌曲的开始演唱命令,根据目标歌曲的分段信息,确定本地的至少一个演唱时间段。
步骤203、开始播放目标歌曲的伴奏并进入第一处理状态,在第一处理状态中,在本地生成的第一直播多媒体数据中,添加录制第一直播多媒体数据时的伴奏播放进度以及演唱标签,向服务器发送添加处理后的第一直播多媒体数据。
步骤204、当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态。
其中,在第二处理状态中,当接收到服务器发送的携带有第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加 延迟标签,并向服务器发送添加处理后的第一直播多媒体数据;当接收到服务器发送的携带有第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向服务器发送添加处理后的第一直播多媒体数据。
图3是本申请实施例提供的一种进行连麦合唱的方法的流程图,该方法应用于第一终端。参见图3,该方法包括:
步骤301、接收第一终端以及第二终端发送的目标歌曲的连麦合唱请求。
步骤302、向第一终端以及第二终端发送目标歌曲的开始演唱命令。
步骤303、接收第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度。
步骤304、当接收到的第二直播多媒体数据中携带有延迟标签时,删除第二直播多媒体数据;当接收到的第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于第二直播多媒体数据携带的伴奏播放进度,以及第一直播多媒体数据携带的伴奏播放进度,对第一直播多媒体数据和第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据。
步骤305、将合成处理后的直播多媒体数据发送至第一终端和第二终端对应的观众终端。
图4是本申请实施例提供的一种进行连麦合唱的方法的流程图。该方法应用于第一终端和服务器之间的交互。参见图4,该实施例包括:
步骤401、第一终端向服务器发送目标歌曲的连麦合唱请求。
在实施中,第一主播想和第二主播连麦合唱时,第一主播可以操作第一终端中的直播应用程序,选择想要演唱的目标歌曲,然后向第二主播对应的第二终端和服务器发送目标歌曲的连麦合唱请求。在第二终端接收到第一终端发送的目标歌曲的连麦合唱请求后,第二主播可以选择接受与第一主播进行连麦合唱目标歌曲,并同样可以向服务器发送目标歌曲的连麦合唱请求。
步骤402、第一终端接收服务器发送的目标歌曲的开始演唱命令。
服务器在接收第一终端和第二终端发送的目标歌曲的连麦合唱请求后,可 以向第一终端和第二终端同时发送目标歌曲的开始演唱命令。
步骤403、第一终端根据目标歌曲的分段信息,确定本地的至少一个演唱时间段。
第一终端和第二终端在接收到目标歌曲的开始演唱命令后,可以开始执行连麦合唱功能。目标歌曲的分段信息可以预先存储在第一终端,也可以由服务器发送至第一终端。在目标歌曲的分段信息中,可以记录有领唱方和接唱方各自需要演唱的歌词部分。其中,领唱方可以为发起连麦合唱的发起端,即第一主播,接唱方可以为接受连麦合唱的接受端,即第二主播。第一终端可以根据目标歌曲的分段信息中记录的领唱方和接唱方各自需要演唱的歌词部分,确定第一终端(即本地)需要演唱目标歌曲的演唱时间段,即开始连麦合唱后,需要第一主播在演唱时间段内演唱目标歌曲。其中,确定根据目标歌曲的分段信息确定本地的至少一个演唱时间段的处理,此处先不做详细介绍。
步骤404、第一终端开始播放目标歌曲的伴奏,并进入第一处理状态。
在实现连麦合唱的功能中,第一终端和第二终端可分别包括两个处理状态,分别是主播处于演唱时间段时的处理状态,即第一处理状态,以及主播处于非演唱时间段时的处理状态,即第二处理状态,其中,在第一终端处于第一处理状态时,第二终端处于第二处理状态。
在本申请中,以第一终端先对目标歌曲进行演唱为例,对方案进行详细说明,即在连麦合唱开始后,第一终端先进入第一处理状态。在第一处理状态下,在本地生成的第一直播多媒体数据中,添加采集第一直播多媒体数据时的伴奏播放时间点以及演唱标签,向服务器发送添加处理后的第一直播多媒体数据。
在本申请中为了便于区分第一终端采集的直播多媒体数据和第二终端采集的直播多媒体数据,可以将第一终端采集的直播多媒体数据称为第一直播多媒体数据,将第二终端采集的直播多媒体数据称为第二直播多媒体数据。
在实施中,在第一终端进入第一处理状态中,可以播放目标歌曲的伴奏音频,第一主播可以随着播放的伴奏音频演唱目标歌曲,第一终端可以采集第一主播演唱目标歌曲时的第一直播多媒体数据,如第一终端的可以通过摄像头拍摄第一主播演唱目标歌曲的视频数据,通过麦克风可以录制第一主播演唱的目标歌曲的音频数据。
第一终端在采集的第一主播演唱目标歌曲时的第一直播多媒体数据后,可 以在第一直播多媒体数据中添加当前播放的目标歌曲的伴奏音频的伴奏播放时间点以及演唱标签。例如可以在麦克风采集到第一主播演唱目标歌曲的音频帧时,可以获取当前的伴奏播放时间点,将对应的伴奏播放时间点添加到音频帧中,如添加到音频帧中的元数据,并可以将演唱标签添加到元数据中。之后可以将添加伴奏播放时间点和演唱标签的第一直播多媒体数据发送至服务器以及第二终端,如此服务器和第二终端在接收到第一直播多媒体数据后,可以根据第一直播多媒体数据中添加的演唱标签确定接收到的第一终端发送的第一直播多媒体数据为第一主播演唱目标歌曲的第一直播多媒体数据,且可以根据伴奏播放进度,确定第一主播对目标歌曲的演唱进度。其中,伴奏播放进度可以伴奏播放的时间点,演唱标签可以任一具有标识作用的字符,可由技术人员预先设定。
另外,在第一终端处于第一处理状态时,第二终端处于第二处理状态,其中,第二终端在第二处理状态中的处理如下:当接收到第一终端发送的携带有非演唱标签的第一直播多媒体数据,或者未添加演唱标签的第一直播多媒体数据时,在本地当前生成的第二直播多媒体数据中,添加延迟标签,向服务器和第一终端发送添加处理后的第二直播多媒体数据,当接收到第一终端发送的携带有伴奏播放进度以及演唱标签的第一直播多媒体数据时,在本地当前生成的第二直播多媒体数据中,添加接收到的伴奏播放进度以及非演唱标签,向服务器和第一终端发送添加处理后的第二直播多媒体数据。
在实施中,第二终端在接收到服务器发送的目标歌曲的开始演唱命令后,可以进入第二处理状态。在刚进入第二处理状态的一段时间内,由于网络延迟问题,第二终端接收的第一终端发送的第一直播多媒体数据可能并不是第一终端在第一处理状态时发送的直播多媒体数据。因此,可以通过确定第一终端发送的第一直播多媒体数据中是否携带有演唱标签,进而确定在第二终端在第二处理状态时,接收到的第一直播多媒体数据是否为第一主播在演唱时间段发送的直播多媒体数据。
如果确定第一终端发送的第一直播多媒体数据中,没有携带演唱标签时,则说明当前第一终端发送的第一直播多媒体数据并不是第一主播演唱目标歌曲的直播多媒体数据,所以此时第二主播根据播放的直播多媒体数据做出的反应和互动,不是在看到或听到第一主播演唱目标歌曲是产生的。因此可以在当前 第二终端采集的第二直播多媒体数据中添加延迟标签,并将添加延迟标签的第二直播多媒体数据发送至服务器和第一终端。这样,延迟标签可用于表示第二终端发送的第二主播的第二直播多媒体数据并不是在看到或听到第一主播演唱目标歌曲时采集的。另外在第二直播多媒体中添加延迟标签的处理,与上述在第一直播多媒体中添加演唱标签的处理类似,此处不再赘述。
如果确定第一终端发送的第一直播多媒体数据中,携带有演唱标签时,则说明当前第一终端发送的第一直播多媒体数据是第一主播演唱目标歌曲的直播多媒体数据,所以此时第二主播根据播放的直播多媒体数据做出的反应和互动,是在看到或听到第一主播演唱目标歌曲是产生的。此时,可以获取接收到的第一直播多媒体数据中添加的伴奏播放进度,将获取的伴奏播放时间点添加到第二终端采集的第二直播多媒体数据中,同时可以在第二直播多媒体数据中添加非演唱标签,然后可以将添加伴奏播放时间点以及非演唱标签的第二直播多媒体数据发送至第一终端和服务器。这样将第二直播多媒体数据中添加接收到的第一直播多媒体中添加的伴奏播放进度,可用于表示第二直播多媒体数据,是在第二主播观看到对应的第一主播在演唱到伴奏播放进度对应的目标歌曲时采集到的。其中,在第二直播多媒体中添加伴奏播放时间点和非演唱标签的处理,与上述在第一直播多媒体中添加伴奏播放时间点和演唱标签的处理类似,此处不再赘述。
另外,由于第二终端接收的第一终端在第一处理状态中发送的第一直播多媒体中包括了目标歌曲伴奏音频,因此第二终端可以在第二处理状态时,可以不对目标歌曲的伴奏进行播放。
步骤405、服务器接收第一终端发送的第一直播多媒体数据,以及第二终端发送的第二直播多媒体数据。
其中,在第一直播多媒体数据和第二多媒体数据中都携带有伴奏播放进度,其中,第一直播多媒体数据是第一终端在第一状态时,发送的直播多媒体数据,其中携带的伴奏播放进度可以为第一终端在采集第一直播多媒体数据时,伴奏的播放时间点。第二直播多媒体数据是第二终端在第二状态时,发送的直播多媒体数据,其中携带的伴奏播放进度为第二终端接收到第一直播多媒体数据时,在第一直播多媒体数据中携带的伴奏播放进度。
步骤406、服务器对第一直播多媒体数据和第二直播多媒体数据进行合成处 理,得到合成处理后的直播多媒体数据。
在实施中,服务器接收第一终端发送的第一直播多媒体数据和第二终端的发送的第二直播多媒体数据。如图5所示,接收的第一直播多媒体数据包括第一终端在第一处理状态时发送的携带演唱标签的第一直播多媒体数据,以及在第二处理状态时发送的携带延迟标签或者非演唱标签的第一直播多媒体数据;接收的第二直播多媒体数据包括第二终端在第一处理状态时发送的携带演唱标签的第二直播多媒体数据,以及在第二处理状态时发送的携带延迟标签或者非演唱标签的第二直播多媒体数据。
在服务器接收到第一直播多媒体数据和第二直播多媒体数据之后,可以分别缓存预设数据量的第一直播多媒体数据和第二直播多媒体数据。其中,在分别缓存第一直播多媒体数据和第二直播多媒体数据时,如果确定第一直播多媒体数据或第二直播多媒体数据中携带有延迟标签,则可以不对携带有延迟标签的直播多媒体数据进行缓存,即删除携带延迟标签的第一直播多媒体数据或第二直播多媒体数据。这样服务器分别缓存的第一直播多媒体数据和第二直播多媒体数据中均为在对应的演唱时间段内采集的直播多媒体数据,或者是在接收到对端发送的在演唱时间段内的直播多媒体数据时,采集的直播多媒体数据。
然后可以分别根据第一直播多媒体数据和第二直播多媒体数据携带的伴奏播放进度,对第一直播多媒体数据和第二直播多媒体数据进行合成处理,将合成处理后的直播多媒体数据发送至观看第一主播和第二主播对应的观众终端。如图6所示,图6为观众终端接收到的合成处理后的直播多媒体数据的示意图。
其中,基于第二直播多媒体数据携带的伴奏播放进度,以及第一直播多媒体数据携带的伴奏播放进度,对第一直播多媒体数据和第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据的处理可以如下:
步骤4061、将第一直播多媒体数据以及第二直播多媒体数据中,携带有相同伴奏播放进度的音频帧进行音频合成处理,得到音频合成处理后的音频帧。
在本申请的实施例中,可以将第一直播多媒体数据中的音频帧称为第一音频帧,将第二直播多媒体数据中的音频帧称为第二音频帧。
对于缓存的第一直播多媒体数据和第二直播多媒体数据,可以分别确定第一直播多媒体数据中的各个第一音频帧携带的伴奏播放时间点,以及第二音频帧携带的伴奏播放时间点,然后可以将具有相同伴奏播放时间点的第一音频帧 和第二音频帧进行音频合成处理,也就是将具有相同伴奏进度的第一音频帧和第二音频帧进行合成,得到音频合成处理后的音频帧,即得到第一主播和第二主播在连麦合唱时的直播音频。
步骤4062、基于进行音频合成处理的音频帧,对第一直播多媒体数据中包括的第一视频帧和第二直播多媒体数据中包括的第二视频帧,进行视频帧拼接处理,得到视频帧拼接处理后的视频帧。
在本申请的实施例中,可以将第一直播多媒体数据中的视频帧称为第一视频帧,将第二直播多媒体数据中的视频帧称为第二视频帧。
在实施中,可以根据进行音频合成处理的音频帧,将第一直播多媒体数据中的第一视频帧和第二直播多媒体数据中的第二视频帧,与进行音频合成处理的音频帧进行对齐处理。其中,对齐处理可以根据第一视频帧、第二视频帧、第一音频帧、第二音频帧中携带的采集时间点实现。例如,进行音频拼接处理的两个音频帧分别为第一直播多媒体数据中的第一音频帧A、第二直播多媒体数据中的第二音频帧B,则可以确定第一音频帧A中携带的采集时间点a,和第二音频帧B中携带的采集时间点b,然后可以在第一直播多媒体数据中确定在采集时间与采集时间点a相同或最接近的第一视频帧A,并在第二直播多媒体数据中确定在采集时间与采集时间点b相同或最接近的第二视频帧B,将第一视频帧A和第二视频帧B拼接为一个视频帧,在该视频帧中既包括第一视频帧A的画面,也包括第二视频帧B的画面。如此,可以得到进行视频对齐处理后的视频帧。
在得到音频合成处理后的音频帧,以及进行视频对齐处理后的视频帧后,可以将对应的音频帧和视频帧发送至观众终端。另外,在将对应的音频帧和视频帧发送至观众终端之前,还可以对音频帧和视频帧的采集时间点进行统一修改,使得音频合成处理后的音频帧和视频对齐处理后的视频帧中携带的采集时间点一致且连续,如此,观众终端在接收到合成处理后的直播多媒体数据后,可以根据对应的采集时间点对合成处理后的直播多媒体数据中的音频帧和视频帧进行播放。
另外,由于不同的第一终端和第二终端在计算精度上可能存在不同,或者第一终端或第二终端在进行直播时出现了意外卡顿等情况,导致服务器在同一段时间内接收到的第一终端和第二终端发送的直播多媒体数据对应的数据包数 量不同。因此在本申请中还提供了一种对直播多媒体数据数据包数量进行校正的方法,如下:
1)当确定第一终端进入第二处理状态时,确定接收第一终端在第一处理状态时发送的第一直播多媒体数据对应的第一数据包数目,并确定接收第二终端在第二处理状态时发送的第二直播多媒体数据对应的第二数据包数目;
在实施中,服务器可以在第一终端和第二终端切换到下一个处理状态时,分别确定第一终端和第二终端在上一个处理状态时,分别发送的数据包的数量。其中,服务器如果确定第一直播多媒体数据中携带有演唱标签时,可以确定第一终端处于第一处理状态中,如果确定第一直播多媒体数据中携带有非演唱标签或延迟标签时,可以确定第一终端处于第二处理状态中。当服务器确定接收的第一直播多媒体数据中携带的演唱标签切换为延迟标签或非演唱标签时,可以确定第一终端由第一处理状态进入第二处理状态,此时可以确定第一终端在第一处理状态时发送的数据包的数目(即第一数据包数目),同理可以确定第二终端在第二处理状态时发送的数据包的数目(即第二数据包数目)。
2)如果第一数据包数目大于第二数据包数目,则基于第一数据包数目与第二数据包数目的差值,对已接收的第二直播多媒体数据进行补包处理,如果第一数据包数目小于第二数据包数目,则基于差值对第二直播多媒体数据进行删包处理。
在实施中,如果确定第一数据包数目大于第二数据包数目,则可以确定第二终端发送的第二直播多媒体对应的数据包较少,则可以对第二终端发送的第二直播多媒体数据进行补包处理,例如在第二终端在第二处理状态中发送的最后一个数据包后面添加空包,其中添加空包的个数可以等于第一数据包数目与第二数据包数目的差值。
如果确定第一数据包数目小于第二数据包数目,则可以确定第二终端发送的第二直播多媒体对应的数据包较多,则可以对删除第二终端在切换处理状态之前发送的数据包,对应删除的数目可以等于第一数据包数目与第二数据包数目的差值。
步骤407、将合成处理后的直播多媒体数据发送至第一终端和第二终端对应的观众终端。
在实施中,可以将合成处理后的直播多媒体数据发送至观看第一主播和第 二主播进行连麦合唱的观众终端。在本申请实施例,将第一终端和第二终端在进行连麦合唱时,将进行连麦合唱的过程分为了两种处理状态,第一终端在第一处理状态时,可以向服务器发送携带伴奏播放进度的直播多媒体数据,第二终端在第二处理状态时,可以向服务器发送携带接收的第一终端发送的伴奏播放进度的直播多媒体数据,如此,服务器可以根据第一终端和第二终端分别发送的携带伴奏播放进度的直播多媒体数据,进行合成处理。这样发送至观众终端的直播多媒体数据是按照伴奏播放进度发送的,可以解决两个主播在连麦合唱时,在观众终端出现接唱不连贯的问题。
其中,在上述步骤401-407中,对第一终端处于第一处理状态时,第二终端处于第二处理状态时的处理进行了介绍,并对应的介绍了服务器的处理。在本申请中,还提供了一种由第一处理状态转为第二处理状态的方法:当伴奏播放进度达到当前的演唱时间段的演唱结束时间点时,切换为第二处理状态。
在实施中,第一终端在第一处理状态的过程中,当检测到伴奏播放时间点播放到当前的演唱时间段的演唱结束时间点,第一终端可以进入第二处理状态,执行第二处理状态对应的处理。
其中,第一终端在第二处理状态下,当接收到第二终端发送的携带有非演唱标签的第二直播多媒体数据时,可以在本地当前生成的第一直播多媒体数据中,添加延迟标签,向服务器发送添加处理后的第一直播多媒体数据,当接收到第二终端发送的携带有伴奏播放进度以及演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的伴奏播放进度以及非演唱标签,向服务器发送添加处理后的第一直播多媒体数据。
在实施中,当第一终端在进入第二处理状态时,可以停止对目标歌曲的伴奏音频的播放。由于网络延迟的原因,第一终端在进入第二处理状态的一段时间内,接收到的第二终端发送的第二直播多媒体数据,仍然是第二终端在第二处理状态时,发送的直播多媒体数据。所以第一终端在进入第二处理状态时,当接收到第二终端发送的携带有非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加延迟标签,向服务器和第二终端发送添加延迟标签的第一直播多媒体数据。其中,延迟标签用于表示第一终端发送的第一主播的第一直播多媒体数据并不是在看到或听到第二主播演唱目标歌曲时采集的。另外在第一直播多媒体中添加延迟标签的处理,与上述在第一直播 多媒体中添加演唱标签的处理相同,此处不再赘述。
如果确定第二终端发送的第二直播多媒体数据中,携带有演唱标签时,则说明当前第二终端发送的直播多媒体数据是第二主播演唱目标歌曲的直播多媒体数据,所以此时第一主播根据播放的直播多媒体数据做出的反应和互动,是在看到或听到第二主播演唱目标歌曲是产生的。此时,可以获取接收到的第二直播多媒体数据中添加的伴奏播放时间点,并将获取的伴奏播放时间点添加到第一终端采集的第一直播多媒体数据中,同时可以在第一直播多媒体数据中添加非演唱标签,然后可以将添加伴奏播放时间点以及非演唱标签的直播多媒体数据发送至第二终端和服务器。这样将第一直播多媒体数据中添加接收到的第一直播多媒体中添加的伴奏播放时间点,可用于表示第一直播多媒体数据,是在第一主播观看到对应的第二主播在演唱到伴奏播放时间点对应的目标歌曲时采集到的。其中,在第一直播多媒体中添加伴奏播放时间点和非演唱标签的处理,与上述在第一直播多媒体中添加伴奏播放时间点和演唱标签的处理类似,此处不再赘述。
另外,本申请提供了一种由第二处理状态转为第一处理状态的方法:当接收到的伴奏播放时间点为任一演唱时间段的演唱开始时间点时,由演唱开始时间点开始播放伴奏,切换到第一处理状态。
在实施中,当第二终端接收到的第一终端发送的第一直播多媒体数据中携带的时间点为,第二终端对应的任一演唱时间段的演唱开始时间点时,则第二终端可以进入第一处理状态。其中,在第一处理状态中,第二终端与第一终端的处理相同,可以在本地生成的直播多媒体数据中,添加录制第二直播多媒体数据时的伴奏播放时间点以及演唱标签,向服务器和第一终端发送添加处理后的第二直播多媒体数据。
这样在第一终端和第二终端在接唱目标歌曲的过程中,第一终端和第二终端不断的在第一处理状态和第二处理状态中切换,当第一终端在第一处理状态时,第二终端应该处于第二处理状态,当第二终端在第一处理状态时,第一终端应该处于第一处理状态。其中,上述步骤405-407为第一终端在第一处理状态时,服务器对第一终端发送的第一直播多媒体数据以及第二终端发送的第二直播多媒体数据进行处理的过程。而对于第二终端在第一处理状态时,服务器对第二终端发送的第二直播多媒体数据的处理,与上述步骤405-407中服务器对第 一直播多媒体数据的处理相同;服务器对第一终端发送的第一直播多媒体数据的处理,与上述步骤405-407中服务器对第二直播多媒体数据的处理相同,此处不再赘述。
在本申请实施例,将第一终端和第二终端在进行连麦合唱时,将进行连麦合唱的过程分为了两种处理状态,第一终端在第一处理状态时,可以向服务器发送携带伴奏播放进度的直播多媒体数据,第二终端在第二处理状态时,可以向服务器发送携带接收的第一终端发送的伴奏播放进度的直播多媒体数据,如此,服务器可以根据第一终端和第二终端分别发送的携带伴奏播放进度的直播多媒体数据,进行合成处理。这样发送至观众终端的直播多媒体数据是按照伴奏播放进度发送的,可以解决两个主播在连麦合唱时,在观众终端出现接唱不连贯的问题。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
图7是本申请实施例提供的一种确定演唱时间段的流程图。参见图7,该实施例包括:
步骤701、根据目标歌曲的分段信息,确定第一终端和第二终端分别在目标歌曲中分别对应的演唱段落。
在实施中,目标歌曲的分段信息中包括目标歌曲中每一句歌词对应的演唱者,例如记录第一句歌词是领唱者演唱,第二句歌词是接唱者演唱,第三句歌词是领唱者演唱。其中,领唱者可以是发起连麦合唱的一方,例如可以是第一终端对应的第一主播。如此,便可以根据目标歌曲的分段信息,确定第一终端需要演唱的歌词部分以及第二终端需要演唱的歌词部分。
步骤702、对于第一终端对应的任一演唱段落,如果任一演唱段落不为目标歌曲中对应的最后一个演唱段落,则基于任一演唱段落在目标歌曲中对应的目标播放结束时间,以及任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定任一演唱段落对应的演唱结束时间点。
其中,第一终端对应的任一演唱段落,可以是第一终端需要演唱的任一歌词部分。例如,第一终端对应的演唱段落可以时第一至第三句歌词,第七至第九句歌词,第十三至第十五句歌词。
目标歌曲可对应有歌词文件,在歌词文件中记录有每句歌词在目标歌曲中的播放开始时间点和播放结束时间点。如此,便可以根据目标歌曲对应的歌词文件,确定目标歌曲各个演唱段落在目标歌曲中的播放开始时间点和播放结束时间点。
根据演唱段落在目标歌曲中对应的目标播放结束时间,以及任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定任一演唱段落对应的演唱结束时间点的处理如下:
确定目标播放结束时间与目标开始播放时间点之间的时间间隔。如果时间间隔大于预设的时间间隔阈值,则将目标播放结束时间与目标开始播放时间点之间的中间时间点,确定为任一演唱段落对应的演唱结束时间点。
其中,预设的时间间隔阈值可以由技术人员预先设置,此处不进行限定。例如可以为3秒、4秒、5秒等。
在目标播放结束时间与目标开始播放时间点之间的时间间隔时,可以说明两个演唱段落中,前一个演唱段落最后一句歌词与后一个演唱段落第一句歌词之间的伴奏时间较长,两个主播可以平分该段伴奏时长。因此可以将将目标播放结束时间与目标开始播放时间点之间的中间时间点,确定为任一演唱段落对应的演唱结束时间点。
如果时间间隔小于等于预设的时间间隔阈值,则基于预设的划分比例,在目标播放结束时间与目标开始播放时间点之间,确定目标时间点,其中,目标时间点与目标播放结束时间点之间的第一时间间隔与目标时间点与目标开始结束时间点之间的第二时间间隔的比值,满足划分比例,且第一时间间隔大于第二时间间隔。
在时间间隔小于等于预设的时间间隔阈值,可以说明两个演唱段落中,前一个演唱段落最后一句歌词与后一个演唱段落第一句歌词之间的伴奏时间较短。由于主播在演唱目标歌曲对应歌词时,可能存在拉长音的情况,导致主播对一句歌词的演唱时长,可能超出了歌词文本中设定的对应歌词的播放时长,因此,在两个演唱段落对应的时间间隔较短时,可以多留出一些时间给前一个演唱段落的演唱者,因此可以基于预设的划分比例,在目标播放结束时间与目标开始播放时间点之间,确定目标时间点。其中,根据该预设的比例确定的目标时间点与目标播放结束时间点之间的第一时间间隔与目标时间点与目标开始 结束时间点之间的第二时间间隔的比值,满足划分比例,且第一时间间隔大于第二时间间隔。
步骤703、如果任一演唱段落不为目标歌曲中对应的第一个演唱段落,则将任一演唱段落相邻的上一个演唱段落的演唱结束时间点,确定为任一演唱段落的演唱开始时间点。
在实施中,在确定第一终端需要演唱的歌词部分以及第二终端需要演唱的歌词部分之后,可以确定每个演唱段落对应的演唱开始时间点和演唱结束时间点。其中,对于第一终端的对应各个演唱段落,如果第一终端的对应演唱段落为目标歌曲中的第一个演唱段落,则可以将目标歌曲的开始时间点,即零分零秒确定为该演唱段落的演唱开始时间。而如果第一终端的对应演唱段落不是目标歌曲中的第一个演唱段落,则演唱段落对应的开始时间点,可以是该演唱段落对应的上一个演唱段落的结束时间点。即在目标歌曲中相邻的两个演唱段落中,前一个演唱段落的演唱结束时间点,即为后一个演唱段落的演唱开始时间点。
步骤704、基于任一演唱段落对应的演唱开始时间点和演唱结束时间点,确定任一演唱段落对应的演唱时间段。
在实施中,在得到每个演唱段落对应的演唱开始时间点和演唱结束时间点后,每个演唱段落对应的演唱开始时间点和演唱结束时间点对应的时间段即为每个演唱段落对应的演唱时间段。
本申请实施例提供了一种演唱时间段的方法,可以根据目标歌曲的分段信息,确定第一终端和第二终端分别在目标歌曲中分别对应的演唱时间段,能够为分别为第一终端和第二终端对应的每个演唱部分分配合理的演唱时长。
在本申请实施例,将第一终端和第二终端在进行连麦合唱时,将进行连麦合唱的过程分为了两种处理状态,第一终端在第一处理状态时,可以向服务器发送携带伴奏播放进度的直播多媒体数据,第二终端在第二处理状态时,可以向服务器发送携带接收的第一终端发送的伴奏播放进度的直播多媒体数据,如此,服务器可以根据第一终端和第二终端分别发送的携带伴奏播放进度的直播多媒体数据,进行合成处理。这样发送至观众终端的直播多媒体数据是按照伴奏播放进度发送的,可以解决两个主播在连麦合唱时,在观众终端出现接唱不连贯的问题。
图8是本申请实施例提供的一种进行连麦合唱的装置,该装置可以是上述实施例中的第一终端或是第二终端,该装置包括:
发送模块810,用于向服务器发送目标歌曲的连麦合唱请求;
确定模块820,用于接收所述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;
处理模块830,用于开始播放所述目标歌曲的伴奏并进入第一处理状态,在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
切换模块840,用于当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态;
处理模块830,用于当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据。
可选的,所述切换模块840,还用于:
当接收到的第二直播多媒体数据中携带的第二终端伴奏播放进度为任一演唱时间段的演唱开始时间点时,由所述演唱开始时间点开始播放所述伴奏,并切换到所述第一处理状态。
可选的,所述确定模块820,用于:
根据所述目标歌曲的分段信息,确定所述第一终端和第二终端分别在目标歌曲中分别对应的演唱段落;
对于所述第一终端对应的任一演唱段落,如果所述任一演唱段落非所述目标歌曲中对应的最后一个演唱段落,则基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点;
如果所述任一演唱段落非所述目标歌曲中对应的第一个演唱段落,则将所述任一演唱段落相邻的上一个演唱段落的演唱结束时间点,确定为所述任一演唱段落的演唱开始时间点;
基于所述任一演唱段落对应的演唱开始时间点和演唱结束时间点,确定所述任一演唱段落对应的演唱时间段。
可选的,所述确定模块820,用于:
确定所述目标播放结束时间与所述目标开始播放时间点之间的时间间隔;
如果所述时间间隔大于预设的时间间隔阈值,则将所述目标播放结束时间与所述目标开始播放时间点之间的中间时间点,确定为所述任一演唱段落对应的演唱结束时间点;
如果所述时间间隔小于等于预设的时间间隔阈值,则基于预设的划分比例,在所述目标播放结束时间与所述目标开始播放时间点之间,确定目标时间点,其中,所述目标时间点与所述目标播放结束时间点之间的第一时间间隔与所述目标时间点与所述目标开始结束时间点之间的第二时间间隔的比值,满足所述划分比例,且所述第一时间间隔大于所述第二时间间隔。
可选的,所述处理模块830还用于:
停止播放所述目标歌曲的伴奏。
图9是本申请实施例提供的一种进行连麦合唱的装置,该装置可以是上述实施例中的服务器,该装置包括:
接收模块910,用于接收第一终端以及第二终端发送的目标歌曲的连麦合唱请求;
发送模块920,用于向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;
所述接收模块910,用于接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;
处理模块930,用于当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴 奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;
所述发送模块920,用于将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端对应的观众终端。
可选的,所述处理模块930,用于:
将所述第一直播多媒体数据以及所述第二直播多媒体数据中,携带有相同伴奏播放进度的音频帧进行音频合成处理,得到音频合成处理后的音频帧;
基于进行音频合成处理的音频帧,对所述第一直播多媒体数据中的视频帧和所述第二直播多媒体数据中的视频帧,进行视频对齐处理,得到视频对齐处理后的视频数据。
可选的,所述处理模块930,还用于:
当确定所述第一终端进入所述第二处理状态时,确定接收所述第一终端在所述第一处理状态时发送的所述第一直播多媒体数据对应的第一数据包数目,并确定接收所述第二终端在所述第二处理状态时发送的所述第二直播多媒体数据对应的第二数据包数目;
如果所述第一数据包数目大于所述第二数据包数目,则基于所述第一数据包数目与所述第二数据包数目的差值,对已接收的第二直播多媒体数据进行补包处理,如果所述第一数据包数目小于所述第二数据包数目,则基于所述差值对所述第二直播多媒体数据进行删包处理。
需要说明的是:上述实施例提供的进行连麦合唱的装置在进行连麦合唱时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的进行连麦合唱的装置与进行连麦合唱的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在本申请中还提供了一种进行连麦合唱的系统,所述系统包括第一终端、第二终端和服务器,其中:
所述第一终端,用于向所述服务器发送目标歌曲的连麦合唱请求;接收所 述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;开始播放所述目标歌曲的伴奏并进入第一处理状态,在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态,当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
所述服务器,用于接收所述第一终端以及第二终端发送的目标歌曲的连麦合唱请求;向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端对应的观众终端。
图10示出了本申请一个示例性实施例提供的计算机设备1000的结构框图。该计算机设备1000可以是上述实施例中的第一终端或第二终端,该计算机设备1000可以是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(moving picture experts group audio layer III,动态影像专家压缩标准音频层面3)、MP4(moving picture experts group audio layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。计算机设备1000还可能被称为用户设备、 便携式终端、膝上型终端、台式终端等其他名称。
通常,计算机设备1000包括有:处理器1001和存储器1002。
处理器1001可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1001可以采用DSP(digital signal processing,数字信号处理)、FPGA(field-programmable gate array,现场可编程门阵列)、PLA(programmable logic array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1001也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(central processing unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1001可以集成有GPU(graphics processing unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1001还可以包括AI(artificial intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1002可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1002还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1002中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器1001所执行以实现本申请中方法实施例提供的进行连麦合唱的方法。
在一些实施例中,计算机设备1000还可选包括有:外围设备接口1003和至少一个外围设备。处理器1001、存储器1002和外围设备接口1003之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1003相连。具体地,外围设备包括:射频电路1004、显示屏1005、摄像头组件1006、音频电路1007、定位组件1008和电源1009中的至少一种。
外围设备接口1003可被用于将I/O(input/output,输入/输出)相关的至少一个外围设备连接到处理器1001和存储器1002。在一些实施例中,处理器1001、存储器1002和外围设备接口1003被集成在同一芯片或电路板上;在一些其他实施例中,处理器1001、存储器1002和外围设备接口1003中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路1004用于接收和发射RF(radio frequency,射频)信号,也称电 磁信号。射频电路1004通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1004将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路1004包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1004可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(wireless fidelity,无线保真)网络。在一些实施例中,射频电路1004还可以包括NFC(near field communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏1005用于显示UI(user interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1005是触摸显示屏时,显示屏1005还具有采集在显示屏1005的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1001进行处理。此时,显示屏1005还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1005可以为一个,设置在计算机设备1000的前面板;在另一些实施例中,显示屏1005可以为至少两个,分别设置在计算机设备1000的不同表面或呈折叠设计;在另一些实施例中,显示屏1005可以是柔性显示屏,设置在计算机设备1000的弯曲表面上或折叠面上。甚至,显示屏1005还可以设置成非矩形的不规则图形,也即异形屏。显示屏1005可以采用LCD(liquid crystal display,液晶显示屏)、OLED(organic light-emitting diode,有机发光二极管)等材质制备。
摄像头组件1006用于采集图像或视频。可选地,摄像头组件1006包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(virtual reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件1006还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路1007可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器1001进行处理,或者输入至射频电路1004以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在计算机设备1000的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1001或射频电路1004的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路1007还可以包括耳机插孔。
定位组件1008用于定位计算机设备1000的当前地理位置,以实现导航或LBS(location based service,基于位置的服务)。定位组件1008可以是基于美国的GPS(global positioning system,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源1009用于为计算机设备1000中的各个组件进行供电。电源1009可以是交流电、直流电、一次性电池或可充电电池。当电源1009包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,计算机设备1000还包括有一个或多个传感器1010。该一个或多个传感器1010包括但不限于:加速度传感器1011、陀螺仪传感器1012、压力传感器1013、指纹传感器1014、光学传感器1015以及接近传感器1016。
加速度传感器1011可以检测以计算机设备1000建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器1011可以用于检测重力加速度在三个坐标轴上的分量。处理器1001可以根据加速度传感器1011采集的重力加速度信号,控制显示屏1005以横向视图或纵向视图进行用户界面的显示。加速度传感器1011还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器1012可以检测计算机设备1000的机体方向及转动角度,陀螺仪传感器1012可以与加速度传感器1011协同采集用户对计算机设备1000的3D动作。处理器1001根据陀螺仪传感器1012采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控 制以及惯性导航。
压力传感器1013可以设置在计算机设备1000的侧边框和/或显示屏1005的下层。当压力传感器1013设置在计算机设备1000的侧边框时,可以检测用户对计算机设备1000的握持信号,由处理器1001根据压力传感器1013采集的握持信号进行左右手识别或快捷操作。当压力传感器1013设置在显示屏1005的下层时,由处理器1001根据用户对显示屏1005的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器1014用于采集用户的指纹,由处理器1001根据指纹传感器1014采集到的指纹识别用户的身份,或者,由指纹传感器1014根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器1001授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1014可以被设置在计算机设备1000的正面、背面或侧面。当计算机设备1000上设置有物理按键或厂商Logo时,指纹传感器1014可以与物理按键或厂商Logo集成在一起。
光学传感器1015用于采集环境光强度。在一个实施例中,处理器1001可以根据光学传感器1015采集的环境光强度,控制显示屏1005的显示亮度。具体地,当环境光强度较高时,调高显示屏1005的显示亮度;当环境光强度较低时,调低显示屏1005的显示亮度。在另一个实施例中,处理器1001还可以根据光学传感器1015采集的环境光强度,动态调整摄像头组件1006的拍摄参数。
接近传感器1016,也称距离传感器,通常设置在计算机设备1000的前面板。接近传感器1016用于采集用户与计算机设备1000的正面之间的距离。在一个实施例中,当接近传感器1016检测到用户与计算机设备1000的正面之间的距离逐渐变小时,由处理器1001控制显示屏1005从亮屏状态切换为息屏状态;当接近传感器1016检测到用户与计算机设备1000的正面之间的距离逐渐变大时,由处理器1001控制显示屏1005从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图10中示出的结构并不构成对计算机设备1000的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图11是本申请实施例提供的一种服务器的结构示意图,该服务器1100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,cpu)1101和一个或一个以上的存储器1102,其中,所述存储器1102中存储有至少一条指令,所述至少一条指令由所述处理器1101加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成上述实施例中进行连麦合唱的方法。该计算机可读存储介质可以是非暂态的。例如,所述计算机可读存储介质可以是ROM(read-only memory,只读存储器)、RAM(random access memory,随机存取存储器)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (12)

  1. 一种进行连麦合唱的方法,其特征在于,所述方法应用于第一终端,所述方法包括:
    向服务器发送目标歌曲的连麦合唱请求;
    接收所述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;
    开始播放所述目标歌曲的伴奏并进入第一处理状态,所述第一处理状态中的处理包括:在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
    当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态,所述第二处理状态中的处理包括:当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当接收到的第二直播多媒体数据中携带的第二终端伴奏播放进度为任一演唱时间段的演唱开始时间点时,由所述演唱开始时间点开始播放所述伴奏,并切换到所述第一处理状态。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段,包括:
    根据所述目标歌曲的分段信息,确定所述第一终端和第二终端分别在目标歌曲中分别对应的演唱段落;
    对于所述第一终端对应的任一演唱段落,如果所述任一演唱段落非所述目标歌曲中对应的最后一个演唱段落,则基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的 目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点;
    如果所述任一演唱段落非所述目标歌曲中对应的第一个演唱段落,则将所述任一演唱段落相邻的上一个演唱段落的演唱结束时间点,确定为所述任一演唱段落的演唱开始时间点;
    基于所述任一演唱段落对应的演唱开始时间点和演唱结束时间点,确定所述任一演唱段落对应的演唱时间段。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述任一演唱段落在所述目标歌曲中对应的目标播放结束时间,以及所述任一演唱段落相邻的下一个演唱段落的目标播放开始时间点,确定所述任一演唱段落对应的演唱结束时间点,包括:
    确定所述目标播放结束时间与所述目标开始播放时间点之间的时间间隔;
    如果所述时间间隔大于预设的时间间隔阈值,则将所述目标播放结束时间与所述目标开始播放时间点之间的中间时间点,确定为所述任一演唱段落对应的演唱结束时间点;
    如果所述时间间隔小于等于预设的时间间隔阈值,则基于预设的划分比例,在所述目标播放结束时间与所述目标开始播放时间点之间,确定目标时间点,其中,所述目标时间点与所述目标播放结束时间点之间的第一时间间隔与所述目标时间点与所述目标开始结束时间点之间的第二时间间隔的比值,满足所述划分比例,且所述第一时间间隔大于所述第二时间间隔。
  5. 根据权利要求1所述的方法,其特征在于,所述当伴奏播放时间点达到当前的演唱时间段的演唱结束时间点时,切换为第二处理状态之后,所述方法还包括:
    停止播放所述目标歌曲的伴奏。
  6. 一种进行连麦合唱的方法,其特征在于,所述方法应用于服务器,所述方法包括:
    接收第一终端以及第二终端发送的目标歌曲的连麦合唱请求;
    向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;
    接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体 数据中携带有伴奏播放进度;
    当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;
    将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据,包括:
    将所述第一直播多媒体数据以及所述第二直播多媒体数据中,携带有相同伴奏播放进度的音频帧进行音频合成处理,得到音频合成处理后的音频帧;
    基于进行音频合成处理的音频帧,对所述第一直播多媒体数据中的视频帧和所述第二直播多媒体数据中的视频帧,进行视频对齐处理,得到视频对齐处理后的视频数据。
  8. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    当确定所述第一终端进入所述第二处理状态时,确定接收所述第一终端在所述第一处理状态时发送的所述第一直播多媒体数据对应的第一数据包数目,并确定接收所述第二终端在所述第二处理状态时发送的所述第二直播多媒体数据对应的第二数据包数目;
    如果所述第一数据包数目大于所述第二数据包数目,则基于所述第一数据包数目与所述第二数据包数目的差值,对已接收的第二直播多媒体数据进行补包处理,如果所述第一数据包数目小于所述第二数据包数目,则基于所述差值对所述第二直播多媒体数据进行删包处理。
  9. 一种进行连麦合唱的系统,其特征在于,所述系统包括第一终端、第二终端和服务器,其中:
    所述第一终端,用于向所述服务器发送目标歌曲的连麦合唱请求;接收所 述服务器发送的所述目标歌曲的开始演唱命令,根据所述目标歌曲的分段信息,确定本地的至少一个演唱时间段;开始播放所述目标歌曲的伴奏并进入第一处理状态,所述第一处理状态中的处理包括:在本地生成的第一直播多媒体数据中,添加录制所述第一直播多媒体数据时的伴奏播放进度以及演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;当伴奏播放进度达到当前的演唱时间段结束时间点时,切换为第二处理状态,所述第二处理状态中的处理包括:当接收到所述服务器发送的携带有所述第二终端非演唱标签的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中添加延迟标签,并向所述服务器发送添加处理后的第一直播多媒体数据;当接收到所述服务器发送的携带有所述第二终端演唱标签及伴奏播放进度的第二直播多媒体数据时,在本地当前生成的第一直播多媒体数据中,添加接收到的第二终端伴奏播放进度以及非演唱标签,向所述服务器发送添加处理后的第一直播多媒体数据;
    所述服务器,用于接收所述第一终端以及第二终端发送的目标歌曲的连麦合唱请求;向所述第一终端以及第二终端发送所述目标歌曲的开始演唱命令;接收所述第一终端在第一处理状态时发送的第一直播多媒体数据,以及第二终端在第二处理状态时发送的第二直播多媒体数据,其中,第一直播多媒体数据中携带有伴奏播放进度;当接收到的所述第二直播多媒体数据中携带有延迟标签时,删除所述第二直播多媒体数据;当接收到的所述第二直播多媒体数据中携带有非演唱标签以及伴奏播放进度时,基于所述第二直播多媒体数据携带的伴奏播放进度,以及所述第一直播多媒体数据携带的伴奏播放进度,对所述第一直播多媒体数据和所述第二直播多媒体数据进行合成处理,得到合成处理后的直播多媒体数据;将所述合成处理后的直播多媒体数据发送至所述第一终端和所述第二终端对应的观众终端。
  10. 一种终端,其特征在于,所述第一终端包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如权利要求1至权利要求5任一项所述的进行连麦合唱的方法所执行的操作。
  11. 一种服务器,其特征在于,所述服务器包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现 如权利要求6至权利要求8任一项所述的进行连麦合唱的方法所执行的操作。
  12. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至权利要求8任一项所述的进行连麦合唱的方法所执行的操作。
PCT/CN2022/101609 2021-08-06 2022-06-27 进行连麦合唱的方法、系统、设备及存储介质 WO2023011050A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110902528.8A CN113596516B (zh) 2021-08-06 2021-08-06 进行连麦合唱的方法、系统、设备及存储介质
CN202110902528.8 2021-08-06

Publications (1)

Publication Number Publication Date
WO2023011050A1 true WO2023011050A1 (zh) 2023-02-09

Family

ID=78255888

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/101609 WO2023011050A1 (zh) 2021-08-06 2022-06-27 进行连麦合唱的方法、系统、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113596516B (zh)
WO (1) WO2023011050A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596516B (zh) * 2021-08-06 2023-02-28 腾讯音乐娱乐科技(深圳)有限公司 进行连麦合唱的方法、系统、设备及存储介质
CN114125480B (zh) * 2021-11-17 2024-07-26 广州方硅信息技术有限公司 直播合唱互动方法、系统、装置及计算机设备
CN115942066B (zh) * 2022-12-06 2024-09-03 腾讯音乐娱乐科技(深圳)有限公司 一种音频直播方法、电子设备及计算机可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120265859A1 (en) * 2011-04-14 2012-10-18 Audish Ltd. Synchronized Video System
CN108769772A (zh) * 2018-05-28 2018-11-06 广州虎牙信息科技有限公司 直播间显示方法、装置、设备及存储介质
CN109413469A (zh) * 2018-08-31 2019-03-01 北京潘达互娱科技有限公司 一种直播连麦延迟控制方法、装置、电子设备及存储介质
CN111524494A (zh) * 2020-04-27 2020-08-11 腾讯音乐娱乐科技(深圳)有限公司 一种异地实时合唱方法及装置、存储介质
CN112040267A (zh) * 2020-09-10 2020-12-04 广州繁星互娱信息科技有限公司 合唱视频生成方法、合唱方法、装置、设备及存储介质
CN112489611A (zh) * 2020-11-27 2021-03-12 腾讯音乐娱乐科技(深圳)有限公司 线上歌房实现方法及电子设备和计算机可读存储介质
WO2021050902A1 (en) * 2019-09-11 2021-03-18 John Nader System and method for distributed musician synchronized performances
CN113596516A (zh) * 2021-08-06 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 进行连麦合唱的方法、系统、设备及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9131016B2 (en) * 2007-09-11 2015-09-08 Alan Jay Glueckman Method and apparatus for virtual auditorium usable for a conference call or remote live presentation with audience response thereto
EP2088518A1 (en) * 2007-12-17 2009-08-12 Sony Corporation Method for music structure analysis
WO2011112640A2 (en) * 2010-03-08 2011-09-15 Vumanity Media Llc Generation of composited video programming
CN106572358B (zh) * 2016-11-11 2022-03-08 青岛海信宽带多媒体技术有限公司 一种直播时移方法及客户端
CN110491358B (zh) * 2019-08-15 2023-06-27 广州酷狗计算机科技有限公司 进行音频录制的方法、装置、设备、系统及存储介质
CN112533037B (zh) * 2019-09-19 2022-02-11 聚好看科技股份有限公司 连麦合唱作品的生成方法和显示设备
CN111028818B (zh) * 2019-11-14 2022-11-22 北京达佳互联信息技术有限公司 合唱方法、装置、电子设备和存储介质
CN111261133A (zh) * 2020-01-15 2020-06-09 腾讯科技(深圳)有限公司 演唱处理方法、装置、电子设备及存储介质
CN111726670A (zh) * 2020-06-30 2020-09-29 广州繁星互娱信息科技有限公司 信息交互方法、装置、终端、服务器及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120265859A1 (en) * 2011-04-14 2012-10-18 Audish Ltd. Synchronized Video System
CN108769772A (zh) * 2018-05-28 2018-11-06 广州虎牙信息科技有限公司 直播间显示方法、装置、设备及存储介质
CN109413469A (zh) * 2018-08-31 2019-03-01 北京潘达互娱科技有限公司 一种直播连麦延迟控制方法、装置、电子设备及存储介质
WO2021050902A1 (en) * 2019-09-11 2021-03-18 John Nader System and method for distributed musician synchronized performances
CN111524494A (zh) * 2020-04-27 2020-08-11 腾讯音乐娱乐科技(深圳)有限公司 一种异地实时合唱方法及装置、存储介质
CN112040267A (zh) * 2020-09-10 2020-12-04 广州繁星互娱信息科技有限公司 合唱视频生成方法、合唱方法、装置、设备及存储介质
CN112489611A (zh) * 2020-11-27 2021-03-12 腾讯音乐娱乐科技(深圳)有限公司 线上歌房实现方法及电子设备和计算机可读存储介质
CN113596516A (zh) * 2021-08-06 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 进行连麦合唱的方法、系统、设备及存储介质

Also Published As

Publication number Publication date
CN113596516A (zh) 2021-11-02
CN113596516B (zh) 2023-02-28

Similar Documents

Publication Publication Date Title
CN110336960B (zh) 视频合成的方法、装置、终端及存储介质
CN109033335B (zh) 音频录制方法、装置、终端及存储介质
CN111464830B (zh) 图像显示的方法、装置、系统、设备及存储介质
CN110324689B (zh) 音视频同步播放的方法、装置、终端及存储介质
WO2023011050A1 (zh) 进行连麦合唱的方法、系统、设备及存储介质
WO2022227581A1 (zh) 资源展示方法及计算机设备
CN110290392B (zh) 直播信息显示方法、装置、设备及存储介质
CN111402844B (zh) 歌曲合唱的方法、装置及系统
CN110248236B (zh) 视频播放方法、装置、终端及存储介质
CN110213608A (zh) 显示虚拟礼物的方法、装置、设备及可读存储介质
CN112738607B (zh) 播放方法、装置、设备及存储介质
CN111327928A (zh) 歌曲播放方法、装置及系统、计算机存储介质
CN110996167A (zh) 在视频中添加字幕的方法及装置
CN107896337B (zh) 信息推广方法、装置及存储介质
CN109587549A (zh) 视频录制方法、装置、终端及存储介质
CN111726670A (zh) 信息交互方法、装置、终端、服务器及存储介质
CN108831425A (zh) 混音方法、装置及存储介质
CN108831513A (zh) 录制音频数据的方法、终端、服务器和系统
CN109743461B (zh) 音频数据处理方法、装置、终端及存储介质
CN110958464A (zh) 直播数据处理方法、装置、服务器、终端及存储介质
CN114245218A (zh) 音视频播放方法、装置、计算机设备及存储介质
CN110808021A (zh) 音频播放的方法、装置、终端及存储介质
CN111092991B (zh) 歌词显示方法及装置、计算机存储介质
CN110234036A (zh) 一种播放多媒体文件的方法、装置及系统
CN112616082B (zh) 视频预览方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22851766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/05/2024)

122 Ep: pct application non-entry in european phase

Ref document number: 22851766

Country of ref document: EP

Kind code of ref document: A1