WO2021164585A1 - 直播数据的编码方法及电子设备 - Google Patents

直播数据的编码方法及电子设备 Download PDF

Info

Publication number
WO2021164585A1
WO2021164585A1 PCT/CN2021/075612 CN2021075612W WO2021164585A1 WO 2021164585 A1 WO2021164585 A1 WO 2021164585A1 CN 2021075612 W CN2021075612 W CN 2021075612W WO 2021164585 A1 WO2021164585 A1 WO 2021164585A1
Authority
WO
WIPO (PCT)
Prior art keywords
data frame
state information
encoding
initial state
code rate
Prior art date
Application number
PCT/CN2021/075612
Other languages
English (en)
French (fr)
Inventor
邢文浩
张晨
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Priority to EP21757025.8A priority Critical patent/EP3993430A4/en
Publication of WO2021164585A1 publication Critical patent/WO2021164585A1/zh
Priority to US17/582,778 priority patent/US11908481B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/70Media network packetisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6582Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number

Definitions

  • the present disclosure relates to the field of live broadcast continuous microphone technology, and in particular to an encoding method and electronic equipment of live broadcast data.
  • Live streaming is now a very common commercial entertainment method, and more and more people are beginning to use mobile phones or computers to connect to the microphone during the live broadcast.
  • the anchor end connects with two or more people (guests) to realize multi-end live broadcast.
  • the live broadcast terminal and the guest terminal In the process of connecting the microphone, the live broadcast terminal and the guest terminal often need to perform signal flow. For example, the guest terminal encodes the recorded live data and transmits it to the host terminal, and the host terminal decodes the encoded live data before playing it. In the signal flow process, packet loss is inevitable. When live broadcast data packet loss occurs, the live broadcast data received by the host will be severely damaged, so packet loss recovery in the process of connecting the microphone is a necessary technology.
  • the present disclosure provides a method for encoding live data and electronic equipment.
  • the technical solutions of the present disclosure are as follows:
  • a method for encoding live data which is applied to an encoding end, and includes: acquiring initial state information for a current data frame; the current data frame is a data frame in the live data; The initial state information is backed up to obtain backup state information; the current data frame is encoded according to the first code rate and the initial state information to obtain a first encoded data frame; wherein, after the encoding is completed, the The initial state information is updated; the updated initial state information is reset through the backup state information to obtain reset state information, and the reset state information is consistent with the initial state information before the update; according to the second code Rate and the reset state information to encode the current data frame to obtain a second coded data frame; the second code rate is different from the first code rate; according to the first coded data frame and the second coded data Frame to obtain the first target data frame corresponding to the current data frame.
  • the current data frame includes an audio frame; the obtaining of the initial state information for the current data frame includes: obtaining the sampling rate, the number of channels, the bandwidth, and the bit rate of the current data frame. At least one of speech activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • the method further includes: obtaining a second target data frame corresponding to an adjacent data frame; the adjacent data frame is a data frame adjacent to the current data frame in the live data; and obtaining a long-term prediction Parameters; the long-term prediction parameter is less than the parameter threshold; according to the long-term prediction parameter, the first target data frame and the second target data frame are sent to the receiving end, so that the receiving end according to the The long-term prediction parameter decodes the first target data frame and the second target data frame.
  • the obtaining the long-term prediction parameter includes: obtaining a packet loss rate for a set historical time period; in response to the packet loss rate being higher than a preset packet loss rate threshold, obtaining the long-term prediction parameter .
  • the obtaining the first target data frame corresponding to the current data frame according to the first encoded data frame and the second encoded data frame includes:
  • the first coded data frame and the second coded data frame are integrated, and the result of the integration is used as the first target data frame.
  • an encoding device for live data which is applied to an encoding end, and includes: an initial information acquisition unit configured to acquire initial state information for a current data frame; the current data frame is The data frame in the live data; the backup information acquisition unit is configured to back up the initial state information to obtain the backup state information; the first data frame determination unit is configured to be based on the first bit rate and the initial state information The current data frame is encoded to obtain a first encoded data frame; wherein the initial state information of the encoding end is updated after the encoding ends; the state information determining unit is configured to reset and update through the backup state information After the initial state information, reset state information is obtained, and the reset state information is consistent with the initial state information before the update; the data frame encoding unit is configured to be based on the second bit rate and the reset state Information encodes the current data frame to obtain a second encoded data frame; the second code rate is different from the first code rate; the second data frame determining unit is configured
  • the current data frame includes an audio frame; the initial information acquisition unit is further configured to acquire the sampling rate, the number of channels, the bandwidth, the bit rate, and the voice activity detection for the current data frame. At least one of information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • the device for encoding live data further includes: a data frame obtaining unit configured to obtain a second target data frame corresponding to an adjacent data frame; the adjacent data frame is the live data A data frame adjacent to the current data frame; a prediction parameter obtaining unit configured to obtain a long-term prediction parameter; the long-term prediction parameter is less than a parameter threshold; a data frame sending unit configured to be based on the long-term prediction Parameters, the first target data frame and the second target data frame are sent to the receiving end, so that the receiving end can compare the first target data frame and the second target data frame to the second target data frame according to the long-term prediction parameter.
  • the target data frame is decoded.
  • the predictive parameter obtaining unit includes: a packet loss rate obtaining subunit configured to obtain a packet loss rate for a set historical time period; and a predictive parameter obtaining subunit configured to respond to the packet loss rate.
  • the packet rate is higher than the preset packet loss rate threshold, and the long-term prediction parameter is acquired.
  • the second data frame determining unit is configured to integrate the first encoded data frame and the second encoded data frame, and use the result of the integration as the first target data frame .
  • a live data streaming system including: an encoding end and a decoding end; the encoding end is configured to encode a current data frame according to reset state information and a compression code rate , Obtain a compressed encoded data frame; acquire a fidelity encoded data frame, combine the compressed encoded data frame and the fidelity encoded data frame into a data composite packet, and send the data composite packet to the decoding end; the current data
  • the frame is a data frame in the live data
  • the reset state information is obtained by resetting the backup state information
  • the backup state information is obtained by backing up the initial state information
  • the initial state information is used in conjunction with the protection
  • the true bit rate encodes the current data frame, and the fidelity encoded data frame is obtained by encoding according to the fidelity bit rate and the initial state information of the historical data frame;
  • the historical data frame is the current data
  • the data frame corresponding to the historical moment of the frame; the decoding end is configured to decode the compressed encoded data frame and the
  • the decoding end is further configured to execute after obtaining the corresponding live data according to the compressed data frame and the first fidelity data frame: in the fidelity bit rate state, output the The first fidelity data frame; switch to the compressed code rate state, and output the compressed data frame.
  • an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the instructions to achieve the above The encoding method of the live broadcast data.
  • a storage medium is provided, and instructions in the storage medium are executed by a processor of an electronic device, so that the electronic device can execute the above-mentioned method for encoding live data.
  • a computer program product includes a computer program, the computer program is stored in a storage medium, and at least one processor of an electronic device reads and executes data from the storage medium. Executing the computer program enables the electronic device to execute the method for encoding live data as described above.
  • Fig. 1 is a diagram showing an application environment of a method for encoding live data according to an embodiment.
  • Fig. 2 is a flowchart showing a method for encoding live data according to an embodiment.
  • Fig. 3 is a flow chart showing a method of communicating between the host end, the audience end, and the guest end according to an embodiment.
  • Fig. 4 is a flow chart showing a change of state information according to an embodiment.
  • Fig. 5 is a flowchart showing a method for encoding live data according to another embodiment.
  • Fig. 6 is a block diagram showing a device for encoding live data according to an embodiment.
  • Fig. 7 is a block diagram showing a system for streaming live data according to an embodiment.
  • Fig. 8 is a timing diagram of interaction between an encoding end and a decoding end according to an embodiment.
  • Fig. 9 is a schematic diagram showing the transmission of a target data frame according to an embodiment.
  • Fig. 10 is a schematic diagram showing the transmission of a target data frame according to another embodiment.
  • Fig. 11 is a block diagram showing an electronic device according to an embodiment.
  • the encoding method for live data is applied in the application environment as shown in FIG. 1.
  • the application environment includes an encoding terminal 101 and a decoding terminal 102, which communicate through the network.
  • the encoding terminal 101 uses the same state information to encode the current data frame according to the first code rate and the second code rate respectively to obtain the corresponding first encoded data frame and second encoded data frame, and send the two encoded data frames
  • the decoding end 102 performs decryption to obtain the corresponding live broadcast data.
  • both the encoding terminal 101 and the decoding terminal 102 are implemented by the client, as long as they have corresponding encoding or decoding functions.
  • the encoding terminal 101 is implemented by an encoder or an electronic device with encoding function, where the encoder is an Opus encoder, a celt encoder, an AAC encoder, etc., and the electronic device is a mobile phone, a computer, or a digital broadcasting terminal. , Messaging equipment, game consoles, tablet devices, medical equipment, fitness equipment, personal digital assistants, etc.; the decoder 102 is also implemented by a decoder or an electronic device with a decoding function.
  • the types of electronic devices are as above. There is no restriction on the type.
  • the host and the guest communicate through their respective clients to realize the live broadcast connection. The host client sends the communication data of the two to the audience to realize the live broadcast. .
  • Fig. 2 is a flowchart of a method for encoding live data according to an embodiment. As shown in Fig. 2, the method for encoding live data is used in the encoding end, including steps S201, S202, S203, S204, S205, and S206 , The content is as follows:
  • the initial state information for the current data frame is acquired; the current data frame is the data frame in the live data.
  • the live broadcast data is audio recorded through a live broadcast platform, audio-related data, etc.
  • the live broadcast data corresponding to a certain time period can be split into multiple data frames, for example, each time corresponds to one data frame. Among them, the length of the specific time period is determined according to the actual situation, and the present disclosure does not limit this.
  • the live broadcast data refers to interaction data between the live broadcast terminal and the guest/audience terminal.
  • the live broadcast and Lianmai scenes are explained as follows:
  • the communication between the host and the audience and the guest is shown in Figure 3.
  • the host records the live data, and sends the recorded live data to the audience through the live network.
  • the host can realize the live broadcast, and it can also establish a connection with the guest to realize the connection to the microphone, that is, the anchor can communicate with the guest through the microphone network during the live broadcast, and integrate the communication information of the anchor and the guest to the audience.
  • the encoder is implemented by the host, guest, or audience.
  • the live broadcast network and Lianmai network are the same or different.
  • the current data frame is the data frame at the current running time (ie the current time) in the live data, and there are adjacent data frames before and after it.
  • the set historical time before the current data frame also corresponds to history Data Frame.
  • the set historical moment is the moment before the current moment or several moments before. It should be noted that this moment can be considered as the sampling moment, that is, one moment corresponds to one data frame.
  • the status information (also called context) refers to the status information of the current data frame, or the status information obtained after the encoding end analyzes and integrates the status information of the historical data frame of the current data frame.
  • the obtaining the initial state information for the current data frame includes: obtaining the sampling rate, the number of channels, and the bandwidth for the current data frame At least one of code rate, voice activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information. This information characterizes the running status of the live data.
  • Combining this status information to encode the current data frame can fully take into account the running status of the current data frame, the running status of the data frame and the requirements for the operating environment, so that the decoder can decode The obtained live data is closer to the original live data, and at the same time, it can ensure that the decoded data frame can be output normally at the decoding end, which effectively guarantees the effect of the live broadcast.
  • At least one item refers to one or more items; for example, sampling rate, number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain, linear prediction coefficient At least one item includes: sampling rate, number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient.
  • the voice activity detection information refers to the information obtained by detecting the live audio through the voice activity detection (Voice activity detection, VAD) technology.
  • VAD detection is to detect the presence of a voice signal. Therefore, the voice activity detection information is the presence of voice or voice. does not exist.
  • the sampling rate, number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient of historical data frames processed by the encoder are determined. At least one item of is integrated to obtain the sampling rate, number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain and/or linear prediction coefficient for the current data frame. For example, the average value of the sampling rate of each historical data frame is calculated as the sampling rate for the current data frame, that is, as the initial state information.
  • the initial state information is backed up to obtain the backup state information.
  • the initial state information is copied, the initial state information obtained by the copy is used as the backup state information, and the backup state information is stored in the memory of the encoding terminal for use.
  • the initial state information is recorded as context_ori
  • the backup state information is recorded as context_copy.
  • the current data frame is encoded according to the first code rate and the initial state information to obtain the first encoded data frame; wherein, the initial state information of the encoding end is updated after the encoding ends.
  • the code rate is also called bit rate, which refers to the number of bits transmitted per second. The higher the bit rate, the more data is transmitted per second, and the clearer the sound.
  • the bit rate in sound refers to the amount of binary data per unit time after the analog sound signal is converted into a digital sound signal. It is an indirect measure of audio quality.
  • the first code rate in this step is the code rate used when encoding the current data frame, and its size is determined according to actual requirements.
  • the newly encoded data frame (the current data frame) will update the context. This is because the signal is related before and after, and encoding a data frame will use the past historical data frame in combination with the status information.
  • the encoder obtains the initial state information according to the previous historical encoding situation (that is, combined with the state information of the previous data frames), and then encodes the current data frame through the initial state information Later, the current data frame will be used as a historical data frame (a data frame that has been encoded) and the initial state information will be updated.
  • the updated initial state information is reset through the backup state information to obtain the reset state information.
  • the reset state information is consistent with the initial state information before the update.
  • the current data frame is encoded according to the second code rate and the reset state information to obtain a second coded data frame; the second code rate is different from the first code rate.
  • the second code rate is the same concept as the first code rate, but there is a certain difference between the two code rates.
  • the first code rate is greater than or less than the second code rate.
  • the encoder encodes the live data according to different code rates, and can obtain target data frames at different code rates.
  • the target data frames obtained in this way can perform packet loss recovery at the decoding end, for example, the first encoding If the data frame corresponding to the data frame is lost, the data frame corresponding to the second encoded data frame can be replaced to realize packet loss recovery of the data frame corresponding to the first encoded data frame.
  • the initial state information After the initial state information is updated, it is recorded as context_temp. Since the current data frame needs to be encoded for the second time (that is, the current data frame is encoded according to the second code rate), or even the third time, if the updated initial state information context_temp is directly used in conjunction with the second code rate,
  • the operating state of the second encoded data frame obtained by encoding will be different from that of the first encoded data frame, that is, the operation of the encoded data frame corresponding to the first code rate and the encoded data frame corresponding to the second code rate
  • the state is different, which will cause a large change in the operating state of the decoding end when the code rate is switched, resulting in the appearance of noise. Therefore, in this step, the current data frame is encoded by resetting the state information. Therefore, the state information corresponding to the second encoded data frame and the first encoded data frame can be made consistent, that is, the operating states of the two can be made consistent.
  • the process of processing the context is shown in FIG. 4.
  • the initial state information is context_ori.
  • the backup state information context_copy is obtained.
  • the current data frame is encoded according to the first code rate and the initial state information context_ori to obtain the first encoded data frame.
  • the initial state The information context_ori is updated to context_temp, and context_temp is reset by backing up the status information context_copy, and the status information corresponding to the current data frame becomes the reset status information (also denoted as context_copy).
  • the status information context_copy encodes the current data frame to obtain the second encoded data frame.
  • the process of encoding the current data frame by the encoder according to the first code rate and the second code rate is completed.
  • the first target data frame is obtained.
  • the first coded data frame and the second coded data frame are integrated, and the result obtained by the integration is As the first target data frame.
  • S201-S206 realize the encoding of the current data frame, and the live data contains not only the current data frame. Therefore, after the first target data frame is obtained, the subsequent data frames of the current data frame can be continued to follow the same Encoding is performed frame by frame to obtain the corresponding target data frame until each frame in the live data has been coded, or the main frame that can characterize the live data has been coded.
  • the decoder needs to switch the code rate when outputting data frames of different code rates.
  • the first and second encoded data frames are performed by the same encoder according to the same state.
  • the information is encoded, which effectively reduces the noise that occurs when the decoder switches between code rates.
  • both the live broadcast terminal and the guest terminal can serve as the encoding terminal and the decoding terminal, for example, the live broadcast
  • the host sends the audio code to the guest, and the guest decodes the encoded audio.
  • the guest can also send the audio code to the host, and the host decodes the encoded audio.
  • the number of connected guest terminals may be more than one. In this case, assuming that the live broadcast terminal is encoding, the live broadcast terminal needs to encode the live broadcast connection data of multiple guest terminals.
  • the live broadcast terminal can separately encode the live broadcast connection data of each guest terminal, and then integrate the target data frame of each guest terminal after the encoding is completed; it can also encode and integrate the data frames of the same time sequence and different guest terminals into a target according to the time sequence sequence. Data frame, and then encode and integrate the data frame of the next time sequence.
  • the method further includes: obtaining the first target data frame corresponding to the adjacent data frame.
  • Two target data frames the adjacent data frame is the data frame adjacent to the current data frame in the live data; obtaining long-term prediction parameters; the long-term prediction parameters are less than a preset parameter threshold; according to The long-term prediction parameter sends the first target data frame and the second target data frame to the receiving end, so that the receiving end can perform a calculation on the first target data frame according to the long-term prediction parameter.
  • the second target data frame for decoding.
  • the second target data frame is a target data frame obtained by encoding adjacent data frames of the current data frame.
  • the adjacent data frame is a historical data frame of the current data frame, or a subsequent data frame thereof (that is, a data frame that is later than the current moment in time sequence).
  • the number of adjacent data frames is one, two or even more.
  • the adjacent data is the previous data frame, the next data frame, the previous and the next data frame, the previous N data frames, and the next N data frame of the current data frame. Data frames and so on; among them, the size of N can be determined according to the actual situation.
  • the adjacent data frames are all data frames in the live data except the current data frame, that is, the encoder encodes each frame of the live data and obtains the corresponding target data frames respectively.
  • the decoder can get the complete live data by decoding the target data frame.
  • LTP parameters Long-term prediction parameters are also referred to as LTP parameters for short, and LTP parameters are used to predict and remove redundant information within a time period. In the encoding process, the correlation between live data frames is predicted, and then redundancy is removed. When the LTP parameter is not enabled, it is simply considered that the relationship between adjacent data frames is only considered, so the coupling between adjacent data frames is relatively large.
  • the LTP parameter in the present disclosure is less than the preset parameter threshold.
  • This parameter will deteriorate the sound quality without packet loss (as opposed to the LTP configuration that does not reduce the correlation between frames), but in the case of packet loss Lowering the LTP parameter value can reduce the coupling between frames, and the impact of lost audio frames on subsequent audio frames is reduced, which in turn reduces the jump when switching between different bitrate packets, which will make the live broadcast effect better.
  • the above parameter threshold is determined according to actual conditions, and the present disclosure does not limit this.
  • the obtaining the long-term prediction parameter includes: obtaining a packet loss rate for a set historical time period; in response to the packet loss rate being higher than a preset packet loss rate threshold, obtaining the long-term prediction parameter .
  • the duration of the historical time period and the packet loss rate threshold are determined according to actual conditions, and the present disclosure does not limit this.
  • the packet loss rate is high, frequent bit rate switching may occur.
  • the master packet will lose every other packet, so the effective master packet received is 1,3,5,7..., the lost master packet 2,4,6,8...need to be replaced by the slave packet.
  • the bit rate of the master packet and the slave packet are different, so the bit rate packet switching will occur continuously, which will cause serious noise interference.
  • reduce the number of bits allocated to the LTP parameter and reduce the long-term prediction parameter The weight of, reduce the coupling between frames, thereby reducing the jump when switching between different bit rate packets, and preventing noise when switching between bit rate packets.
  • the first code rate is lower than the second code rate.
  • the first bit rate is the compressed bit rate for compressing and encoding live data frames (also known as low bit rate)
  • the second bit rate is the fidelity bit rate for fidelity encoding of live data frames (also known as high bit rate). Bit rate).
  • the current data frame is respectively encoded according to the compression code rate and the fidelity code rate to obtain the corresponding compressed code data frame and the fidelity code rate data frame.
  • the fidelity rate data frame is sent to the decoding end as the main packet.
  • the decoding end In response to the decoding end being able to decode the corresponding current data frame from the fidelity rate data frame, proceed to the next step, and respond The decoding end cannot decode the corresponding current data frame from the fidelity code rate data frame, and decode the corresponding current data frame from the compressed code rate data frame.
  • the number of the first code rate and the second code rate is one, two or even more.
  • One bit rate encoding obtains one target data frame. In this way, multiple target data frames can be obtained by encoding the current data frame.
  • the current data frame in addition to the first code rate and the second code rate, it also includes the third code rate, the fourth code rate, etc., that is, the current data frame can be coded according to multiple code rates to obtain multiple corresponding codes.
  • the target data frame so that the decoder can perform operations such as recovery of packet loss.
  • Fig. 5 is a flowchart showing a method for encoding live data according to an embodiment. As shown in Fig. 5, the method for encoding live data is used in an encoder. Taking the live broadcast data as the live broadcast continuous microphone data as an example, the encoding method of the live broadcast data includes the following steps:
  • At least one of the sampling rate, the number of channels, the bandwidth, the code rate, the voice activity detection information, the entropy coding information, the noise shaping gain, and the linear prediction coefficient for the current data frame is acquired as initial state information.
  • the initial state information is backed up to obtain the backup state information.
  • the current data frame is encoded according to the compression code rate and the initial state information to obtain a compressed encoded data frame; wherein the initial state information of the encoding end is updated by the current data frame after the encoding ends.
  • the compressed code rate corresponds to the aforementioned first code rate
  • the compressed coded data frame corresponds to the aforementioned first coded data frame
  • the updated initial state information is reset through the backup state information to obtain the reset state information.
  • the current data frame is encoded according to the fidelity code rate and the reset state information to obtain the fidelity encoded data frame.
  • the fidelity code rate corresponds to the aforementioned second code rate
  • the fidelity coded data frame corresponds to the aforementioned second coded data frame
  • a first target data frame corresponding to the current data frame is obtained according to the compressed encoded data frame and the fidelity encoded data frame.
  • the packet loss rate of the set historical time period is obtained; in response to the packet loss rate being higher than the preset packet loss rate threshold, the long-term prediction parameter is obtained.
  • the first target data frame and the second target data frame are sent to the receiving end, so that the receiving end performs processing on the first target data frame and the second target data frame according to the long-term prediction parameter. decoding.
  • the decoding end needs to switch the code rate when outputting data frames of different code rates.
  • the first and second encoded data frames in this solution are obtained by the same encoding end according to the same state information. This effectively reduces the noise that occurs when the decoder switches between code rates.
  • Fig. 6 is a block diagram of a device for encoding live data according to an embodiment. 6, the device includes an initial information acquiring unit 601, a backup information acquiring unit 602, a first determining unit 603, a state information determining unit 604, a data frame encoding unit 605, and a second determining unit 606.
  • the initial information acquiring unit 601 is configured to acquire initial state information for a current data frame; the current data frame is a data frame in live data.
  • the backup information obtaining unit 602 is configured to back up the initial state information to obtain the backup state information.
  • the first data frame determining unit 603 is configured to encode the current data frame according to the first code rate and the initial state information to obtain a first encoded data frame; wherein, after the encoding ends, the initial The status information is updated.
  • the state information determining unit 604 is configured to reset the updated initial state information through the backup state information to obtain reset state information, and the reset state information is consistent with the initial state information before the update.
  • the data frame encoding unit 605 is configured to encode the current data frame according to the second code rate and the reset state information to obtain a second coded data frame; the second code rate is different from the first code rate.
  • the second data frame determining unit 606 is configured to obtain a first target data frame corresponding to the current data frame according to the first encoded data frame and the second encoded data frame.
  • the decoding end needs to switch the code rate when outputting data frames of different code rates.
  • the first and second coded data frames are encoded according to the same state information. Effectively reduce the noise that occurs when the decoder switches between code rates.
  • the current data frame includes an audio frame; the initial information acquisition unit is further configured to acquire the sampling rate, the number of channels, the bandwidth, the bit rate, and the voice activity detection for the current data frame. At least one of information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • the device for encoding live data further includes: a data frame obtaining unit configured to obtain a second target data frame corresponding to an adjacent data frame; the adjacent data frame is the live data A data frame adjacent to the current data frame; a prediction parameter obtaining unit configured to obtain a long-term prediction parameter; the long-term prediction parameter is less than a parameter threshold; a data frame sending unit configured to be based on the long-term prediction Parameters, the first target data frame and the second target data frame are sent to the receiving end, so that the receiving end can compare the first target data frame and the second target data frame to the second target data frame according to the long-term prediction parameter.
  • the target data frame is decoded.
  • the predictive parameter obtaining unit includes: a packet loss rate obtaining subunit configured to obtain a packet loss rate for a set historical time period; and a predictive parameter obtaining subunit configured to respond to the packet loss rate.
  • the packet rate is higher than the preset packet loss rate threshold, and the long-term prediction parameter is acquired.
  • the second data frame determining unit is configured to integrate the first encoded data frame and the second encoded data frame, and use the result of the integration as the first target data frame
  • a live data streaming system 700 including: an encoding terminal 701 and a decoding terminal 702; the two are connected in communication through a network.
  • the encoding terminal 701 is configured to encode the current data frame according to the reset state information and the compression code rate to obtain a compressed encoded data frame; obtain the fidelity encoded data frame, and combine the compressed encoded data frame and the fidelity encoded data frame into data Composite package, the data composite package is sent to the decoder; the current data frame is the data frame in the live data, and the reset state information is obtained by resetting the backup state information.
  • the backup state information is obtained by backing up the initial state information.
  • the state information is used to encode the current data frame in combination with the fidelity rate.
  • the fidelity encoded data frame is obtained by encoding according to the fidelity rate and the initial state information of the historical data frame; the historical data frame is the historical moment of the current data frame Corresponding data frame; the decoding end is configured to decode the data composite packet to obtain the compressed data frame and the first fidelity data frame, and in response to receiving the loss information of the second fidelity data frame, replace the first fidelity data frame by the compressed data frame
  • the second fidelity data frame obtains corresponding live data according to the compressed data frame and the first fidelity data frame; wherein, the second fidelity data frame is a data frame obtained according to the historical data frame and the fidelity bit rate.
  • the fidelity code rate is referred to as a high code rate
  • the compressed code rate is referred to as a low code rate
  • the fidelity coded data frame and the compressed coded data frame are respectively referred to as a high coded data frame and a low coded data frame.
  • the fidelity coded data frame is 24K
  • the compressed coded data frame is 16K.
  • the current data frame be the Nth frame
  • the historical data frames be the N-1th frame and the N-2th frame.
  • the number of historical data frames can be more or less; in addition, the frame rate of historical data frames is greater than 1, for example, the frame rate is 2, and the historical data frames are N-2, N-4, N-6 frames and so on.
  • the process of streaming live data between the encoding end and the decoding end is shown in FIG. 8.
  • the encoding end generates corresponding high-encoded data frames and low-encoded data frames for the N-2, N-1, and Nth frames respectively.
  • the high-encoded data frame of the Nth frame is marked as Packet_High[N]
  • the low-encoded data frame of the Nth frame The encoded data frame is marked as Packet_Low[N].
  • Packet_High[N-1] If packet loss of the main packet Packet_High[N-1] is found (that is, the loss information of the second fidelity data frame is received, as shown in the dashed box in Figure 10), then Waiting for the next composite packet; the encoding end sends the Nth composite packet to the decoding end after a period of time: Packet_High[N]-Packet_Low[N-1].
  • the decoding end decodes the received Nth composite packet Packet_High[N]-Packet_Low[N-1], and uses the data frame obtained by decoding Packet_Low[N-1] as the data of the N-1th frame and uses it Packet_Low[N-1] replaces Packet_High[N-1].
  • the decoder obtains the complete data of the N-1th frame and the Nth frame (ie Packet_Low[N-1]-Packet_High[N]), that is The decoding end can obtain the entire live broadcast data.
  • Packet_Low[N-1] and Packet_High[N] obtained in this way in the embodiment of the present disclosure come from the same encoder, and both are encoded based on the same state information. Therefore, the encoder is switching Packet_Low[N -1] and Packet_High[N] corresponding bit rate will not produce noise, which can effectively improve the audio output effect of live broadcast connection.
  • the decoding end is further configured to execute after obtaining the corresponding live data according to the compressed data frame and the first fidelity data frame: in the fidelity bit rate state, output the The first fidelity data frame; switch to the compressed code rate state, and output the compressed data frame.
  • the decoding end can output the decoded data frame, and since packet loss occurred before and the high bit rate data frame was replaced by the low bit rate data frame, when the data is output one by one, it needs to be in the high bit rate and the low bit rate. To switch between.
  • the decoding end needs to switch the code rate when outputting data frames of different code rates.
  • the fidelity coded data frame and the compressed coded data frame in this solution come from the same encoder, and the two It is obtained by encoding according to the same state information.
  • the decoder decodes the data composite packet when it receives it. In response to finding that a certain fidelity data frame is lost, it passes the next frame (or even the next few frames). At the same time, the live data can be obtained according to the restored data frame.
  • the live data is output, the fidelity rate and the compression rate need to be switched, and the fidelity data frame and the compressed data frame The corresponding status information is the same, so no noise will appear when the decoder switches between bit rates.
  • an electronic device 1100 is provided. 11, the electronic device 1100 includes one or more of the following components: a processing component 1101, a memory 1102, a power component 1103, a multimedia component 1104, an input/output (I/O) interface 1105, an audio component 1106, and a communication component 1107.
  • a processing component 1101 a memory 1102
  • a power component 1103 a multimedia component 1104
  • an input/output (I/O) interface 1105 an audio component 1106, and a communication component 1107.
  • the processing component 1101 generally controls overall operations of the electronic device 1100, such as operations associated with display, telephone calls, data communication, camera operations, and recording operations.
  • the processing component 1101 includes one or more processors 1108 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 1101 includes one or more modules to facilitate the interaction between the processing component 1101 and other components.
  • the processing component 1101 includes a multimedia module to facilitate the interaction between the multimedia component 1104 and the processing component 1101.
  • the memory 1102 is configured to store various types of data to support operations in the electronic device 1100. Examples of these data include instructions for any application or method operating on the electronic device 1100, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 1102 is implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable Read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable Read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic storage flash memory, magnetic or optical disk.
  • the power supply component 1103 provides power for various components of the electronic device 1100.
  • the power supply component 1103 includes a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 1100.
  • the multimedia component 1104 includes a screen that provides an output interface between the electronic device 1100 and the user.
  • the screen includes a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen is implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor not only senses the boundary of the touch or slide action, but also detects the duration and pressure related to the touch or slide operation.
  • the multimedia component 1104 includes a front camera and/or a rear camera. When the electronic device 1100 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera receive external multimedia data. Each front camera and rear camera is a fixed optical lens system or has focal length and optical zoom capabilities.
  • the I/O interface 1105 provides an interface between the processing component 1101 and a peripheral interface module.
  • the above-mentioned peripheral interface module is a keyboard, a click wheel, a button, and the like. These buttons include but are not limited to: home button, volume button, start button and lock button.
  • the audio component 1106 is configured to output and/or input audio signals.
  • the audio component 1106 includes a microphone (MIC).
  • the microphone When the electronic device 1100 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive external audio signals.
  • the received audio signal is further stored in the memory 1102 or transmitted via the communication component 1107.
  • the audio component 1106 further includes a speaker for outputting audio signals.
  • the live broadcast data is live microphone-linked audio, and the live microphone-linked audio is input to the electronic device through the audio component 1106.
  • the communication component 1107 is configured to facilitate wired or wireless communication between the electronic device 1100 and other devices.
  • the electronic device 1100 accesses a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof.
  • the communication component 1107 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component 1107 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the electronic device 1100 is implemented by one or more application specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field programmable gates Array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented to implement the above methods.
  • ASIC application specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field programmable gates Array
  • controller microcontroller, microprocessor or other electronic components are implemented to implement the above methods.
  • the processor is configured to: obtain initial state information for the current data frame; the current data frame is a data frame in the live data; backup the initial state information to obtain the backup State information; encode the current data frame according to the first code rate and the initial state information to obtain a first encoded data frame; wherein the initial state information of the encoding end is updated after the encoding ends; through the The backup state information resets the updated initial state information to obtain reset state information, and the reset state information is consistent with the initial state information before the update; according to the second bit rate and the reset state information pair
  • the current data frame is encoded to obtain a second coded data frame; the second code rate is different from the first code rate; according to the first coded data frame and the second coded data frame, the second coded data frame is The corresponding first target data frame.
  • the current data frame includes an audio frame; the processor is further configured to: obtain the sampling rate, the number of channels, the bandwidth, the bit rate, and the voice activity detection information for the current data frame, At least one of entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • the processor is further configured to: obtain a second target data frame corresponding to an adjacent data frame; the adjacent data frame is data adjacent to the current data frame in the live data Frame; acquiring long-term prediction parameters; the long-term prediction parameters are less than the parameter threshold; according to the long-term prediction parameters, the first target data frame and the second target data frame are sent to the receiving end, so that all The receiving end decodes the first target data frame and the second target data frame according to the long-term prediction parameter.
  • the processor is further configured to: obtain a packet loss rate for a set historical time period; and in response to the packet loss rate being higher than a preset packet loss rate threshold, obtain the long-term prediction parameter.
  • the processor is further configured to integrate the first encoded data frame and the second encoded data frame, and use the result of the integration as the first target data frame.
  • a non-transitory computer-readable storage medium including instructions, such as a memory 1102 including instructions, which can be executed by the processor 1108 of the encoding terminal 1100 to complete the foregoing method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • the instructions in the storage medium are executed by the processor of the electronic device to implement all or part of the steps of the foregoing method.
  • the instructions in the storage medium are executed by the processor of the electronic device, so that the electronic device can perform the following steps: obtain the initial state information for the current data frame; the current data frame is the data in the live data Frame; back up the initial state information to obtain backup state information; encode the current data frame according to the first code rate and the initial state information to obtain a first encoded data frame; wherein, after the end of the encoding, the current data frame is encoded
  • the initial state information of the encoding terminal is updated; the updated initial state information is reset through the backup state information to obtain reset state information, and the reset state information is consistent with the initial state information before the update; Encode the current data frame according to the second code rate and the reset state information to obtain a second coded data frame; the second code rate is different from the first code rate; according to the first coded data frame and the The second encoded data frame obtains the first target data frame corresponding to the current data frame.
  • the current data frame includes an audio frame; the instructions in the storage medium are executed by the processor of the electronic device, so that the electronic device can perform the following steps: obtain the sampling rate and the audio frame for the current data frame. At least one of the number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • the instructions in the storage medium are executed by the processor of the electronic device, so that the electronic device can perform the following steps: obtain the second target data frame corresponding to the adjacent data frame; The data frame adjacent to the current data frame in the live data; obtain long-term prediction parameters; the long-term prediction parameters are less than the parameter threshold; according to the long-term prediction parameters, the first target data frame and the The second target data frame is sent to the receiving end, so that the receiving end decodes the first target data frame and the second target data frame according to the long-term prediction parameter.
  • the instructions in the storage medium are executed by the processor of the electronic device, so that the electronic device can perform the following steps: obtain a packet loss rate for a set historical time period; Set a packet loss rate threshold to obtain the long-term prediction parameter.
  • a computer program product includes a computer program, the computer program is stored in a storage medium, and at least one processor of the electronic device reads from the storage medium and executes all the programs.
  • the computer program enables the electronic device to execute: obtain the initial state information for the current data frame; the current data frame is the data frame in the live data; backup the initial state information to obtain the backup state information; The code rate and the initial state information encode the current data frame to obtain the first encoded data frame; wherein the initial state information of the encoding end is updated after the encoding ends; the backup state information is reset and updated After the initial state information, reset state information is obtained, and the reset state information is consistent with the initial state information before the update; the current data frame is performed according to the second bit rate and the reset state information Encode to obtain a second coded data frame; the second code rate is different from the first code rate; according to the first coded data frame and the second coded data frame, the first target data corresponding to the current data frame is obtained frame.
  • the current data frame includes an audio frame; at least one processor of the electronic device reads and executes the computer program from the storage medium, so that the electronic device can execute: At least one of sampling rate, number of channels, bandwidth, code rate, voice activity detection information, entropy coding information, noise shaping gain, and linear prediction coefficient is used as the initial state information.
  • At least one processor of the electronic device reads and executes the computer program from the storage medium, so that the electronic device can execute: obtain a second target data frame corresponding to an adjacent data frame;
  • the data frame is the data frame adjacent to the current data frame in the live data;
  • the long-term prediction parameter is obtained;
  • the long-term prediction parameter is less than the parameter threshold; according to the long-term prediction parameter, the first target
  • the data frame and the second target data frame are sent to the receiving end, so that the receiving end decodes the first target data frame and the second target data frame according to the long-term prediction parameter.
  • At least one processor of the electronic device reads and executes the computer program from the storage medium, so that the electronic device can execute: obtain a packet loss rate for a set historical time period; and respond to the packet loss The rate is higher than the preset packet loss rate threshold, and the long-term prediction parameter is acquired.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Environmental & Geological Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本公开关于一种直播数据的编码方法及电子设备。该方法应用于编码端,包括:获取针对当前数据帧的初始状态信息;当前数据帧为直播数据中的数据帧;对初始状态信息进行备份,得到备份状态信息;根据第一码率和初始状态信息对当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后编码端的初始状态信息被更新;通过备份状态信息重置更新后的初始状态信息,得到重置状态信息,重置状态信息与更新前的初始状态信息一致;根据第二码率和重置状态信息对当前数据帧进行编码,得到第二编码数据帧;根据第一编码数据帧和第二编码数据帧,得到与当前数据帧对应的第一目标数据帧。

Description

直播数据的编码方法及电子设备
本公开要求于2020年02月18日提交的申请号为202010099025.7、发明名称为“直播数据的编码方法、装置、流转系统及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及直播连麦技术领域,尤其涉及直播数据的编码方法及电子设备。
背景技术
直播连麦是现在一种十分普遍的商业娱乐方式,越来越多的人开始在直播过程中使用手机或者电脑进行连麦。主播端与两个或多个人(嘉宾)进行连麦,实现多端的直播。
在连麦过程中,直播端和嘉宾端往往需要进行信号流转,例如,嘉宾端将录制的直播数据进行编码并传输至主播端,主播端对已编码的直播数据解码后进行播放。在信号流转过程中,丢包在所难免。当直播数据发生丢包时,主播端接收到的直播数据会受到严重损伤,所以在连麦过程中的丢包恢复是一个必要的技术。
发明内容
本公开提供一种直播数据的编码方法及电子设备。本公开的技术方案如下:
根据本公开实施例的一方面,提供一种直播数据的编码方法,应用于编码端,包括:获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;对所述初始状态信息进行备份,得到备份状态信息;根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同;根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
在一些实施例中,所述当前数据帧包括音频帧;所述获取针对当前数据帧的初始状态信 息,包括:获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,还包括:获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;获取长时预测参数;所述长时预测参数小于参数阈值;根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
在一些实施例中,所述获取长时预测参数,包括:获取设定历史时间段的丢包率;响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
在一些实施例中,所述根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧,包括:
对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
根据本公开实施例的另一方面,提供一种直播数据的编码装置,应用于编码端,包括:初始信息获取单元,被配置为获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;备份信息获取单元,被配置为对所述初始状态信息进行备份,得到备份状态信息;第一数据帧确定单元,被配置为根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;状态信息确定单元,被配置为通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;数据帧编码单元,被配置为根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同;第二数据帧确定单元,被配置为根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
在一些实施例中,所述当前数据帧包括音频帧;所述初始信息获取单元,还被配置为获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,所述直播数据的编码装置,还包括:数据帧获取单元,被配置为获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;预测参数获取单元,被配置为获取长时预测参数;所述长时预测参数小于参数阈值;数据帧发送单元,被配置为根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据 帧和所述第二目标数据帧进行解码。
在一些实施例中,所述预测参数获取单元,包括:丢包率获取子单元,被配置为获取设定历史时间段的丢包率;预测参数获取子单元,被配置为响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
在一些实施例中,所述第二数据帧确定单元,被配置为对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
根据本公开实施例的另一方面,提供一种直播数据的流转系统,包括:编码端和解码端;所述编码端,被配置为根据重置状态信息和压缩码率对当前数据帧进行编码,得到压缩编码数据帧;获取保真编码数据帧,将所述压缩编码数据帧和所述保真编码数据帧组合为数据复合包,将所述数据复合包发送给解码端;所述当前数据帧为直播数据中的数据帧,所述重置状态信息由备份状态信息重置得到,所述备份状态信息是通过对所述初始状态信息进行备份得到的,所述初始状态信息用于结合保真码率对所述当前数据帧进行编码,所述保真编码数据帧是根据所述保真码率和历史数据帧的初始状态信息进行编码得到的;所述历史数据帧为所述当前数据帧的历史时刻对应的数据帧;所述解码端,被配置为对所述数据复合包中的所述压缩编码数据帧和所述保真编码数据帧进行解码,得到压缩数据帧和第一保真数据帧,响应于接收到第二保真数据帧的丢失信息,通过所述压缩数据帧替换所述第二保真数据帧,根据所述压缩数据帧和所述第一保真数据帧得到对应的直播数据;其中,所述第二保真数据帧为根据所述历史数据帧以及所述保真码率得到的数据帧。
在一些实施例中,所述解码端,还被配置为在根据所述压缩数据帧和所述第一保真数据帧得到对应的直播数据之后执行:在保真码率状态下,输出所述第一保真数据帧;切换为压缩码率状态,输出所述压缩数据帧。
根据本公开实施例的另一方面,提供一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为所述指令,以实现如上所述的直播数据的编码方法。
根据本公开实施例的另一方面,提供一种存储介质,所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行如上所述的直播数据的编码方法。
根据本公开实施例的另一方面,提供一种计算机程序产品,所述程序产品包括计算机程序,所述计算机程序存储在存储介质中,电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备能够执行如上所述的直播数据的编码方法。
附图说明
图1是根据一实施例示出的一种直播数据的编码方法的应用环境图。
图2是根据一实施例示出的一种直播数据的编码方法的流程图。
图3是根据一实施例示出的一种主播端与观众端以及嘉宾端进行通信的方法流程图。
图4是根据一实施例示出的一种状态信息的变化流程图。
图5是根据另一实施例示出的一种直播数据的编码方法的流程图。
图6是根据一实施例示出的一种直播数据的编码装置的框图。
图7是根据一实施例示出的一种直播数据的流转系统的框图。
图8是根据一实施例示出的一种编码端和解码端进行交互的时序图。
图9是根据一实施例示出的一种目标数据帧的传输示意图。
图10是根据另一实施例示出的一种目标数据帧的传输示意图。
图11是根据一实施例示出的一种电子设备的框图。
具体实施方式
本公开所提供的直播数据的编码方法,应用于如图1所示的应用的环境中。该应用环境包括编码端101和解码端102,两者通过网络进行通信。编码端101对当前数据帧采用同样的状态信息分别按照第一码率和第二码率进行编码,得到对应的第一编码数据帧和第二编码数据帧,并将这两个编码数据帧发送给解码端,由解码端102进行解密以得到对应的直播数据。在一些实施例中,编码端101和解码端102均通过客户端实现,只要具有对应的编码或解码功能即可。
在一些实施例中,编码端101通过编码器或者具有编码功能的电子设备实现,其中,编码器为Opus编码器、celt编码器、AAC编码器等,电子设备是移动电话、计算机、数字广播终端、消息收发设备、游戏控制台、平板设备、医疗设备、健身设备、个人数字助理等;解码端102也通过解码器或者具有解码功能的电子设备实现,电子设备的类型如上,本公开对解码器的类型不做限制。在一些实施例中,直播连麦场景中,主播端和嘉宾端分别通过各自的客户端进行通信,实现直播连麦,由主播端的客户端将两者的通信数据发送给观众端,进而实现直播。
图2是根据一实施例示出的一种直播数据的编码方法的流程图,如图2所示,直播数据的编码方法用于编码端中,包括步骤S201、S202、S203、S204、S205和S206,内容如下:
在S201中,获取针对当前数据帧的初始状态信息;当前数据帧为直播数据中的数据帧。
在一些实施例中,直播数据是通过直播平台录制的音频、与音频相关的数据等,某个时 间段对应的直播数据能够拆分为多个数据帧,例如,每个时刻对应一个数据帧。其中,具体时间段的长短根据实际情况确定,本公开对此不做限制。在一些实施例中,直播数据指直播端与嘉宾端/观众端之间的交互数据。
对直播和连麦场景解释如下:在直播平台中,主播端与观众端以及嘉宾端之间的通信如图3所示,主播端录制直播数据,将所录制的直播数据通过直播网络发送给观众端实现直播,也能够与嘉宾端建立连接关系实现连麦,即主播端能够在直播过程中通过连麦网络与嘉宾端进行通信,并将主播端和嘉宾端的通信信息进行整合发送给观众端。在本公开的实施例中,编码器通过主播端、嘉宾端或观众端实现。直播网络和连麦网络相同或者不同。
在一些实施例中,当前数据帧为直播数据中当前所运行时刻(即当前时刻)的数据帧,其前后有相邻数据帧,另外,在当前数据帧之前的设定历史时刻也对应有历史数据帧。在一些实施例中,设定历史时刻是当前时刻的前一时刻或者前若干个时刻。需要说明的是,该时刻可以认为是采样时刻,即一个时刻对应一个数据帧。
状态信息(也称为context)指当前数据帧的状态信息,或者是编码端对当前数据帧的历史数据帧的状态信息进行分析整合后得到的状态信息。
在直播数据为音频的情况下,即所述当前数据帧包括音频帧;所述获取针对当前数据帧的初始状态信息,包括:获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。这些信息表征了直播数据的运行状态,结合这些状态信息对当前数据帧进行编码能充分考虑到当前数据帧的运行状态,充分考虑到数据帧的运行情况以及对运行环境的要求,使得解码端解码得到的直播数据更为接近原始的直播数据,同时能保证解码得到的数据帧能够在解码端正常的输出,有效保证直播的效果。
需要说明的是,至少一项是指一项或一项以上;例如,采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,包括:采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益和线性预测系数。
其中,语音活性检测信息指通过语音活性检测(Voice activity detection,VAD)技术对直播音频进行检测得到的信息,VAD检测的目的是检测语音信号是否存在,因此,语音活性检测信息是语音存在或者语音不存在。在一些实施例中,确定编码器已处理的历史数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数,对这些数据中的至少一项进行整合,得到针对当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益和/或线性预测系数。例如,计算各个历史数据帧的采样率的平均值,作为针对当前数据帧的采样率,即作为初始状态信息。
在S202中,对初始状态信息进行备份,得到备份状态信息。
本步骤对初始状态信息进行拷贝操作,将拷贝得到的初始状态信息作为备份状态信息,将备份状态信息存储在编码端的内存中备用。
其中,初始状态信息记为context_ori,备份状态信息记为context_copy。
在203中,根据第一码率和初始状态信息对当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后编码端的初始状态信息被更新。
其中,码率也称为比特率,指的是每秒传送的比特(bit)数。比特率越高,每秒传送数据就越多,声音就越清晰。声音中的比特率是指将模拟声音信号转换成数字声音信号后,单位时间内的二进制数据量,是间接衡量音频质量的一个指标。本步骤中的第一码率是对当前数据帧进行编码时所采用的码率,其大小根据实际需求确定。
编码后,新编码的数据帧(当前数据帧)就会更新context,这是因为信号前后是有关联的,编码一个数据帧会结合状态信息用到过去的历史数据帧。在一些实施例中,在获取到当前数据帧时,编码器根据之前的历史编码情况(即结合前面若干的数据帧的状态信息)得到初始状态信息,在通过初始状态信息对当前数据帧进行编码后,该当前数据帧就会作为历史数据帧(已经被编码的数据帧)并对初始状态信息进行更新。
在S204中,通过备份状态信息重置更新后的初始状态信息,得到重置状态信息。
其中,该重置状态信息与更新之前的初始状态信息一致。
在S205中,根据第二码率和重置状态信息对当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同。
其中,第二码率与第一码率是同样的概念,只是两个在码率大小上存在一定的差别。在一些实施例中,第一码率大于或者小于第二码率。本公开中编码器对直播数据按照不同的码率进行编码,能够得到不同码率下的目标数据帧,通过这种方式得到的目标数据帧在解码端能够进行丢包恢复,例如,第一编码数据帧对应的数据帧丢失,则能够通过第二编码数据帧对应的数据帧来替换,实现对第一编码数据帧对应的数据帧的丢包恢复。
初始状态信息被更新后,记为context_temp。由于需要对当前数据帧进行第二次编码(即根据第二码率来对当前数据帧进行编码),甚至第三次编码,如果直接通过更新后的初始状态信息context_temp结合第二码率来对当前数据帧进行编码,则编码得到的第二编码数据帧的运行状态就会与第一编码数据帧不同,即第一码率对应的编码数据帧与第二码率对应的编码数据帧的运行状态不同,这就会导致解码端在进行码率切换时会出现运行状态的较大变化,导致杂音的出现。因此,本步骤通过重置状态信息来对当前数据帧进行编码,因此,能够使得第二编码数据帧与第一编码数据帧对应的状态信息一致,即使得两者的运行状态一致。
在一些实施例中,对context进行处理的流程如图4所示。在图4中,初始状态信息为context_ori,对其进行拷贝后得到备份状态信息context_copy,按照第一码率和初始状态信息context_ori对当前数据帧进行编码,得到第一编码数据帧,此时初始状态信息context_ori被更新为context_temp,通过备份状态信息context_copy对context_temp进行重置,则当前数据帧对应的状态信息变为重置状态信息(同样记为context_copy),此时,按照第二码率和重置状态信息context_copy对当前数据帧进行编码,得到第二编码数据帧,此时,就完成了编码器对当前数据帧按照第一码率和第二码率进行编码的过程。
在S206中,根据第一编码数据帧和第二编码数据帧,得到与当前数据帧对应的第一目标数据帧。
本步骤在得到第一编码数据帧和第二编码数据帧后,得到第一目标数据帧,在一些实施例中,对第一编码数据帧和第二编码数据帧进行整合,将整合得到的结果作为第一目标数据帧。
其中,S201-S206实现的是对当前数据帧编码,而直播数据中包含的不仅仅是当前数据帧,因此,在得到第一目标数据帧后,可以继续对当前数据帧的后续数据帧按照同样的方式进行逐帧编码,得到对应的目标数据帧,直到直播数据中的各个帧都被编码完毕,或者,能够表征直播数据的主要帧被编码完毕。
上述直播数据的编码方法中,解码端在输出不同码率的数据帧时需要进行码率切换,本方案中的第一编码数据帧和第二编码数据帧是由同一个编码器根据相同的状态信息编码得到,这就有效了减少解码端在码率之间切换时出现的噪声。
在一些实施例中,对于直播端和嘉宾端进行直播连麦的情况(此时,直播数据称为直播连麦数据),直播端和嘉宾端都能够分别作为编码端和解码端,例如,直播端将音频编码发送给嘉宾端,嘉宾端对已编码的音频进行解码,同样,嘉宾端也能够将音频编码发送给主播端,主播端对已编码的音频进行解码。对于一个直播端,其连接的嘉宾端的数量可能不止一个,在这种情况下,假设由直播端进行编码,则直播端需要对多个嘉宾端的直播连麦数据进行编码,在一些实施例中,直播端能够对各个嘉宾端的直播连麦数据分别进行编码,编码结束后再对各个嘉宾端的目标数据帧进行整合;也能够按照时序顺序,对同一时序不同嘉宾端的数据帧进行编码并整合为一个目标数据帧,然后对下一时序的数据帧进行编码整合处理。
在一些实施例中,在根据所述第一编码数据帧和所述第二编码数据帧得到与所述当前数据帧对应的第一目标数据帧之后,还包括:获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;获取长时预测参数;所述长时预测参数小于预先设定的参数阈值;根据所述长时预测参数,将所述第一目标数据帧和所述 第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
其中,第二目标数据帧为对当前数据帧的相邻数据帧进行编码得到的目标数据帧。
在一些实施例中,相邻数据帧为当前数据帧的历史数据帧,或者是其后续数据帧(即在时序上晚于当前时刻的数据帧)。相邻数据帧的数量为一个、两个甚至多个,例如,相邻数据中为当前数据帧的前一数据帧、后一数据帧、前后各一数据帧、前N个数据帧、后N个数据帧等等;其中,N的大小可以根据实际情况确定。
在另一些实施例中,相邻数据帧是直播数据中除当前数据帧之外的所有数据帧,即编码器对直播数据的各个帧都进行编码,并分别得到对应的目标数据帧,这样的话,解码端对目标数据帧进行解码就能得到完整的直播数据。
长时预测参数也简称为LTP参数,LTP参数用于在一个时间周期内预测并去除冗余信息。在编码过程中,预测直播数据帧之间的相互关系,然后去掉冗余。在不启用LTP参数时,简单认为只是考虑相邻数据帧间的相互关系,所以相邻数据帧间耦合比较大。本公开中的LTP参数小于预先设定的参数阈值,这个参数在不丢包的情况下会使音质变差(相对于不减少帧与帧之间相关性的LTP配置),但是在丢包情况下,降低LTP参数值,能降低帧间的耦合,丢失的音频帧对后续音频帧的影响变小了,进而降低了不同码率包切换时的跳变,会使直播效果变好。其中,上述参数阈值根据实际情况确定,本公开对此不做限制。
在一些实施例中,所述获取长时预测参数,包括:获取设定历史时间段的丢包率;响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
其中,设定历史时间段的时长和丢包率阈值根据实际情况确定,本公开对此不做限制。丢包率较高时,可能是发生了频繁码率切换。例如:主包每隔1个就会发生丢包,所以收到的有效主包是1,3,5,7…,丢失的主包2,4,6,8…需要由从包来代替。而主包和从包的码率是不同的,所以就会不停地发生码率包的切换,这样就会导致杂音干扰严重,此时减少分配给LTP参数的bit数,降低长时预测参数的权重,降低帧间的耦合,进而减少不同码率包切换时的跳变,防止码率包切换时的杂音。
在一些实施例中,第一码率低于第二码率。第一码率是对直播数据帧进行压缩编码的压缩码率(也称为是低码率),第二码率是对直播数据帧进行保真编码的保真码率(也称为是高码率)。对当前数据帧分别按照压缩码率和保真码率进行编码,得到对应的压缩编码数据帧和保真码率数据帧。在发送给解码端时,将保真码率数据帧作为主包发送给解码端,响应于解码端能够从保真码率数据帧中解码出对应的当前数据帧,进行下一步的操作,响应于解码端不能从保真码率数据帧中解码出对应的当前数据帧,从压缩码率数据帧从解码出对应的当前 数据帧。
在一些实施例中,第一码率和第二码率的数量为一个、两个甚至多个。一个码率编码得到一路目标数据帧,通过这种方式,能够对当前数据帧编码得到多路目标数据帧。
在另一些实施例中,除了第一码率和第二码率,还包括第三码率、第四码率等,即能够按照多个码率对当前数据帧进行编码,得到对应的多个目标数据帧,以便解码端进行相应的丢包恢复等操作。
图5是根据一实施例示出的一种直播数据的编码方法的流程图,如图5所示,直播数据的编码方法用于编码器中。以直播数据为直播连麦数据为例,所述直播数据的编码方法包括以下步骤:
在S501中,获取针对当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为初始状态信息。
在S502中,对初始状态信息进行备份,得到备份状态信息。
在S503中,根据压缩码率和初始状态信息对当前数据帧进行编码,得到压缩编码数据帧;其中,编码结束后编码端的初始状态信息被当前数据帧更新。
在一些实施例中,压缩码率对应前述的第一码率,压缩编码数据帧对应前述的第一编码数据帧。
在S504中,通过备份状态信息重置更新后的初始状态信息,得到重置状态信息。
在S505中,根据保真码率和重置状态信息对当前数据帧进行编码,得到保真编码数据帧。
在一些实施例中,保真码率对应前述的第二码率,保真编码数据帧对应前述的第二编码数据帧。
在S506中,根据压缩编码数据帧和保真编码数据帧,得到与当前数据帧对应的第一目标数据帧。
在S507中,获取相邻数据帧对应的第二目标数据帧。
在S508中,获取设定历史时间段的丢包率;响应于丢包率高于预设丢包率阈值,获取长时预测参数。
在S509中,根据长时预测参数,将第一目标数据帧和第二目标数据帧发送给接收端,以使接收端根据长时预测参数,对第一目标数据帧和第二目标数据帧进行解码。
上述直播数据的编码方法,解码端在输出不同码率的数据帧时需要进行码率切换,本方案中的第一编码数据帧和第二编码数据帧是同一编码端根据相同的状态信息编码得到,这就有效减少解码端在码率之间切换时出现的噪声。
图6是根据一实施例示出的一种直播数据的编码装置框图。参照图6,该装置包括初始信息获取单元601、备份信息获取单元602、第一确定单元603、状态信息确定单元604、数据帧编码单元605和第二确定单元606。
初始信息获取单元601,被配置为获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧。
备份信息获取单元602,被配置为对所述初始状态信息进行备份,得到备份状态信息。
第一数据帧确定单元603,被配置为根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新。
状态信息确定单元604,被配置为通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致。
数据帧编码单元605,被配置为根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同。
第二数据帧确定单元606,被配置为根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
上述直播数据的编码装置,解码端在输出不同码率的数据帧时需要进行码率切换,本方案中的第一编码数据帧和第二编码数据帧是根据相同的状态信息编码得到,这就有效减少解码端在码率之间切换时出现的噪声。
在一些实施例中,所述当前数据帧包括音频帧;所述初始信息获取单元,还被配置为获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,所述直播数据的编码装置,还包括:数据帧获取单元,被配置为获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;预测参数获取单元,被配置为获取长时预测参数;所述长时预测参数小于参数阈值;数据帧发送单元,被配置为根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
在一些实施例中,所述预测参数获取单元,包括:丢包率获取子单元,被配置为获取设定历史时间段的丢包率;预测参数获取子单元,被配置为响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
在一些实施例中,所述第二数据帧确定单元,被配置为对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
在一些实施例中,如图7所示,还提供了一种直播数据的流转系统700,包括:编码端701和解码端702;两者通过网络进行通信连接。
编码端701,被配置为根据重置状态信息和压缩码率对当前数据帧进行编码,得到压缩编码数据帧;获取保真编码数据帧,将压缩编码数据帧和保真编码数据帧组合为数据复合包,将数据复合包发送给解码端;当前数据帧为直播数据中的数据帧,重置状态信息由备份状态信息重置得到,备份状态信息是通过对初始状态信息进行备份得到的,初始状态信息用于结合保真码率对当前数据帧进行编码,保真编码数据帧是根据保真码率和历史数据帧的初始状态信息进行编码得到的;历史数据帧为当前数据帧的历史时刻对应的数据帧;解码端,被配置为对数据复合包进行解码,得到压缩数据帧和第一保真数据帧,响应于接收到第二保真数据帧的丢失信息,通过压缩数据帧替换第二保真数据帧,根据压缩数据帧和第一保真数据帧得到对应的直播数据;其中,第二保真数据帧为根据历史数据帧以及保真码率得到的数据帧。
在一些实施例中,保真码率称为高码率,压缩码率称为低码率,对应的,保真编码数据帧和压缩编码数据帧分别称为高编码数据帧和低编码数据帧。在一些实施例中,保真编码数据帧为24K,压缩编码数据帧为16K。
在一些实施例中,令当前数据帧为第N帧,历史数据帧为第N-1帧和N-2帧。在实际的应用场景中,历史数据帧的数量能够更多或更少;另外,历史数据帧的帧率大于1,例如取帧率为2,历史数据帧为第N-2、N-4、N-6帧等。
在一些实施例中,编码端和解码端之间进行直播数据的流转过程如图8所示。编码端分别为第N-2、N-1以及N帧生成对应的高编码数据帧和低编码数据帧,其中,第N帧的高编码数据帧记为Packet_High[N],第N帧的低编码数据帧记为Packet_Low[N],因此,如图9所示,得到以下编码数据帧:Packet_High[N-2]、Packet_High[N-1]、Packet_High[N]、Packet_Low[N-2]、Packet_Low[N-1]、Packet_Low[N]。对于编码端,它先向解码端发送第N-1个复合包:Packet_High[N-1]-Packet_Low[N-2];对于解码端,它对接收到的第N-1个复合包Packet_High[N-1]-Packet_Low[N-2]进行解码,如果发现主包Packet_High[N-1]丢包(即接收到第二保真数据帧的丢失信息,如图10中的虚线框),则等待下一个复合包;编码端间隔一段时间后向解码端发送第N个复合包:Packet_High[N]-Packet_Low[N-1]。
解码端对接收到的第N个复合包Packet_High[N]-Packet_Low[N-1]进行解码,将对Packet_Low[N-1]进行解码得到的数据帧,作为第N-1帧的数据,并用Packet_Low[N-1]替换Packet_High[N-1],此时,解码端获取到了完整的第N-1帧和第N帧的数据(即Packet_Low[N-1]-Packet_High[N]),即解码端能够获取到整个直播数据。
通过本公开实施例这样的方式得到的Packet_Low[N-1]和Packet_High[N]来自同一个编码器,且两者是基于同样的状态信息进行编码得到的,因此,编码器在切换Packet_Low[N-1]和Packet_High[N]对应的码率时不会产生杂音,能有效提高直播连麦的音频输出效果。
在一些实施例中,所述解码端,还被配置为在根据所述压缩数据帧和所述第一保真数据帧得到对应的直播数据之后执行:在保真码率状态下,输出所述第一保真数据帧;切换为压缩码率状态,输出所述压缩数据帧。
解码端能够对解码得到的数据帧进行输出,而由于之前有发生丢包且高码率数据帧被低码率数据帧替换,则逐个进行数据的输出时,需要在高码率和低码率之间进行切换。
上述实施例提供的直播数据的流转系统,解码端在输出不同码率的数据帧时需要进行码率切换,本方案中的保真编码数据帧和压缩编码数据帧来自同一个编码器,且二者是根据相同的状态信息进行编码得到的,解码端在接收到数据复合包时对其进行解码,响应于发现某个保真数据帧丢包,则通过下一帧(甚至下几帧)方的压缩数据帧来进行恢复,同时,根据恢复后的数据帧能够得到直播数据,在对直播数据进行输出时需要进行保真码率和压缩码率的切换,而保真数据帧和压缩数据帧对应的状态信息是一样的,因此,解码端在码率之间切换时不会出现杂音。
在一些实施例中,提供一种电子设备1100。参照图11,电子设备1100包括以下一个或多个组件:处理组件1101、存储器1102、电力组件1103、多媒体组件1104、输入/输出(I/O)的接口1105、音频组件1106以及通信组件1107。
处理组件1101通常控制电子设备1100的整体操作,诸如与显示、电话呼叫、数据通信、相机操作和记录操作相关联的操作。处理组件1101包括一个或多个处理器1108来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1101包括一个或多个模块,便于处理组件1101和其他组件之间的交互。例如,处理组件1101包括多媒体模块,以方便多媒体组件1104和处理组件1101之间的交互。
存储器1102被配置为存储各种类型的数据以支持在电子设备1100的操作。这些数据的示例包括用于在电子设备1100上操作的任何应用程序或方法的指令、联系人数据、电话簿数据、消息、图片、视频等。存储器1102由任何类型的易失性或非易失性存储设备或者它们的 组合实现,如静态随机存取存储器(SRAM)、电可擦除可编程只读存储器(EEPROM)、可擦除可编程只读存储器(EPROM)、可编程只读存储器(PROM)、只读存储器(ROM)、磁存储器、快闪存储器、磁盘或光盘。
电源组件1103为电子设备1100的各种组件提供电力。电源组件1103包括电源管理系统,一个或多个电源,及其他与为电子设备1100生成、管理和分配电力相关联的组件。
多媒体组件1104包括在所述电子设备1100和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1104包括一个前置摄像头和/或后置摄像头。当电子设备1100处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头接收外部的多媒体数据。每个前置摄像头和后置摄像头是一个固定的光学透镜系统或具有焦距和光学变焦能力。
I/O接口1105为处理组件1101和外围接口模块之间提供接口,上述外围接口模块是键盘,点击轮,按钮等。这些按钮包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
音频组件1106被配置为输出和/或输入音频信号。例如,音频组件1106包括一个麦克风(MIC),当电子设备1100处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号被进一步存储在存储器1102或经由通信组件1107发送。在一些实施例中,音频组件1106还包括一个扬声器,用于输出音频信号。在一些实施例中,直播数据为直播连麦音频,该直播连麦音频通过音频组件1106输入到电子设备中。
通信组件1107被配置为便于电子设备1100和其他设备之间有线或无线方式的通信。电子设备1100接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一些实施例中,通信组件1107经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一些实施例中,所述通信组件1107还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在一些实施例中,电子设备1100被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在一些实施例中,所述处理器被配置为以下步骤:获取针对当前数据帧的初始状态信息; 所述当前数据帧为直播数据中的数据帧;对所述初始状态信息进行备份,得到备份状态信息;根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同;根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
在一些实施例中,所述当前数据帧包括音频帧;所述处理器还被配置为:获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,所述处理器还被配置为:获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;获取长时预测参数;所述长时预测参数小于参数阈值;根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
在一些实施例中,所述处理器还被配置为:获取设定历史时间段的丢包率;响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
在一些实施例中,所述处理器还被配置为:对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
在一些实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器1102,上述指令可由编码端1100的处理器1108执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。所述存储介质中的指令由电子设备的处理器执行,以实现上述方法的全部或部分步骤。
在一些实施例中,所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行以下步骤:获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;对所述初始状态信息进行备份,得到备份状态信息;根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;根据第二码率和所述重置状态 信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同;根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
在一些实施例中,所述当前数据帧包括音频帧;所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行以下步骤:获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行以下步骤:获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;获取长时预测参数;所述长时预测参数小于参数阈值;根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
在一些实施例中,所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行以下步骤:获取设定历史时间段的丢包率;响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
在一些实施例中,还提供了一种计算机程序产品,所述程序产品包括计算机程序,所述计算机程序存储在存储介质中,电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备能够执行:获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;对所述初始状态信息进行备份,得到备份状态信息;根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;第二码率与第一码率不同;根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
在一些实施例中,所述当前数据帧包括音频帧;电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备能够执行:获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
在一些实施例中,电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备能够执行:获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所 述直播数据中与所述当前数据帧相邻的数据帧;获取长时预测参数;所述长时预测参数小于参数阈值;根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
在一些实施例中,电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备能够执行:获取设定历史时间段的丢包率;响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。

Claims (19)

  1. 一种直播数据的编码方法,应用于编码端,包括:
    获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;
    对所述初始状态信息进行备份,得到备份状态信息;
    根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;
    通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;
    根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧,所述第二码率与所述第一码率不同;
    根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
  2. 根据权利要求1所述的直播数据的编码方法,其中,所述当前数据帧包括音频帧;
    所述获取针对当前数据帧的初始状态信息,包括:
    获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
  3. 根据权利要求1或2所述的直播数据的编码方法,其中,还包括:
    获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;
    获取长时预测参数;所述长时预测参数小于参数阈值;
    根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
  4. 根据权利要求3所述的直播数据的编码方法,其中,所述获取长时预测参数,包括:
    获取设定历史时间段的丢包率;
    响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
  5. 根据权利要求1所述的直播数据的编码方法,其中,所述根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧,包括:
    对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
  6. 一种直播数据的编码装置,应用于编码端,包括:
    初始信息获取单元,被配置为获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;
    备份信息获取单元,被配置为对所述初始状态信息进行备份,得到备份状态信息;
    第一数据帧确定单元,被配置为根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;
    状态信息确定单元,被配置为通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;
    数据帧编码单元,被配置为根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧;所述第二码率与所述第一码率不同;
    第二数据帧确定单元,被配置为根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
  7. 根据权利要求6所述的直播数据的编码装置,其中,所述当前数据帧包括音频帧;
    所述初始信息获取单元,还被配置为获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
  8. 根据权利要求6或7所述的直播数据的编码装置,其中,所述直播数据的编码装置,还包括:
    数据帧获取单元,被配置为获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;
    预测参数获取单元,被配置为获取长时预测参数;所述长时预测参数小于参数阈值;
    数据帧发送单元,被配置为根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧 和所述第二目标数据帧进行解码。
  9. 根据权利要求8所述的直播数据的编码装置,其中,所述预测参数获取单元,包括:
    丢包率获取子单元,被配置为获取设定历史时间段的丢包率;
    预测参数获取子单元,被配置为响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
  10. 根据权利要求6所述的直播数据的编码装置,其中,所述第二数据帧确定单元,被配置为对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
  11. 一种直播数据的流转系统,包括:编码端和解码端;
    所述编码端,被配置为根据重置状态信息和压缩码率对当前数据帧进行编码,得到压缩编码数据帧;获取保真编码数据帧,将所述压缩编码数据帧和所述保真编码数据帧组合为数据复合包,将所述数据复合包发送给解码端;所述当前数据帧为直播数据中的数据帧,所述重置状态信息由备份状态信息重置得到,所述备份状态信息是通过对所述初始状态信息进行备份得到的,所述初始状态信息用于结合保真码率对所述当前数据帧进行编码,所述保真编码数据帧是根据所述保真码率和历史数据帧的初始状态信息进行编码得到的;所述历史数据帧为所述当前数据帧的历史时刻对应的数据帧;
    所述解码端,被配置为对所述数据复合包中的所述压缩编码数据帧和所述保真编码数据帧进行解码,得到压缩数据帧和第一保真数据帧,响应于接收到第二保真数据帧的丢失信息,通过所述压缩数据帧替换所述第二保真数据帧,根据所述压缩数据帧和所述第一保真数据帧得到对应的直播数据;其中,所述第二保真数据帧为根据所述历史数据帧以及所述保真码率得到的数据帧。
  12. 根据权利要求11所述的直播数据的流转系统,其中,所述解码端,还被配置为在保真码率状态下,输出所述第一保真数据帧;切换为压缩码率状态,输出所述压缩数据帧。
  13. 一种电子设备,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为所述指令,以实现以下步骤:
    获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;
    对所述初始状态信息进行备份,得到备份状态信息;
    根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;
    通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;
    根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧,所述第二码率与所述第一码率不同;
    根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
  14. 根据权利要求13所述的电子设备,其中,所述当前数据帧包括音频帧;
    所述处理器还被配置为:
    获取针对所述当前数据帧的采样率、声道数、频宽、码率、语音活性检测信息、熵编码信息、噪声整形增益、线性预测系数中的至少一项,作为所述初始状态信息。
  15. 根据权利要求13或14所述的电子设备,其中,所述处理器还被配置为:
    获取相邻数据帧对应的第二目标数据帧;所述相邻数据帧为所述直播数据中与所述当前数据帧相邻的数据帧;
    获取长时预测参数;所述长时预测参数小于参数阈值;
    根据所述长时预测参数,将所述第一目标数据帧和所述第二目标数据帧发送给接收端,以使所述接收端根据所述长时预测参数,对所述第一目标数据帧和所述第二目标数据帧进行解码。
  16. 根据权利要求15所述的电子设备,其中,所述处理器还被配置为:
    获取设定历史时间段的丢包率;
    响应于所述丢包率高于预设丢包率阈值,获取所述长时预测参数。
  17. 根据权利要求13所述的电子设备,其中,所述处理器还被配置为:
    对所述第一编码数据帧和所述第二编码数据帧进行整合,将整合得到的结果作为所述第一目标数据帧。
  18. 一种存储介质,所述存储介质中的指令由电子设备的处理器执行,使得电子设备能够执行以下步骤:
    获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;
    对所述初始状态信息进行备份,得到备份状态信息;
    根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;
    通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;
    根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧,所述第二码率与所述第一码率不同;
    根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
  19. 一种计算机程序产品,其中,所述程序产品包括计算机程序,所述计算机程序存储在存储介质中,电子设备的至少一个处理器从所述存储介质读取并执行所述计算机程序,使得电子设备执行以下步骤:
    获取针对当前数据帧的初始状态信息;所述当前数据帧为直播数据中的数据帧;
    对所述初始状态信息进行备份,得到备份状态信息;
    根据第一码率和所述初始状态信息对所述当前数据帧进行编码,得到第一编码数据帧;其中,编码结束后所述编码端的所述初始状态信息被更新;
    通过所述备份状态信息重置更新后的所述初始状态信息,得到重置状态信息,所述重置状态信息与更新前的所述初始状态信息一致;
    根据第二码率和所述重置状态信息对所述当前数据帧进行编码,得到第二编码数据帧,所述第二码率与所述第一码率不同;
    根据所述第一编码数据帧和所述第二编码数据帧,得到与所述当前数据帧对应的第一目标数据帧。
PCT/CN2021/075612 2020-02-18 2021-02-05 直播数据的编码方法及电子设备 WO2021164585A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21757025.8A EP3993430A4 (en) 2020-02-18 2021-02-05 METHOD OF ENCODING LIVE BROADCAST DATA AND ELECTRONIC DEVICE
US17/582,778 US11908481B2 (en) 2020-02-18 2022-01-24 Method for encoding live-streaming data and encoding device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010099025.7A CN111277864B (zh) 2020-02-18 2020-02-18 直播数据的编码方法、装置、流转系统及电子设备
CN202010099025.7 2020-02-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/582,778 Continuation US11908481B2 (en) 2020-02-18 2022-01-24 Method for encoding live-streaming data and encoding device

Publications (1)

Publication Number Publication Date
WO2021164585A1 true WO2021164585A1 (zh) 2021-08-26

Family

ID=71000354

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075612 WO2021164585A1 (zh) 2020-02-18 2021-02-05 直播数据的编码方法及电子设备

Country Status (4)

Country Link
US (1) US11908481B2 (zh)
EP (1) EP3993430A4 (zh)
CN (1) CN111277864B (zh)
WO (1) WO2021164585A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660488A (zh) * 2021-10-18 2021-11-16 腾讯科技(深圳)有限公司 对多媒体数据进行流控及流控模型训练方法、以及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111277864B (zh) * 2020-02-18 2021-09-10 北京达佳互联信息技术有限公司 直播数据的编码方法、装置、流转系统及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110075758A1 (en) * 2008-05-06 2011-03-31 Electronics And Telecommunications Research Institute Apparatus for transmitting layered data
CN103533386A (zh) * 2013-10-21 2014-01-22 腾讯科技(深圳)有限公司 一种直播控制方法,及主播设备
CN105530449A (zh) * 2014-09-30 2016-04-27 阿里巴巴集团控股有限公司 编码参数调整方法及装置
CN106169998A (zh) * 2016-07-13 2016-11-30 腾讯科技(深圳)有限公司 媒体文件的处理方法和装置
CN108093257A (zh) * 2017-12-05 2018-05-29 北京小米移动软件有限公司 视频编码的码率控制方法、电子设备及存储介质
CN108769826A (zh) * 2018-06-22 2018-11-06 广州酷狗计算机科技有限公司 直播媒体流获取方法、装置、终端及存储介质
CN111277864A (zh) * 2020-02-18 2020-06-12 北京达佳互联信息技术有限公司 直播数据的编码方法、装置、流转系统及电子设备

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7885340B2 (en) * 1999-04-27 2011-02-08 Realnetworks, Inc. System and method for generating multiple synchronized encoded representations of media data
US7729386B2 (en) * 2002-09-04 2010-06-01 Tellabs Operations, Inc. Systems and methods for frame synchronization
CN102238179B (zh) * 2010-04-07 2014-12-10 苹果公司 实时或准实时流传输
US9183560B2 (en) * 2010-05-28 2015-11-10 Daniel H. Abelow Reality alternate
CN102307302B (zh) * 2011-07-06 2014-07-30 杭州华三通信技术有限公司 一种保持视频图像连续性的方法和装置
US9043201B2 (en) * 2012-01-03 2015-05-26 Google Technology Holdings LLC Method and apparatus for processing audio frames to transition between different codecs
US9106934B2 (en) * 2013-01-29 2015-08-11 Espial Group Inc. Distribution of adaptive bit rate live streaming video via hyper-text transfer protocol
CN103280222B (zh) * 2013-06-03 2014-08-06 腾讯科技(深圳)有限公司 音频编码、解码方法及其系统
CN107369453B (zh) * 2014-03-21 2021-04-20 华为技术有限公司 语音频码流的解码方法及装置
CN104837042B (zh) * 2015-05-06 2018-01-16 腾讯科技(深圳)有限公司 数字多媒体数据的编码方法和装置
CN107592540B (zh) * 2016-07-07 2020-02-11 腾讯科技(深圳)有限公司 一种视频数据处理方法及装置
US10291365B2 (en) * 2016-12-29 2019-05-14 X Development Llc Efficient automatic repeat request for free space optical communication
WO2018121775A1 (en) * 2016-12-30 2018-07-05 SZ DJI Technology Co., Ltd. System and methods for feedback-based data transmission
AU2018319228B2 (en) * 2017-08-22 2023-08-10 Dejero Labs Inc. System and method for assessing communication resources
CN109524015B (zh) * 2017-09-18 2022-04-15 杭州海康威视数字技术股份有限公司 音频编码方法、解码方法、装置及音频编解码系统
CN108093268B (zh) * 2017-12-29 2020-11-10 广州酷狗计算机科技有限公司 进行直播的方法和装置
WO2020037501A1 (zh) * 2018-08-21 2020-02-27 深圳市大疆创新科技有限公司 码率分配方法、码率控制方法、编码器和记录介质
CN109167965B (zh) * 2018-09-28 2020-12-04 视联动力信息技术股份有限公司 一种数据处理的方法和装置
CN109587510B (zh) * 2018-12-10 2021-11-02 广州虎牙科技有限公司 一种直播方法、装置、设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110075758A1 (en) * 2008-05-06 2011-03-31 Electronics And Telecommunications Research Institute Apparatus for transmitting layered data
CN103533386A (zh) * 2013-10-21 2014-01-22 腾讯科技(深圳)有限公司 一种直播控制方法,及主播设备
CN105530449A (zh) * 2014-09-30 2016-04-27 阿里巴巴集团控股有限公司 编码参数调整方法及装置
CN106169998A (zh) * 2016-07-13 2016-11-30 腾讯科技(深圳)有限公司 媒体文件的处理方法和装置
CN108093257A (zh) * 2017-12-05 2018-05-29 北京小米移动软件有限公司 视频编码的码率控制方法、电子设备及存储介质
CN108769826A (zh) * 2018-06-22 2018-11-06 广州酷狗计算机科技有限公司 直播媒体流获取方法、装置、终端及存储介质
CN111277864A (zh) * 2020-02-18 2020-06-12 北京达佳互联信息技术有限公司 直播数据的编码方法、装置、流转系统及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3993430A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660488A (zh) * 2021-10-18 2021-11-16 腾讯科技(深圳)有限公司 对多媒体数据进行流控及流控模型训练方法、以及装置
CN113660488B (zh) * 2021-10-18 2022-02-11 腾讯科技(深圳)有限公司 对多媒体数据进行流控及流控模型训练方法、以及装置

Also Published As

Publication number Publication date
EP3993430A4 (en) 2022-11-09
EP3993430A1 (en) 2022-05-04
US11908481B2 (en) 2024-02-20
US20220148603A1 (en) 2022-05-12
CN111277864A (zh) 2020-06-12
CN111277864B (zh) 2021-09-10

Similar Documents

Publication Publication Date Title
TWI502977B (zh) 影音播放裝置、影音處理裝置、系統以及方法
WO2018077083A1 (zh) 音频帧丢失恢复方法和装置
JP5989797B2 (ja) メディア出力の選択的ミラーリング
WO2021164585A1 (zh) 直播数据的编码方法及电子设备
CN107277423B (zh) 丢包重传的方法及装置
CN108932948B (zh) 音频数据处理方法、装置、计算机设备和计算机可读存储介质
US11202066B2 (en) Video data encoding and decoding method, device, and system, and storage medium
US10170127B2 (en) Method and apparatus for sending multimedia data
JP2012521718A5 (zh)
US11871075B2 (en) Audio playing and transmitting methods and apparatuses
WO2020108033A1 (zh) 转码方法、转码装置和计算机可读存储介质
JP2013533504A (ja) 選択的出力制御によってオーディオデータを復号するための方法とシステム
EP4007289A1 (en) Video uploading method and apparatus, electronic device, and storage medium
WO2020034780A1 (zh) 传输控制方法、装置、电子设备及存储介质
WO2023221527A1 (zh) 投屏方法、装置、终端设备及计算机可读存储介质
US20160277737A1 (en) Method and system to control bit rate in video encoding
CN109120929B (zh) 一种视频编码、解码方法、装置、电子设备及系统
CN107493478B (zh) 编码帧率设置方法及设备
CN112449208A (zh) 语音处理方法及其装置
US20140297720A1 (en) Client apparatus, server apparatus, multimedia redirection system, and method thereof
US20210400334A1 (en) Method and apparatus for loop-playing video content
TWI701922B (zh) 訊號處理裝置、及訊號處理方法、以及記錄程式之非暫時性電腦可讀取之記錄媒體
CN113422997A (zh) 一种播放音频数据的方法、装置及可读存储介质
CN111355996A (zh) 一种音频播放方法及计算设备
CN113261300B (zh) 音频发送、播放的方法及智能电视

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21757025

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 21757025.8

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021757025

Country of ref document: EP

Effective date: 20220127

NENP Non-entry into the national phase

Ref country code: DE