CN116980075A - Data encoding method, device, electronic equipment and storage medium - Google Patents

Data encoding method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116980075A
CN116980075A CN202310574455.3A CN202310574455A CN116980075A CN 116980075 A CN116980075 A CN 116980075A CN 202310574455 A CN202310574455 A CN 202310574455A CN 116980075 A CN116980075 A CN 116980075A
Authority
CN
China
Prior art keywords
data
coding
coding parameter
sample
data unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310574455.3A
Other languages
Chinese (zh)
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310574455.3A priority Critical patent/CN116980075A/en
Publication of CN116980075A publication Critical patent/CN116980075A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0009Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the channel coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a data encoding method, a data encoding device, electronic equipment and a storage medium. The method comprises the following steps: inputting a preset coding code rate value, a current data unit and respective packet receiving waiting time delays of a plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the first coding parameter and the second coding parameter are matched with transmission conditions of packet receiving waiting time delays of the plurality of data packets; encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data; a data packet comprising first encoded data and second encoded data is obtained. By the method, the coding parameters can be accurately distributed to the current data unit and the corresponding redundant data unit, so that the reliability of data transmission is improved.

Description

Data encoding method, device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data encoding method, apparatus, electronic device, and storage medium.
Background
Data codec plays an important role in modern communication systems. In the related art, in order to ensure the real-time performance of data transmission, when data transmission is performed, data to be transmitted is generally divided into small data units, and the data units are sequentially transmitted, and in order to avoid data packet loss in the transmission process, the data units are generally encoded and then packaged for transmission, and the data packet corresponding to the current data unit generally carries the information of the previous data unit, so that the information of the previous data unit can be obtained through the analysis of the information of the previous data unit carried in the data packet corresponding to the current data unit under the condition that the previous data unit is lost, thereby ensuring the reliability of data transmission.
In an exemplary voice data coding transmission scenario in a voice call application, a voice signal is acquired through a microphone, an analog voice signal is converted into a digital voice signal through an analog-to-digital conversion circuit, the digital signal is compressed through a voice encoder, and then is packaged and sent to a receiving end according to a communication network transmission format and a protocol, a receiving end device receives a data packet and unpacks the data packet to output a voice coding compressed code stream, a voice digital signal is regenerated through a voice decoder, and finally the voice digital signal is played through a loudspeaker. The voice encoding and decoding effectively reduces the bandwidth of voice signal transmission, and plays a decisive role in saving the cost of voice information storage and transmission and guaranteeing the voice information integrity in the transmission process of a communication network. The speech coder is a speech coding compression realized on the basis of a speech model, signal time-frequency domain compression and masking effect compression, such as a silk encoder and an opus encoder, and an anti-packet-loss in-band FEC (forward error correction forward error correction) module is built in the speech coder, the in-band FEC is used for resisting and recovering network packet loss by caching code stream information of a previous frame, and when the network packet loss occurs, data of a packet loss position can be recovered through the code stream information of the previous frame carried in a next frame speech coding code stream.
At present, when the above transmission mode is adopted for data coding transmission, bits cannot be reasonably allocated for a current data unit and a previous data unit in the coding process, so that the coding quality (such as audio coding quality) of the current data unit may be reduced or the data packet corresponding to the previous data unit is lost, so that the data quality obtained by recovering the data packet based on the current data unit is poor, and the problem of poor reliability of the data transmission based on the in-band FEC in the related art is caused.
Disclosure of Invention
In view of this, embodiments of the present application provide a data encoding method, apparatus, electronic device, and storage medium, which can improve the rationality of bit allocation for a current data unit and a previous data unit, thereby improving the data encoding quality.
In a first aspect, an embodiment of the present application provides a data encoding method, including: acquiring a current data unit of data to be transmitted and a packet receiving waiting time delay of each of a plurality of data packets counted by data receiving equipment, wherein the time length between the counting time of the packet receiving waiting time delay and the acquisition time of the current data unit is within a preset time length range, and the current data unit corresponds to the same data transmission channel with each data packet; inputting the preset coding code rate value, the current data unit and the respective packet receiving waiting time delay of the plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the first coding parameter and the second coding parameter are matched with the transmission condition of the packet receiving waiting time delay reaction of the plurality of data packets; the coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are obtained by selecting a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data packets received by a data receiving device, the sample data packets received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to encode the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of the corresponding sample data unit; encoding the current data unit based on the first encoding parameter to obtain first encoded data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoded data; and packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit. .
In a second aspect, an embodiment of the present application provides a data encoding apparatus, including: the device comprises a first data acquisition module, a bit determination module, an encoding module and a packaging module. The first data acquisition module is used for acquiring a current data unit of data to be transmitted and respective packet receiving waiting time delays of a plurality of data packets counted by the data receiving equipment, wherein the duration between the counting time of the packet receiving waiting time delays and the acquisition time of the current data unit is within a preset duration range, and the current data unit corresponds to the same data transmission channel with each data packet; the bit determining module is used for inputting the preset coding code rate value, the current data unit and the respective packet receiving waiting time delay of the plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the first coding parameter and the second coding parameter are matched with the transmission condition of the packet receiving waiting time delay reaction of the plurality of data packets; the coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are obtained by selecting a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data packets received by a data receiving device, the sample data packets received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to encode the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of the corresponding sample data unit; the encoding module is used for encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data; and the packaging module is used for packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit.
In an embodiment, the device further includes a second data acquisition module, a combination mode determining module, a sample constructing module and a model training module, where the second data acquisition module is configured to acquire multiple coding combination modes corresponding to the initial data and packet loss waiting delays and target data of multiple data packets corresponding to each coding combination mode, where the packet loss delay corresponding to each coding combination mode is obtained by performing corresponding coding on the initial data by using the coding combination mode and then simulating packet loss transmission; the combination mode determining module is used for determining one coding parameter combination mode from the plurality of coding parameter combination modes as a target coding parameter combination mode of the initial data based on the initial data and the target data corresponding to each coding combination mode; the sample construction module is used for constructing training samples based on the target coding parameter combination mode and the packet receiving waiting time delay corresponding to each of a plurality of data packets corresponding to the target coding parameter combination mode; and the model training module is used for training the coding parameter distribution model based on a plurality of training samples so as to minimize the model parameters of the bit model and obtain a trained coding parameter distribution model.
In an embodiment, the coding parameter is a coding rate, the initial data is audio data, the data unit is an audio frame, and the combination mode determining module includes a mean value calculating sub-module, a first combination mode selecting sub-module, a quality evaluating sub-module, and a second combination mode selecting sub-module. The average value calculation sub-module is used for calculating the average value of the coding code rate corresponding to each coding parameter combination mode; the first combination mode selecting submodule is used for selecting a coding parameter combination mode with the average value of the coding rate smaller than a coding rate set threshold value from a plurality of coding parameter combination modes as a candidate coding parameter combination mode; the quality evaluation sub-module is used for carrying out objective quality evaluation on the initial data and the target data corresponding to each candidate coding parameter combination mode to obtain an evaluation result of the target data corresponding to each candidate coding parameter combination mode; the second combination mode selecting sub-module is used for selecting the target coding parameter combination mode from the candidate coding parameter combination modes based on the evaluation result of the target data corresponding to each candidate coding parameter combination mode and the coding rate average value corresponding to each candidate coding parameter combination mode.
In one embodiment, the second data acquisition module includes a combination acquisition sub-module, a coding sub-module, a data transmission sub-module, and a data reception sub-module. The method comprises the steps of obtaining a sub-module in a combination mode, wherein the sub-module is used for obtaining a plurality of coding parameter combination modes for coding initial data according to N coding parameters of each data unit in the initial data and M coding parameters of a redundancy unit corresponding to the data unit, wherein N and M are integers which are larger than or equal to 1 respectively; the coding sub-module is used for coding the data units in the initial data according to each coding parameter combination mode to obtain a data packet in the coding parameter combination mode; the data transmitting sub-module is used for transmitting the data packet in each coding parameter combination mode to the data receiving equipment; the data receiving sub-module is used for receiving packet receiving waiting time delay corresponding to each coding parameter combination mode fed back by the data receiving equipment; the data receiving equipment simulates packet loss and counts packet receiving waiting time delay in the process of receiving the data packets corresponding to the coding parameter combination modes; the data receiving sub-module is also used for receiving target data corresponding to the combination modes of the coding parameters fed back by the data receiving equipment.
In one embodiment, the coding combination mode obtaining submodule is further configured to obtain, according to N coding parameters for each data unit and M coding parameters for a redundancy unit corresponding to the data unit in the initial data, multiple coding combination modes for each data unit and the redundancy unit of the data unit; and obtaining a plurality of coding parameter combination modes for coding the initial data based on each data unit and the coding combination modes of the redundant units corresponding to the data unit.
In an embodiment, the initial data is initial audio data, the data unit is an audio frame, the coding parameters include a coding rate, and the coding combination mode obtaining submodule is further used for performing classification detection on each audio frame in the initial audio data to obtain an audio category of each audio frame; and obtaining a plurality of coding parameter combination modes for coding the initial audio data according to the preset code rate range corresponding to each audio type, the audio type of each audio frame, N coding code rates of each audio frame in the initial audio data and M coding code rates of redundant audio frames corresponding to the audio frames.
In one embodiment, the sample data is an audio sample, the data unit is an audio frame, and the bit determination module includes a feature extraction sub-module and a bit determination sub-module. The feature extraction submodule is used for carrying out feature extraction on the current data unit to obtain an audio feature of the current data unit, wherein the audio feature comprises at least one of a power spectrum feature and a Mel spectrum feature; the bit determining sub-module is used for inputting the preset coding code rate value, the audio characteristics of the current data unit and the packet receiving waiting time delay of each data packet into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit.
In an embodiment, the apparatus further includes a data sending module, where the data sending module is configured to send, to the data receiving device, a data packet corresponding to the current data unit, so that the data receiving device decodes, when it is confirmed that a previous data unit of the current data unit is lost, second encoded data in the data packet corresponding to the current data unit, and decodes, when it is confirmed that the previous data unit of the current data unit is not lost, first encoded data in the data packet corresponding to the current data unit.
In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, wherein the program code, when executed by a processor, performs the method described above.
In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device obtains the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the method described above.
The embodiment of the application provides a data coding method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: after acquiring a current data unit of data to be transmitted and respective packet receiving waiting time delays of a plurality of data packets counted by data receiving equipment, inputting the preset coding code rate value, the current data unit and the respective packet receiving waiting time delays of the plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the coding parameter distribution model is obtained by training an initial model based on data samples, a sample label of the data samples comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are obtained by selecting a plurality of coding parameter combination modes based on the data samples and a plurality of target data, the target data is data obtained by decoding the sample data packets received by the data receiving equipment, and the sample data packets received by the data receiving equipment are data packets obtained by adopting a coding parameter combination mode and simulating packet loss, and each coding parameter combination mode comprises the first sample data unit and the second sample data corresponding to each sample data unit in the sample data. Therefore, the first coding parameters and the second coding parameters obtained by using the coding parameter distribution model are matched with the transmission conditions of the packet receiving waiting time delay reaction of a plurality of data packets, so that the coding parameters are accurately distributed for the current data unit and the corresponding redundant data unit according to the packet receiving waiting time delay and the preset coding code rate value, and the reliability of data transmission is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows an application scenario diagram of a data encoding method provided by an embodiment of the present application;
fig. 2 is a schematic flow chart of a data encoding method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a data encoding method according to an embodiment of the present application;
fig. 4 shows a schematic flow chart of step S210 in fig. 3;
fig. 5 shows an application scenario schematic diagram of a data encoding method according to an embodiment of the present application;
fig. 6 is a schematic diagram of another application scenario of a data encoding method according to an embodiment of the present application;
fig. 7 is a schematic diagram of another application scenario of a data encoding method according to an embodiment of the present application;
FIG. 8 is a block diagram illustrating a data encoding method according to an embodiment of the present application;
FIG. 9 is another flow chart of a data encoding method according to an embodiment of the present application;
fig. 10 shows a connection block diagram of a data encoding apparatus according to an embodiment of the present application;
fig. 11 shows a block diagram of an electronic device for performing the method of an embodiment of the application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be noted that: references herein to "a plurality" means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
With the development of communication technologies, data transmission, such as audio data transmission, video data transmission or text data transmission, is commonly used in the present, and in order to ensure the reliability and real-time performance of information transmission, the commonly used data transmission method has an in-band FEC (forward error correction) data transmission method, where the FEC (Forward Error Correction) transmission method refers to the capability of ensuring that the communication system can still realize error-free transmission under the influence of noise and other impairments.
Taking voice data transmission as an example, in-band FEC of an existing voice encoder recodes voice coding features of a previous frame, and packages and sends the recoded voice coding features and coded code stream of the previous frame to a receiving end for decoding, and main differences between an in-band FEC scheme of the voice encoder and a traditional out-of-band FEC scheme are as follows: out-of-band FEC is implemented independently of the encoder, and single or multiple speech encoder code streams are used for FEC encoding, where the FEC code stream and the speech encoding code stream may be two different data streams, and there is no constraint relationship between them, and no bandwidth competition problem, so the FEC code stream size is not limited, and the frame speech code stream size only depends on a preset encoding code rate value. The in-band FEC code stream is mixed with the speech coding code stream of the present frame, and the size of the whole code stream determines the actual code rate of the final output code stream, so that the in-band FEC code stream is controlled by a preset code rate value, that is, if the FEC code rate is higher, more bits are allocated, the bits reserved for speech coding become smaller, and conversely, if the FEC allocation bits are fewer, more bits are reserved for speech coding. In-band FEC schemes of existing speech coders, because the sum of the number of encoded bits of FEC and the number of encoded bits of the current speech frame is limited by a preset encoding rate, under a preset encoding rate, the number of encoded bits of FEC and the number of encoded bits of the current frame have a competitive relationship, i.e., when the number of encoded bits of FEC is high, the number of encoded bits of the current frame speech becomes small, so that the speech quality after encoding and decoding is reduced, which is disadvantageous to the overall call experience. Experiments show that under the condition that the encoding code rate setting value of the same audio signal is the same, under the condition that the in-band FEC is closed and the in-band FEC is opened, the objective quality MOS scoring value of the PESQ is compared, the in-band FEC is closed for 3.9 minutes, the in-band FEC is opened for only 3.0 minutes, MOS is reduced by 0.9 because of FEC, and the audio quality is obviously reduced because the in-band FEC occupies the bit number of audio codes after the in-band FEC is opened.
The invention finds that the actual code stream size of the current in-band FEC is related to the packet loss rate fed back by the receiving end, for example, an opus encoder, and if the packet loss rate fed back by the receiving end is high, the prediction gain of the prediction filter of the FEC of the opus encoder is reduced, the entropy of the quantization index is increased, and therefore the coding bit number of the FEC is increased. Conversely, if the packet loss rate is low, the prediction gain of the prediction filter of the FEC of the opus encoder increases, decreasing the entropy of the quantization index, thereby decreasing the number of encoded bits of the FEC. I.e., the number of FEC redundancy bits in-band of the opus encoder increases as the receiving end packet loss rate increases. However, the in-band FEC is configured only according to the packet loss rate fed back by the receiving end, where the packet loss rate is a statistic of the percentage of packet loss in a short period of time, and the packet loss rate is usually very small and is a delayed detection result, and the packet loss rate can only reflect the packet receiving condition of the receiving end in a period of time before the current moment, and there is no necessary causal relationship between the packet loss rate and the current sending frame, so that the size setting of the in-band FEC in the related art is not accurate enough.
An exemplary application for executing the data encoding method provided by the embodiment of the present invention is described below, and the data encoding method provided by the embodiment of the present invention may be applied to a server in an application environment as shown in fig. 1.
Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present application, and as shown in fig. 1, the application scenario includes a data transmitting end 10 and a data receiving end 20 communicatively connected to the data transmitting end 10 through a network.
The data transmitting end 10 and the data receiving end 20 may be a server, a mobile phone, a computer, a desktop computer, a tablet computer, a vehicle-mounted terminal, an intelligent television or the like, and the data transmitting end 10 and the data receiving end 20 may be provided with a client for displaying data, such as a content interaction client, an instant messaging client, an audio playing client, a video playing client or the like.
The network may be a wide area network or a local area network, or a combination of both.
Fig. 1 shows a schematic diagram in which a data transmitting end 10 and a data receiving end 20 are respectively installed with a data interaction client, the data transmitting end 10 transmits audio data to the data receiving end 20, and a client of the data transmitting end 10 is identified as a client of a user a data receiving end and a client of the data receiving end is identified as a user B, wherein when the user a of the data transmitting end 10 transmits the audio data to the data transmitting end 20, the data transmitting end 10 encodes the audio data acquired by the user B and transmits the encoded audio data to the data receiving end 20, and a specific encoding process can refer to the following steps: the data receiving end 20 may count the packet receiving waiting time delay when receiving the data packet sent by the data sending end 10 and feed back the packet receiving waiting time delay to the data sending end 10, the data sending end 10 may acquire a current data unit (current audio frame) of data to be transmitted and the packet receiving waiting time delay of each of the plurality of data packets counted by the data receiving device 20, and input a preset coding code rate value, the current data unit and the packet receiving waiting time delay of each of the plurality of data packets to a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, where the redundant data unit is used for recovering a data unit coded before the current data unit; encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data; and finally, packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit, and finally, sending the data packet corresponding to the current data unit to the data receiving terminal 20, so that the data receiving terminal 20 counts the packet receiving waiting time delay of the data packet again and feeds back the data sending terminal 10 when receiving the data packet, and the data sending terminal 10 executes the steps of determining the bit number, coding and transmitting the data according to the bit number again, thereby realizing the data transmission of the data to be transmitted.
The coding method provided by the application can be realized based on artificial intelligence. Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Taking the application of artificial intelligence in machine learning as an example for illustration:
among them, machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. The scheme of the application mainly utilizes machine learning to code data and code the data.
It should be understood that the above-described method steps, i.e. the application scenarios of the above-described method steps, are only illustrative and not limiting of the present solution.
To facilitate understanding of the present solution, the following concepts will be briefly described:
the coding parameters, which are parameters for data coding, determine the coding quality, and in general, the coding parameters may be bit rates, bit numbers, or other coding parameters, and in general, the larger the coding parameter values (i.e., the larger the values of the bit rates and bit numbers), the better the coding quality is indicated. Bit rate refers to the number of bits transmitted per unit time; the number of bits is defined in the present application as how many bits are needed to transmit the current data unit or the redundant data packet after encoding (compressing). Wherein, BITs (BIT) are the information volume units and are transliterated from English BIT. And the bit in the binary digit is also the measurement unit of the information quantity, which is the minimum unit of the information quantity.
Redundant data unit refers to part or all of the data in at least one data unit transmitted either immediately prior to the current data unit or immediately prior to the current data unit.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Referring to fig. 2, fig. 2 shows a data encoding method provided by the present application, which may be applied to an electronic device, where the electronic device may be the data transmitting end, and the method includes:
step S110: and acquiring the current data unit of the data to be transmitted and the respective packet receiving waiting time delay of a plurality of data packets counted by the data receiving equipment.
The time length between the statistical time of the packet receiving waiting time delay and the acquisition time of the current data unit is within a preset time length range, and the current data unit corresponds to the same data transmission channel with each data packet.
The data to be transmitted can be video data, audio data or text data, and the like, and correspondingly, the data transmitting end can be specifically applied to the live broadcast scene, the video or voice interaction scene, and other real-time interaction scenes, or the data transmission is performed in the scene of online data transmission.
The waiting time delay of receiving the packets can be that each data packet is marked with a time stamp when the data packet is transmitted by the data transmitting end, and when the data packet is received by the data receiving end, the time of the time stamp is subtracted by the current time, and the difference value is the waiting time delay of receiving the packets. Or the data receiving end marks the time stamp on each data packet when receiving the data packet, and the waiting time delay of the received data packet is obtained according to the difference value between the time stamp of the current data packet received by the data receiving end and the time stamp of the last data packet. It should be understood that the above definition of the packet-receiving waiting delay may be different according to different actual situations, and the embodiment of the present application is not specifically limited.
The longer the waiting time for receiving packets, the worse the network quality, the greater the possibility of packet loss. And the shorter the waiting time delay of receiving the packet, the better the network quality is, and the less the possibility of packet loss is.
The preset time length can be 0.5 seconds, 0.8 seconds, 1 second or 1.2 seconds, etc., and the preset time length is set according to actual requirements.
The data packet received by the receiving device may be a data packet obtained by encoding a data unit in the data to be transmitted by the data sending device, or may be a data packet using the same data transmission channel as the current data unit, where the same data transmission channel may be the same data packet and the data sending device and the receiving device of the current data unit.
Step S120: inputting a preset coding code rate value, a current data unit and respective packet receiving waiting time delay of a plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit.
The first coding parameter and the second coding parameter are matched with transmission conditions of packet receiving waiting time delay reaction of a plurality of data packets; the coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a packet receiving waiting time delay corresponding to a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are selected from a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data received by a data receiving device, the sample data received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to code the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of a redundant sample data unit corresponding to each sample data unit.
The redundant data unit is used to recover the data unit encoded before the current data unit.
The first coding parameter and the second coding parameter have correlation with a preset coding rate value. And the preset coding rate is used for representing the actual code rate in the process of carrying out in-band FEC data transmission.
Accordingly, the combination of the values of the first sample coding parameter and the second sample coding parameter corresponding to the same data unit can be used for representing the actual code rate adopted in the transmission process of the sample data packet (the actual code rate can be used for representing the preset sample coding code rate participating in model training). That is, the sample data may further include a preset sample coding rate. And when the coding parameter is a coding rate, the sum of the first sample coding parameter and the second sample coding parameter can be used for representing the preset sample coding rate.
The sum of the first sample coding parameter and the second sample coding parameter of each data unit corresponding to the same initial data may be a constant value. For example, if the coding parameter is a coding rate, the sum of the first sample coding rate and the second sample coding rate corresponding to the same data unit represents the actual coding rate in the process of transmitting the sample data packet.
The pre-trained encoding parameter distribution model can be a neural network model, such as a recurrent neural network, a recurrent neural network or a regression model.
Step S130: and encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data.
Wherein the coding parameters are parameters for data coding, and the coding parameters determine the data coding quality. And the types of the coding parameters corresponding to different data to be transmitted are different, for example, when the data to be transmitted is audio data or video data, the coding parameters can be code rate, bit number or sampling rate; when the data to be transmitted is text data, the encoding parameter may be a code rate or a number of bits.
Specifically, taking the data to be transmitted as audio data as an example, the coding parameters may be a code rate or a sampling rate of speech coding, or other coding parameters. The sampling rate is the sampling precision used to sample the analog voice signal to obtain the digital voice signal, and the code rate is the data amount of the coded code stream transmitted per second. It will be appreciated that the higher the sampling rate, the more realistically the sound details in the original sound signal can be preserved, and the higher the speech quality of the decoded speech signal obtained after decoding. Similarly, the higher the bit number and code rate, the more speech details in the digital speech signal can be preserved, and the higher the speech quality of the decoded speech signal obtained after decoding. Accordingly, the first encoding parameter and the second encoding parameter may be different sampling rates, or may be different code rates or bit numbers.
Step S140: and packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit.
By adopting the method, the first coding parameter of the current data unit and the second coding parameter of the redundant data unit corresponding to the current data unit are obtained by acquiring the packet receiving waiting time delay corresponding to a plurality of data packets with the same transmission channel in the preset time period before the current data unit is transmitted, and inputting the preset coding code value, the current data unit and the packet receiving time delay corresponding to the plurality of data packets into the pre-trained coding parameter distribution model, the quality condition of the network is fully considered to be generally related to the packet receiving time delay and the probability of the current packet loss, the longer the packet receiving time delay is, the corresponding larger the packet loss probability is, the shorter the packet receiving time delay is, the corresponding packet loss probability is also small, and therefore, the first coding parameter of the current data unit and the corresponding redundant data unit can be accurately distributed with the coding parameter through the packet receiving time delay and the preset coding code value, the problem that the packet loss rate is generally determined by adopting the coding parameter in the correlation technology is avoided, the packet loss rate is not normally the same as the current time delay, the current time delay is not a cause of the error rate is not a stable, and the error rate is not necessarily the current time delay is not a delay value, and the error rate is not a problem is not necessarily solved, and the error rate is not has a stable, and the problem is due to whether the packet loss is has been detected. In addition, since the pre-trained coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit, a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample has a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are selected from a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data packets received by the data receiving device, the sample data packets received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to encode the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of the corresponding sample data unit. Therefore, the first coding parameters and the second coding parameters obtained by using the coding parameter distribution model are matched with the transmission conditions of the packet receiving waiting time delay reaction of a plurality of data packets, so that the coding parameters are accurately distributed for the current data unit and the corresponding redundant data unit according to the packet receiving waiting time delay and the preset coding code rate value, and the reliability of data transmission is improved.
Referring to fig. 3, another embodiment of the present application provides a data encoding method, which includes:
step S210: and acquiring a plurality of coding combination modes corresponding to the initial data and packet loss waiting time delay and target data of a plurality of data packets corresponding to each coding combination mode.
The packet loss time delay corresponding to each coding combination mode is obtained by adopting the coding combination mode to correspondingly code the initial data and then simulating packet loss transmission.
The initial data refers to any audio/video data which can simulate packet loss.
The various coding combinations corresponding to the initial data obtained in step S210 may be: and obtaining a plurality of coding parameter combination modes for coding the initial data according to N coding parameters of each data unit in the initial data and M coding parameters of the redundancy unit corresponding to the data unit, wherein N and M are integers greater than or equal to 1 respectively.
It should be understood that the number of the initial data is a plurality of, and the number of the data units corresponding to each initial data may be the same or different. In one embodiment of the present application, if the number of data units corresponding to the initial data is different, the collected batch data is first divided according to a fixed window length, where the fixed window length may be 3 seconds, 5 seconds, 8 seconds, or 10 seconds, so as to obtain a plurality of initial data having the same data unit.
In one embodiment, after the initial data is obtained, the coding modes can be combined according to N coding parameters of each data unit in the initial data and M coding parameters of the redundancy unit corresponding to the data unit, so as to obtain multiple coding combination modes of each data unit and the redundancy unit of the data unit; and obtaining a plurality of coding parameter combination modes for coding the initial data based on each data unit and the coding combination modes of the redundant units corresponding to the data units.
For example, the initial data is initial audio data, and the data unit is an audio frame, and step S210 may specifically be a combination of multiple coding rates for coding the audio data according to the coding parameters of each audio frame in the initial audio data and the M coding parameters of the redundant frame corresponding to the audio frame, where the selection range of the coding parameters (coding rates) corresponding to each of the audio frame and the redundant frame should be within a controllable number of selection ranges, and the types of the selection modes of the coding parameters (coding rates) corresponding to different audio frames may be the same or different.
For example, the coding parameters (coding rate) corresponding to different audio frames may have N selection modes, and the coding parameters (coding rate) of the redundant frames corresponding to each audio frame may have M selection modes, that is, if the sum of the coding parameters (coding rate) of the current audio frame and the coding parameters (coding rate) of the corresponding redundant frame is usually a fixed value in the in-band FEC coding mode, when one audio frame is coded, there may be a combination selection of n+m coding rates, and when any audio with L frames is coded, there is at most L (N+M) A code rate combination mode.
Illustratively, in this embodiment, the initial audio data generated for one window (400 frames of signal) in 8 seconds, assuming that the speech in-band FEC encoder has K code rates selectable, k=n+m. There are K code rate configurations possible for all frames, and then there are 400K frame rate configurations in total (code rate combination).
It should be understood that if the types of coding rates corresponding to different audio frames are different, or the types of coding rates of redundant audio frames corresponding to different audio frames are different, the coding modes of different audio frames may be combined to obtain a final coding combination.
In another embodiment, the plurality of coding combinations corresponding to the initial data obtained in the step S210 may be: and carrying out classification detection on each data unit in the initial data to obtain a class corresponding to each data unit, and obtaining a plurality of coding parameter combination modes for coding the initial data according to a preset code rate range corresponding to each class, the class corresponding to each data unit, M coding parameters corresponding to each data unit and N coding parameters of a redundancy unit corresponding to the data unit.
In this manner, for each data unit, M1 first candidate coding parameters may be selected from M coding parameters corresponding to the data unit according to the type of the data unit and a preset code rate range corresponding to the type of the data unit, where the value of the first candidate coding parameter is within the preset code rate range corresponding to the type of the data unit, and N1 second candidate coding parameters may be selected from N coding parameters corresponding to the type of the redundancy unit according to the type of the redundancy unit and the preset code rate range corresponding to the type of the redundancy unit, where the value of the second candidate coding parameter is within the preset code rate range corresponding to the type of the redundancy unit. And obtaining a plurality of coding combination modes for coding the initial data according to the N1 first candidate coding parameters corresponding to each data unit and the second candidate coding parameters in the N2 of the redundant unit corresponding to the data unit.
It should be appreciated that in such an embodiment, the types of the first candidate coding parameters corresponding to different audio frames may be different, and the types of the second candidate coding parameters of the redundant frames corresponding to different audio frames may be different.
Taking initial data as initial audio data and data units as audio frames as examples, classifying and detecting each audio frame in the initial audio data to obtain an audio class of each audio frame; and obtaining a plurality of coding rate combination modes for coding the audio data according to the preset code rate range corresponding to each audio type, the audio type of each audio frame, N coding rates of each audio frame in the initial audio data and M coding rates of the redundant audio frames corresponding to the audio frames.
Specifically, the above-mentioned classification detection manner may have different selection manners according to different initial data, for example, if the initial data is video data, frame type detection may be performed on the video data to obtain frame types, such as I frames, P frames and B frames, where the value ranges of the coding rates corresponding to different frame types may be different. For example, if the initial data is audio data, the audio frames may be classified by vad detection, an audio signal classifier, or the like, for example, the audio input may be classified into speech and non-speech by vad detection, or the input signal may be classified into speech, music, noise, silence, or the like by another classifier (audio signal classifier), and the like, and the selectable coding rate range may be set according to the different classes, for example, non-speech, noise, silence, or the like, and the maximum value of the coding rate of the class may be defined as a smaller coding rate, and the music class may be defined as a larger coding rate because the class of information is larger and the higher coding rate value is generally allocated, so that the maximum value and the minimum value of the coding rate of the class are limited to be set as a larger coding rate. The number of code rate combinations of configurable code rates in the latter window may be reduced by limiting the range of type code rates based on signal classification.
The method for obtaining the packet loss waiting time delay and the target data of the plurality of data packets corresponding to each coding combination mode specifically can be as follows:
and aiming at each coding parameter combination mode, coding the data unit in the initial data according to the coding parameter combination mode to obtain the data packet under the coding parameter combination mode. And sending the data packet in each coding parameter combination mode to the data receiving equipment. Receiving packet receiving waiting time delay corresponding to each coding parameter combination mode fed back by the data receiving equipment; the data receiving equipment simulates packet loss and counts packet receiving waiting time delay in the process of receiving the data packets corresponding to the coding parameter combination modes; and receiving target data corresponding to the combination modes of the coding parameters fed back by the data receiving equipment.
It should be understood that the above-mentioned analog packet loss process may be performed by an electronic device (a device that transmits a data packet), or may be performed by a router, or may be performed by an electronic device (a device that transmits a data packet) and a data receiving device together, or the receiving device may simply feed back the data packet received by the receiving device to the electronic device and decode the data packet by the electronic device to obtain the target data.
Taking initial data as initial audio data as an example, by adopting the steps, each audio frame in the initial audio data can be encoded according to each encoding rate combination mode and then sent to data receiving equipment, the analog packet loss processing is carried out in the sending process, and packet receiving waiting time delays corresponding to a plurality of audio data packets corresponding to each encoding combination mode and degradation audio data corresponding to each encoding combination mode are counted by the data receiving equipment. Specifically, when the packet loss is simulated, the method may encode each audio frame and the corresponding redundant frame in the initial audio data according to the corresponding encoding parameters, then send the encoded initial encoded data to the data receiving device and simulate the packet loss, when the data receiving device receives the data packet corresponding to each audio, the data receiving device may count the packet waiting time delay corresponding to each of the plurality of audio data packets corresponding to each encoding combination mode, and after receiving the audio data packet, decode the data packet to obtain degraded audio data (i.e., target data) corresponding to each encoding combination mode, and to simulate the actual packet loss scene as much as possible, at least one of the router packet loss, the access device (the data receiving device, the data sending device, etc.) may be adopted to simulate the packet loss, and record the packet waiting time delay value of the data receiving device.
Step S220: based on the initial data and the target data corresponding to each coding combination mode, one coding parameter combination mode is determined from a plurality of coding parameter combination modes as a target coding parameter combination mode of the initial data.
The step S220 may be: and performing similarity calculation on the target data corresponding to each coding combination mode and the initial data to obtain the similarity corresponding to each coding parameter combination mode, and determining the target coding parameter combination mode from a plurality of coding combination modes based on the similarity corresponding to each coding parameter combination mode. And determining the coding parameter combination mode corresponding to the maximum similarity as a target coding parameter combination mode.
If the initial data is initial audio data. If the coding parameters are coding rates, the step S220 may also be to calculate the coding rate corresponding to each coding combination mode, select, from a plurality of coding rate combination modes, the coding rate combination mode with the coding rate satisfying the preset coding condition as the candidate coding parameter combination mode according to the coding rate corresponding to each coding combination mode, obtain the encoded initial audio data obtained by encoding according to each candidate coding combination mode, decode the encoded initial audio data to obtain degraded audio data, perform quality assessment according to the degraded audio data (decoded initial audio data) and the initial audio data, and obtain the target coding rate combination mode according to the quality assessment result.
It should be understood that the determination manner of the superscript encoding rate combination may also have other manners, and the specific selection manner may be selected according to the foregoing manner of obtaining multiple encoding rate combinations.
Referring to fig. 4, in an embodiment of the present application, the obtaining the packet loss waiting delay of the plurality of data packets corresponding to each coding combination mode may specifically include:
step S221: and calculating the average value of the coding rate corresponding to each coding parameter combination mode.
Step S222: and selecting the coding parameter combination mode with the average value of the coding rate smaller than the set threshold value of the coding rate from the multiple coding parameter combination modes as a candidate coding parameter combination mode.
The preset code rate value can be set based on experience, or can be set based on a code rate limit range in actual transmission, and can be set according to actual requirements.
Step S223: objective quality evaluation is carried out on the initial data and the target data corresponding to each candidate coding parameter combination mode, and an evaluation result of the target data corresponding to each candidate coding parameter combination mode is obtained.
In step S224, specifically, when the initial data is the initial audio data, PESQ or POLQA objective quality evaluation is performed on the initial audio data and the degraded audio data (target data) corresponding to each candidate coding parameter combination mode, so as to obtain an evaluation result of the degraded audio data corresponding to each candidate coding parameter combination mode. It should be understood that the above manner of performing objective quality assessment is merely illustrative, and that there may be more ways of assessing, and the present embodiment is not particularly limited.
Step S224: and selecting a target coding parameter combination mode from the candidate coding parameter combination modes based on the evaluation result of the target data corresponding to each candidate coding parameter combination mode and the average value of the coding code rate corresponding to each candidate coding parameter combination mode.
By selecting the coding rate combination mode of which the coding rate average value is smaller than the coding rate set threshold value as the candidate coding parameter combination mode, excessive calculation resources are wasted in the process of performing objective quality evaluation of the PESQ or POLQA on the degradation audio signal corresponding to the initial audio signal and each candidate coding parameter combination mode, and therefore efficiency in the process of providing the highest score of customer quality evaluation and configuring the corresponding coding rate combination for the optimal coding rate by the average code rate value based on the evaluation result is improved.
Step S230: and constructing a training sample based on the target coding parameter combination mode and the packet receiving waiting time delay corresponding to each of the plurality of data packets corresponding to the target coding parameter combination mode.
The method includes selecting a sample data unit from a target coding parameter combination mode as a corresponding coding combination, taking the selected sample data unit as a current sample data unit, constructing a training sample based on respective first coding parameters and second coding parameters corresponding to a plurality of sample data packets before the current sample data unit, so that the training sample comprises the current sample data unit, respective packet receiving waiting time delays corresponding to a plurality of training sample data packets corresponding to a plurality of historical sample data units, the coding position of the historical sample data unit in the training sample is positioned before the current sample data unit, the time length between the statistical time of the packet receiving waiting time delays of the historical sample data units and the acquisition time of the current sample data unit is within a preset time length range, taking the coding parameters corresponding to the current sample data unit as first sample coding parameters, and taking the coding parameters of the redundant sample data unit corresponding to the current sample data unit as second sample coding parameters.
Specifically, taking a training sample as an audio sample, the audio sample comprises a current audio sample frame and a packet receiving waiting time delay corresponding to a plurality of audio sample data packets corresponding to a plurality of historical audio frames respectively, the encoding position of the historical audio frame in the audio sample is positioned before the current audio sample frame, and the time length between the statistic time of the packet receiving waiting time delay of each historical audio frame and the acquisition time of the current audio sample frame is within a preset time length range, and the sample label comprises a first encoding code rate corresponding to the current audio sample frame and a second encoding code rate of a redundant sample frame corresponding to the current audio sample frame.
When the audio samples are constructed based on the target coding rate combination mode, the initial audio data and the packet waiting delay corresponding to each of the plurality of audio data packets corresponding to the plurality of audio frames included in the initial audio data, one initial audio data may correspondingly construct one or more audio samples, and the number of sample audio data packets in each audio sample may be the same. When one initial audio data corresponds to a plurality of audio samples, the current audio frame in the plurality of audio samples is different in position in the initial audio data.
It should be understood that when the data unit is video data or other data transmitted in a data stream manner, the data sample may also be constructed in a similar manner as described above, which is not described in detail in this embodiment.
Step S240: and training the coding parameter distribution model based on a plurality of training samples to minimize model parameters of the bit model and obtain a trained coding parameter distribution model.
The coding parameter allocation model may be one or more units selected from CONV (convolutional block), LSTM (Long-short-term memory artificial neural network, a time-cyclic neural network), RNN (cyclic neural network), a recursive neural network that takes sequence data as input, recursively (recurrence) in the evolution direction of the sequence and all nodes (cyclic units) are chained, and GRU (cyclic neural network, which is the same as LSTM (Long-Short Term Memory), and proposed for solving the problems of gradients in Long-term memory and counter-propagation), DENSE (extracting the correlation between the features by nonlinear change of the DENSE), and the like.
When the coding parameter distribution model is trained based on a plurality of sample data, the characteristic extraction can be carried out on the current data sample unit and the data packets corresponding to a plurality of historical data units in the sample data respectively for packet receiving waiting time delay, coding parameter distribution prediction is carried out on the current data sample unit and the corresponding redundant data sample unit based on the extracted characteristic, coding parameter distribution prediction results of the current data sample unit and the redundant data sample unit are obtained, loss calculation is carried out on the first coding code rate corresponding to the current sample data unit and the second coding code rate corresponding to the corresponding redundant sample data unit in the sample label corresponding to the current data sample unit based on the coding parameter distribution prediction results of the current data sample unit and the redundant data sample unit respectively, model loss is obtained, and model parameters of the coding parameter distribution model are adjusted based on the model loss so as to minimize model loss, and the trained coding parameter distribution model is obtained.
By adopting the steps S210-S220, a combination that the filtering part does not meet the requirement according to the preset requirement of the coding code rate value (i.e. average coding code rate) can be realized, an optimal search method is adopted for the remaining code rate combinations meeting the requirement, an optimal coding bit allocation combination (target coding code rate combination) is extracted through an audio objective quality assessment tool such as PESQ or POLQA, and the result is used for training a deep learning bit allocation network, and the trained network is used for bit allocation control of FEC in the actual coding band. In addition, in the feature extraction process, since the sample data comprises the current sample data unit, in the feature extraction process, the extracted features comprise the features of the current data unit, so that in the training process, more reasonable coding parameters can be fully determined to be allocated to the current data unit according to the features of the current data unit, for example, more coding code rates are allocated, and the situation that if the information quantity of a packet loss frame (a redundant data unit corresponding to the current data unit) is not large, the influence of the packet loss frame on the tone quality after decoding of a receiving end is not large, in-band FEC is performed to protect the packet loss frame, and then the current data unit is sacrificed is avoided.
Step S250: and acquiring the current data unit of the data to be transmitted and the respective packet receiving waiting time delay of a plurality of data packets counted by the data receiving equipment.
The time length between the statistical time of the packet receiving waiting time and the acquisition time of the current data unit is within the preset time length range, and the current data unit corresponds to the same data transmission channel with each data packet.
Step S260: inputting a preset coding code rate value, a current data unit and respective packet receiving waiting time delay of a plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit.
In one embodiment, if the current data unit is an audio frame, the step S240 may include: extracting the characteristics of the current data unit to obtain the audio characteristics of the current data unit, wherein the audio characteristics comprise at least one of power spectrum characteristics and Mel spectrum characteristics; inputting a preset coding code rate value, the audio characteristics of the current data unit and the packet receiving waiting time delay of each data packet into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit.
Here, the Mel spectrum (MBF, mel Bank Features) is a Mel spectrum obtained by passing it through a Mel-scale filter bank (Mel-scale filter banks) to obtain a sound characteristic of a proper size because of a large sound spectrum.
The power spectrum is an abbreviation for power spectral density function, which is defined as the signal power within a unit frequency band. It shows the variation of signal power with frequency, i.e. the distribution of signal power in the frequency domain. The power spectrum represents the variation of signal power with frequency.
It should be understood that if the data unit is a video unit, feature extraction may be performed on a current data unit (current video frame) to obtain a feature vector corresponding to the current data unit, so as to input a preset encoding code rate value, the feature vector of the current data unit, and a packet waiting delay of each data packet into a pre-trained encoding parameter allocation model to obtain a first encoding parameter of the current data unit and a second encoding parameter of a redundant data unit corresponding to the current data unit.
Step S270: and encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data.
Step S260: and packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit.
For the specific description of the steps S230-S260, reference may be made to the foregoing specific description of the steps S110-S140, which is not repeated in the embodiment of the present application.
After obtaining the data packet corresponding to the current data unit, the method further comprises: and transmitting a data packet corresponding to the current data unit to the data receiving equipment, so that the data receiving equipment decodes the second encoded data in the data packet corresponding to the current data unit when receiving the data packet corresponding to the current data unit and confirming that the last data unit of the current data unit is lost, and decodes the first encoded data in the data packet corresponding to the current data unit when confirming that the last data unit of the current data unit is not lost.
It should be appreciated that, when it is confirmed that the previous data unit of the current data unit is lost, the first encoded data in the data packet corresponding to the current data unit may be decoded in addition to the second encoded data in the data packet corresponding to the current data unit.
The method for confirming whether the last data unit of the current data unit is lost may be to confirm whether the last data unit is lost according to the data receiving delay of the data receiving device, or confirm whether the last data unit of the current data unit is lost according to the tag information in the current data unit and the tag information of the data units in other received data packets. It should be understood that the above manner of confirming whether the last data unit of the current data unit is lost is merely illustrative, and other confirmation manners are also possible, which are not described in detail in this embodiment.
The method comprises the steps of obtaining a plurality of coding combination modes corresponding to initial data and packet loss waiting time delays and target data of a plurality of data packets corresponding to each coding combination mode, determining one coding parameter combination mode from a plurality of coding parameter combination modes based on the initial data and the target data corresponding to each coding combination mode as a target coding parameter combination mode of the initial data, and constructing training samples based on the target coding parameter combination mode and packet receiving waiting time delays corresponding to the plurality of data packets corresponding to the target coding parameter combination mode, so that a coding parameter distribution model trained by the training samples can reasonably distribute a first coding parameter and a second coding parameter according to the packet receiving waiting time delays of the plurality of data packets of data to be transmitted as a current data unit and a redundant data unit of the data to be transmitted, and the first coding parameter and the second coding parameter are matched with transmission conditions of packet receiving waiting time delays of the plurality of data packets, thereby improving the reliability of data transmission.
In addition, the application is carried out based on the packet receiving waiting time delay of the data packet when the coding parameter is allocated, so that the quality condition of the network is fully considered to be generally related to the packet receiving waiting time delay and the possibility of packet loss of the current packet under the normal condition, the worse the network quality is, the longer the packet receiving waiting time delay is, the correspondingly higher the possibility of packet loss is, the shorter the packet receiving waiting time delay is, and the correspondingly lower the possibility of packet loss is, therefore, the coding parameter can be accurately allocated to the current data unit and the corresponding redundant data unit according to the packet receiving waiting time delay and the preset coding code value, the problem that the coding parameter is not accurately obtained based on the obtained coding parameter because the packet loss rate is generally unstable when the coding parameter is determined by adopting the packet loss rate in the related technology is solved, the packet loss rate itself is a statistics of the percentage of packet loss in a near period of time, the packet loss rate is a hysteresis detection result, the packet loss rate can only reflect the receiving end packet receiving condition in a period before the current moment, and the situation that the packet loss does not occur in a causal relation to the current transmission frame.
Furthermore, the above method can avoid the unreasonable problem of the coding and decoding tone quality of the original transmission frame, and the unreasonable problem caused by the fact that the coding and decoding tone quality of the in-band FEC further sacrifices the normal frame (possibly important frame) is made for protecting the lost frame if the information amount of the lost frame is not large and the lost frame is not influenced by the decoding of the receiving end is not large.
By constructing a combination mode of multiple coding rates for initial audio data, performing packet loss simulation, acquiring packet receiving equipment statistics of packet receiving waiting time delays corresponding to multiple audio data packets corresponding to each coding parameter combination mode and degraded audio data corresponding to each coding combination mode, obtaining an evaluation result of the degraded audio data corresponding to each coding rate combination mode through objective quality evaluation means such as pesq (objective speech quality evaluation) or polqa (audio quality evaluation), so as to obtain an audio sample with an optimal in-band FEC coding parameter allocation result, and utilizing the audio sample training coding parameter allocation model to allocate coding parameters for a current data unit and a corresponding redundancy unit when multiple packet receiving waiting time delays are used for allocating the coding parameters for the current audio frame and the corresponding redundancy frame, so that the coding effect can be improved.
Taking the example that the data coding method is specifically applied to coding transmission of audio data, the coding transmission of the audio data can be specifically performed in the fields of Voip call, live broadcast, audio broadcast and the like as scenes. Voice over IP voice transmission is a voice call technology, and the voice call and the multimedia conference are achieved through Internet Protocol (IP), that is, communication is performed through the internet. Other informal names are IP phone (IP telephone), internet phone (Internet telephony), broadband phone (broadband telephony), and broadband phone service (broadband phone service).
When voice data transmission is performed by using a Voip call, the transmission mode shown in fig. 5 may be specifically adopted, and after a user records a voice signal through a data transmitting end, the data transmitting end encodes the audio signal and transmits the encoded audio signal to a data receiving end; the audio signal transmission (interaction) between the two devices mounted with the VIOP client may be performed as shown in fig. 6, or the audio signal transmission may be performed by a plurality of devices mounted with the VIOP client as shown in fig. 7, and it should be understood that each device may be used as a data transmitting end or a data receiving end when the audio signal transmission is performed between at least two devices mounted with the VIOP client for the week.
Referring to fig. 8 and 9 in combination, a specific audio data transmission process is as follows:
model training: firstly, the collected batch audio training samples are segmented according to a fixed window length, for example, the window length is 8 seconds (20 ms is one frame of data, then one window is 400 frames in total), the segmented one window 400 frame of data (initial audio data) is encoded by a voice encoder and then packet loss is simulated, the real packet loss scene (for example, router packet loss, access equipment packet loss and the like) is simulated as far as possible, and the packet receiving waiting delay value of a receiving end is recorded. Assuming that the normal frame coding rate of the speech coder has N choices and the in-band FEC redundancy coding part has M choices of coding rates, each frame of data (each audio frame) has a combination choice of n+m coding rates, there is a combination that the filtering part does not meet the requirement according to the preset coding rate value (i.e., average coding rate) requirement for the initial audio data, an optimal search method is adopted for the remaining code rate combinations meeting the requirement, an optimal coding bit allocation combination is extracted by an audio objective quality evaluation tool such as PESQ or POLQA, and training samples are constructed by using the optimal coding bit allocation combination, the packet reception waiting delay and the initial audio data, so that the training samples are used for training a coding parameter allocation model (a deep learning bit allocation network), and the trained coding parameter allocation model is used for coding parameter allocation control (bit allocation control) of in-band FEC in actual coding.
The optimal searching method is two, namely a global traversal method and a classification traversal method.
Global traversal refers to: taking 8 seconds as a window (400 frames signal), it is assumed that the speech in-band FEC encoder has K code rates selectable, k=n1+n2. The global traversal rule has K code rate configurations according to all frames, and then has 400-K frame code rate configurations in total, and the combination meeting the given target coding code rate (average code rate) requirement is filtered out from the combination, namely the sum of different code rates of 400 frames is divided by 400 to obtain the average code rate value which is the current code rate configuration scheme, and the value requirement is less than or equal to the given preset coding code rate (the externally configured coding code rate value when the encoder is started).
The classification traversal method refers to: the input frame signal is classified by a classification means (such as vad detection, an audio signal classifier, etc.), for example, vad detection divides the audio input into voice and non-voice, other classifiers can be used to divide the input signal into voice, music, noise, silence, etc., and selectable coding rate ranges are set according to different classes, for example, non-voice, noise, silence, etc., and the maximum value of the coding rate of the type can be limited to be a smaller coding rate because of no substantial content information, and the music type is limited to be a larger coding rate because of larger information class and generally allocated with higher coding rate value, so that the maximum value and the minimum value of the coding rate of the type are limited to be set to be a larger coding rate. The total number of combinations that meet the target code rate requirement and that meet the configurable code rate in the latter window is reduced compared to the code rate combination obtained by the global traversal method by limiting the type code rate range based on signal classification. Assuming that the total number of the final available combinations of the code rates obtained by the traversing is N, carrying out analog packet loss and audio encoding and decoding processing on each frame of audio input signals in a window based on each group of encoding rate configuration, outputting the obtained degraded audio signals, carrying out PESQ or POLQA objective quality assessment on the original input audio signals and the degraded audio signals, and giving out the highest score of customer quality assessment based on the assessment result, wherein the lowest average code rate value is the optimal encoding rate configuration (target encoding rate combination mode).
And constructing an audio sample based on the optimal coding rate configuration, the initial audio data and the packet receiving waiting time delays corresponding to the audio data packets corresponding to the audio frames included in the initial audio data, wherein the audio sample comprises a current audio sample frame and the packet receiving waiting time delays corresponding to the audio sample data packets corresponding to the historical audio frames, the coding position of the historical audio frame in the audio sample is positioned before the current audio sample frame, and the sample label of the time length between the statistical time of the packet receiving waiting time delays of the historical audio frames and the acquisition time of the current audio sample frame comprises a first coding rate corresponding to the current audio sample frame and a second coding rate of the redundant sample frame corresponding to the current audio sample frame within a preset time length range. And finally outputting the selected probabilities with different coding rates through the Sofemax processing by inputting the audio samples into the coding parameter distribution model, and obtaining the first predictive coding rate of the current audio frame and the second predictive coding rate corresponding to the redundant frame of the current audio frame under the different selected probabilities. And obtaining coding loss according to the sample labels, the selection probabilities of different coding rates and the first predictive coding rate of the current audio frame and the second predictive coding rate corresponding to the redundant frame of the current audio frame under the different selection probabilities, and adjusting model parameters to minimize the model loss to obtain a trained coding parameter distribution model.
After the training of the coding parameter distribution model is completed, if the current audio frame and the corresponding redundant frame are to be coded, acquiring the packet receiving waiting time delay of each of the current data unit of the data to be transmitted and the plurality of data packets counted by the data receiving equipment, wherein the duration between the counting time of the packet receiving waiting time delay and the acquiring time of the current data unit is within a preset duration range, and the current data unit corresponds to the same data transmission channel of each data packet; extracting the characteristics of the current data unit to obtain the audio characteristics of the current data unit, wherein the audio characteristics comprise at least one of power spectrum characteristics and Mel spectrum characteristics; inputting a preset coding code rate value, the audio characteristics of the current data unit and the packet receiving waiting time delay of each data packet into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the redundant data unit is used for recovering the data unit coded before the current data unit; encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data; and packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit. And transmitting the data packets to the data receiving end in the form of a code stream.
It should be noted that, before the current audio frame is encoded by using the allocated encoding parameters, the audio features of the current audio frame may be extracted based on the features of the speech model, and the audio features may be quantized and then stored, so that when the next frame of audio is transmitted, the stored audio frame may be encoded as a redundant frame.
When the quantized audio frame is obtained through the feature extraction and feature quantization of the current audio frame, the current audio frame can be encoded based on a first encoding parameter corresponding to the current audio frame to obtain first encoded data, the redundant frame (the previous audio frame of the current audio frame) corresponding to the current audio frame is encoded based on a second encoding parameter to obtain second encoded data, the first encoded data and the second encoded data are packed to obtain a data packet, and then the data packet is output, wherein the data packet can be output in a code stream mode when the data transmitting end transmits the data packet.
It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.
As shown in fig. 10, a data encoding apparatus 300 according to an embodiment is shown, and as shown in fig. 7, the data encoding apparatus 300 includes: a first data acquisition module 310, a bit determination module 320, an encoding module 330, and a packing module 340. A first data obtaining module 310, configured to obtain a current data unit of data to be transmitted and a packet reception waiting delay of each of a plurality of data packets counted by the data receiving device, where a duration between a counting time of the packet reception waiting delay and an obtaining time of the current data unit is within a preset duration range, and the current data unit corresponds to the same data transmission channel with each data packet; the bit determining module 320 is configured to input a preset coding code rate value, a current data unit, and a packet reception waiting delay of each of the plurality of data packets to a pre-trained coding parameter allocation model, to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, where the first coding parameter and the second coding parameter are adapted to a transmission condition of the packet reception waiting delay reaction of the plurality of data packets; the method comprises the steps that an encoding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a packet receiving waiting time delay corresponding to a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample encoding parameter and a second sample encoding parameter, the first sample encoding parameter and the second sample encoding parameter are selected from a plurality of encoding parameter combination modes based on the data sample and a plurality of target data, the target data is data obtained by decoding the sample data packets received by a data receiving device, the sample data packets received by the data receiving device are data packets obtained by encoding the sample data in a coding parameter combination mode and simulating packet loss, and each coding parameter combination mode comprises a first sample encoding parameter of each of the plurality of sample data units in the sample data and a second sample encoding parameter of a redundant sample data unit corresponding to each sample data unit; the encoding module 330 is configured to encode the current data unit based on the first encoding parameter to obtain first encoded data, and encode the redundant data unit based on the second encoding parameter to obtain second encoded data; and the packaging module 340 is configured to package the first encoded data and the second encoded data to obtain a data packet corresponding to the current data unit.
In an embodiment, the apparatus 300 further includes a second data acquisition module, a combination mode determining module, a sample construction module, and a model training module, where the second data acquisition module is configured to acquire multiple coding combination modes corresponding to the initial data and packet loss waiting delays and target data of multiple data packets corresponding to each coding combination mode, where the packet loss delay corresponding to each coding combination mode is obtained by performing corresponding coding on the initial data by using the coding combination mode and then simulating packet loss transmission; the combination mode determining module is used for determining one coding parameter combination mode from a plurality of coding parameter combination modes as a target coding parameter combination mode of the initial data based on the initial data and the target data corresponding to each coding combination mode; the sample construction module is used for constructing a training sample based on a target coding parameter combination mode and a packet receiving waiting time delay corresponding to each of a plurality of data packets corresponding to the target coding parameter combination mode, wherein the training sample comprises a current sample data unit and the packet receiving waiting time delay corresponding to each of a plurality of training sample data packets corresponding to a plurality of historical sample data units, the coding position of the historical sample data unit in the training sample is positioned before the current sample data unit, the time length between the statistical time of the packet receiving waiting time delay of each historical sample data unit and the acquisition time of the current sample data unit is within the preset time length range, and the sample label comprises a first sample coding parameter corresponding to the current sample data unit and a second sample coding parameter of a redundant sample data unit corresponding to the current sample data unit; and the model training module is used for training the coding parameter distribution model based on a plurality of training samples so as to minimize the model parameters of the bit model and obtain the trained coding parameter distribution model.
In one embodiment, the coding parameter is a coding rate, the initial data is audio data, the data unit is an audio frame, and the combination mode determining module includes a mean value calculating sub-module, a first combination mode selecting sub-module, a quality evaluating sub-module, and a second combination mode selecting sub-module. The average value calculation sub-module is used for calculating the average value of the coding code rate corresponding to each coding parameter combination mode; the first combination mode selecting submodule is used for selecting a coding parameter combination mode with the average value of the coding rate smaller than a coding rate set threshold value from a plurality of coding parameter combination modes as a candidate coding parameter combination mode; the quality evaluation sub-module is used for carrying out objective quality evaluation on the initial data and the target data corresponding to each candidate coding parameter combination mode to obtain an evaluation result of the target data corresponding to each candidate coding parameter combination mode; the second combination mode selecting sub-module is used for selecting the target coding parameter combination mode from the candidate coding parameter combination modes based on the evaluation result of the target data corresponding to each candidate coding parameter combination mode and the coding rate average value corresponding to each candidate coding parameter combination mode.
In one embodiment, the second data acquisition module includes a combination acquisition sub-module, a coding sub-module, a data transmission sub-module, and a data reception sub-module. The method comprises the steps of obtaining a sub-module in a combination mode, wherein the sub-module is used for obtaining a plurality of coding parameter combination modes for coding initial data according to N coding parameters of each data unit in the initial data and M coding parameters of a redundancy unit corresponding to the data unit, wherein N and M are integers which are larger than or equal to 1 respectively; the coding sub-module is used for coding the data units in the initial data according to each coding parameter combination mode to obtain a data packet in the coding parameter combination mode; the data transmitting sub-module is used for transmitting the data packet in each coding parameter combination mode to the data receiving equipment; the data receiving sub-module is used for receiving packet receiving waiting time delay corresponding to each coding parameter combination mode fed back by the data receiving equipment; the data receiving equipment simulates packet loss and counts packet receiving waiting time delay in the process of receiving the data packets corresponding to the coding parameter combination modes; the data receiving sub-module is also used for receiving target data corresponding to the combination modes of the coding parameters fed back by the data receiving equipment.
In one embodiment, the code combining manner obtaining submodule is further configured to obtain, according to N coding parameters for each data unit and M coding parameters for a redundancy unit corresponding to the data unit in the initial data, multiple code combining manners for each data unit and the redundancy unit of the data unit; and obtaining a plurality of coding parameter combination modes for coding the initial data based on each data unit and the coding combination modes of the redundant units corresponding to the data units.
In one embodiment, the initial data is initial audio data, the data unit is an audio frame, the coding parameters include coding rate, the coding combination mode obtains a sub-module, and the sub-module is further used for classifying and detecting each audio frame in the initial audio data to obtain an audio category of each audio frame; and obtaining a plurality of coding parameter combination modes for coding the initial audio data according to the preset code rate range corresponding to each audio type, the audio type of each audio frame, N coding code rates of each audio frame in the initial audio data and M coding code rates of the redundant audio frames corresponding to the audio frames.
In one embodiment, the sample data is an audio sample, the data unit is an audio frame, and the bit determination module 320 includes a feature extraction sub-module and a bit determination sub-module. The feature extraction sub-module is used for carrying out feature extraction on the current data unit to obtain an audio feature of the current data unit, wherein the audio feature comprises at least one of a power spectrum feature and a Mel spectrum feature; the bit determining sub-module is used for inputting a preset coding code rate value, the audio characteristics of the current data unit and the packet receiving waiting time delay of each data packet into the pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of the redundant data unit corresponding to the current data unit.
In an embodiment, the apparatus 300 further includes a data sending module, configured to send a data packet corresponding to the current data unit to the data receiving device, so that the data receiving device decodes the second encoded data in the data packet corresponding to the current data unit when it is confirmed that the previous data unit of the current data unit is lost, and decodes the first encoded data in the data packet corresponding to the current data unit when it is confirmed that the previous data unit of the current data unit is not lost.
An electronic device according to the present application will be described with reference to fig. 11.
Referring to fig. 11, based on the data encoding method provided by the foregoing embodiment, another electronic device 100 including a processor 102 capable of executing the foregoing method is provided in the embodiment of the present application, where the electronic device 100 may be a server or a terminal device, and the terminal device may be a smart phone, a tablet computer, a computer or a portable computer.
The electronic device 100 also includes a memory 104. The memory 104 stores therein a program capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104.
Processor 102 may include one or more cores for processing data and a message matrix unit, among other things. The processor 102 utilizes various interfaces and lines to connect various portions of the overall electronic device 100, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104, and invoking data stored in the memory 104. Alternatively, the processor 102 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 102 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 102 and may be implemented solely by a single communication chip.
The Memory 104 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described below, and the like. The storage data area may also store data acquired by the electronic device 100 in use (e.g., one or more of target write vectors, cluster vectors, sample data, etc.), and so forth.
The electronic device 100 may further include a network module and a screen, where the network module is configured to receive and transmit electromagnetic waves, and implement mutual conversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices, such as an audio playing device. The network module may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. The network module may communicate with various networks such as the internet, intranets, wireless networks, or with other devices via wireless networks. The wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. The screen may display interface content and perform data interaction.
In some embodiments, the electronic device 100 may further include: a peripheral interface 106 and at least one peripheral device. The processor 102, memory 104, and peripheral interface 106 may be connected by a bus or signal lines. The individual peripheral devices may interface with the peripheral devices via buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of the radio frequency assembly 108, the positioning assembly 112, the camera 114, the audio assembly 116, the display screen 118, and the power supply 122, etc
The peripheral interface 106 may be used to connect at least one Input/Output (I/O) related peripheral device to the processor 102 and the memory 104. In some embodiments, the processor 102, the memory 104, and the peripheral interface 106 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 102, the memory 104, and the peripheral interface 106 may be implemented on separate chips or circuit boards, as embodiments of the application are not limited in this respect.
The Radio Frequency (RF) component 108 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency component 108 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency component 108 converts electrical signals to electromagnetic signals for transmission or converts received electromagnetic signals to electrical signals. Optionally, the radio frequency assembly 108 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency component 108 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency component 108 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limiting of the application.
The location component 112 is used to locate the current geographic location of the electronic device to enable navigation or LBS (LocationBased Service, location-based services). The positioning component 112 may be a positioning component based on the united states GPS (GlobalPositioning System ), beidou system or galileo system.
The camera 114 is used to capture images or video. Optionally, the camera 114 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the electronic device 100, and the rear camera is disposed on the back of the electronic device 100. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera 114 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio component 116 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 102 for processing, or inputting the electric signals to the radio frequency component 108 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 100. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 102 or the radio frequency assembly 108 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio component 114 may also include a headphone jack.
The display screen 118 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 118 is a touch display screen, the display screen 118 also has the ability to collect touch signals at or above the surface of the display screen 118. The touch signal may be input to the processor 102 as a control signal for processing. At this point, the display screen 118 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 118 may be one, providing a front panel of the electronic device 100; in other embodiments, the display screen 118 may be at least two, respectively disposed on different surfaces of the electronic device 100 or in a folded design; in still other embodiments, the display screen 118 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device 100. Even more, the display screen 118 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display screen 118 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode), or other materials.
The power supply 122 is used to power the various components in the electronic device 100. The power source 122 may be alternating current, direct current, disposable or rechargeable. When the power source 122 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
The embodiment of the application also provides a structural block diagram of the computer readable storage medium. The computer readable medium has stored therein program code which is callable by a processor to perform the method described in the method embodiments described above.
The computer readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium has storage space for program code to perform any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code may be compressed, for example, in a suitable form.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods described in the various alternative implementations described above.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (12)

1. A method of encoding data, the method comprising:
acquiring a current data unit of data to be transmitted and a packet receiving waiting time delay of each of a plurality of data packets counted by data receiving equipment, wherein the time length between the counting time of the packet receiving waiting time delay and the acquisition time of the current data unit is within a preset time length range, and the current data unit corresponds to the same data transmission channel with each data packet;
Inputting the preset coding code rate value, the current data unit and the respective packet receiving waiting time delay of the plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the first coding parameter and the second coding parameter are matched with the transmission condition of the packet receiving waiting time delay reaction of the plurality of data packets; the coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are obtained by selecting a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data packets received by a data receiving device, the sample data packets received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to encode the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of the corresponding sample data unit;
Encoding the current data unit based on the first encoding parameter to obtain first encoded data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoded data;
and packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit.
2. The method of claim 1, wherein the training process of the encoding parameter assignment model comprises:
acquiring a plurality of coding combination modes corresponding to initial data and packet loss waiting time delay and target data of a plurality of data packets corresponding to each coding combination mode, wherein the packet loss time delay corresponding to each coding combination mode is obtained by adopting the coding combination mode to correspondingly code the initial data and then simulating packet loss transmission;
determining one coding parameter combination mode from the multiple coding parameter combination modes as a target coding parameter combination mode of the initial data based on the initial data and target data corresponding to each coding combination mode;
constructing a training sample based on the target coding parameter combination mode and packet receiving waiting time delays corresponding to a plurality of data packets corresponding to the target coding parameter combination mode;
And training the coding parameter distribution model based on a plurality of training samples so as to minimize the model parameters of the bit model and obtain the trained coding parameter distribution model.
3. The method according to claim 2, wherein the coding parameter is a coding rate, the initial data is audio data, the data unit is an audio frame, and the determining, based on the initial data and the target data corresponding to each coding combination mode, one coding parameter combination mode from the plurality of coding parameter combination modes as a target coding parameter combination mode of the initial data includes:
calculating an average value of the coding code rate corresponding to each coding parameter combination mode;
selecting an encoding parameter combination mode with the average encoding rate smaller than the set threshold value of the encoding rate from a plurality of encoding parameter combination modes as a candidate encoding parameter combination mode;
objective quality evaluation is carried out on the initial data and target data corresponding to each candidate coding parameter combination mode, and an evaluation result of the target data corresponding to each candidate coding parameter combination mode is obtained;
and selecting a target coding parameter combination mode from the candidate coding parameter combination modes based on the evaluation result of the target data corresponding to each candidate coding parameter combination mode and the average value of the coding code rate corresponding to each candidate coding parameter combination mode.
4. The method of claim 2, wherein the obtaining the plurality of coding combinations corresponding to the initial data and the packet loss waiting delay and the target data of the plurality of data packets corresponding to each coding combination includes:
according to N coding parameters of each data unit in initial data and M coding parameters of a redundancy unit corresponding to the data unit, obtaining a plurality of coding parameter combination modes for coding the initial data, wherein N and M are integers greater than or equal to 1 respectively;
aiming at each coding parameter combination mode, coding the data unit in the initial data according to the coding parameter combination mode to obtain a data packet in the coding parameter combination mode;
transmitting the data packet in each coding parameter combination mode to data receiving equipment;
receiving packet receiving waiting time delay corresponding to each coding parameter combination mode fed back by the data receiving equipment; the data receiving equipment simulates packet loss and counts packet receiving waiting time delay in the process of receiving the data packets corresponding to the coding parameter combination modes;
and receiving target data corresponding to the combination modes of the coding parameters fed back by the data receiving equipment.
5. The method of claim 4, wherein the obtaining a plurality of coding parameter combinations for coding the initial data according to the N coding parameters for each data unit in the initial data and the M coding parameters for the redundancy unit corresponding to the data unit includes:
Obtaining a plurality of coding combination modes of each data unit and the redundant unit of the data unit according to N coding parameters of each data unit and M coding parameters of the redundant unit corresponding to the data unit in initial data;
and obtaining a plurality of coding parameter combination modes for coding the initial data based on each data unit and the coding combination modes of the redundant units corresponding to the data unit.
6. The method according to claim 4, wherein the initial data is initial audio data, the data units are audio frames, the coding parameters include coding rates, and obtaining a plurality of coding parameter combination modes for coding the initial data according to N coding parameters of each data unit in the initial data and M coding parameters of a redundancy unit corresponding to the data unit includes:
classifying and detecting each audio frame in the initial audio data to obtain the audio category of each audio frame;
and obtaining a plurality of coding parameter combination modes for coding the initial audio data according to the preset code rate range corresponding to each audio type, the audio type of each audio frame, N coding code rates of each audio frame in the initial audio data and M coding code rates of redundant audio frames corresponding to the audio frames.
7. The method according to claim 1, wherein the sample data is an audio sample, the data unit is an audio frame, the inputting the preset coding code rate value, the current data unit, and the packet waiting delay of each of the data packets into a pre-trained coding parameter allocation model, obtaining a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, includes:
extracting features of the current data unit to obtain audio features of the current data unit, wherein the audio features comprise at least one of power spectrum features and Mel spectrum features;
inputting the preset coding code rate value, the audio characteristics of the current data unit and the packet receiving waiting time delay of each data packet into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit.
8. The method of claim 1, wherein after the first encoded data and the second encoded data are packetized to obtain the data packet corresponding to the current data unit, the method further comprises:
And sending the data packet corresponding to the current data unit to the data receiving equipment, so that the data receiving equipment decodes the second encoded data in the data packet corresponding to the current data unit when the data packet corresponding to the current data unit is received and the last data unit of the current data unit is confirmed to be lost, and decodes the first encoded data in the data packet corresponding to the current data unit when the data packet corresponding to the current data unit is confirmed to be not lost.
9. A data encoding apparatus, the apparatus comprising:
the first data acquisition module is used for acquiring a current data unit of data to be transmitted and respective packet receiving waiting time delays of a plurality of data packets counted by the data receiving equipment, wherein the duration between the counting time of the packet receiving waiting time delays and the acquisition time of the current data unit is within a preset duration range, and the current data unit corresponds to the same data transmission channel with each data packet;
the bit determining module is used for inputting the preset coding code rate value, the current data unit and the respective packet receiving waiting time delay of the plurality of data packets into a pre-trained coding parameter distribution model to obtain a first coding parameter of the current data unit and a second coding parameter of a redundant data unit corresponding to the current data unit, wherein the first coding parameter and the second coding parameter are matched with the transmission condition of the packet receiving waiting time delay reaction of the plurality of data packets; the coding parameter distribution model is obtained by training an initial model based on a data sample, the data sample comprises a current sample data unit and a plurality of sample data packets corresponding to a plurality of historical sample data units respectively, the data sample is provided with a sample tag, the sample tag comprises a first sample coding parameter and a second sample coding parameter, the first sample coding parameter and the second sample coding parameter are obtained by selecting a plurality of coding parameter combination modes based on the data sample and a plurality of target data, the target data are data obtained by decoding the sample data packets received by a data receiving device, the sample data packets received by the data receiving device are data packets obtained by adopting a coding parameter combination mode to encode the sample data and simulate packet loss, and each coding parameter combination mode comprises a first sample coding parameter of each of the plurality of sample data units in the sample data and a second sample coding parameter of the corresponding sample data unit;
The encoding module is used for encoding the current data unit based on the first encoding parameter to obtain first encoding data, and encoding the redundant data unit based on the second encoding parameter to obtain second encoding data;
and the packaging module is used for packaging the first coded data and the second coded data to obtain a data packet corresponding to the current data unit.
10. An electronic device, comprising:
one or more processors;
a memory;
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-9.
11. A computer readable storage medium, characterized in that the computer readable storage medium stores a program code, which is callable by a processor for performing the method according to any one of claims 1-9.
12. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-9.
CN202310574455.3A 2023-05-19 2023-05-19 Data encoding method, device, electronic equipment and storage medium Pending CN116980075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310574455.3A CN116980075A (en) 2023-05-19 2023-05-19 Data encoding method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310574455.3A CN116980075A (en) 2023-05-19 2023-05-19 Data encoding method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116980075A true CN116980075A (en) 2023-10-31

Family

ID=88473869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310574455.3A Pending CN116980075A (en) 2023-05-19 2023-05-19 Data encoding method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116980075A (en)

Similar Documents

Publication Publication Date Title
US11227612B2 (en) Audio frame loss and recovery with redundant frames
CN104054125B (en) Devices for redundant frame coding and decoding
CN102144256B (en) Method and apparatus for fast nearestneighbor search for vector quantizers
CN108932948B (en) Audio data processing method and device, computer equipment and computer readable storage medium
US10354660B2 (en) Audio frame labeling to achieve unequal error protection for audio frames of unequal importance
CN114333862B (en) Audio encoding method, decoding method, device, equipment, storage medium and product
CN105099795A (en) Jitter buffer level estimation
CN110838894A (en) Voice processing method, device, computer readable storage medium and computer equipment
CN112530444A (en) Audio encoding method and apparatus
CN114338623A (en) Audio processing method, device, equipment, medium and computer program product
CN115552518B (en) Signal encoding and decoding method and device, user equipment, network side equipment and storage medium
CN103503444A (en) Signaling number of active layers in video coding
CN112767955A (en) Audio encoding method and device, storage medium and electronic equipment
CN111951821B (en) Communication method and device
CN111277864B (en) Encoding method and device of live data, streaming system and electronic equipment
CN103325385B (en) Voice communication method and equipment, the method and apparatus of operation wobble buffer
CN116980075A (en) Data encoding method, device, electronic equipment and storage medium
CN112802485B (en) Voice data processing method and device, computer equipment and storage medium
CN114842857A (en) Voice processing method, device, system, equipment and storage medium
CN115705839A (en) Voice playing method and device, computer equipment and storage medium
CN116959458A (en) Audio transmission method, device, terminal, storage medium and program product
US11070666B2 (en) Methods and devices for improvements relating to voice quality estimation
CN114283837A (en) Audio processing method, device, equipment and storage medium
CN115334349B (en) Audio processing method, device, electronic equipment and storage medium
US20240087585A1 (en) Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication