CN116189706A - Data transmission method, device, electronic equipment and computer readable storage medium - Google Patents

Data transmission method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN116189706A
CN116189706A CN202111435155.4A CN202111435155A CN116189706A CN 116189706 A CN116189706 A CN 116189706A CN 202111435155 A CN202111435155 A CN 202111435155A CN 116189706 A CN116189706 A CN 116189706A
Authority
CN
China
Prior art keywords
data
audio
noise
environmental noise
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111435155.4A
Other languages
Chinese (zh)
Inventor
杨伟明
王少鸣
郭润增
洪哲鸣
唐惠忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111435155.4A priority Critical patent/CN116189706A/en
Publication of CN116189706A publication Critical patent/CN116189706A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Abstract

The embodiment of the invention discloses a data transmission method, a data transmission device, electronic equipment and a computer readable storage medium; after data to be transmitted and current environmental noise are obtained, extracting characteristics of the current environmental noise to obtain audio characteristics of the current environmental noise, determining the environmental noise type of the current environmental noise based on the audio characteristics, adding the environmental noise type to the data to be transmitted to obtain target transmission data, converting the target transmission data into audio data, and playing the audio data so that a receiving terminal can obtain the data to be transmitted in the audio data according to the environmental noise type; the scheme can improve the transmission success rate of data transmission.

Description

Data transmission method, device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data transmission method, apparatus, electronic device, and computer readable storage medium.
Background
In recent years, with the rapid development of internet technology, rapid transmission of data between terminals can be realized through serial communication. However, after the serial port line is abnormal or the cable fails, data cannot be transmitted. Aiming at the problem of serial communication, the existing data transmission method can send the data to be transmitted to a receiving terminal in a sound wave mode.
In the research and practice process of the prior art, the inventor finds that the transmission data in the acoustic wave mode is easy to be interfered by external environmental noise, so that the transmission data acquired by the receiving terminal in the acquired audio frequency is inaccurate or can not be analyzed, and further the data transmission failure occurs, and therefore, the success rate of the data transmission is lower.
Disclosure of Invention
The embodiment of the invention provides a data transmission method, a data transmission device, electronic equipment and a computer readable storage medium, which can improve the success rate of data transmission.
A data transmission method, comprising:
acquiring data to be transmitted and current environmental noise, wherein the current environmental noise is a sound in a current transmission environment acquired in real time;
extracting the characteristics of the current environmental noise to obtain the audio characteristics of the current environmental noise;
determining an ambient noise type of the current ambient noise based on the audio characteristics;
adding the environmental noise type to the data to be transmitted to obtain target transmission data;
and converting the target transmission data into audio data and playing the audio data so that a receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type.
Correspondingly, an embodiment of the present invention provides a data transmission device, including:
the acquisition unit is used for acquiring data to be transmitted and current environmental noise, wherein the current environmental noise is a sound in a current transmission environment acquired in real time;
the extraction unit is used for extracting the characteristics of the current environmental noise to obtain the audio characteristics of the current environmental noise;
a determining unit configured to determine an environmental noise type of the current environmental noise based on the audio feature;
an adding unit, configured to add the environmental noise type to the data to be transmitted, to obtain target transmission data;
and the playing unit is used for converting the target transmission data into audio data and playing the audio data so that a receiving terminal can acquire the data to be transmitted from the audio data according to the environmental noise type.
Optionally, in an embodiment, the data transmission device further includes an identification unit, where the identification unit may be specifically configured to record, when the audio data is detected to be played, the played target audio data to obtain recording data; analyzing the recording data to obtain the current environmental noise type and initial audio data; and identifying transmission data in the initial audio data according to the current environmental noise type.
Optionally, in some embodiments, the identifying unit may be specifically configured to evaluate, according to the current environmental noise type, network transmission quality of the target audio data to obtain evaluation information; determining a processing mode for the target audio data based on the evaluation information; and identifying transmission data in the initial audio data when the processing mode is to receive the target audio data.
Optionally, in some embodiments, the identifying unit may be specifically configured to extract fundamental frequency data from the initial audio data, and perform header detection on the fundamental frequency data according to a data block identifier in the fundamental frequency data; when the message header exists in the fundamental frequency data, extracting sound wave fundamental frequency data from the initial audio data; extracting at least one data block identifier from the sound wave fundamental frequency data, and correcting the extracted data block identifier to obtain a data block identifier set; and acquiring a target data block corresponding to each data block identifier in the data block identifier set, and fusing the target data blocks to obtain transmission data.
Optionally, in some embodiments, the identifying unit may be specifically configured to discard the initial audio data and generate a prompt message when the processing manner is that the target audio data is not received; and sending the prompt information to a sending terminal so that the sending terminal replays the target audio data based on the prompt information.
Optionally, in some embodiments, the extracting unit may be specifically configured to convert a channel of the current environmental noise to obtain the current environmental noise of the target channel; blocking the current environmental noise of the target sound channel to obtain a plurality of audio data blocks; and extracting the characteristics of the audio data block to obtain the audio characteristics corresponding to the audio data block.
Optionally, in some embodiments, the extracting unit may be specifically configured to extract spectral image information from the audio data block, and identify a spectral image value corresponding to the audio data block in the spectral image information; acquiring data sampling quantity corresponding to each audio feature dimension, and extracting initial audio features corresponding to each audio feature dimension from the spectrum image information according to the data sampling quantity; and splicing the initial audio features with the frequency spectrum image values to obtain the audio features corresponding to each audio data block.
Optionally, in some embodiments, the determining unit may be specifically configured to identify, in the audio feature, a candidate environmental noise type of the corresponding audio data block using a trained noise identification model; counting the number of types corresponding to each environmental noise type in the candidate environmental noise types; and screening the environmental noise type of the current environmental noise from the candidate environmental noise types based on the type number.
Optionally, in some embodiments, the determining unit may be specifically configured to extract a noise audio feature from the audio feature of each of the audio data blocks by using a trained noise type recognition model, to obtain a noise audio feature set of the current environmental noise; carrying out batch normalization processing on noise audio features in the noise audio feature set to obtain a basic noise audio feature set; and identifying the environment noise type corresponding to each basic noise audio feature in the basic noise audio feature set, and obtaining the candidate environment noise type of the audio data block.
Optionally, in some embodiments, the determining unit may be specifically configured to obtain time information of the audio data block in the current environmental noise, and sort base noise audio features in the base noise audio feature set based on the time information; based on the ordering information, respectively converting the basic noise audio features in the basic noise audio feature set into noise type features; and screening out the environmental noise type corresponding to the noise type characteristic from a preset environmental noise type set to obtain the candidate environmental noise type of the audio data block.
Optionally, in some embodiments, the determining unit may be specifically configured to determine a base noise audio feature that needs to be converted currently in the base noise audio feature set, to obtain a current base noise audio feature; querying a target basic noise audio feature ranked in front of the current basic noise audio feature in the basic noise audio feature set based on ranking information; and respectively converting the basic noise audio features in the basic noise audio feature set into noise type features according to the query result of the target basic noise audio features.
Optionally, in some embodiments, the determining unit may be specifically configured to convert, when the target base noise audio feature exists, the current base noise audio feature into a noise type feature according to the target base noise audio feature; when the target basic noise audio feature does not exist, taking the target basic noise audio feature as a noise type feature; and returning to the step of determining the basic noise audio features which need to be converted currently in the basic noise audio feature set until the basic noise audio features in the basic noise audio feature set are all converted into noise type features, and obtaining the noise type features corresponding to each basic noise audio feature.
Optionally, in some embodiments, the determining unit may be specifically configured to obtain a hidden state feature corresponding to the target basic noise audio feature, where the hidden state feature is used to indicate a hidden state transferred during a process of converting the target basic noise audio feature into a noise type feature; calculating the feature ratio of the current basic noise audio feature and the hidden state feature, and determining the target hidden state feature corresponding to the current basic noise audio feature based on the feature ratio; and carrying out dimension transformation on the target hidden state feature to obtain a noise type feature corresponding to the current basic noise audio feature.
Optionally, in some embodiments, the data transmission device may further include a training unit, where the training unit may specifically be configured to obtain an audio data sample, where the audio data sample includes audio data labeled with an environmental noise type; predicting the environmental noise type of the audio data sample by adopting a preset noise type recognition model to obtain a predicted environmental noise type; and converging the preset noise type recognition model according to the labeling environment noise type and the prediction environment noise type to obtain a trained noise type recognition model.
Optionally, in some embodiments, the playing unit may be specifically configured to encrypt the target transmission data, and generate a frequency mapping table based on the encrypted transmission data; performing audio coding on the target transmission data based on the frequency mapping table to obtain an audio frequency point corresponding to the target transmission data; and mapping a frequency value of each audio frequency point in the frequency mapping table, generating an audio waveform based on the frequency value, and taking the audio waveform as audio data.
Optionally, in some embodiments, the playing unit may be specifically configured to segment the target transmission data into a plurality of data blocks to obtain a first data block, and segment a data identifier of the target transmission data generated based on the current time into a plurality of data blocks to obtain a second data block; screening at least one message header from a preset message header set, and taking the message header as a third data block; performing error correction coding on the first data block, the second data block and the third data block, and dividing the error correction code into a plurality of data blocks to obtain a fourth data block; and extracting audio frequency points from the first data block, the second data block, the third data block and the fourth data block to obtain the audio frequency points corresponding to the target transmission data.
In addition, the embodiment of the invention also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores application programs, and the processor is used for running the application programs in the memory to realize the data transmission method provided by the embodiment of the invention.
In addition, the embodiment of the invention also provides a computer readable storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any data transmission method provided by the embodiment of the invention.
After data to be transmitted and current environmental noise are obtained, extracting characteristics of the current environmental noise to obtain audio characteristics of the current environmental noise, determining the environmental noise type of the current environmental noise based on the audio characteristics, adding the environmental noise type to the data to be transmitted to obtain target transmission data, converting the target transmission data into audio data, and playing the audio data so that a receiving terminal can obtain the data to be transmitted in the audio data according to the environmental noise type; according to the scheme, the current environmental noise can be acquired, the environmental noise type of the current environmental noise is determined, the environmental noise type is added to the data to be transmitted, so that the receiving terminal can extract the environmental noise type from the received audio data, the current network transmission quality (Qos) of the data to be transmitted is determined according to the environmental noise type, the data to be transmitted is acquired based on the current network transmission quality, and therefore the transmission success rate of the data transmission can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of a scenario of a data transmission method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a data transmission method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of extracting audio features from current environmental noise according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the format of the current ambient noise provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a network structure of a trained noise type recognition model according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a bidirectional BGRU network according to an embodiment of the present invention;
FIG. 7 is a flow chart of determining the type of environmental noise of the current environmental noise provided by an embodiment of the present invention;
FIG. 8 is a schematic flow chart of identifying candidate environmental noise types using a trained noise type identification model according to an embodiment of the present invention;
Fig. 9 is a schematic diagram of a format of a header according to an embodiment of the present invention;
fig. 10 is a schematic overall flow chart of a data transmission process according to an embodiment of the present invention;
fig. 11 is another flow chart of a data transmission method according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of a data transmission device according to an embodiment of the present invention;
fig. 13 is another schematic structural diagram of a data transmission device according to an embodiment of the present invention;
fig. 14 is another schematic structural diagram of a data transmission device according to an embodiment of the present invention;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides a data transmission method, a data transmission device and a computer readable storage medium. The data transmission device may be integrated in an electronic device, which may be a server or a terminal.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
For example, referring to fig. 1, taking an example that a data transmission device is integrated in an electronic device, after the electronic device obtains data to be transmitted and current environmental noise, the electronic device performs feature extraction on the current environmental noise to obtain audio features of the current environmental noise, then determines an environmental noise type of the current environmental noise based on the audio features, adds the environmental noise type to the data to be transmitted to obtain target transmission data, converts the target transmission data into audio data, and plays the audio data, so that a receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type, and further improves the transmission success rate of data transmission.
The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.
The embodiment will be described from the perspective of a data transmission device, which may be integrated in an electronic apparatus, and the electronic apparatus may be a server or a terminal, etc.; the terminal may include a tablet computer, a notebook computer, a personal computer (PC, personal Computer), a wearable device, a virtual reality device, or other devices capable of performing data transmission.
A data transmission method, comprising:
the method comprises the steps of obtaining data to be transmitted and current environmental noise, wherein the current environmental noise is sound in a current transmission environment collected in real time, extracting characteristics of the current environmental noise to obtain audio characteristics of the current environmental noise, determining the environmental noise type of the current environmental noise based on the audio characteristics, adding the environmental noise type to the data to be transmitted to obtain target transmission data, converting the target transmission data into audio data, and playing the audio data, so that a receiving terminal obtains the data to be transmitted in the audio data according to the environmental noise type.
As shown in fig. 2, the specific flow of the data transmission method is as follows:
101. and acquiring data to be transmitted and current environmental noise.
The current environmental noise is a sound collected in real time under the current transmission environment. For example, taking the current transmission environment as a subway, the current environmental noise can be a sound of a preset time period on the subway collected in real time, and the like.
The types of data to be transmitted may also be various, for example, transaction data generated in the transaction by the face brushing device and the desktop cashing device, or other data to be transmitted may also be used.
The manner of acquiring the data to be transmitted and the current environmental noise may be various, and may specifically be as follows:
for example, for the data to be transmitted, the data input by the user may be directly obtained as the data to be transmitted, or the service data returned by the service server or other terminals may be received, and the service data is used as the data to be transmitted. For the current environmental noise, after the data to be transmitted is acquired, the sound under the current environment with preset time is acquired through the sound acquisition equipment of the data transmission device to obtain the current environmental noise, or the environmental sound under the current environment can be acquired in real time, and when the data to be transmitted is acquired, the acquired environmental sound with preset time at the current moment is extracted from the acquired environmental sound to obtain the current environmental noise.
102. And extracting the characteristics of the current environmental noise to obtain the audio characteristics of the current environmental noise.
Wherein the audio characteristic may be a characteristic indicative of audio information in the current ambient noise,
the feature extraction method for the current environmental noise may be various, and specifically may be as follows:
for example, the current environmental noise channel may be converted to obtain the current environmental noise of the target channel, the current environmental noise of the target channel is segmented to obtain a plurality of audio data blocks, and feature extraction is performed on the audio data blocks to obtain audio features corresponding to the audio data blocks, which may be specifically shown in fig. 3.
The format of the current environmental noise may be various, for example, may be two channels, 16 bits and 48k, or may be two channels, 16 bits and 32k, or may also be two channels, 16 bits and 16k, as shown in fig. 4. There are various ways to convert the channel of the current environmental noise, for example, to detect the channel of the current environmental noise, and when the channel of the current environmental noise is a binaural channel, to convert the current environmental noise into a monophonic current environmental noise.
After the channel conversion of the current environmental noise, the current environmental noise of the target channel can be blocked, and the blocking mode can be various, for example, the current environmental noise of the target channel can be blocked according to a preset time window, so that a plurality of audio data blocks are obtained, the preset time window is any time, for example, 30s or other time, and the preset time window can be adjusted according to practical application.
After the current environmental noise of the target channel is segmented, feature extraction can be performed on the audio data block, so as to obtain audio features corresponding to the audio data block, and various modes for feature extraction can be performed on the audio data block, for example, spectrum image information is extracted from the audio data block, spectrum image values corresponding to the audio data block are identified in the spectrum image information, data sampling amounts corresponding to each audio feature dimension are obtained, initial audio features corresponding to each audio feature dimension are extracted from the spectrum image information according to the data sampling amounts, and the initial audio features and the spectrum images are spliced to obtain the audio features corresponding to each audio data block.
The spectrum image information may be understood as a spectrum image of an audio data block, the audio feature dimension may be understood as an audio feature type, the data sampling amount may be the data amount acquired when the audio data block is subjected to fourier transform, and the data sampling amount corresponding to each audio feature dimension may be the data amount in the audio data block which needs to be acquired in the fourier transform of each type of audio feature. The method for extracting the initial audio feature corresponding to each audio feature dimension from the spectrum image information according to the data sampling amount may be various, for example, according to the number of sampling amounts, extracting the basic audio feature of each audio feature dimension from the spectrogram of the audio data block by adopting fourier transform, and according to the type of the basic audio feature, normalizing the basic audio feature to obtain the initial audio feature, which may be specifically as follows:
For example, with the data sampling amount (fft size) being 4096, extracting 128-dimensional mel-language spectral features (mel 128) and performing standardization to obtain a first standardized audio feature, with the fft size being 4096, extracting mel-language spectral features (mfcc) and performing standardization to obtain a second standardized audio feature, and with the fft size being 1024, extracting zero-crossing rate features and performing binary encoding on the zero-crossing rate features to obtain encoded audio features, and further extracting spectral flatness features (flat) and spectral centroid features and performing standardization respectively to obtain a third standardized audio feature and a fourth standardized audio feature, wherein the first standardized audio feature, the second standardized audio feature, the third standardized audio feature, the fourth standardized audio feature and the encoded audio feature are used as initial audio features.
103. Based on the audio characteristics, an ambient noise type of the current ambient noise is determined.
The environmental noise type may be understood as a scene type corresponding to the current transmission environment, for example, may include various living common scenes such as a normal environment, a subway, a supermarket, a school, a building site, and the like.
The manner of determining the environmental noise type of the current environmental noise based on the audio features may be various, and may specifically be as follows:
For example, a trained noise recognition model is adopted to recognize candidate environmental noise types of corresponding audio data blocks in audio features, the number of types corresponding to each environmental noise type is counted in the candidate environmental noise types, and the environmental noise type of the current environmental noise is screened out from the candidate environmental noise types based on the number of types.
The mode of identifying the candidate environmental noise types of the corresponding audio data blocks in the audio features by using the trained noise identification model can be various, for example, the trained noise type identification model can be used for extracting noise audio features from the audio features of each audio data block to obtain a noise audio feature set of current environmental noise, the noise audio features in the noise audio feature set are subjected to batch normalization processing to obtain a basic noise audio feature set, and the environmental noise types corresponding to each basic noise audio feature are identified in the basic noise audio feature set to obtain the candidate environmental noise types of the audio data blocks.
The method for extracting the noise sound audio features from the audio features of each audio data block by using the trained noise type recognition model may be various, for example, the audio features are fused, the fused audio features are converted into a Spectrogram (spectrum), and the Spectrogram is extracted by using the noise feature extraction network of the trained noise type recognition model, so as to obtain a noise audio feature set of the current environmental noise, where the noise audio feature set may include the noise audio features corresponding to each audio data block.
The network structure of the noise type recognition model after training may be various, for example, may be a residual network model including a residual network sub-structure and a layer (Batch Normalization, BN), as shown in fig. 5, or may be other network structures including the two. The network structure of the noise feature extraction network of the noise type recognition model after training may be composed of a plurality of network layers, for example, a second layer other than the input layer may be 64 convolution kernels, the convolution kernel size is 3×3, the stride (step size) is 1, the padding pixel value is 1, a third layer is a maxpolaring layer, windows (window) is 2×2, the stride is 2, a fourth layer uses 128 convolution kernels, the convolution kernel size is 3×3, the stride is 1, the fifth layer is a maxpolaring layer, windows is 2×2, the stride is 2, the sixth layer uses 256 convolution kernels, the convolution kernel size is 3×3, the stride is 1, the seventh layer is a maxpolaring layer, and windows is 2×2, the stride is 2.
After the noise audio features are extracted, the noise audio features in the noise audio feature set can be subjected to batch normalization processing, for example, a BN layer can be used for batch normalization processing of the noise audio features in the noise audio feature set, and a maxpoling layer can be used for pooling processing of the batch normalization processing of the noise audio features, so that the secondary basic noise audio features corresponding to each audio data block are obtained, and the basic noise audio features are combined to obtain the basic noise audio feature set.
After the noise audio feature batch normalization processing in the noise audio feature set, the environmental noise type corresponding to each basic noise audio feature can be identified in the basic noise audio feature set, and various modes can be used for identifying the environmental noise type, for example, time information of an audio data block in current environmental noise is acquired, the basic noise audio features in the basic noise audio feature set are ordered based on the time information, the basic noise audio features in the basic noise audio feature set are respectively converted into noise type features based on the ordering information, and the environmental noise type corresponding to the noise type features is screened out from a preset environmental noise type set, so that candidate environmental noise types of the audio data block are obtained.
The time information may be understood as information of the time position of the audio data block in the current ambient noise, for example, 5-6s, or a specific time interval. There may be various manners of sorting the basic noise audio features in the basic noise audio feature set based on the time information, for example, sorting the audio data blocks in the current environmental noise from front to back according to the time sequence based on the time information to obtain sorting information, and then sorting the basic noise audio features corresponding to the audio data blocks according to the sorting information to obtain the sorting information of the basic noise audio features.
After the basic noise audio features are ordered, the basic audio features in the basic noise audio feature set can be respectively converted into noise type features based on the ordering information, for example, the basic noise audio features to be converted currently can be determined in the basic noise audio feature set to obtain the current basic noise audio features, the target basic noise audio features arranged in front of the current basic noise audio features are inquired in the basic noise audio feature set based on the ordering information, and the basic noise audio features in the basic noise audio feature set are respectively converted into noise type features according to the inquiring result of the target basic noise audio features.
The noise type feature may be feature information indicating an environmental noise type of the audio data block, and according to the target basic noise audio feature, there may be various ways of converting the basic noise audio feature in the basic noise audio feature set into the noise type feature, for example, when the target basic noise audio feature exists, converting the current basic noise audio feature into the noise type feature according to the target basic noise audio feature, when the target basic noise audio feature does not exist, taking the target basic noise feature as the noise type feature, and returning to execute the step of determining the basic noise audio feature to be converted in the basic noise audio feature set until all the basic noise audio features in the basic noise audio feature set are converted into the noise type feature, so as to obtain the noise type feature corresponding to each basic noise audio feature.
The hidden state feature can be understood as an intermediate feature generated by the noise type feature conversion network when converting the target basic noise audio feature, and the intermediate feature is used for indicating the hidden state transferred in the process of converting the target basic noise feature into the noise type.
The noise type feature conversion network may be a bidirectional BGRU (gate control loop network), the hidden unit may be 256, and particularly, as shown in fig. 6, there may be multiple modes of converting the current basic noise into the noise type feature by using the noise type feature conversion network, for example, calculating a feature ratio of the current basic noise audio feature and the hidden state feature, determining a target hidden state feature corresponding to the current basic noise audio feature based on the feature ratio, and performing dimension conversion on the target hidden state feature to obtain the noise type feature corresponding to the basic noise audio feature.
The dimension conversion of the target hidden state feature can be understood as converting the intermediate feature obtained after the noise type feature conversion network processes the current basic noise audio feature based on the hidden state feature into the feature which can be output, so that the output noise type feature can be obtained.
After the basic noise audio features are converted into the noise type features, the environmental noise types corresponding to the noise type features can be screened out from the preset environmental noise type set, and the mode of screening out the environmental noise types can be multiple, for example, the noise environmental types can be mapped out of the classification probability corresponding to each environmental type noise through a softmax network, and the environmental noise types corresponding to the noise type features are screened out from the noise environmental types based on the classification probability, so that the candidate environmental noise types of the audio data block are obtained.
After the candidate environmental noise types of the audio data block are identified, the number of types corresponding to each environmental noise type can be counted in the candidate environmental noise types, then, based on the number of types, the environmental noise type of the current environmental noise is screened out of the candidate environmental noise types, for example, the environmental noise type with the largest number in the candidate environmental noise types is used as the environmental noise type of the current environmental noise, or a weighting parameter of each environmental noise type can be obtained, based on the weighting parameter, the number of types of each environmental noise type is weighted, and then, the environmental noise type with the largest number of weighted types is screened out from the candidate environmental noise types, so that the environmental noise type of the current environmental noise is obtained.
In the process of extracting the characteristics of the current environmental noise and determining the environmental noise type of the current environmental noise, as shown in fig. 7, the current environmental noise is converted into a mono channel, then the current environmental noise is framed or blocked to obtain an audio data block, the candidate environmental noise type is predicted for the audio data block, statistics is performed on the prediction result, and the environmental noise type of the current environmental noise is screened out.
The trained noise recognition model can be set according to the requirements of practical applications, and in addition, it should be noted that the trained noise recognition model can be preset by maintenance personnel, and also can be trained by a data transmission device, that is, before the step of extracting noise audio features from the audio features of each audio data block by adopting the trained noise type recognition model, the data transmission method can further include:
and obtaining an audio data sample, wherein the audio data sample comprises audio data with marked environment noise types, predicting the environment noise types of the audio data sample by adopting a preset noise type recognition model to obtain predicted environment noise types, and converging the preset noise type recognition model according to the marked environment noise types and the predicted environment noise types to obtain a trained noise type recognition model.
The method for obtaining the audio data samples may be various, for example, determining the audio waveforms of environmental noise in each scene, then collecting audio data in different time periods in the scenes, using the environmental noise types in the marked object scenes in the audio data as positive samples, determining the audio types of negative samples, collecting the audio data of the negative samples with the same number as the positive samples, and marking the environmental noise types in the scenes to obtain the audio data of the negative samples. And performing data cleaning on the positive sample audio data sample and the negative sample audio data sample, thereby obtaining an audio data sample.
After the audio data sample is obtained, the environmental noise type of the audio data sample can be predicted by adopting a preset noise type recognition model to obtain a predicted environmental noise type, and various prediction modes can be adopted, for example, the audio data sample is subjected to feature extraction by adopting the preset noise type recognition model to obtain sample audio features of the audio data sample, the sample environmental noise type of a data block of the audio data sample is recognized in the sample audio features, the number of sample types corresponding to each environmental noise type is counted in the sample environmental noise type, and the environmental noise type of the audio data sample is screened out from the sample environmental noise types based on the number of sample types to obtain the predicted environmental noise type.
After the predicted environmental noise type is obtained, the preset noise type recognition model can be converged according to the labeling environmental noise type and the predicted environmental noise type, and various convergence modes can be adopted, for example, the labeling environmental noise type and the predicted environmental noise type can be compared, so that loss information of an audio data sample is obtained, and based on the loss information, a gradient descent algorithm is adopted to update network parameters of the preset noise type recognition model so as to converge the preset noise type recognition model, so that the trained noise type recognition model is obtained.
The process of identifying candidate environmental noise types by using the trained noise type identification model for the audio features of the audio data block may be as shown in fig. 8, where the audio data samples are collected, then feature extraction is performed on the collected audio data samples through audio feature engineering, then model design is performed on the preset noise type identification model, then the preset noise type identification model is trained by using the extracted sample audio features, and then the candidate environmental noise types of the audio data block are identified in the audio features by using the trained noise type identification model.
104. And adding the environmental noise type to the data to be transmitted to obtain target transmission data.
For example, the environmental noise type may be directly added to the data to be transmitted, so as to obtain target transmission data, or a data tag corresponding to the environmental noise type may be screened out from a preset data tag set, and the data tag is added to the data to be transmitted, so as to obtain target transmission data, or the environmental noise type may be converted into the environmental noise type data, and the environmental noise type data may be added to the data to be transmitted, so as to obtain target transmission data.
105. And converting the target transmission data into audio data and playing the audio data so that the receiving terminal can acquire the data to be transmitted from the audio data according to the type of the environmental noise.
The manner of converting the target transmission data into the audio data may be various, and specifically may be as follows:
for example, the target transmission data may be encrypted, a frequency mapping table is generated based on the encrypted transmission data, audio encoding is performed on the target transmission data based on the frequency mapping table, so as to obtain audio frequency points corresponding to the target transmission data, a frequency value of each audio frequency point is mapped in the frequency mapping table, an audio waveform is generated based on the frequency value, and the audio waveform is used as audio data.
The encryption method for the target transmission data may be various, for example, the encryption may be performed by using an SE security chip, or the encryption may be performed on the target transmission data by using a hash algorithm or an MD5 value.
After encrypting the target transmission data, a frequency mapping table can be generated based on the encrypted transmission data, wherein the frequency mapping table can be understood that the data block identifier of each data block corresponds to a specific frequency, and the corresponding frequency can be mapped by inputting a specific data block. There are various ways of generating frequencies based on the encrypted transmission data, for example, the encrypted transmission data may be framed, each frame of data is decomposed into a plurality of blocks of metadata, and a mapping table between the metadata and the frequencies is generated according to a value corresponding to the number of valid bits of the metadata, and the mapping table is used as a frequency mapping table.
The encrypted transmission data is framed, each frame of data may include bytes of preset data, and the frame length may be set according to practical application, for example, may be 14 bytes or other numbers of bytes. The method of decomposing each frame data into a plurality of pieces of metadata may be various, for example, taking 14 bytes of each frame data as an example, taking 5 bits of data as metadata from each byte of the 14 bytes in sequence, and supplementing or replacing 0 with less than 5 bits of data from the next adjacent byte, where what is to be done is to decompose an 8-bit byte into a 5-bit metadata, the maximum value may be 2^5, the minimum value may be 0, the metadata required for a frame is (n×8+4/5), and N is the number of bytes in the frame. Metadata is understood here as a block of data.
After the metadata is decomposed, the mapping table between the metadata and the frequency can be generated according to the value corresponding to the significant bit number of the metadata, and various modes can be used for generating the mapping table, for example, an arithmetic sequence or an arithmetic sequence can be used for generating the mapping table between the metadata and the frequency, and specifically, the mapping table can be shown in the formula (1):
Freq metadata =Freq Base +metadata*θ f (1)
wherein Freq metadata Freq is the frequency of the metadata corresponding to the value Base Metadata is metadata, θ f For the corresponding frequency difference between two adjacent metadata, the generated frequency mapping table may be as shown in table 1:
TABLE 1
Figure BDA0003381535470000151
Figure BDA0003381535470000161
After the frequency mapping table is generated, audio encoding can be performed on the target transmission data, so as to obtain an audio frequency point corresponding to the target transmission data, various manners of audio encoding can be performed on the target transmission data, for example, the target transmission data is segmented into a plurality of data blocks, a first data block is obtained, a data identification code of the target transmission data generated based on the current time is segmented into a plurality of data blocks, a second data block is obtained, at least one message header is screened out from a preset message header set, the message header is used as a third data block, error correction encoding is performed on the first data block, the second data block and the third data block, error correction codes are segmented into a plurality of data blocks, so as to obtain a fourth data block, and the audio frequency point corresponding to the target transmission data is extracted from the first data block, the second data block, the third data block and the fourth data block.
The data blocks may be continuous or discontinuous blocks (chunk), and the header may be a header structure added to the target transmission data when the target transmission data is transmitted as a data packet, and the format of the header may be as shown in fig. 9.
The data identifier of the target transmission data may be a universal unique identifier (Universally Unique Identifier, uuid) of the target transmission data, and the data identifier of the target transmission data may be generated in various manners, for example, the current time may be obtained, the attribute information of the target transmission data and the current time may be fused, the uuid algorithm is adopted to generate the uuid from the fused data, and then the uuid is split into a plurality of chunk.
After the first data block, the second data block and the third data block are segmented, the first data block, the second data block and the third data block may be error correction coded, and there may be various error correction coding modes, for example, a reed-solomon (an error correction coding algorithm) may be adopted to perform coding, so as to obtain an error correction (RS) code, and the RS code is segmented into a plurality of chunks, so as to obtain a fourth data block.
After the first data block, the second data block, the third data block and the fourth data block are obtained, audio points can be extracted from the first data block, the second data block, the third data block and the fourth data, and various manners of extracting audio points can be adopted, for example, data block information of the first data block, the second data block, the third data block and the fourth data block can be obtained, a data block identifier is identified in the data block information, and the database identifier is used as the audio point of the target transmission data.
After audio encoding is performed on the target transmission data, an audio waveform corresponding to the target transmission data may be generated, for example, a frequency value of each audio frequency point in the frequency mapping table may be used, a single-frequency point audio waveform of each audio frequency point may be generated based on the frequency value, the single-frequency point audio waveforms are fused to obtain an audio waveform corresponding to the target transmission data, and the audio waveform is used as the audio data of the target transmission data.
The method for generating the single-frequency-point audio waveform of each audio frequency point based on the frequency value can be various, for example, a sin function can be adopted to process the frequency value, so as to obtain the single-frequency-point audio waveform of the audio frequency point.
After the target transmission data is converted into the audio data, the audio data may be played in various manners, for example, a corresponding audio signal may be generated based on an audio waveform of the target transmission data, and then the audio signal may be played, thereby playing the audio data. When the receiving terminal detects the audio signal, the audio data can be acquired, then, target transmission data is extracted from the audio data, the type of environmental noise is identified in the target transmission data, and data to be transmitted is identified in the target transmission data based on the type of environmental noise.
Optionally, when the audio data is detected to be played, transmission data may be acquired from the audio data, for example, the playing target audio data may be recorded in multiple manners, so as to obtain recording data, the recording data is parsed to obtain the current environmental noise type and the initial audio data, and the transmission data is identified from the initial audio data according to the current environmental noise type.
Various manners may be used to parse the recording data, for example, filtering may be performed on the recording data (PCM), unnecessary signals may be filtered, filtered audio data may be obtained, and the current environmental noise type and the initial audio data may be identified from the filtered audio data.
After the environmental noise type and the initial audio data are analyzed, the transmission data can be identified in the initial audio data, for example, the network transmission quality of the target audio data can be evaluated according to the current environmental noise type to obtain evaluation information, the processing mode of the target audio data is determined based on the evaluation information, and when the processing mode is to receive the target audio data, the transmission data can be identified in the initial audio data.
The network transmission quality (Qos) can be a network capable of providing better service capability for specified network communication by using various basic technologies, and is a security mechanism of the network, the evaluation of the network transmission quality is mainly based on the current environmental noise type, the Qos value is determined, and the main evaluation mode can be various, for example, the Qos value corresponding to the current environmental noise type can be screened out from a preset Qos value set, and the Qos value is used as evaluation information. For the current environmental noise type to evaluate the network transmission quality, it should be explained here that, for example, the current environmental noise type may be subway noise or indoor noise, when the subway noise is generated, when audio data is played, more environmental noise is obtained by recording the audio data relative to urban noise, so that the audio data is inaccurate, when the audio data is inaccurate, the accuracy of the transmission data analyzed in the audio data is also reduced, so that the data transmission is failed, therefore, by acquiring the environmental noise type in the current transmission environment, whether the received audio data is accurate or not can be determined, and further, the accuracy of the data transmission can be improved.
After the network transmission instruction of the target audio data is evaluated, a processing method for the target audio data can be determined based on the evaluation information, for example, a Qos value can be compared with a preset evaluation threshold, when the Qos value exceeds the preset evaluation threshold, the processing method for the target audio data is determined to be the processing method for receiving the target audio data, when the Qos value does not exceed the preset evaluation threshold, the processing method for the target audio data is determined to be the processing method for receiving the target audio data, or when the Qos value is the preset Qos value, the processing method for the target audio data is determined to be the processing method for receiving the target audio data, and when the Qos value is the non-preset Qos value, the processing method for the target audio data is determined to be the processing method for not receiving the target audio data.
When the processing mode is to receive the target audio data, there may be various modes for identifying the transmission data in the initial audio data, for example, extracting the base frequency data from the initial audio data, and performing header detection on the base frequency data according to the data block identifier in the base frequency data, when the header exists in the base frequency data, extracting the acoustic base frequency data from the initial audio data, extracting at least one data block identifier from the acoustic base frequency data, and performing error correction on the extracted data block identifier to obtain a data block identifier set, obtaining a target data block corresponding to each data block identifier in the data block identifier set, and fusing the target data blocks to obtain the transmission data.
The method for detecting the header of the baseband data according to the data block identifier in the baseband data may be various, for example, the data block identifier corresponding to each frequency in the baseband data is mapped in the frequency mapping table, the data block identifier is matched with the preset data block identifier corresponding to the header, when the matching is successful, the header of the baseband data can be determined, and when the matching fails, the header of the baseband data can be determined. When the base frequency data does not have a message header, the initial audio data does not contain the data to be transmitted or the transmitted data is not complete, and the identification of the transmitted data is stopped. When there is a header in the baseband data, it is indicated that the initial audio data includes data to be transmitted, at this time, the acoustic baseband data may be extracted from the initial audio data, at least one data block identifier may be extracted from the acoustic baseband data, and reed-solomon error correction may be applied to the extracted data block identifier, so as to obtain a data block identifier set, where the manner of extracting the data block identifier is the same as the manner of extracting the data block identifier in the baseband data, and will not be described in detail herein.
After the data block identifier set is obtained, the target data block corresponding to each data block identifier in the data block identifier set can be obtained, the target data blocks can be fused, and various fusion modes can be adopted, for example, the target data blocks can be decrypted, the decrypted data blocks are spliced or fused to obtain fused data, and the message header is deleted from the fused data, so that the transmission data is obtained.
Optionally, when the processing mode is that the target audio data is not received, discarding the initial audio data, generating prompt information, and sending the prompt information to the sending terminal, so that the sending terminal replays the target audio data based on the prompt information. When the target audio data can be determined to be played, the current transmission network quality is poor, and the sending terminal needs to be prompted to play the target audio data again so as to resend the transmission data, so that the success rate of data transmission can be improved.
It should be noted that, after the data transmission process may be as shown in fig. 10, after the data to be transmitted and the current environmental noise are obtained, the Qos module may determine the environmental noise type of the current environmental noise, and add the environmental noise type to the data to be transmitted, so as to obtain the target transmission data. The method comprises the steps of carrying out sound wave encoding on target transmission data through a sound wave encoding module, adding a message header into the data header generating module, generating a data identification code (UUID) of the target transmission data based on the current time, carrying out error correction encoding through an error correction encoding module (reed-solomom), finally playing the audio data through generating audio data (PCM), recording the audio data when the audio data are detected to be played by a receiving terminal, carrying out PCM analysis on the recorded data, acquiring the current environmental noise type through a Qos module, evaluating the network transmission quality, carrying out data header detection when the network transmission quality meets the requirement, extracting sound wave fundamental frequency data from the recorded data when the message header exists in the recorded data, acquiring a data block identifier in the sound wave fundamental frequency data, carrying out error correction through reed-solomom to obtain a data block identifier set, acquiring a target data block corresponding to each data block identifier in the data block identifier set, and carrying out data package to obtain the data to be transmitted.
As can be seen from the foregoing, after obtaining the data to be transmitted and the current environmental noise, the embodiment of the present application performs feature extraction on the current environmental noise to obtain an audio feature of the current environmental noise, then determines an environmental noise type of the current environmental noise based on the audio feature, adds the environmental noise type to the data to be transmitted to obtain target transmission data, converts the target transmission data into audio data, and plays the audio data, so that a receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type; according to the scheme, the current environmental noise can be acquired, the environmental noise type of the current environmental noise is determined, the environmental noise type is added to the data to be transmitted, so that the receiving terminal can extract the environmental noise type from the received audio data, the current network transmission quality (Qos) of the data to be transmitted is determined according to the environmental noise type, the data to be transmitted is acquired based on the current network transmission quality, and therefore the transmission success rate of the data transmission can be improved.
According to the method described in the above embodiments, examples are described in further detail below.
In this embodiment, the data transmission device is specifically integrated in an electronic device, where the electronic device is a terminal, and for distinguishing the electronic device from the terminal, the terminal may be divided into a sending terminal and a receiving terminal, where the sending terminal may be a terminal that transmits data to be transmitted, and the receiving terminal may be a terminal that receives the data to be transmitted.
Training noise type recognition model
(1) The transmitting terminal acquires an audio data sample.
For example, the transmitting terminal determines the audio waveforms of the environmental noise in each scene, then collects the audio data of different time periods in the scenes, marks the type of the environmental noise in the object scene in the audio data as a positive sample, determines the type of the negative sample audio, collects the audio data of the negative sample with the same number as the positive sample, marks the type of the environmental noise in the scene, and obtains the negative sample audio data. And performing data cleaning on the positive sample audio data sample and the negative sample audio data sample, thereby obtaining an audio data sample.
(2) And the sending terminal predicts the environmental noise type of the audio data sample by adopting a preset noise type recognition model to obtain the predicted environmental noise type.
For example, the transmitting terminal performs feature extraction on the audio data sample by using a preset noise type identification model to obtain sample audio features of the audio data sample, identifies sample environmental noise types of data blocks of the audio data sample in the sample audio features, counts the number of sample types corresponding to each environmental noise type in the sample environmental noise types, and screens out the environmental noise types of the audio data sample from the sample environmental noise types based on the number of sample types to obtain a predicted environmental noise type.
(3) And the sending terminal converges the preset noise type recognition model according to the marked environment noise type and the predicted environment noise type to obtain a trained noise type recognition model.
For example, the transmitting terminal compares the marked environmental noise type with the predicted environmental noise type, so as to obtain loss information of the audio data sample, and based on the loss information, adopts a gradient descent algorithm to update network parameters of a preset noise type recognition model so as to converge the preset noise type recognition model, so that a trained noise type recognition model is obtained.
And (II) the terminal adopts the trained noise type recognition model to determine the environment noise type of the current environment noise.
The trained noise recognition model comprises a noise feature extraction network and a noise type feature conversion network.
As shown in fig. 11, a data transmission method specifically includes the following steps:
201. the sending terminal obtains data to be transmitted and current environmental noise.
For example, for the data to be transmitted, the sending terminal may directly acquire the data input by the user as the data to be transmitted, or may also receive the service data returned by the service server or other terminals, and use the service data as the data to be transmitted. For the current environmental noise, the transmitting terminal can acquire the sound in the current environment of the preset time through the sound acquisition equipment of the data transmission device after acquiring the data to be transmitted to obtain the current environmental noise, or can acquire the environmental sound in the current environment in real time, and when the data to be transmitted is acquired, the acquired environmental sound of the preset time at the current moment is extracted from the acquired environmental sound to obtain the current environmental noise.
202. And the sending terminal performs feature extraction on the current environmental noise to obtain the audio features of the current environmental noise.
For example, the transmitting terminal detects a channel of the current environmental noise, and when the channel of the current environmental noise is a binaural channel, converts the current environmental noise into a monophonic current environmental noise. The current environmental noise of the target channel is segmented according to a preset time window, so that a plurality of audio data blocks are obtained, the preset time window can be any time, for example, 30s or other time, and the preset time window can be adjusted according to practical application.
The sending terminal extracts spectrum image information from the audio data block, identifies a spectrum image value corresponding to the audio data block in the spectrum image information, acquires a data sampling amount corresponding to each audio feature dimension, extracts a Mel spectrum feature (mel 128) of 128 dimensions with the data sampling amount (fft size) as 4096, performs standardization to obtain a first standardized audio feature, extracts a Mel cepstrum feature (mfcc) with the fft size as 4096, performs standardization to obtain a second standardized audio feature, extracts a zero crossing rate feature with the fft size as 1024, performs binary encoding on the zero crossing rate feature to obtain an encoded audio feature, and extracts a spectrum flatness feature (flatness) and a spectrum centroid feature, performs standardization to obtain a third standardized audio feature and a fourth standardized audio feature, and takes the first standardized audio feature, the second standardized audio feature, the fourth standardized audio feature, and the initial audio feature as features. And splicing the initial audio features and the frequency spectrum images to obtain the audio features corresponding to each audio data block.
203. The transmitting terminal determines an ambient noise type of the current ambient noise based on the audio characteristics.
For example, the sending terminal fuses the audio features, converts the fused audio features into a Spectrogram (spectrum), and adopts a noise feature extraction network of a trained noise type recognition model to perform feature extraction on the Spectrogram, so as to obtain a noise audio feature set of the current environmental noise, where the noise audio feature set may include noise audio features corresponding to each audio data block.
The network structure of the noise feature extraction network of the noise type recognition model after training may be composed of multiple network layers, for example, a second layer except an input layer may be 64 convolution kernels, the convolution kernel size is 3×3, the stride (step size) is 1, the padding pixel value is 1, a third layer is a maxpoling layer, windows (window) is 2×2, the stride is 2, a fourth layer uses 128 convolution kernels, the convolution kernel size is 3×3, the stride is 1, the fifth layer is a maxpoling layer, windows is 2×2, the stride is 2, the sixth layer uses 256 convolution kernels, the convolution kernel size is 3×3, the stride is 1, the seventh layer is a maxpoling layer, and the stride is 2×2, and the stride is 2.
The sending terminal adopts a BN layer to carry out batch normalization processing on noise audio features in the noise audio feature set, adopts a MaxPolling layer to carry out pooling processing on the noise audio features subjected to batch normalization processing, so as to obtain secondary basic noise audio features corresponding to each audio data block, and combines the basic noise audio features to obtain a basic noise audio feature set.
The method comprises the steps that a sending terminal obtains time information of an audio data block in current environmental noise, based on the time information, the audio data block in the current environmental noise is sequenced from front to back according to time sequence to obtain sequencing information, and then, according to the sequencing information, the basic noise audio characteristics corresponding to the audio data block are sequenced to obtain sequencing information of the basic noise audio characteristics.
Determining a basic noise audio feature to be converted currently in a basic noise audio feature set to obtain a current basic noise audio feature, inquiring a target basic noise audio feature arranged in front of the current basic noise audio feature in the basic noise audio feature set based on sequencing information, converting the current basic noise audio feature into a noise type feature according to the target basic noise audio feature when the target basic noise audio feature exists, acquiring a hidden state feature of the target basic noise audio feature by adopting a bidirectional BGRU, converting the current basic noise into a noise type feature according to the hidden state feature of the target basic noise audio feature through a noise type feature conversion network, calculating feature ratio of the current basic noise audio feature and the hidden state feature, determining a target hidden state feature corresponding to the current basic noise audio feature based on the feature ratio, and performing dimension conversion on the target hidden state feature to obtain a noise type feature corresponding to the basic noise audio feature. And when the target basic noise audio features do not exist, taking the target basic noise features as noise type features, and returning to the step of determining the basic noise audio features which need to be converted currently in the basic noise audio feature set until all the basic noise audio features in the basic noise audio feature set are converted into noise type features, so as to obtain the noise type features corresponding to each basic noise audio feature.
The sending terminal maps the noise environment types out of the classification probability corresponding to the noise of each environment type through the softmax network, and based on the classification probability, the environment noise types corresponding to the noise type characteristics are screened out from the noise environment types, so that the candidate environment noise types of the audio data block are obtained. The largest environmental noise type among the candidate environmental noise types is used as the environmental noise type of the current environmental noise, or a weighting parameter of each environmental noise type can be obtained, the type number of each environmental noise type is weighted based on the weighting parameter, and then the environmental noise type with the largest weighted type number is screened out from the candidate environmental noise types, so that the environmental noise type of the current environmental noise is obtained.
204. And the sending terminal adds the environmental noise type to the data to be transmitted to obtain target transmission data.
For example, the transmitting terminal directly adds the environmental noise type to the data to be transmitted, so as to obtain target transmission data, or may screen out a data tag corresponding to the environmental noise type from a preset data tag set, and add the data tag to the data to be transmitted, so as to obtain target transmission data, or may also convert the environmental noise type into environmental noise type data, and add the environmental noise type data to the data to be transmitted, so as to obtain target transmission data.
205. And transmitting the target transmission data converted into audio data and playing the audio data.
For example, the sending terminal may encrypt the target transmission data by using an SE security chip, or may encrypt the target transmission data by using a hash algorithm or an MD5 value, to obtain encrypted transmission data. Taking 14 bytes of each frame of data as an example, taking 5 bits of data from each byte of 14 bytes in sequence as metadata, supplementing or replacing 0 from the adjacent byte below by less than 5 bits of data, wherein what is needed is to decompose an 8-bit byte into 5 bits of metadata to represent, the maximum value is 2^5, the minimum value is 0, the metadata required by a frame is (n×8+4/5), and N is the number of bytes in the frame. Metadata is understood here as a block of data. The mapping table between the metadata and the frequency is generated by using an arithmetic sequence or an arithmetic sequence, and specifically, the mapping table can be used as a frequency mapping table as shown in a formula (1).
The sending terminal divides target transmission data into a plurality of data blocks to obtain a first data block, obtains the current time, fuses attribute information of the current time and the target transmission data, generates uuid for the fused data by adopting a uuid algorithm, and divides the uuid into a plurality of chunk to obtain a second data block. The first data block, the second data block and the third data block may be error correction coded by reed-solomon to obtain an RS code, and the RS code is split into multiple blocks to obtain a fourth data block. And acquiring data block information of the first data block, the second data block, the third data block and the fourth data block, identifying a data block identifier in the data block information, and taking the database identifier as an audio frequency point of target transmission data.
And processing the frequency value by adopting a sin function in the frequency value of each audio frequency point in the frequency mapping table by the sending terminal, so as to obtain a single-frequency-point audio waveform of the audio frequency point. And fusing the audio waveforms of the single frequency points to obtain audio waveforms corresponding to the target transmission data, and taking the audio waveforms as the audio data of the target transmission data. And generating a corresponding audio signal based on the audio waveform of the target transmission data, and then playing the audio signal, thereby realizing the playing of the audio data.
206. When the receiving terminal detects that the audio data is played, the receiving terminal records the played audio data to obtain recording data.
For example, when the receiving terminal detects that the audio data is played, the receiving terminal starts the recording device to record the played audio data, so that the recording data is obtained.
207. And the receiving terminal analyzes the recording data to obtain the current environmental noise type and the initial audio data.
For example, the receiving terminal may filter recording data (PCM), filter out unnecessary signals, obtain filtered audio data, and identify a current ambient noise type and initial audio data from the filtered audio data.
208. The receiving terminal identifies the transmission data from the initial audio data according to the current ambient noise type.
For example, the receiving terminal screens out a Qos value corresponding to the current environmental noise type from a preset Qos value set, and uses the Qos value as evaluation information. Comparing the Qos value with a preset evaluation threshold, determining that the processing mode of the target audio data is to receive the target audio data when the Qos value exceeds the preset evaluation threshold, determining that the processing mode of the target audio data is not to receive the target audio data when the Qos value does not exceed the preset evaluation threshold, or determining that the processing mode of the target audio data is to receive the target audio data when the Qos value is the preset Qos value, and determining that the processing mode of the target audio data is not to receive the target audio data when the Qos value is not the preset Qos value.
When the processing mode is that the audio data is received, the receiving terminal extracts the fundamental frequency data from the initial audio data, maps the data block identifier corresponding to each frequency in the fundamental frequency data in the frequency mapping table, matches the data block identifier with the preset data block identifier corresponding to the message header, when the matching is successful, the message header of the fundamental frequency data can be determined, and when the matching fails, the message header can be determined to be absent in the fundamental frequency data. When the base frequency data does not have a message header, the initial audio data does not contain the data to be transmitted or the transmitted data is not complete, and the identification of the transmitted data is stopped. When the message header exists in the fundamental frequency data, the acoustic fundamental frequency data can be extracted from the initial audio data, at least one data block identifier is extracted from the acoustic fundamental frequency data, and reed-solomon error correction is adopted for the extracted data block identifier, so that a data block identifier set is obtained. And obtaining a target data block corresponding to each data block identifier in the data block identifier set, decrypting the target data block, splicing or fusing the decrypted data blocks to obtain fused data, and deleting a message header from the fused data to obtain transmission data.
When the processing mode is that the target audio data is not received, the receiving terminal discards the initial audio data, generates prompt information and sends the prompt information to the sending terminal so that the sending terminal replays the target audio data based on the prompt information. When the target audio data can be determined to be played, the current transmission network quality is poor, and the sending terminal needs to be prompted to play the target audio data again so as to resend the transmission data, so that the success rate of data transmission can be improved.
As can be seen from the above, after obtaining the data to be transmitted and the current environmental noise, the terminal in this embodiment performs feature extraction on the current environmental noise to obtain an audio feature of the current environmental noise, then determines an environmental noise type of the current environmental noise based on the audio feature, adds the environmental noise type to the data to be transmitted to obtain target transmission data, converts the target transmission data into audio data, and plays the audio data, so that the receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type; according to the scheme, the current environmental noise can be acquired, the environmental noise type of the current environmental noise is determined, the environmental noise type is added to the data to be transmitted, so that the receiving terminal can extract the environmental noise type from the received audio data, the current network transmission quality (Qos) of the data to be transmitted is determined according to the environmental noise type, the data to be transmitted is acquired based on the current network transmission quality, and therefore the transmission success rate of the data transmission can be improved.
In order to better implement the above method, the embodiment of the present invention further provides a data transmission device, where the data transmission device may be integrated in an electronic device, such as a server or a terminal, where the terminal may include a tablet computer, a notebook computer, and/or a personal computer.
For example, as shown in fig. 12, the data transmission apparatus may include an acquisition unit 301, an extraction unit 302, a determination unit 303, an addition unit 304, and a playback unit 305, as follows:
(1) An acquisition unit 301;
the acquiring unit 301 is configured to acquire data to be transmitted and current environmental noise, where the current environmental noise is a sound in a current transmission environment acquired in real time.
For example, the acquiring unit 301 may specifically be configured to directly acquire data input by a user as data to be transmitted, or may also receive service data returned by a service server or other terminals, and use the service data as data to be transmitted. After the data to be transmitted is acquired, acquiring the sound under the current environment of the preset time through the sound acquisition equipment of the data transmission device to obtain the current environmental noise, or acquiring the environmental sound under the current environment in real time, and extracting the acquired environmental sound of the preset time at the current moment from the acquired environmental sound when the data to be transmitted is acquired to obtain the current environmental noise.
(2) An extraction unit 302;
the extracting unit 302 is configured to perform feature extraction on the current environmental noise, so as to obtain an audio feature of the current environmental noise.
For example, the extracting unit 302 may be specifically configured to convert a channel of the current environmental noise to obtain the current environmental noise of the target channel, block the current environmental noise of the target channel to obtain a plurality of audio data blocks, and perform feature extraction on the audio data blocks to obtain audio features corresponding to the audio data blocks.
(3) A determination unit 303;
a determining unit 303 for determining an ambient noise type of the current ambient noise based on the audio characteristics.
For example, the determining unit 303 may specifically be configured to identify candidate environmental noise types of the corresponding audio data block in the audio feature by using a trained noise identification model, count the number of types corresponding to each environmental noise type in the candidate environmental noise types, and screen the environmental noise types of the current environmental noise from the candidate environmental noise types based on the number of types.
(4) An adding unit 304;
an adding unit 304, configured to add the environmental noise type to the data to be transmitted, so as to obtain target transmission data.
For example, the adding unit 304 may be specifically configured to directly add the environmental noise type to the data to be transmitted, thereby obtaining target transmission data, or may screen a data tag corresponding to the environmental noise type from a preset data tag set, and add the data tag to the data to be transmitted, thereby obtaining target transmission data, or may also convert the environmental noise type into environmental noise type data, and add the environmental noise type data to the data to be transmitted, thereby obtaining target transmission data.
(5) A playback unit 305;
and a playing unit 305, configured to convert the target transmission data into audio data, and play the audio data, so that the receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type.
For example, the playing unit 305 may specifically be configured to encrypt the target transmission data, generate a frequency mapping table based on the encrypted transmission data, perform audio encoding on the target transmission data based on the frequency mapping table to obtain audio frequency points corresponding to the target transmission data, map a frequency value of each audio frequency point in the frequency mapping table, and generate an audio waveform based on the frequency value, and use the audio waveform as the audio data. And playing the audio data so that the receiving terminal can acquire the data to be transmitted from the audio data according to the type of the environmental noise.
Optionally, the data transmission device may further include an identification unit 306, as shown in fig. 13, specifically may be as follows:
and the identification unit 306 is configured to record the played target audio data to obtain the transmission data when the audio data play is detected.
For example, the identifying unit 306 may be specifically configured to record, when audio data playing is detected, the played target audio data to obtain recording data, parse the recording data to obtain a current environmental noise type and initial audio data, and identify transmission data in the initial audio data according to the current environmental noise type.
Optionally, the data transmission device may further include a training unit 307, as shown in fig. 14, and specifically may be as follows:
the training unit 307 is configured to train the preset noise type recognition model to obtain a trained noise type recognition model.
For example, the training unit 307 may be specifically configured to obtain an audio data sample, where the audio data sample includes audio data labeled with an environmental noise type, predict the environmental noise type of the audio data sample by using a preset noise type recognition model to obtain a predicted environmental noise type, and converge the preset noise type recognition model according to the labeled environmental noise type and the predicted environmental noise type to obtain a trained noise type recognition model.
In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.
As can be seen from the foregoing, after obtaining the data to be transmitted and the current environmental noise, the embodiment of the present application performs feature extraction on the current environmental noise to obtain an audio feature of the current environmental noise, then determines an environmental noise type of the current environmental noise based on the audio feature, adds the environmental noise type to the data to be transmitted to obtain target transmission data, converts the target transmission data into audio data, and plays the audio data, so that a receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type; according to the scheme, the current environmental noise can be acquired, the environmental noise type of the current environmental noise is determined, the environmental noise type is added to the data to be transmitted, so that the receiving terminal can extract the environmental noise type from the received audio data, the current network transmission quality (Qos) of the data to be transmitted is determined according to the environmental noise type, the data to be transmitted is acquired based on the current network transmission quality, and therefore the transmission success rate of the data transmission can be improved.
The embodiment of the invention also provides an electronic device, as shown in fig. 15, which shows a schematic structural diagram of the electronic device according to the embodiment of the invention, specifically:
the electronic device may include one or more processing cores 'processors 401, one or more computer-readable storage media's memory 402, power supply 403, and input unit 404, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 15 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, preferably the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 404, which input unit 404 may be used for receiving input digital or character information and generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:
the method comprises the steps of obtaining data to be transmitted and current environmental noise, wherein the current environmental noise is sound in a current transmission environment collected in real time, extracting characteristics of the current environmental noise to obtain audio characteristics of the current environmental noise, determining the environmental noise type of the current environmental noise based on the audio characteristics, adding the environmental noise type to the data to be transmitted to obtain target transmission data, converting the target transmission data into audio data, and playing the audio data, so that a receiving terminal obtains the data to be transmitted in the audio data according to the environmental noise type.
For example, the electronic device may directly acquire data input by the user as data to be transmitted, or may also receive service data returned by the service server or other terminals, and use the service data as data to be transmitted. After the data to be transmitted is acquired, acquiring the sound under the current environment of the preset time through the sound acquisition equipment of the data transmission device to obtain the current environmental noise, or acquiring the environmental sound under the current environment in real time, and extracting the acquired environmental sound of the preset time at the current moment from the acquired environmental sound when the data to be transmitted is acquired to obtain the current environmental noise. Converting the sound channel of the current environmental noise to obtain the current environmental noise of the target sound channel, blocking the current environmental noise of the target sound channel to obtain a plurality of audio data blocks, and extracting the characteristics of the audio data blocks to obtain the audio characteristics corresponding to the audio data blocks. And identifying candidate environmental noise types of the corresponding audio data blocks in the audio features by adopting a trained noise identification model, counting the number of types corresponding to each environmental noise type in the candidate environmental noise types, and screening the environmental noise types of the current environmental noise from the candidate environmental noise types based on the number of types. And adding the environmental noise type to the data to be transmitted to obtain target transmission data, or screening a data tag corresponding to the environmental noise type from a preset data tag set, adding the data tag to the data to be transmitted to obtain target transmission data, or converting the environmental noise type into the environmental noise type data, and adding the environmental noise type data to the data to be transmitted to obtain target transmission data. Encrypting the target transmission data, generating a frequency mapping table based on the encrypted transmission data, performing audio coding on the target transmission data based on the frequency mapping table to obtain audio frequency points corresponding to the target transmission data, mapping a frequency value of each audio frequency point in the frequency mapping table, generating an audio waveform based on the frequency value, and taking the audio waveform as the audio data. And playing the audio data so that the receiving terminal can acquire the data to be transmitted from the audio data according to the type of the environmental noise. When the audio data play is detected, recording the played target audio data to obtain recording data, analyzing the recording data to obtain the current environment noise type and initial audio data, and identifying transmission data in the initial audio data according to the current environment noise type.
The specific implementation of each operation may be referred to the previous embodiments, and will not be described herein.
As can be seen from the above, after obtaining the data to be transmitted and the current environmental noise, the embodiment of the invention performs feature extraction on the current environmental noise to obtain the audio feature of the current environmental noise, then determines the environmental noise type of the current environmental noise based on the audio feature, adds the environmental noise type to the data to be transmitted to obtain the target transmission data, converts the target transmission data into the audio data, and plays the audio data, so that the receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type; according to the scheme, the current environmental noise can be acquired, the environmental noise type of the current environmental noise is determined, the environmental noise type is added to the data to be transmitted, so that the receiving terminal can extract the environmental noise type from the received audio data, the current network transmission quality (Qos) of the data to be transmitted is determined according to the environmental noise type, the data to be transmitted is acquired based on the current network transmission quality, and therefore the transmission success rate of the data transmission can be improved.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present invention provides a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the data transmission methods provided by the embodiment of the present invention. For example, the instructions may perform the steps of:
the method comprises the steps of obtaining data to be transmitted and current environmental noise, wherein the current environmental noise is sound in a current transmission environment collected in real time, extracting characteristics of the current environmental noise to obtain audio characteristics of the current environmental noise, determining the environmental noise type of the current environmental noise based on the audio characteristics, adding the environmental noise type to the data to be transmitted to obtain target transmission data, converting the target transmission data into audio data, and playing the audio data, so that a receiving terminal obtains the data to be transmitted in the audio data according to the environmental noise type.
For example, the data input by the user is obtained as the data to be transmitted, or the service data returned by the service server or other terminals may also be received, and the service data is taken as the data to be transmitted. After the data to be transmitted is acquired, acquiring the sound under the current environment of the preset time through the sound acquisition equipment of the data transmission device to obtain the current environmental noise, or acquiring the environmental sound under the current environment in real time, and extracting the acquired environmental sound of the preset time at the current moment from the acquired environmental sound when the data to be transmitted is acquired to obtain the current environmental noise. Converting the sound channel of the current environmental noise to obtain the current environmental noise of the target sound channel, blocking the current environmental noise of the target sound channel to obtain a plurality of audio data blocks, and extracting the characteristics of the audio data blocks to obtain the audio characteristics corresponding to the audio data blocks. And identifying candidate environmental noise types of the corresponding audio data blocks in the audio features by adopting a trained noise identification model, counting the number of types corresponding to each environmental noise type in the candidate environmental noise types, and screening the environmental noise types of the current environmental noise from the candidate environmental noise types based on the number of types. And adding the environmental noise type to the data to be transmitted to obtain target transmission data, or screening a data tag corresponding to the environmental noise type from a preset data tag set, adding the data tag to the data to be transmitted to obtain target transmission data, or converting the environmental noise type into the environmental noise type data, and adding the environmental noise type data to the data to be transmitted to obtain target transmission data. Encrypting the target transmission data, generating a frequency mapping table based on the encrypted transmission data, performing audio coding on the target transmission data based on the frequency mapping table to obtain audio frequency points corresponding to the target transmission data, mapping a frequency value of each audio frequency point in the frequency mapping table, generating an audio waveform based on the frequency value, and taking the audio waveform as the audio data. And playing the audio data so that the receiving terminal can acquire the data to be transmitted from the audio data according to the type of the environmental noise. When the audio data play is detected, recording the played target audio data to obtain recording data, analyzing the recording data to obtain the current environment noise type and initial audio data, and identifying transmission data in the initial audio data according to the current environment noise type.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium may execute the steps in any data transmission method provided by the embodiments of the present invention, the beneficial effects that any data transmission method provided by the embodiments of the present invention can be achieved, which are detailed in the previous embodiments and are not described herein.
Among other things, according to one aspect of the present application, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in various alternative implementations of the data transfer aspects or transaction data transfer aspects described above.
The foregoing has described in detail a data transmission method, apparatus, electronic device and computer readable storage medium according to embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only for aiding in understanding the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present invention, the present description should not be construed as limiting the present invention.

Claims (20)

1. A data transmission method, comprising:
acquiring data to be transmitted and current environmental noise, wherein the current environmental noise is a sound in a current transmission environment acquired in real time;
extracting the characteristics of the current environmental noise to obtain the audio characteristics of the current environmental noise;
determining an ambient noise type of the current ambient noise based on the audio characteristics;
adding the environmental noise type to the data to be transmitted to obtain target transmission data;
and converting the target transmission data into audio data and playing the audio data so that a receiving terminal obtains the data to be transmitted from the audio data according to the environmental noise type.
2. The data transmission method according to claim 1, further comprising:
when the audio data play is detected, recording the played target audio data to obtain recording data;
analyzing the recording data to obtain the current environmental noise type and initial audio data;
and identifying transmission data in the initial audio data according to the current environmental noise type.
3. The data transmission method according to claim 2, wherein the identifying transmission data in the initial audio data according to the current environmental noise type includes:
according to the current environmental noise type, evaluating the network transmission quality of the target audio data to obtain evaluation information;
determining a processing mode for the target audio data based on the evaluation information;
and identifying transmission data in the initial audio data when the processing mode is to receive the target audio data.
4. A data transmission method according to claim 3, wherein said identifying transmission data in said initial audio data comprises:
extracting fundamental frequency data from the initial audio data, and detecting a message header of the fundamental frequency data according to a data block identifier in the fundamental frequency data;
When the message header exists in the fundamental frequency data, extracting sound wave fundamental frequency data from the initial audio data;
extracting at least one data block identifier from the sound wave fundamental frequency data, and correcting the extracted data block identifier to obtain a data block identifier set;
and acquiring a target data block corresponding to each data block identifier in the data block identifier set, and fusing the target data blocks to obtain transmission data.
5. The data transmission method according to claim 3, wherein after determining the processing mode for the target audio data based on the evaluation information, further comprising:
when the processing mode is that the target audio data is not received, discarding the initial audio data and generating prompt information;
and sending the prompt information to a sending terminal so that the sending terminal replays the target audio data based on the prompt information.
6. The data transmission method according to claims 1 to 5, wherein the feature extraction of the current environmental noise to obtain the audio feature of the current environmental noise includes:
converting the sound channel of the current environmental noise to obtain the current environmental noise of the target sound channel;
Blocking the current environmental noise of the target sound channel to obtain a plurality of audio data blocks;
and extracting the characteristics of the audio data block to obtain the audio characteristics corresponding to the audio data block.
7. The method for data transmission according to claim 6, wherein the performing feature extraction on the audio data blocks to obtain audio features corresponding to each audio data block includes:
extracting spectrum image information from the audio data block, and identifying a spectrum image value corresponding to the audio data block in the spectrum image information;
acquiring data sampling quantity corresponding to each audio feature dimension, and extracting initial audio features corresponding to each audio feature dimension from the spectrum image information according to the data sampling quantity;
and splicing the initial audio features with the frequency spectrum image values to obtain the audio features corresponding to each audio data block.
8. The method of data transmission according to claim 6, wherein said determining an ambient noise type of the current ambient noise based on the audio characteristics comprises:
identifying candidate environmental noise types of the corresponding audio data blocks in the audio features by adopting a trained noise identification model;
Counting the number of types corresponding to each environmental noise type in the candidate environmental noise types;
and screening the environmental noise type of the current environmental noise from the candidate environmental noise types based on the type number.
9. The data transmission method of claim 8, wherein the employing a trained noise recognition model to identify candidate environmental noise types for corresponding blocks of audio data in the audio features comprises:
extracting noise audio features from the audio features of each audio data block by adopting a trained noise type recognition model to obtain a noise audio feature set of the current environmental noise;
carrying out batch normalization processing on noise audio features in the noise audio feature set to obtain a basic noise audio feature set;
and identifying the environment noise type corresponding to each basic noise audio feature in the basic noise audio feature set, and obtaining the candidate environment noise type of the audio data block.
10. The method of claim 9, wherein identifying the ambient noise type for each set of base noise audio features in the set of base noise audio features results in candidate ambient noise types for the block of audio data, comprising:
Acquiring time information of the audio data block in the current environmental noise, and sequencing basic noise audio features in the basic noise audio feature set based on the time information;
based on the ordering information, respectively converting the basic noise audio features in the basic noise audio feature set into noise type features;
and screening out the environmental noise type corresponding to the noise type characteristic from a preset environmental noise type set to obtain the candidate environmental noise type of the audio data block.
11. The data transmission method according to claim 10, wherein the converting the base noise audio features in the base noise audio feature set into noise type features based on the ranking information, respectively, comprises:
determining the basic noise audio characteristics to be converted currently in the basic noise audio characteristic set to obtain the current basic noise audio characteristics;
querying a target basic noise audio feature ranked in front of the current basic noise audio feature in the basic noise audio feature set based on ranking information;
and respectively converting the basic noise audio features in the basic noise audio feature set into noise type features according to the query result of the target basic noise audio features.
12. The data transmission method according to claim 11, wherein the converting the basic noise audio features in the basic noise audio feature set into noise type features according to the query result of the target basic noise audio features, respectively, includes:
when the target basic noise audio feature exists, converting the current basic noise audio feature into a noise type feature according to the target basic noise audio feature;
when the target basic noise audio feature does not exist, taking the target basic noise audio feature as a noise type feature;
and returning to the step of determining the basic noise audio features which need to be converted currently in the basic noise audio feature set until the basic noise audio features in the basic noise audio feature set are all converted into noise type features, and obtaining the noise type features corresponding to each basic noise audio feature.
13. The method of claim 12, wherein said converting the current base noise audio characteristic to a noise type characteristic based on the target base noise audio characteristic comprises:
Acquiring hidden state features corresponding to the target basic noise audio features, wherein the hidden state features are used for indicating hidden states transmitted in the process of converting the target basic noise audio features into noise type features;
calculating the feature ratio of the current basic noise audio feature and the hidden state feature, and determining the target hidden state feature corresponding to the current basic noise audio feature based on the feature ratio;
and carrying out dimension transformation on the target hidden state feature to obtain a noise type feature corresponding to the current basic noise audio feature.
14. The data transmission method of claim 8, wherein said employing a trained noise type recognition model further comprises, prior to extracting noise audio features from the audio features of each of the blocks of audio data:
acquiring an audio data sample, wherein the audio data sample comprises audio data marked with an environmental noise type;
predicting the environmental noise type of the audio data sample by adopting a preset noise type recognition model to obtain a predicted environmental noise type;
and converging the preset noise type recognition model according to the labeling environment noise type and the prediction environment noise type to obtain a trained noise type recognition model.
15. The data transmission method according to any one of claims 1 to 5, characterized in that the converting the target transmission data into audio data includes:
encrypting the target transmission data, and generating a frequency mapping table based on the encrypted transmission data;
performing audio coding on the target transmission data based on the frequency mapping table to obtain an audio frequency point corresponding to the target transmission data;
and mapping a frequency value of each audio frequency point in the frequency mapping table, generating an audio waveform based on the frequency value, and taking the audio waveform as audio data.
16. The data transmission method according to claim 15, wherein the audio encoding the target transmission data based on the frequency mapping table to obtain audio frequency points corresponding to the target transmission data includes:
dividing the target transmission data into a plurality of data blocks to obtain a first data block, dividing a data identification code of the target transmission data generated based on the current time into a plurality of data blocks to obtain a second data block;
screening at least one message header from a preset message header set, and taking the message header as a third data block;
Performing error correction coding on the first data block, the second data block and the third data block, and dividing the error correction code into a plurality of data blocks to obtain a fourth data block;
and extracting audio frequency points from the first data block, the second data block, the third data block and the fourth data block to obtain the audio frequency points corresponding to the target transmission data.
17. A data transmission apparatus, comprising:
the acquisition unit is used for acquiring data to be transmitted and current environmental noise, wherein the current environmental noise is a sound in a current transmission environment acquired in real time;
the extraction unit is used for extracting the characteristics of the current environmental noise to obtain the audio characteristics of the current environmental noise;
a determining unit configured to determine an environmental noise type of the current environmental noise based on the audio feature;
an adding unit, configured to add the environmental noise type to the data to be transmitted, to obtain target transmission data;
and the playing unit is used for converting the target transmission data into audio data and playing the audio data so that a receiving terminal can acquire the data to be transmitted from the audio data according to the environmental noise type.
18. An electronic device comprising a processor and a memory, the memory storing an application, the processor being configured to run the application in the memory to perform the steps in the data transmission method of any one of claims 1 to 16.
19. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data transmission method of any one of claims 1 to 16.
20. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the data transmission method of any one of claims 1 to 16.
CN202111435155.4A 2021-11-29 2021-11-29 Data transmission method, device, electronic equipment and computer readable storage medium Pending CN116189706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111435155.4A CN116189706A (en) 2021-11-29 2021-11-29 Data transmission method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111435155.4A CN116189706A (en) 2021-11-29 2021-11-29 Data transmission method, device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116189706A true CN116189706A (en) 2023-05-30

Family

ID=86438880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111435155.4A Pending CN116189706A (en) 2021-11-29 2021-11-29 Data transmission method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116189706A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116302A (en) * 2023-10-24 2023-11-24 深圳市齐奥通信技术有限公司 Audio data analysis method, system and storage medium under complex scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117116302A (en) * 2023-10-24 2023-11-24 深圳市齐奥通信技术有限公司 Audio data analysis method, system and storage medium under complex scene
CN117116302B (en) * 2023-10-24 2023-12-22 深圳市齐奥通信技术有限公司 Audio data analysis method, system and storage medium under complex scene

Similar Documents

Publication Publication Date Title
CN107392121B (en) Self-adaptive equipment identification method and system based on fingerprint identification
EP2657884B1 (en) Identifying multimedia objects based on multimedia fingerprint
CN111652280B (en) Behavior-based target object data analysis method, device and storage medium
CN108768695B (en) KQI problem positioning method and device
CN110046297B (en) Operation and maintenance violation identification method and device and storage medium
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN113707173B (en) Voice separation method, device, equipment and storage medium based on audio segmentation
CN113270197A (en) Health prediction method, system and storage medium based on artificial intelligence
CN110995273B (en) Data compression method, device, equipment and medium for power database
CN114297448B (en) License applying method, system and medium based on intelligent epidemic prevention big data identification
CN111581258B (en) Security data analysis method, device, system, equipment and storage medium
EP3890312B1 (en) Distributed image analysis method and system, and storage medium
CN116189706A (en) Data transmission method, device, electronic equipment and computer readable storage medium
CN113676498B (en) Prediction machine management system for accessing third-party information based on distributed network technology
CN112820404A (en) Information processing method applied to big data intelligent medical treatment and intelligent medical treatment server
CN114172856B (en) Message automatic replying method, device, equipment and storage medium
CN114422515B (en) Edge computing architecture design method and system suitable for power industry
CN116386086A (en) Personnel positioning method and device, electronic equipment and storage medium
CN113572792B (en) Engineering measurement intelligent management platform based on Internet of things
CN115567283A (en) Identity authentication method, device, electronic equipment, system and storage medium
CN113115107B (en) Handheld video acquisition terminal system based on 5G network
CN104637496A (en) Computer system and audio comparison method
CN110517671B (en) Audio information evaluation method and device and storage medium
CN109598488B (en) Group red packet abnormal behavior identification method and device, medium and electronic equipment
CN113099283A (en) Method for synchronizing monitoring picture and sound and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination