CN115273914A - Data self-adaptive down-sampling method, device, equipment and medium - Google Patents

Data self-adaptive down-sampling method, device, equipment and medium Download PDF

Info

Publication number
CN115273914A
CN115273914A CN202210900383.2A CN202210900383A CN115273914A CN 115273914 A CN115273914 A CN 115273914A CN 202210900383 A CN202210900383 A CN 202210900383A CN 115273914 A CN115273914 A CN 115273914A
Authority
CN
China
Prior art keywords
endpoint detection
endpoint
audio data
time
short
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210900383.2A
Other languages
Chinese (zh)
Other versions
CN115273914B (en
Inventor
陈为
祝震杰
薛攀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jingdao Technology Co ltd
Original Assignee
Hangzhou Jingdao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jingdao Technology Co ltd filed Critical Hangzhou Jingdao Technology Co ltd
Priority to CN202210900383.2A priority Critical patent/CN115273914B/en
Publication of CN115273914A publication Critical patent/CN115273914A/en
Application granted granted Critical
Publication of CN115273914B publication Critical patent/CN115273914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/09Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a data self-adaptive down-sampling method and a data self-adaptive down-sampling system, and relates to the technical field of audio data compression. The method comprises the following steps: dividing the acquired audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data has the endpoint or not; and when the reliability is greater than the preset reliability, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced, accurate endpoint detection is obtained, and the continuity of the audio data sentence translation lyrics is improved.

Description

Data self-adaptive down-sampling method, device, equipment and medium
Technical Field
The present application relates to the field of audio data compression technologies, and in particular, to a data adaptive downsampling method, apparatus, device, and medium.
Background
For voice data communications, only about 40% of the time is attributable to the useful signal that is dominated by voice, while around 60% of the speech gaps are attributable to unwanted background noise information. For background noise of voice gaps, if the transmission is carried out by adopting a code rate as high as that of voice signals, huge waste of network bandwidth is caused; the fact that background noise is not transmitted completely can cause discontinuous hearing at a receiving end, people can feel uncomfortable, the background noise is particularly obvious when the background noise is strong, and even normal understanding of voice information by people can be influenced. And the compressed transmission of the whole voice information is influenced if the voice information is compressed at the pause or the end point in the same lossless mode.
Disclosure of Invention
The invention aims to provide a data adaptive down-sampling method, a data adaptive down-sampling device, data adaptive down-sampling equipment and a data adaptive down-sampling medium, wherein the data adaptive down-sampling method, the data adaptive down-sampling device, the data adaptive down-sampling equipment and the data adaptive down-sampling medium are used for detecting end points of voice data, less observed values are set at the end points and are compressed according to a larger compression ratio, more observed values are set at other places and are compressed according to a smaller compression ratio.
In order to solve the above technical problem, the present application provides a data adaptive downsampling method, including:
acquiring audio data;
dividing audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;
acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each endpoint detection frame in the endpoint detection frame sequence;
determining the reliability of an endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection or not;
judging whether the reliability is greater than a preset reliability;
and if so, outputting the endpoint detection time node and the endpoint detection times.
And carrying out lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.
Preferably, the dividing the audio data according to the length of the endpoint detection time window to obtain the sequence of endpoint detection frames includes:
acquiring the length of audio data;
dividing the audio data length by the end point detection time window length to obtain a division value;
and rounding the division value, and dividing the audio data according to the rounded division value to obtain an endpoint detection frame sequence.
Preferably, determining the trustworthiness of the endpoint-detection time window from the sequence of endpoint-detection frames comprises:
initializing each endpoint detection column in the endpoint detection frame sequence;
acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy according to the initialized endpoint detection columns and updating each endpoint detection column;
and determining the reliability according to the updated endpoint detection columns.
Preferably, when the reliability is greater than the preset reliability, before outputting the endpoint detection time node and the endpoint detection times, the method further includes:
judging whether the number of endpoint detection variables in the endpoint detection frame sequence is 1 or not;
if yes, entering a step of outputting an endpoint detection time node and an endpoint detection frequency;
and if not, fusing endpoint detection time nodes corresponding to the endpoint detection variables.
Preferably, after outputting the endpoint detection time node and the number of times of endpoint detection, the method further includes:
judging whether all the divided endpoint detection time windows output endpoint detection time nodes and endpoint detection times;
if yes, ending;
if not, returning to the step of obtaining the audio data.
Preferably, the endpoint detection time windows are multiple and do not overlap with each other.
Preferably, the lossy audio compression is performed according to the detection time node and the endpoint detection times and a certain bandwidth, the iteration is performed until the distortion is greater than a certain value, the recording of the bandwidth at this time includes performing lossy audio compression forward and backward according to the time node and a preset frame number, if the distortion is less than a certain value, the frame number is increased, the lossy audio compression is performed forward and backward, the iteration is performed until the distortion is greater than a certain value, the bandwidth at this time is recorded, the compression of all endpoints is repeated until the compression is completed, and finally compressed audio data is output.
In order to solve the above technical problem, the present application further provides a data adaptive down-sampling device, including:
the first acquisition module is used for acquiring audio data;
the system comprises a dividing module, a processing module and a processing module, wherein the dividing module is used for dividing audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, and the endpoint detection time window is used for judging whether the audio data has an endpoint and counting the number of times of endpoint detection;
the second acquisition module is used for acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy of each endpoint detection column in the endpoint detection frame sequence;
the determining module is used for determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability of representing whether the audio data has the endpoint;
the judging module is used for judging whether the reliability is greater than the preset reliability;
and if so, entering an output module for outputting the endpoint detection time node and the endpoint detection times.
And the compression module is used for performing lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.
In order to solve the above technical problem, the present application further provides a data adaptive down-sampling device, including:
a memory for storing a computer program;
and the processor is used for pointing to a computer program and realizing the steps of the data self-adaptive down-sampling method.
In order to solve the above technical problem, the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the above all data adaptive downsampling method are implemented.
The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each endpoint detection column in the endpoint detection frame sequence; determining the reliability of an endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. The end points of the voice data are detected first, fewer observed values are set at the end points and compressed at a larger compression ratio, and more observed values are set at other places and compressed at a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.
The application also provides a data self-adaptive down-sampling device, and the effect is the same as the effect.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a data adaptive down-sampling method according to an embodiment of the present application;
fig. 2 is a structural diagram of a data adaptive down-sampling apparatus according to an embodiment of the present application;
fig. 3 is a block diagram of a data adaptive down-sampling device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
The core of the application is to provide a data self-adaptive down-sampling method and a system, wherein the endpoint of voice data is detected first, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.
Fig. 1 is a flowchart of a data adaptive down-sampling method according to an embodiment of the present disclosure. As shown in fig. 1, the data adaptive down-sampling method includes:
s10: audio data is acquired.
Audio data is acquired and the length of the audio data is derived, where the sampling frequency is reported in hertz (Hz). The sampling time for acquiring the audio data is denoted as S, and the unit is seconds (S), the length of the audio data can be calculated by the following formula:
L=f·S
wherein, the length of the audio data can be expressed as L
Figure BDA0003770728240000051
Wherein,
Figure BDA0003770728240000052
represents the short-time energy of the k-th frame,
Figure BDA0003770728240000053
Indicating the short-time zero-crossing rate of the k-th frame,
Figure BDA0003770728240000054
Representing the short-time information entropy of the kth frame;
Figure BDA0003770728240000055
short-time energy representing background noise of the k-th frame,
Figure BDA0003770728240000056
Short-time zero-crossing rate representing the background noise of the k-th frame,
Figure BDA0003770728240000057
Short-time information entropy representing the k-th frame background noise. The audio data is data acquired in time series. It should be noted that the audio data is time-series data that varies with time, varies with the acquired data, and varies with time.
S11: and dividing the audio data according to the length of the endpoint detection time window to obtain an endpoint detection frame sequence.
The endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection.
The dividing of the audio data according to the length of the endpoint detection time window to obtain the sequence of the endpoint detection frames includes:
acquiring the length of audio data;
dividing the audio data length by the endpoint detection time window length to obtain a division value;
and rounding the division value, and dividing the audio data according to the rounded division value to obtain an endpoint detection frame sequence.
And setting the length of the endpoint detection time window as d, and calculating the number of the endpoint detection time windows divided by the audio data according to the length of the endpoint detection time window by the following formula:
M=ROUNDUP(L/d)
the ROUNDUP (x) represents an upward integer function, and is used for obtaining an integer calculation result, that is, a maximum integer not less than x is taken, and M is the number of the endpoint detection time windows and is also a division value. At this time, for the nth endpoint detection time window, there is audio data of length d, which is expressed as:
Figure BDA0003770728240000061
wherein Cd∈CL
It should be noted that, in the present embodiment, the endpoint detection time windows are multiple and do not overlap (mutually exclusive). The end point detection time window frames a time sequence according to a specified unit length to perform data sampling, so as to calculate data in the frame. The slide block with the designated length slides on the scale, and data in the slide block can be fed back when the slide block slides one unit. The purpose of setting the time window is to segment the time series data by using the time window with a set length, and sequentially judge, and the rounding up represents that the redundant time series is also regarded as one end point detection. Wherein a plurality of frame sequences is constructed as a random finite set.
S12: and acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each end point detection column in the end point detection frame sequence.
S13: and determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy.
Confidence is the confidence that characterizes whether the audio data is endpoint detected. The audio data in each sampling sliding window has two conditions of end point detection and non-end point detection, the audio end point detection variable can be modeled into a random finite set, and one time window can be regarded as one frame sequence, namely, the length of one end point detection time window is the same as the number of frames in the frame sequence. For audio data in the nth endpoint detection time window, the audio endpoint detection variables for the sampling points at time k can be expressed as discrete finite set variables, which are expressed as: { phi, 1}k. Where φ represents empty set, i.e., non-endpoint detection, and 1 represents endpoint detection. Then the audio data in the nth endpoint detection time window is modeled at this time as:
Gn={{φ,1}1,{φ,1}2,{φ,1}3,L,{φ,1}k}
the data described above was modeled as a discrete finite set.
S14: and judging whether the reliability is greater than a preset reliability.
If yes, the process proceeds to step S15: and outputting the endpoint detection time node and the endpoint detection times.
S16: and performing down-sampling according to the endpoint detection time node.
The above-mentioned obtained number of times of end point detection, that is, the number of times of end point detection can be counted and collected, and the end point detection time node is determined in the corresponding judgment time window, and finally output. The output form may be "the endpoint detection time node is 14.
The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced to obtain accurate endpoint detection, then, according to the endpoint of the voice data, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.
Based on the foregoing embodiment, as a more preferred embodiment, determining the confidence level of the endpoint detection time window according to the sequence of endpoint detection frames comprises:
initializing each endpoint detection column in the endpoint detection frame sequence;
acquiring short-time energy, a short-time zero-crossing rate and a short-time information entropy according to the initialized endpoint detection columns and updating each endpoint detection column;
and determining the reliability according to the updated endpoint detection columns.
The sequence of endpoint detection frames corresponding to the nth endpoint detection time window can be denoted as G from the above descriptionn={parn(1),parn(2),parn(3),L,parn(d) That is, one frame in the endpoint detection frame sequence may be denoted as parn(i) Where i ∈ (1, d), is calculated according to the following formula:
parn(i)=(w,xt,h(xt))
wherein,
Figure BDA0003770728240000071
is short-term energy, xtIs the global short-time zero-crossing rate corresponding to the t-th short-time zero-crossing rate in the endpoint detection time window, h (x)t) For short-time zero-crossing rate audio dataSignal amplitude, then for the nth endpoint detection time window xtThe calculation can be made according to the following formula:
xt=(n-1)·d+1+t
it should be noted that the short-term energy of each frame is an average value of the short-term energies of the endpoint detection time windows. For example: the short-time energy of the endpoint detection time window is 1, and if the endpoint detection time window contains Q frames, the short-time energy of each frame is 1/Q.
Initializing an endpoint detection frame sequence in the nth endpoint detection time window, and calculating according to the following formula:
Figure BDA0003770728240000081
mn(i)=xt
hn(i)=h(xt)
wherein, wn(i) Short-time energy of the ith frame; m isn(i) The short-time zero-crossing rate of the ith frame; h isn(i) Is the short-time information entropy of the ith frame.
Note that, in the present embodiment, the detection probability P of the end point detection is calculated according to the following formulad
Figure BDA0003770728240000082
Wherein,
Figure BDA0003770728240000083
is sigmoid function, h (x)t) Is xtAnd H is a preset confidence level.
Note that the short-time information entropy vnThe calculation can be made according to the following formula:
Figure BDA0003770728240000084
then, the filtered endpoint detection confidence of the updated endpoint detection column
Figure BDA0003770728240000085
And filter endpoint detection time
Figure BDA0003770728240000086
Calculated according to the following formula:
Figure BDA0003770728240000087
Figure BDA0003770728240000088
wherein k is h (x)t) Coefficients below a preset confidence H. Then, the updated reliability is finally obtained by the following formula:
Figure BDA0003770728240000089
wherein,
Figure BDA00037707282400000810
is the updated confidence level.
On the basis of the foregoing embodiment, as a more preferred embodiment, when the reliability is greater than the preset reliability, before outputting the endpoint detection time node and the endpoint detection times, the method further includes:
judging whether the number of endpoint detection variables in the endpoint detection frame sequence is 1 or not;
if yes, entering a step of outputting an endpoint detection time node and an endpoint detection frequency;
and if not, fusing endpoint detection time nodes corresponding to the endpoint detection variables.
As described in the above embodiments, if there are a plurality of endpoint detection variables in the nth endpoint detection time window, the endpoint detection times corresponding to the endpoint detection variables are fused according to the following formula:
Figure BDA0003770728240000091
wherein s is the number of the endpoint detection variables,
Figure BDA0003770728240000092
a time node is detected for an endpoint.
On the basis of the above embodiment, as a more preferred embodiment, after outputting the endpoint detection time node and the endpoint detection times, the method further includes:
judging whether all the divided endpoint detection time windows output endpoint detection time nodes and endpoint detection times;
if yes, ending;
if not, returning to the step of obtaining the audio data.
In order to make the obtained data more accurate, all endpoint detection time windows need to be traversed once, so that the obtained data is more accurate, and the experience of using the audio data is improved.
On the basis of the foregoing embodiment, as a more preferred embodiment, after the audio data is acquired, before the audio data is divided according to the length of the endpoint detection time window to obtain the sequence of endpoint detection frames, the method further includes:
and performing Kalman filtering processing on the audio data. In order to remove clutter. In addition, it should be noted that the interference of the clutter can also be avoided by using the column filtering method.
In the foregoing embodiments, the data adaptive down-sampling method is described in detail, and the present application also provides embodiments corresponding to the data adaptive down-sampling apparatus. It should be noted that the present application describes the embodiments of the apparatus portion from two perspectives, one from the perspective of the function module and the other from the perspective of the hardware.
Fig. 2 is a structural diagram of a data adaptive down-sampling apparatus according to an embodiment of the present application. As shown in fig. 2, the present application further provides a data adaptive down-sampling apparatus, including:
a first obtaining module 20, configured to obtain audio data;
a dividing module 21, configured to divide the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, where the endpoint detection time window is used to determine whether there is an endpoint in the audio data, and count the number of times of endpoint detection, where the endpoint includes an audio start point and an endpoint
A second obtaining module 22, configured to obtain short-term energy, a short-term zero-crossing rate, and a short-term information entropy of each endpoint detection column in the endpoint detection frame sequence;
the determining module 23 is configured to determine a reliability of an endpoint detection time window according to the short-term energy, the short-term zero-crossing rate, and the short-term information entropy, where the reliability is a reliability representing whether the audio data has an endpoint;
the judging module 24 is used for judging whether the reliability is greater than the preset reliability;
if yes, the method enters an output module 25 for outputting the endpoint detection time node and the endpoint detection times.
The compression module 26: and performing down-sampling according to the endpoint detection time node.
The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced to obtain accurate endpoint detection, and then according to the endpoint of the voice data, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
Fig. 3 is a structural diagram of a data adaptive down-sampling device according to an embodiment of the present application, and as shown in fig. 3, a data adaptive down-sampling device includes:
a memory 30 for storing a computer program;
a processor 31 for implementing the steps of the data adaptive down-sampling method as mentioned in the above embodiments when executing the computer program.
The data adaptive down-sampling device provided by the embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
The processor 31 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 31 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 31 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 31 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 31 may further include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.
Memory 30 may include one or more computer-readable storage media, which may be non-transitory. Memory 30 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 30 is at least used for storing a computer program, wherein the computer program can realize the relevant steps of the data adaptive down-sampling method disclosed in any one of the foregoing embodiments after being loaded and executed by the processor 31. In addition, the resources stored in the memory 30 may also include an operating system, data, and the like, and the storage manner may be a transient storage or a permanent storage. The operating system may include Windows, unix, linux, etc. The data may include, but is not limited to, data adaptive downsampling methods, and the like.
In some embodiments, the data adaptive down-sampling device may further include a display screen, an input-output interface, a communication interface, a power source, and a communication bus.
Those skilled in the art will appreciate that the architecture shown in fig. 3 does not constitute a limitation of a data adaptive down-sampling device and may include more or fewer components than those shown.
The data adaptive down-sampling device provided by the embodiment of the application comprises a memory 30 and a processor 31, and when the processor 31 executes a program stored in the memory 30, the data adaptive down-sampling method can be realized.
Finally, the application also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps as set forth in the above-mentioned method embodiments.
It is to be understood that if the method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (Read-Only Memory), a ROM, a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The data adaptive downsampling method, apparatus, device and medium provided by the present application are described in detail above. The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for adaptive data downsampling, comprising:
acquiring audio data;
dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;
acquiring short-time energy, a short-time zero-crossing rate and a short-time information entropy of each endpoint detection frame in the endpoint detection frame sequence;
determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data has the endpoint;
judging whether the reliability is greater than a preset reliability;
if so, outputting the endpoint detection time node and the endpoint detection times;
and carrying out lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.
2. The method of claim 1, wherein the dividing the audio data according to the endpoint detection time window length to obtain the sequence of endpoint detection frames comprises:
acquiring the audio data length;
dividing the audio data length by the endpoint detection time window length to obtain a division value;
and rounding the division value upwards, and dividing the audio data according to the rounded division value to obtain the endpoint detection frame sequence.
3. The method of claim 1, wherein the determining the confidence level of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate, and the short-time entropy comprises:
initializing each endpoint detection column in the endpoint detection frame sequence;
acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy according to the initialized endpoint detection column and updating each endpoint detection frame;
and determining the reliability according to each updated endpoint detection frame.
4. The data adaptive down-sampling method according to claim 3, wherein when the reliability is greater than the preset reliability, before the outputting the endpoint detection time node and the endpoint detection times, further comprising:
judging whether the number of endpoint detection variables in the endpoint detection frame sequence is 1 or not;
if yes, entering the step of outputting the endpoint detection time node and the endpoint detection times;
and if not, fusing the endpoint detection time nodes corresponding to the endpoint detection variables.
5. The data adaptive down-sampling method according to claim 2, further comprising, after said outputting the endpoint detection time node and the number of endpoint detections:
judging whether all the endpoint detection time windows of the division values output the endpoint detection time nodes and the endpoint detection times;
if yes, ending;
if not, returning to the step of acquiring the audio data.
6. The data adaptive down-sampling method of claim 1, wherein the endpoint detection time windows are multiple and do not overlap with each other.
7. The data adaptive down-sampling method according to claim 1, wherein the lossy audio compression is performed according to a certain bandwidth according to the detection time node and the end point detection times, and the iteration is performed until the distortion is greater than a certain value, and the recording of the bandwidth at this time includes performing the lossy audio compression forward and backward according to a preset number of frames by the time node, if the distortion is less than a certain value, increasing the number of frames and performing the lossy audio compression forward and backward, and the iteration is performed until the distortion is greater than a certain value, recording the bandwidth at this time, repeating the compression of all end points until the compression is completed, and outputting the final compressed audio data.
8. A data adaptive down-sampling apparatus, comprising:
the first acquisition module is used for acquiring audio data;
the dividing module is used for dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;
a second obtaining module, configured to obtain short-time energy, a short-time zero-crossing rate, and a short-time information entropy of each endpoint detection column in the endpoint detection frame sequence;
the determining module is used for determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data has endpoints;
the judging module is used for judging whether the reliability is greater than the preset reliability;
if yes, entering an output module for outputting the endpoint detection time node and the endpoint detection times;
and the compression module is used for performing lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.
9. A data adaptive down-sampling device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data adaptive downsampling method of any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data adaptive downsampling method according to any one of claims 1 to 7.
CN202210900383.2A 2022-07-28 2022-07-28 Data self-adaptive downsampling method, device, equipment and medium Active CN115273914B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210900383.2A CN115273914B (en) 2022-07-28 2022-07-28 Data self-adaptive downsampling method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210900383.2A CN115273914B (en) 2022-07-28 2022-07-28 Data self-adaptive downsampling method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN115273914A true CN115273914A (en) 2022-11-01
CN115273914B CN115273914B (en) 2024-07-16

Family

ID=83772148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210900383.2A Active CN115273914B (en) 2022-07-28 2022-07-28 Data self-adaptive downsampling method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115273914B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625858A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Method for extracting short-time energy frequency value in voice endpoint detection
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
US20160358598A1 (en) * 2015-06-07 2016-12-08 Apple Inc. Context-based endpoint detection
KR20180021531A (en) * 2016-08-22 2018-03-05 에스케이텔레콤 주식회사 Endpoint detection method of speech using deep neural network and apparatus thereof
CN111354378A (en) * 2020-02-12 2020-06-30 北京声智科技有限公司 Voice endpoint detection method, device, equipment and computer storage medium
CN114495907A (en) * 2022-01-27 2022-05-13 多益网络有限公司 Adaptive voice activity detection method, device, equipment and storage medium
WO2022134833A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Speech signal processing method, apparatus and device, and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625858A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Method for extracting short-time energy frequency value in voice endpoint detection
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
US20160358598A1 (en) * 2015-06-07 2016-12-08 Apple Inc. Context-based endpoint detection
KR20180021531A (en) * 2016-08-22 2018-03-05 에스케이텔레콤 주식회사 Endpoint detection method of speech using deep neural network and apparatus thereof
CN111354378A (en) * 2020-02-12 2020-06-30 北京声智科技有限公司 Voice endpoint detection method, device, equipment and computer storage medium
WO2022134833A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Speech signal processing method, apparatus and device, and storage medium
CN114495907A (en) * 2022-01-27 2022-05-13 多益网络有限公司 Adaptive voice activity detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115273914B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
EP3926623B1 (en) Speech recognition method and apparatus, and neural network training method and apparatus
WO2019096149A1 (en) Auditory selection method and device based on memory and attention model
CN107622773B (en) Audio feature extraction method and device and electronic equipment
US11133022B2 (en) Method and device for audio recognition using sample audio and a voting matrix
CN112420079B (en) Voice endpoint detection method and device, storage medium and electronic equipment
CN112967738A (en) Human voice detection method and device, electronic equipment and computer readable storage medium
CN113257238A (en) Training method of pre-training model, coding feature acquisition method and related device
CN113823323A (en) Audio processing method and device based on convolutional neural network and related equipment
CN114333912B (en) Voice activation detection method, device, electronic equipment and storage medium
CN115273823B (en) Data processing method, device, equipment and medium based on Gaussian mixture probability density
CN111477248B (en) Audio noise detection method and device
CN111862967B (en) Voice recognition method and device, electronic equipment and storage medium
WO2018154372A1 (en) Sound identification utilizing periodic indications
CN117056728A (en) Time sequence generation method, device, equipment and storage medium
CN115273914A (en) Data self-adaptive down-sampling method, device, equipment and medium
CN110570877A (en) Sign language video generation method, electronic device and computer readable storage medium
CN113838450B (en) Audio synthesis and corresponding model training method, device, equipment and storage medium
CN113409792B (en) Voice recognition method and related equipment thereof
CN112735392B (en) Voice processing method, device, equipment and storage medium
CN114999440A (en) Avatar generation method, apparatus, device, storage medium, and program product
CN111862931B (en) Voice generation method and device
CN113539300A (en) Voice detection method and device based on noise suppression, storage medium and terminal
CN112750469A (en) Method for detecting music in voice, voice communication optimization method and corresponding device
CN114582367B (en) Music reverberation intensity estimation method and device and electronic equipment
CN117334198B (en) Speech signal processing method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant