CN115273914A

CN115273914A - Data self-adaptive down-sampling method, device, equipment and medium

Info

Publication number: CN115273914A
Application number: CN202210900383.2A
Authority: CN
Inventors: 陈为; 祝震杰; 薛攀
Original assignee: Hangzhou Jingdao Technology Co ltd
Current assignee: Hangzhou Jingdao Technology Co ltd
Priority date: 2022-07-28
Filing date: 2022-07-28
Publication date: 2022-11-01
Anticipated expiration: 2042-07-28
Also published as: CN115273914B

Abstract

The application discloses a data self-adaptive down-sampling method and a data self-adaptive down-sampling system, and relates to the technical field of audio data compression. The method comprises the following steps: dividing the acquired audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data has the endpoint or not; and when the reliability is greater than the preset reliability, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced, accurate endpoint detection is obtained, and the continuity of the audio data sentence translation lyrics is improved.

Description

Data self-adaptive down-sampling method, device, equipment and medium

Technical Field

The present application relates to the field of audio data compression technologies, and in particular, to a data adaptive downsampling method, apparatus, device, and medium.

Background

For voice data communications, only about 40% of the time is attributable to the useful signal that is dominated by voice, while around 60% of the speech gaps are attributable to unwanted background noise information. For background noise of voice gaps, if the transmission is carried out by adopting a code rate as high as that of voice signals, huge waste of network bandwidth is caused; the fact that background noise is not transmitted completely can cause discontinuous hearing at a receiving end, people can feel uncomfortable, the background noise is particularly obvious when the background noise is strong, and even normal understanding of voice information by people can be influenced. And the compressed transmission of the whole voice information is influenced if the voice information is compressed at the pause or the end point in the same lossless mode.

Disclosure of Invention

The invention aims to provide a data adaptive down-sampling method, a data adaptive down-sampling device, data adaptive down-sampling equipment and a data adaptive down-sampling medium, wherein the data adaptive down-sampling method, the data adaptive down-sampling device, the data adaptive down-sampling equipment and the data adaptive down-sampling medium are used for detecting end points of voice data, less observed values are set at the end points and are compressed according to a larger compression ratio, more observed values are set at other places and are compressed according to a smaller compression ratio.

In order to solve the above technical problem, the present application provides a data adaptive downsampling method, including:

acquiring audio data;

dividing audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;

acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each endpoint detection frame in the endpoint detection frame sequence;

determining the reliability of an endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection or not;

judging whether the reliability is greater than a preset reliability;

and if so, outputting the endpoint detection time node and the endpoint detection times.

And carrying out lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.

Preferably, the dividing the audio data according to the length of the endpoint detection time window to obtain the sequence of endpoint detection frames includes:

acquiring the length of audio data;

dividing the audio data length by the end point detection time window length to obtain a division value;

and rounding the division value, and dividing the audio data according to the rounded division value to obtain an endpoint detection frame sequence.

Preferably, determining the trustworthiness of the endpoint-detection time window from the sequence of endpoint-detection frames comprises:

initializing each endpoint detection column in the endpoint detection frame sequence;

acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy according to the initialized endpoint detection columns and updating each endpoint detection column;

and determining the reliability according to the updated endpoint detection columns.

Preferably, when the reliability is greater than the preset reliability, before outputting the endpoint detection time node and the endpoint detection times, the method further includes:

judging whether the number of endpoint detection variables in the endpoint detection frame sequence is 1 or not;

if yes, entering a step of outputting an endpoint detection time node and an endpoint detection frequency;

and if not, fusing endpoint detection time nodes corresponding to the endpoint detection variables.

Preferably, after outputting the endpoint detection time node and the number of times of endpoint detection, the method further includes:

judging whether all the divided endpoint detection time windows output endpoint detection time nodes and endpoint detection times;

if yes, ending;

if not, returning to the step of obtaining the audio data.

Preferably, the endpoint detection time windows are multiple and do not overlap with each other.

Preferably, the lossy audio compression is performed according to the detection time node and the endpoint detection times and a certain bandwidth, the iteration is performed until the distortion is greater than a certain value, the recording of the bandwidth at this time includes performing lossy audio compression forward and backward according to the time node and a preset frame number, if the distortion is less than a certain value, the frame number is increased, the lossy audio compression is performed forward and backward, the iteration is performed until the distortion is greater than a certain value, the bandwidth at this time is recorded, the compression of all endpoints is repeated until the compression is completed, and finally compressed audio data is output.

In order to solve the above technical problem, the present application further provides a data adaptive down-sampling device, including:

the first acquisition module is used for acquiring audio data;

the system comprises a dividing module, a processing module and a processing module, wherein the dividing module is used for dividing audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, and the endpoint detection time window is used for judging whether the audio data has an endpoint and counting the number of times of endpoint detection;

the second acquisition module is used for acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy of each endpoint detection column in the endpoint detection frame sequence;

the determining module is used for determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability of representing whether the audio data has the endpoint;

the judging module is used for judging whether the reliability is greater than the preset reliability;

and if so, entering an output module for outputting the endpoint detection time node and the endpoint detection times.

And the compression module is used for performing lossy audio compression according to the detection time node and the endpoint detection times and a certain bandwidth, iterating until the distortion is greater than a certain value, recording the bandwidth at the moment, and outputting final compressed audio data.

a memory for storing a computer program;

and the processor is used for pointing to a computer program and realizing the steps of the data self-adaptive down-sampling method.

In order to solve the above technical problem, the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the above all data adaptive downsampling method are implemented.

The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each endpoint detection column in the endpoint detection frame sequence; determining the reliability of an endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. The end points of the voice data are detected first, fewer observed values are set at the end points and compressed at a larger compression ratio, and more observed values are set at other places and compressed at a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.

The application also provides a data self-adaptive down-sampling device, and the effect is the same as the effect.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a data adaptive down-sampling method according to an embodiment of the present application;

fig. 2 is a structural diagram of a data adaptive down-sampling apparatus according to an embodiment of the present application;

fig. 3 is a block diagram of a data adaptive down-sampling device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.

The core of the application is to provide a data self-adaptive down-sampling method and a system, wherein the endpoint of voice data is detected first, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.

Fig. 1 is a flowchart of a data adaptive down-sampling method according to an embodiment of the present disclosure. As shown in fig. 1, the data adaptive down-sampling method includes:

s10: audio data is acquired.

Audio data is acquired and the length of the audio data is derived, where the sampling frequency is reported in hertz (Hz). The sampling time for acquiring the audio data is denoted as S, and the unit is seconds (S), the length of the audio data can be calculated by the following formula:

L＝f·S

wherein, the length of the audio data can be expressed as L

Wherein,

represents the short-time energy of the k-th frame,

Indicating the short-time zero-crossing rate of the k-th frame,

Representing the short-time information entropy of the kth frame;

short-time energy representing background noise of the k-th frame,

Short-time zero-crossing rate representing the background noise of the k-th frame,

Short-time information entropy representing the k-th frame background noise. The audio data is data acquired in time series. It should be noted that the audio data is time-series data that varies with time, varies with the acquired data, and varies with time.

S11: and dividing the audio data according to the length of the endpoint detection time window to obtain an endpoint detection frame sequence.

The endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection.

The dividing of the audio data according to the length of the endpoint detection time window to obtain the sequence of the endpoint detection frames includes:

acquiring the length of audio data;

dividing the audio data length by the endpoint detection time window length to obtain a division value;

And setting the length of the endpoint detection time window as d, and calculating the number of the endpoint detection time windows divided by the audio data according to the length of the endpoint detection time window by the following formula:

M＝ROUNDUP(L/d)

the ROUNDUP (x) represents an upward integer function, and is used for obtaining an integer calculation result, that is, a maximum integer not less than x is taken, and M is the number of the endpoint detection time windows and is also a division value. At this time, for the nth endpoint detection time window, there is audio data of length d, which is expressed as:

wherein C_d∈C_L。

It should be noted that, in the present embodiment, the endpoint detection time windows are multiple and do not overlap (mutually exclusive). The end point detection time window frames a time sequence according to a specified unit length to perform data sampling, so as to calculate data in the frame. The slide block with the designated length slides on the scale, and data in the slide block can be fed back when the slide block slides one unit. The purpose of setting the time window is to segment the time series data by using the time window with a set length, and sequentially judge, and the rounding up represents that the redundant time series is also regarded as one end point detection. Wherein a plurality of frame sequences is constructed as a random finite set.

S12: and acquiring short-time energy, short-time zero-crossing rate and short-time information entropy of each end point detection column in the end point detection frame sequence.

S13: and determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy.

Confidence is the confidence that characterizes whether the audio data is endpoint detected. The audio data in each sampling sliding window has two conditions of end point detection and non-end point detection, the audio end point detection variable can be modeled into a random finite set, and one time window can be regarded as one frame sequence, namely, the length of one end point detection time window is the same as the number of frames in the frame sequence. For audio data in the nth endpoint detection time window, the audio endpoint detection variables for the sampling points at time k can be expressed as discrete finite set variables, which are expressed as: { phi, 1}_k. Where φ represents empty set, i.e., non-endpoint detection, and 1 represents endpoint detection. Then the audio data in the nth endpoint detection time window is modeled at this time as:

G_n＝{{φ,1}₁,{φ,1}₂,{φ,1}₃,L,{φ,1}_k}

the data described above was modeled as a discrete finite set.

S14: and judging whether the reliability is greater than a preset reliability.

If yes, the process proceeds to step S15: and outputting the endpoint detection time node and the endpoint detection times.

S16: and performing down-sampling according to the endpoint detection time node.

The above-mentioned obtained number of times of end point detection, that is, the number of times of end point detection can be counted and collected, and the end point detection time node is determined in the corresponding judgment time window, and finally output. The output form may be "the endpoint detection time node is 14.

The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced to obtain accurate endpoint detection, then, according to the endpoint of the voice data, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.

Based on the foregoing embodiment, as a more preferred embodiment, determining the confidence level of the endpoint detection time window according to the sequence of endpoint detection frames comprises:

acquiring short-time energy, a short-time zero-crossing rate and a short-time information entropy according to the initialized endpoint detection columns and updating each endpoint detection column;

The sequence of endpoint detection frames corresponding to the nth endpoint detection time window can be denoted as G from the above description_n＝{par_n(1),par_n(2),par_n(3),L,par_n(d) That is, one frame in the endpoint detection frame sequence may be denoted as par_n(i) Where i ∈ (1, d), is calculated according to the following formula:

par_n(i)＝(w,x_t,h(x_t))

wherein,

is short-term energy, x_tIs the global short-time zero-crossing rate corresponding to the t-th short-time zero-crossing rate in the endpoint detection time window, h (x)_t) For short-time zero-crossing rate audio dataSignal amplitude, then for the nth endpoint detection time window x_tThe calculation can be made according to the following formula:

x_t＝(n-1)·d+1+t

it should be noted that the short-term energy of each frame is an average value of the short-term energies of the endpoint detection time windows. For example: the short-time energy of the endpoint detection time window is 1, and if the endpoint detection time window contains Q frames, the short-time energy of each frame is 1/Q.

Initializing an endpoint detection frame sequence in the nth endpoint detection time window, and calculating according to the following formula:

m_n(i)＝x_t

h_n(i)＝h(x_t)

wherein, w_n(i) Short-time energy of the ith frame; m is_n(i) The short-time zero-crossing rate of the ith frame; h is_n(i) Is the short-time information entropy of the ith frame.

Note that, in the present embodiment, the detection probability P of the end point detection is calculated according to the following formula_d：

Wherein,

is sigmoid function, h (x)_t) Is x_tAnd H is a preset confidence level.

Note that the short-time information entropy v_nThe calculation can be made according to the following formula:

then, the filtered endpoint detection confidence of the updated endpoint detection column

And filter endpoint detection time

Calculated according to the following formula:

wherein k is h (x)_t) Coefficients below a preset confidence H. Then, the updated reliability is finally obtained by the following formula:

wherein,

is the updated confidence level.

On the basis of the foregoing embodiment, as a more preferred embodiment, when the reliability is greater than the preset reliability, before outputting the endpoint detection time node and the endpoint detection times, the method further includes:

As described in the above embodiments, if there are a plurality of endpoint detection variables in the nth endpoint detection time window, the endpoint detection times corresponding to the endpoint detection variables are fused according to the following formula:

wherein s is the number of the endpoint detection variables,

a time node is detected for an endpoint.

On the basis of the above embodiment, as a more preferred embodiment, after outputting the endpoint detection time node and the endpoint detection times, the method further includes:

if yes, ending;

if not, returning to the step of obtaining the audio data.

In order to make the obtained data more accurate, all endpoint detection time windows need to be traversed once, so that the obtained data is more accurate, and the experience of using the audio data is improved.

On the basis of the foregoing embodiment, as a more preferred embodiment, after the audio data is acquired, before the audio data is divided according to the length of the endpoint detection time window to obtain the sequence of endpoint detection frames, the method further includes:

and performing Kalman filtering processing on the audio data. In order to remove clutter. In addition, it should be noted that the interference of the clutter can also be avoided by using the column filtering method.

In the foregoing embodiments, the data adaptive down-sampling method is described in detail, and the present application also provides embodiments corresponding to the data adaptive down-sampling apparatus. It should be noted that the present application describes the embodiments of the apparatus portion from two perspectives, one from the perspective of the function module and the other from the perspective of the hardware.

Fig. 2 is a structural diagram of a data adaptive down-sampling apparatus according to an embodiment of the present application. As shown in fig. 2, the present application further provides a data adaptive down-sampling apparatus, including:

a first obtaining module 20, configured to obtain audio data;

a dividing module 21, configured to divide the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, where the endpoint detection time window is used to determine whether there is an endpoint in the audio data, and count the number of times of endpoint detection, where the endpoint includes an audio start point and an endpoint

A second obtaining module 22, configured to obtain short-term energy, a short-term zero-crossing rate, and a short-term information entropy of each endpoint detection column in the endpoint detection frame sequence;

the determining module 23 is configured to determine a reliability of an endpoint detection time window according to the short-term energy, the short-term zero-crossing rate, and the short-term information entropy, where the reliability is a reliability representing whether the audio data has an endpoint;

the judging module 24 is used for judging whether the reliability is greater than the preset reliability;

if yes, the method enters an output module 25 for outputting the endpoint detection time node and the endpoint detection times.

The compression module 26: and performing down-sampling according to the endpoint detection time node.

The application provides a data self-adaptive down-sampling method, which comprises the following steps: acquiring audio data; dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data is subjected to endpoint detection or not and counting the number of times of endpoint detection; determining the reliability of an endpoint detection time window according to the endpoint detection frame sequence, wherein the reliability is the reliability representing whether the audio data is subjected to endpoint detection; judging whether the reliability is greater than a preset reliability; and if so, outputting the endpoint detection time node and the endpoint detection times. Because the audio data are divided once, all the audio data are used for obtaining the reliability of the representation audio data in the audio process. At the moment, the output endpoint detection times and the deviation of endpoint detection time nodes are reduced to obtain accurate endpoint detection, and then according to the endpoint of the voice data, fewer observed values are set at the endpoint and compressed according to a larger compression ratio, and more observed values are set at other places and compressed according to a smaller compression ratio. The lossless compression can be realized near the port, and the lossless compression is carried out at other positions, so that the integrity of the original audio data is more kept, and the compression transmission efficiency is improved.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

Fig. 3 is a structural diagram of a data adaptive down-sampling device according to an embodiment of the present application, and as shown in fig. 3, a data adaptive down-sampling device includes:

a memory 30 for storing a computer program;

a processor 31 for implementing the steps of the data adaptive down-sampling method as mentioned in the above embodiments when executing the computer program.

The data adaptive down-sampling device provided by the embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.

The processor 31 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 31 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 31 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 31 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 31 may further include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.

Memory 30 may include one or more computer-readable storage media, which may be non-transitory. Memory 30 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 30 is at least used for storing a computer program, wherein the computer program can realize the relevant steps of the data adaptive down-sampling method disclosed in any one of the foregoing embodiments after being loaded and executed by the processor 31. In addition, the resources stored in the memory 30 may also include an operating system, data, and the like, and the storage manner may be a transient storage or a permanent storage. The operating system may include Windows, unix, linux, etc. The data may include, but is not limited to, data adaptive downsampling methods, and the like.

In some embodiments, the data adaptive down-sampling device may further include a display screen, an input-output interface, a communication interface, a power source, and a communication bus.

Those skilled in the art will appreciate that the architecture shown in fig. 3 does not constitute a limitation of a data adaptive down-sampling device and may include more or fewer components than those shown.

The data adaptive down-sampling device provided by the embodiment of the application comprises a memory 30 and a processor 31, and when the processor 31 executes a program stored in the memory 30, the data adaptive down-sampling method can be realized.

Finally, the application also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps as set forth in the above-mentioned method embodiments.

It is to be understood that if the method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or part of the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (Read-Only Memory), a ROM, a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The data adaptive downsampling method, apparatus, device and medium provided by the present application are described in detail above. The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for adaptive data downsampling, comprising:

acquiring audio data;

dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;

acquiring short-time energy, a short-time zero-crossing rate and a short-time information entropy of each endpoint detection frame in the endpoint detection frame sequence;

determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data has the endpoint;

judging whether the reliability is greater than a preset reliability;

if so, outputting the endpoint detection time node and the endpoint detection times;

2. The method of claim 1, wherein the dividing the audio data according to the endpoint detection time window length to obtain the sequence of endpoint detection frames comprises:

acquiring the audio data length;

and rounding the division value upwards, and dividing the audio data according to the rounded division value to obtain the endpoint detection frame sequence.

3. The method of claim 1, wherein the determining the confidence level of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate, and the short-time entropy comprises:

acquiring the short-time energy, the short-time zero-crossing rate and the short-time information entropy according to the initialized endpoint detection column and updating each endpoint detection frame;

and determining the reliability according to each updated endpoint detection frame.

4. The data adaptive down-sampling method according to claim 3, wherein when the reliability is greater than the preset reliability, before the outputting the endpoint detection time node and the endpoint detection times, further comprising:

if yes, entering the step of outputting the endpoint detection time node and the endpoint detection times;

and if not, fusing the endpoint detection time nodes corresponding to the endpoint detection variables.

5. The data adaptive down-sampling method according to claim 2, further comprising, after said outputting the endpoint detection time node and the number of endpoint detections:

judging whether all the endpoint detection time windows of the division values output the endpoint detection time nodes and the endpoint detection times;

if yes, ending;

if not, returning to the step of acquiring the audio data.

6. The data adaptive down-sampling method of claim 1, wherein the endpoint detection time windows are multiple and do not overlap with each other.

7. The data adaptive down-sampling method according to claim 1, wherein the lossy audio compression is performed according to a certain bandwidth according to the detection time node and the end point detection times, and the iteration is performed until the distortion is greater than a certain value, and the recording of the bandwidth at this time includes performing the lossy audio compression forward and backward according to a preset number of frames by the time node, if the distortion is less than a certain value, increasing the number of frames and performing the lossy audio compression forward and backward, and the iteration is performed until the distortion is greater than a certain value, recording the bandwidth at this time, repeating the compression of all end points until the compression is completed, and outputting the final compressed audio data.

8. A data adaptive down-sampling apparatus, comprising:

the first acquisition module is used for acquiring audio data;

the dividing module is used for dividing the audio data according to the length of an endpoint detection time window to obtain an endpoint detection frame sequence, wherein the endpoint detection time window is used for judging whether the audio data has an endpoint or not and counting the number of times of endpoint detection, and the endpoint comprises a voice starting point and an endpoint;

a second obtaining module, configured to obtain short-time energy, a short-time zero-crossing rate, and a short-time information entropy of each endpoint detection column in the endpoint detection frame sequence;

the determining module is used for determining the reliability of the endpoint detection time window according to the short-time energy, the short-time zero-crossing rate and the short-time information entropy, wherein the reliability is the reliability representing whether the audio data has endpoints;

if yes, entering an output module for outputting the endpoint detection time node and the endpoint detection times;

9. A data adaptive down-sampling device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the data adaptive downsampling method of any one of claims 1 to 7 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the data adaptive downsampling method according to any one of claims 1 to 7.