CN111354365B - Pure voice data sampling rate identification method, device and system - Google Patents

Pure voice data sampling rate identification method, device and system Download PDF

Info

Publication number
CN111354365B
CN111354365B CN202010160577.4A CN202010160577A CN111354365B CN 111354365 B CN111354365 B CN 111354365B CN 202010160577 A CN202010160577 A CN 202010160577A CN 111354365 B CN111354365 B CN 111354365B
Authority
CN
China
Prior art keywords
frequency
sampling rate
voice data
data
pure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010160577.4A
Other languages
Chinese (zh)
Other versions
CN111354365A (en
Inventor
刘兵兵
包飞
吴科苇
刘如意
车洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010160577.4A priority Critical patent/CN111354365B/en
Priority to PCT/CN2020/097008 priority patent/WO2021179470A1/en
Priority to CA3175103A priority patent/CA3175103A1/en
Publication of CN111354365A publication Critical patent/CN111354365A/en
Application granted granted Critical
Publication of CN111354365B publication Critical patent/CN111354365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application discloses a pure voice data sampling rate identification method, a device and a system, wherein the method comprises the following steps: performing Fourier transform on the pure voice data to obtain frequency domain data; processing the frequency domain data according to the received prior threshold value data to obtain frequency band information; acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates; and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate. According to the prior characteristic that the bandwidth range of voice sent by a person in a frequency domain is 200 Hz-4000 Hz, the voice processing method is compared with different assumed frequencies of pure voice data, and the actual sampling rate can be determined according to the similarity value of the comparison result, so that the sampling rate of the pure voice data can be automatically pre-judged, and the problems that the effect of voice processing is greatly influenced under the condition of unknown sampling rate are prevented.

Description

Pure voice data sampling rate identification method, device and system
Technical Field
The application belongs to the technical field of voice processing, and particularly relates to a pure voice data sampling rate recognition method, device and system.
Background
The sampling rate defines the number of samples extracted from the continuous signal and constituting the discrete signal per second, and is used to describe the tone quality and tone of sound file, and is the quality standard for measuring sound card and sound file.
In the voice processing process, sometimes pure voice data can be encountered, the voice data refers to voice data without sampling rate information, and under the condition that the sampling rate cannot be known, if the voice data is processed according to the wrong sampling rate, a large deviation often occurs in the voice processing result, so that the effect of outputting voice is poor, and the user experience of a voice product is further affected.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a pure voice data sampling rate identification method, a device and a system.
The specific technical scheme provided by the embodiment of the application is as follows:
in a first aspect, the present application provides a method for identifying a sampling rate of pure speech data, the method comprising:
performing Fourier transform on the pure voice data to obtain frequency domain data;
processing the frequency domain data according to the received prior threshold data to obtain frequency band information;
acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate.
Preferably, the prior frequency ranges from 200Hz to 4000Hz.
Preferably, comparing the different assumed frequencies with the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the comparison result is most similar as the actual sampling rate specifically includes:
and calculating Euclidean distances between different assumed frequencies and the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the Euclidean distance is minimum as the actual sampling rate.
Preferably, after fourier transforming the pure speech data to obtain frequency domain data, the method further comprises:
and carrying out normalization processing on the frequency domain data.
Preferably, the pure voice data is pure voice data containing voice fragments;
the method for acquiring the pure voice data containing the voice fragments comprises the following steps:
receiving voice data and analyzing the voice data;
when the voice data does not comprise the sampling rate information, carrying out Fourier transform on the voice data to obtain the energy of the voice data;
and acquiring voice data corresponding to energy larger than the energy threshold according to a preset energy threshold to obtain the pure voice data containing the voice fragments.
Preferably, the method further comprises:
and decoding the pure voice data according to the actual sampling rate.
In a second aspect, the present application provides a pure voice data sampling rate recognition apparatus, comprising:
the conversion module is used for carrying out Fourier transform on the pure voice data to obtain frequency domain data;
the acquisition module is used for processing the frequency domain data according to the received priori threshold data to obtain frequency band information; and a high-frequency cut-off frequency point for acquiring the frequency band information;
the calculation module is used for calculating the corresponding hypothesis frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and the processing module is used for comparing different assumed frequencies with the prior frequency and determining the sampling rate corresponding to the assumed frequency when the comparison result is the most similar as the actual sampling rate.
Preferably, the prior frequency ranges from 200Hz to 4000Hz.
The processing module is specifically configured to:
and calculating Euclidean distances between different assumed frequencies and the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the Euclidean distance is minimum as the actual sampling rate.
Preferably, the conversion module is further configured to normalize the frequency domain data after performing fourier transform on the pure voice data to obtain the frequency domain data.
Preferably, the pure voice data is pure voice data containing voice fragments;
the apparatus further comprises:
the receiving module is used for receiving voice data;
the analysis module is used for analyzing the voice data;
the conversion module is further used for carrying out Fourier transform on the voice data to obtain the energy of the voice data when the voice data does not comprise the sampling rate information;
the processing module is further configured to obtain, according to a preset energy threshold, voice data corresponding to energy greater than the energy threshold, so as to obtain the pure voice data including the voice segment.
Preferably, the apparatus further comprises:
and the decoding module is used for decoding the pure voice data according to the actual sampling rate.
In a third aspect, the present application provides a computer system comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the operations of:
performing Fourier transform on the pure voice data to obtain frequency domain data;
processing the frequency domain data according to the received prior threshold data to obtain frequency band information;
acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate.
The embodiment of the application has the following beneficial effects:
according to the prior characteristic that the bandwidth range of voice sent by a person in a frequency domain is 200 Hz-4000 Hz, the voice processing method is compared with different assumed frequencies of pure voice data, and the actual sampling rate can be determined according to the similarity value of the comparison result, so that the sampling rate of the pure voice data can be automatically pre-judged, the occurrence of the problems that the effect of voice processing is greatly influenced under the condition of unknown sampling rate is prevented, the bad packet phenomenon in the network transmission voice data packet is improved, the sampling rate checking and reminding functions of a voice communication engine, common audio processing software and the like are increased, and the robustness of voice processing equipment is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying a sampling rate of pure voice data according to a first embodiment of the present application;
fig. 2 is a schematic structural diagram of a pure voice data sampling rate recognition device according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer system according to a third embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1
When there is network exchange between two fixed phones, if the PCM voice packet received by one party loses the sampling rate information, the voice cannot be accurately processed, so that the playing effect of the voice is affected, and the played voice is possibly abnormal.
In this case, if the sampling rate of the voice can be initially recognized, the occurrence of this situation can be avoided, and thus the bad packet phenomenon occurring in the network transmission voice data packet can be improved.
Based on this, as shown in fig. 1, the present application provides a pure voice data sampling rate recognition method, which can be applied to an audio device, and the audio device performs the following processes:
s11, receiving voice data and preprocessing to obtain pure voice data.
In this embodiment, the pure voice data is pure voice data including voice segments, and the steps specifically include:
1. analyzing the voice data;
specifically, relevant information of voice data is obtained and analyzed, and the relevant information includes a data packet, a sampling rate and the like.
2. When the voice data does not comprise the sampling rate information, carrying out Fourier transform on the voice data to obtain the energy of the voice data;
3. according to the preset energy threshold, obtaining the voice data corresponding to the energy larger than the energy threshold to obtain the pure voice data containing the voice fragments.
The above-mentioned receiving of voice data and preprocessing to obtain pure voice data can also be achieved by:
1. analyzing the voice data;
2. when the voice data does not include the sampling rate information, filtering processing is performed on the voice data to obtain pure voice data containing voice fragments.
The purpose of the filtering process is to remove noise, silence, and other information in the valid voice data, thereby enhancing the voice data.
S12, carrying out Fourier transform on the pure voice data to obtain frequency domain data.
By fourier transformation, pure speech data in the time domain can be converted into speech data in the frequency domain.
S13, carrying out normalization processing on the frequency domain data.
S14, processing the frequency domain data according to the received priori threshold data to obtain frequency band information.
The prior threshold value data is obtained by processing prior information.
Since the bandwidth range of the voice uttered by the person in the frequency domain is 200 Hz-4000 Hz, the information is introduced as prior information.
S15, acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates.
S16, comparing different assumed frequencies with the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the comparison result is the most similar as the actual sampling rate.
Specifically, the step may include:
and calculating Euclidean distances between different assumed frequencies and prior frequencies, and determining the sampling rate corresponding to the assumed frequency when the Euclidean distance is minimum as the actual sampling rate.
When the Euclidean distance is minimum, the similarity between the assumed frequency and the prior frequency is the highest, and therefore the sampling rate corresponding to the assumed frequency is closest to the actual value, and the actual sampling rate of the pure voice data can be obtained.
When calculating the euclidean distance between different assumed frequencies and the prior frequency, a plurality of frequency points can be selected from the prior frequency for calculation, for example: 200Hz, 4000Hz.
S17, decoding and playing the pure voice data according to the actual sampling rate.
Therefore, for the bad packets losing the sampling rate information in the telephone switching network, decoding and playing of the normal flow can be carried out according to the identified sampling rate, so that the quality and experience of the voice call are improved.
In order to prove the effect brought by the method, the application tests the audio files with various sampling rates, and the recognition results are as follows:
table 1 test accuracy of audio files at different sample rates
Experimental conditions Judging the accuracy Judging the number
Pcm file 8k sampling 85% 100 pieces of
Pcm file 16k sampling 81% 100 pieces of
Pcm file 32k sampling 88% 100 pieces of
The experimental results show that the identification accuracy of the scheme is higher and is more than 80%.
In addition, for communication engines such as IP telephone service and the like, if the wrong sampling rate is set due to misoperation of staff, the sampling rate can be prejudged by the method, so that the staff can be timely reminded, and unnecessary risks and losses are reduced; for common audio processing software, the user can be timely reminded of setting errors of the sampling rate, and time waste and redundant operation of the user in life and work are reduced.
Example two
As shown in fig. 2, the present application further provides a pure voice data sampling rate recognition device, including:
a conversion module 21 for performing fourier transform on the pure voice data to obtain frequency domain data;
an obtaining module 22, configured to process the frequency domain data according to the received prior threshold data to obtain frequency band information; a high-frequency cut-off frequency point for acquiring frequency band information;
a calculating module 23, configured to calculate a corresponding assumed frequency at the high-frequency cutoff frequency point according to different preset sampling rates;
the processing module 24 is configured to compare the different assumed frequencies with the prior frequency, and determine the sampling rate corresponding to the assumed frequency when the comparison result is most similar as the actual sampling rate.
Preferably, the prior frequency ranges from 200Hz to 4000Hz.
Preferably, the processing module 24 is specifically configured to calculate euclidean distances between different assumed frequencies and the prior frequency, and determine, as the actual sampling rate, the sampling rate corresponding to the assumed frequency when the euclidean distance is minimum.
Preferably, the conversion module 21 is further configured to normalize the frequency domain data after performing fourier transform on the pure voice data to obtain the frequency domain data.
Preferably, the pure voice data is pure voice data including voice fragments;
the device further comprises:
a receiving module 25 for receiving voice data;
an analysis module 26 for analyzing the voice data;
the conversion module 21 is further configured to fourier transform the voice data to obtain energy of the voice data when the voice data does not include the sampling rate information;
the processing module 24 is further configured to obtain, according to a preset energy threshold, voice data corresponding to energy greater than the energy threshold to obtain pure voice data including voice segments.
Preferably, the apparatus further comprises:
a decoding module 27 for decoding the pure speech data according to the actual sampling rate.
Example III
The present application also provides a computer system comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the operations of:
performing Fourier transform on the pure voice data to obtain frequency domain data;
processing the frequency domain data according to the received prior threshold value data to obtain frequency band information;
acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate.
Fig. 3 illustrates an architecture of a computer system, which may include, among other things, a processor 32, a video display adapter 34, a disk drive 36, an input/output interface 38, a network interface 310, and a memory 312. The processor 32, video display adapter 34, disk drive 36, input/output interface 38, network interface 310, and memory 312 may be communicatively coupled via a communication bus 314.
The processor 32 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for executing relevant programs to implement the technical scheme provided by the present application.
The Memory 312 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage device, dynamic storage device, or the like. Memory 312 may store an operating system 316 for controlling the operation of computer system 30, and a Basic Input Output System (BIOS) 318 for controlling the low-level operation of the computer system. In addition, web browser 320, data storage management system 322, and the like may also be stored. In general, when the technical solution provided by the present application is implemented by software or firmware, the relevant program code is stored in the memory 312 and invoked by the processor 32 for execution.
The input/output interface 38 is used to connect with an input/output module to enable information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
The network interface 310 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Communication bus 314 includes a path to transfer information between various components of the device (e.g., processor 32, video display adapter 34, disk drive 36, input/output interface 38, network interface 310, and memory 312.
In addition, the computer system may also obtain information of specific acquisition conditions from the virtual resource object acquisition condition information database, for performing condition judgment, and the like.
It should be noted that although the above-described devices illustrate only the processor 32, video display adapter 34, disk drive 36, input/output interface 38, network interface 310, memory 312, communication bus 314, etc., the device may include other components necessary to achieve proper operation in an implementation.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a cloud server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application. In addition, the computer system, the pure voice data sampling rate recognition device and the pure voice data sampling rate recognition method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in method embodiments and are not described herein again.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A method for identifying a sample rate of pure speech data, the method comprising:
performing Fourier transform on the pure voice data to obtain frequency domain data;
processing the frequency domain data according to the received prior threshold data to obtain frequency band information;
acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate.
2. The method of claim 1, wherein the prior frequency ranges from 200Hz to 4000Hz.
3. The method of claim 1, wherein comparing the different hypothesized frequencies with the prior frequency, and wherein determining the sampling rate corresponding to the hypothesized frequency when the comparison is most similar as the actual sampling rate comprises:
and calculating Euclidean distances between different assumed frequencies and the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the Euclidean distance is minimum as the actual sampling rate.
4. A method according to any one of claims 1-3, characterized in that after fourier transforming the pure speech data to obtain frequency domain data, the method further comprises:
and carrying out normalization processing on the frequency domain data.
5. A method according to any one of claims 1 to 3, wherein the pure speech data is pure speech data comprising speech segments;
the method for acquiring the pure voice data containing the voice fragments comprises the following steps:
receiving voice data and analyzing the voice data;
when the voice data does not comprise the sampling rate information, carrying out Fourier transform on the voice data to obtain the energy of the voice data;
and acquiring voice data corresponding to energy larger than the energy threshold according to a preset energy threshold to obtain the pure voice data containing the voice fragments.
6. A method according to any one of claims 1 to 3, further comprising:
and decoding the pure voice data according to the actual sampling rate.
7. A pure speech data sample rate recognition device, comprising:
the conversion module is used for carrying out Fourier transform on the pure voice data to obtain frequency domain data;
the acquisition module is used for processing the frequency domain data according to the received priori threshold data to obtain frequency band information; and a high-frequency cut-off frequency point for acquiring the frequency band information;
the calculation module is used for calculating the corresponding hypothesis frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and the processing module is used for comparing different assumed frequencies with the prior frequency and determining the sampling rate corresponding to the assumed frequency when the comparison result is the most similar as the actual sampling rate.
8. The apparatus of claim 7, wherein the prior frequency ranges from 200Hz to 4000Hz.
9. The apparatus of claim 7, wherein the processing module is specifically configured to:
and calculating Euclidean distances between different assumed frequencies and the prior frequency, and determining the sampling rate corresponding to the assumed frequency when the Euclidean distance is minimum as the actual sampling rate.
10. A computer system, comprising:
one or more processors; and
a memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the operations of:
performing Fourier transform on the pure voice data to obtain frequency domain data;
processing the frequency domain data according to the received prior threshold data to obtain frequency band information;
acquiring a high-frequency cut-off frequency point of the frequency band information, and calculating a corresponding assumed frequency at the high-frequency cut-off frequency point according to different preset sampling rates;
and comparing different hypothesis frequencies with the prior frequency, and determining the sampling rate corresponding to the hypothesis frequency when the comparison result is most similar as the actual sampling rate.
CN202010160577.4A 2020-03-10 2020-03-10 Pure voice data sampling rate identification method, device and system Active CN111354365B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010160577.4A CN111354365B (en) 2020-03-10 2020-03-10 Pure voice data sampling rate identification method, device and system
PCT/CN2020/097008 WO2021179470A1 (en) 2020-03-10 2020-06-19 Method, device and system for recognizing sampling rate of pure voice data
CA3175103A CA3175103A1 (en) 2020-03-10 2020-06-19 Method for sampling-rate recognition of pure voice data, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010160577.4A CN111354365B (en) 2020-03-10 2020-03-10 Pure voice data sampling rate identification method, device and system

Publications (2)

Publication Number Publication Date
CN111354365A CN111354365A (en) 2020-06-30
CN111354365B true CN111354365B (en) 2023-10-31

Family

ID=71196071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010160577.4A Active CN111354365B (en) 2020-03-10 2020-03-10 Pure voice data sampling rate identification method, device and system

Country Status (3)

Country Link
CN (1) CN111354365B (en)
CA (1) CA3175103A1 (en)
WO (1) WO2021179470A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113447713B (en) * 2021-06-25 2023-03-07 南京丰道电力科技有限公司 Fourier-based fast high-precision power system frequency measurement method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320560A (en) * 2008-07-01 2008-12-10 上海大学 Method for speech recognition system improving discrimination by using sampling velocity conversion
CN101582264A (en) * 2009-06-12 2009-11-18 瑞声声学科技(深圳)有限公司 Method and voice collecting system for speech enhancement
JP2012002858A (en) * 2010-06-14 2012-01-05 Pioneer Electronic Corp Time scaling method, pitch shift method, audio data processing apparatus and program
CN103745726A (en) * 2013-11-07 2014-04-23 中国电子科技集团公司第四十一研究所 Self-adaptive variable-sampling rate audio frequency sampling method
CN105513590A (en) * 2015-11-23 2016-04-20 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN107833581A (en) * 2017-10-20 2018-03-23 广州酷狗计算机科技有限公司 A kind of method, apparatus and readable storage medium storing program for executing of the fundamental frequency for extracting sound
CN109328383A (en) * 2016-06-27 2019-02-12 高通股份有限公司 Use the audio decoder of intermediate samples rate
CN109509478A (en) * 2013-04-05 2019-03-22 杜比国际公司 Apparatus for processing audio

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7046857B2 (en) * 1997-07-31 2006-05-16 The Regents Of The University Of California Apparatus and methods for image and signal processing
CN102332266B (en) * 2010-07-13 2013-04-24 炬力集成电路设计有限公司 Audio data encoding method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320560A (en) * 2008-07-01 2008-12-10 上海大学 Method for speech recognition system improving discrimination by using sampling velocity conversion
CN101582264A (en) * 2009-06-12 2009-11-18 瑞声声学科技(深圳)有限公司 Method and voice collecting system for speech enhancement
JP2012002858A (en) * 2010-06-14 2012-01-05 Pioneer Electronic Corp Time scaling method, pitch shift method, audio data processing apparatus and program
CN109509478A (en) * 2013-04-05 2019-03-22 杜比国际公司 Apparatus for processing audio
CN103745726A (en) * 2013-11-07 2014-04-23 中国电子科技集团公司第四十一研究所 Self-adaptive variable-sampling rate audio frequency sampling method
CN105513590A (en) * 2015-11-23 2016-04-20 百度在线网络技术(北京)有限公司 Voice recognition method and device
CN109328383A (en) * 2016-06-27 2019-02-12 高通股份有限公司 Use the audio decoder of intermediate samples rate
CN107833581A (en) * 2017-10-20 2018-03-23 广州酷狗计算机科技有限公司 A kind of method, apparatus and readable storage medium storing program for executing of the fundamental frequency for extracting sound

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨佳俊.网络音频质量无参考客观评估.中国优秀硕士学位论文全文数据库 信息科技辑.2017,(第03期),第47-52页. *

Also Published As

Publication number Publication date
WO2021179470A1 (en) 2021-09-16
CN111354365A (en) 2020-06-30
CA3175103A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
CN109410986B (en) Emotion recognition method and device and storage medium
CN111739542A (en) Method, device and equipment for detecting characteristic sound
CN110875059A (en) Method and device for judging reception end and storage device
CN111916109A (en) Feature-based audio classification method and device and computing equipment
CN111354365B (en) Pure voice data sampling rate identification method, device and system
CN115394318A (en) Audio detection method and device
CN110689885A (en) Machine-synthesized speech recognition method, device, storage medium and electronic equipment
CN112992190B (en) Audio signal processing method and device, electronic equipment and storage medium
CN113782036A (en) Audio quality evaluation method and device, electronic equipment and storage medium
CN111326159B (en) Voice recognition method, device and system
WO2020186695A1 (en) Voice information batch processing method and apparatus, computer device, and storage medium
CN111696529A (en) Audio processing method, audio processing device and readable storage medium
CN116886225A (en) Method, device, equipment and medium for judging working state of emergency broadcast terminal
CN105791602B (en) Sound quality testing method and system
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
CN113851114B (en) Method and device for determining fundamental frequency of voice signal
CN114420165A (en) Audio circuit testing method, device, equipment and storage medium
CN114155845A (en) Service determination method and device, electronic equipment and storage medium
CN111028860B (en) Audio data processing method and device, computer equipment and storage medium
CN109671437B (en) Audio processing method, audio processing device and terminal equipment
WO2021051533A1 (en) Address information-based blacklist identification method, apparatus, device, and storage medium
WO2021143095A1 (en) Dialing test method and apparatus, and computer device and storage medium
CN112863548A (en) Method for training audio detection model, audio detection method and device thereof
CN111782860A (en) Audio detection method and device and storage medium
CN114678040B (en) Voice consistency detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant