WO2021179470A1 - Procédé, dispositif et système de reconnaissance d'une vitesse d'échantillonnage de données vocales pures - Google Patents
Procédé, dispositif et système de reconnaissance d'une vitesse d'échantillonnage de données vocales pures Download PDFInfo
- Publication number
- WO2021179470A1 WO2021179470A1 PCT/CN2020/097008 CN2020097008W WO2021179470A1 WO 2021179470 A1 WO2021179470 A1 WO 2021179470A1 CN 2020097008 W CN2020097008 W CN 2020097008W WO 2021179470 A1 WO2021179470 A1 WO 2021179470A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frequency
- sampling rate
- data
- voice data
- hypothetical
- Prior art date
Links
- 238000005070 sampling Methods 0.000 title claims abstract description 97
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012545 processing Methods 0.000 claims abstract description 29
- 230000008569 process Effects 0.000 claims description 13
- 239000012634 fragment Substances 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 5
- 238000004891 communication Methods 0.000 description 10
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Definitions
- the invention belongs to the technical field of speech processing, and in particular relates to a method, device and system for recognizing the sampling rate of pure speech data.
- the sampling rate defines the number of samples extracted from a continuous signal per second to form a discrete signal. It is used to describe the sound quality and tone of a sound file, and is a measure of the quality of sound cards and sound files.
- Voice data refers to voice data without sampling rate information. If the sampling rate is not known, if the processing is performed according to the wrong sampling rate, the result of the voice processing Large deviations often occur, resulting in poor output voice effects, which in turn affects the user experience of voice products.
- the present invention proposes a method, device, and system for recognizing the sampling rate of pure voice data.
- the method can improve the appearance of voice data packets transmitted over the network. Bad packet phenomenon, and increase the sampling rate check and remind function of voice communication engine, common audio processing software, etc., to improve the robustness of voice processing equipment.
- the present invention provides a method for identifying the sampling rate of pure voice data, the method including:
- the different hypothetical frequencies are compared with the prior frequencies, and the sampling rate corresponding to the hypothetical frequency when the comparison results are most similar is determined as the actual sampling rate.
- the a priori frequency ranges from 200 Hz to 4000 Hz.
- comparing different hypothetical frequencies with a priori frequencies, and determining the sampling rate corresponding to the hypothetical frequency when the comparison result is the most similar as the actual sampling rate specifically includes:
- the Euclidean distance between different hypothetical frequencies and the prior frequency is calculated, and the sampling rate corresponding to the hypothetical frequency when the Euclidean distance is the smallest is determined as the actual sampling rate.
- the method further includes:
- the pure speech data is pure speech data including speech fragments
- the method for obtaining pure voice data including voice fragments includes:
- the voice data corresponding to the energy greater than the energy threshold is obtained to obtain the pure voice data including the voice segment.
- the method further includes:
- the pure voice data is decoded according to the actual sampling rate.
- the present invention provides a pure voice data sampling rate recognition device, including:
- the conversion module is used to perform Fourier transform on pure voice data to obtain frequency domain data
- An obtaining module configured to process the frequency domain data according to the received a priori threshold data to obtain frequency band information; and to obtain a high-frequency cutoff frequency point of the frequency band information;
- the calculation module is used to calculate the corresponding hypothetical frequency at the high-frequency cut-off frequency point according to different preset sampling rates
- the processing module is used to compare different hypothetical frequencies with a priori frequencies, and determine the sampling rate corresponding to the hypothetical frequency when the comparison result is the most similar as the actual sampling rate.
- the a priori frequency ranges from 200 Hz to 4000 Hz.
- the processing module is specifically used for:
- the Euclidean distance between different hypothetical frequencies and the prior frequency is calculated, and the sampling rate corresponding to the hypothetical frequency when the Euclidean distance is the smallest is determined as the actual sampling rate.
- the conversion module is further configured to perform a normalization process on the frequency domain data after performing Fourier transform on the pure speech data to obtain frequency domain data.
- the pure speech data is pure speech data including speech fragments
- the device also includes:
- the receiving module is used to receive voice data
- the conversion module is further configured to perform Fourier transform on the voice data to obtain the energy of the voice data when the voice data does not include sampling rate information;
- the processing module is further configured to obtain voice data corresponding to energy greater than the energy threshold according to a preset energy threshold to obtain the pure voice data including the voice segment.
- the device further includes:
- the decoding module is configured to decode the pure voice data according to the actual sampling rate.
- the present invention provides a computer system, including:
- One or more processors are One or more processors.
- a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
- the different hypothetical frequencies are compared with the prior frequencies, and the sampling rate corresponding to the hypothetical frequency when the comparison results are most similar is determined as the actual sampling rate.
- the present invention compares with the different hypothetical frequencies of pure voice data, and can determine the actual sampling rate according to the similarity value of the comparison result, thereby Automatically predict the sample rate of pure voice data to prevent problems such as a large impact on the effect of voice processing when the sample rate is unknown, thereby improving the phenomenon of bad packets in the network transmission of voice data packets, and increasing voice Sampling rate checking and reminding functions of communication engine, common audio processing software, etc., improve the robustness of voice processing equipment.
- FIG. 1 is a flowchart of a method for recognizing the sampling rate of pure voice data according to Embodiment 1 of the present application;
- FIG. 2 is a schematic structural diagram of a pure voice data sampling rate recognition device provided in the second embodiment of the present application.
- Fig. 3 is a schematic structural diagram of a computer system provided in the third embodiment of the present application.
- the present application provides a method for identifying the sampling rate of pure voice data, which can be applied to an audio device, and the audio device performs the following processing:
- the pure voice data is pure voice data including voice fragments, and the above steps specifically include:
- the relevant information of the voice data is obtained and analyzed, and the relevant information includes data packets, sampling rate, and so on.
- the voice data corresponding to the energy greater than the energy threshold is obtained to obtain pure voice data including the voice segment.
- the above receiving voice data and preprocessing to obtain pure voice data can also be implemented through the following steps:
- sampling rate information is not included in the voice data, filter the voice data to obtain pure voice data including voice fragments.
- the purpose of the filtering process is to remove information such as noise and silence in the effective voice data, thereby enhancing the voice data.
- the a priori threshold data is obtained by processing the prior information.
- this step may include:
- the Euclidean distance When the Euclidean distance is the smallest, it indicates that the hypothetical frequency at this time and the prior frequency have the highest similarity, and thus the sampling rate corresponding to the hypothetical frequency is closest to the actual value. In this way, the actual sampling rate of pure voice data can be obtained.
- the normal process of decoding and playing can be performed according to the identified sampling rate, thereby improving the quality and experience of voice calls.
- the above methods can also be used to predict the sampling rate, remind the staff in time, and reduce unnecessary risks and losses; for common audio processing
- the software can promptly remind the user that the sampling rate is set incorrectly, and reduce the time waste and redundant operation of the user in life and work.
- the present application also provides a pure voice data sampling rate recognition device, including:
- the conversion module 21 is configured to perform Fourier transform on pure voice data to obtain frequency domain data
- the obtaining module 22 is configured to process the frequency domain data according to the received a priori threshold data to obtain frequency band information; and to obtain the high-frequency cutoff frequency point of the frequency band information;
- the calculation module 23 is configured to calculate the corresponding hypothetical frequency at the high-frequency cut-off frequency point according to different preset sampling rates
- the processing module 24 is configured to compare different hypothetical frequencies with a priori frequencies, and determine the sampling rate corresponding to the hypothetical frequency when the comparison result is the most similar as the actual sampling rate.
- the a priori frequency ranges from 200 Hz to 4000 Hz.
- the above-mentioned processing module 24 is specifically configured to calculate the Euclidean distance between different hypothetical frequencies and the prior frequency, and determine the sampling rate corresponding to the hypothetical frequency when the Euclidean distance is the smallest as the actual sampling rate.
- the above-mentioned conversion module 21 is further configured to perform a normalization process on the frequency domain data after performing Fourier transform on the pure speech data to obtain frequency domain data.
- the above-mentioned pure speech data is pure speech data including speech fragments
- the above device also includes:
- the receiving module 25 is used to receive voice data
- the analysis module 26 is used to analyze voice data
- the conversion module 21 is also used to perform Fourier transform on the voice data to obtain the energy of the voice data when the voice data does not include sampling rate information;
- the above-mentioned processing module 24 is further configured to obtain speech data corresponding to energy greater than the energy threshold according to a preset energy threshold to obtain pure speech data including speech fragments.
- the above-mentioned device further includes:
- the decoding module 27 is used to decode pure voice data according to the actual sampling rate.
- This application also provides a computer system, including:
- One or more processors are One or more processors.
- a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
- the different hypothetical frequencies are compared with the prior frequencies, and the sampling rate corresponding to the hypothetical frequency when the comparison results are most similar is determined as the actual sampling rate.
- FIG. 3 exemplarily shows the architecture of the computer system, which may specifically include a processor 32, a video display adapter 34, a disk drive 36, an input/output interface 38, a network interface 310, and a memory 312.
- the processor 32, the video display adapter 34, the disk drive 36, the input/output interface 38, the network interface 310, and the memory 312 may be communicatively connected through the communication bus 314.
- the processor 32 may be implemented by a general CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. Perform relevant procedures to realize the technical solutions provided in this application.
- a general CPU Central Processing Unit, central processing unit
- a microprocessor central processing unit
- ASIC Application Specific Integrated Circuit
- the memory 312 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
- the memory 312 may store an operating system 316 for controlling the operation of the computer system 30, and a basic input output system (BIOS) 318 for controlling low-level operations of the computer system.
- BIOS basic input output system
- a web browser 320, a data storage management system 322, etc. can also be stored.
- the technical solution provided by the present application is implemented through software or firmware, the related program code is stored in the memory 312 and called and executed by the processor 32.
- the input/output interface 38 is used to connect the input/output module to realize information input and output.
- the input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions.
- the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
- the network interface 310 is used to connect a communication module (not shown in the figure) to implement communication interaction between the device and other devices.
- the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
- the communication bus 314 includes a path to transmit information between various components of the device (for example, the processor 32, the video display adapter 34, the disk drive 36, the input/output interface 38, the network interface 310, and the memory 312).
- the computer system can also obtain information about specific receiving conditions from the virtual resource object receiving condition information database for condition judgment, and so on.
- the above device only shows the processor 32, the video display adapter 34, the disk drive 36, the input/output interface 38, the network interface 310, the memory 312, the communication bus 314, etc., in the specific implementation process,
- the device may also include other components necessary for normal operation.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephonic Communication Services (AREA)
Abstract
Les modes de réalisation de la présente invention concernent un procédé, un dispositif et un système permettant de reconnaître une vitesse d'échantillonnage de données vocales pures. Le procédé consiste : à exécuter une transformée de Fourier sur des données vocales pures pour obtenir des données d'un domaine fréquentiel ; selon les données de seuil antérieures reçues, à traiter les données du domaine fréquentiel pour obtenir des informations de bande de fréquence ; à acquérir un point de fréquence de coupure haute fréquence des informations de bande de fréquence, et à calculer, selon des vitesses d'échantillonnage différentes prédéfinies, des fréquences hypothétiques correspondant au point de fréquence de coupure haute fréquence ; à comparer les différentes fréquences hypothétiques avec une fréquence précédente, et à déterminer la vitesse d'échantillonnage correspondant à la fréquence hypothétique lorsque le résultat de la comparaison est le plus similaire à la vitesse d'échantillonnage réelle. Dans la présente invention, en fonction de la caractéristique antérieure de la plage de largeur de bande dans le domaine fréquentiel d'une voix émise par un individu qui est comprise entre 200 Hz et 4000 Hz, différentes fréquences hypothétiques de données vocales pures sont comparées, et la vitesse d'échantillonnage réelle peut être déterminée en fonction de la valeur de la similarité du résultat de la comparaison. Ainsi, l'amplitude de la vitesse d'échantillonnage des données vocales pures peut être automatiquement prédite, ce qui permet d'éviter des problèmes tels que les effets considérables produits sur un traitement vocal lorsque la vitesse d'échantillonnage est inconnue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3175103A CA3175103A1 (fr) | 2020-03-10 | 2020-06-19 | Procede, dispositif et systeme de reconnaissance d'une vitesse d'echantillonnage de donnees vocales pures |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010160577.4A CN111354365B (zh) | 2020-03-10 | 2020-03-10 | 一种纯语音数据采样率识别方法、装置、系统 |
CN202010160577.4 | 2020-03-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021179470A1 true WO2021179470A1 (fr) | 2021-09-16 |
Family
ID=71196071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/097008 WO2021179470A1 (fr) | 2020-03-10 | 2020-06-19 | Procédé, dispositif et système de reconnaissance d'une vitesse d'échantillonnage de données vocales pures |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN111354365B (fr) |
CA (1) | CA3175103A1 (fr) |
WO (1) | WO2021179470A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113447713B (zh) * | 2021-06-25 | 2023-03-07 | 南京丰道电力科技有限公司 | 一种基于傅式快速高精度的电力系统频率测量方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101320560A (zh) * | 2008-07-01 | 2008-12-10 | 上海大学 | 语音识别系统应用采样速率转化提高识别率的方法 |
CN103745726A (zh) * | 2013-11-07 | 2014-04-23 | 中国电子科技集团公司第四十一研究所 | 一种自适应的变采样率音频采样方法 |
CN105513590A (zh) * | 2015-11-23 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | 语音识别的方法和装置 |
CN109328383A (zh) * | 2016-06-27 | 2019-02-12 | 高通股份有限公司 | 使用中间采样率的音频解码 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7046857B2 (en) * | 1997-07-31 | 2006-05-16 | The Regents Of The University Of California | Apparatus and methods for image and signal processing |
CN101582264A (zh) * | 2009-06-12 | 2009-11-18 | 瑞声声学科技(深圳)有限公司 | 语音增强的方法及语音增加的声音采集系统 |
JP2012002858A (ja) * | 2010-06-14 | 2012-01-05 | Pioneer Electronic Corp | タイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラム |
CN102332266B (zh) * | 2010-07-13 | 2013-04-24 | 炬力集成电路设计有限公司 | 一种音频数据的编码方法及装置 |
CN105247613B (zh) * | 2013-04-05 | 2019-01-18 | 杜比国际公司 | 音频处理系统 |
CN107833581B (zh) * | 2017-10-20 | 2021-04-13 | 广州酷狗计算机科技有限公司 | 一种提取声音的基音频率的方法、装置及可读存储介质 |
-
2020
- 2020-03-10 CN CN202010160577.4A patent/CN111354365B/zh active Active
- 2020-06-19 CA CA3175103A patent/CA3175103A1/fr active Pending
- 2020-06-19 WO PCT/CN2020/097008 patent/WO2021179470A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101320560A (zh) * | 2008-07-01 | 2008-12-10 | 上海大学 | 语音识别系统应用采样速率转化提高识别率的方法 |
CN103745726A (zh) * | 2013-11-07 | 2014-04-23 | 中国电子科技集团公司第四十一研究所 | 一种自适应的变采样率音频采样方法 |
CN105513590A (zh) * | 2015-11-23 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | 语音识别的方法和装置 |
CN109328383A (zh) * | 2016-06-27 | 2019-02-12 | 高通股份有限公司 | 使用中间采样率的音频解码 |
Also Published As
Publication number | Publication date |
---|---|
CN111354365B (zh) | 2023-10-31 |
CA3175103A1 (fr) | 2021-09-16 |
CN111354365A (zh) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7210634B2 (ja) | 音声クエリの検出および抑制 | |
US9412371B2 (en) | Visualization interface of continuous waveform multi-speaker identification | |
US10339956B2 (en) | Method and apparatus for detecting audio signal according to frequency domain energy | |
WO2020181824A1 (fr) | Procédé, appareil et dispositif de reconnaissance d'empreinte vocale et support de stockage lisible par ordinateur | |
US9258425B2 (en) | Method and system for speaker verification | |
WO2019134247A1 (fr) | Procédé d'enregistrement d'empreinte vocale basé sur un modèle de reconnaissance d'empreinte vocale, dispositif terminal et support d'informations | |
US20140358264A1 (en) | Audio playback method, apparatus and system | |
WO2020198354A1 (fr) | Détection d'appels provenant d'assistants vocaux | |
WO2021042537A1 (fr) | Procédé et système d'authentification de reconnaissance vocale | |
WO2021000498A1 (fr) | Procédé, dispositif et appareil de reconnaissance de parole composite et support d'informations lisible par ordinateur | |
CN107580155B (zh) | 网络电话质量确定方法、装置、计算机设备和存储介质 | |
US20060100866A1 (en) | Influencing automatic speech recognition signal-to-noise levels | |
WO2014194641A1 (fr) | Procédé, appareil et système de lecture audio | |
CN111343660B (zh) | 一种应用程序的测试方法及设备 | |
WO2021051566A1 (fr) | Procédé de reconnaissance de la parole synthétisée par machine, appareil, dispositif électronique et support de stockage | |
CN111916109A (zh) | 一种基于特征的音频分类方法、装置及计算设备 | |
WO2021179470A1 (fr) | Procédé, dispositif et système de reconnaissance d'une vitesse d'échantillonnage de données vocales pures | |
US20100172479A1 (en) | Dynamically improving performance of an interactive voice response (ivr) system using a complex events processor (cep) | |
US11146607B1 (en) | Smart noise cancellation | |
WO2018032760A1 (fr) | Procédé et appareil de traitement d'informations vocales | |
KR20170010978A (ko) | 통화 내용 패턴 분석을 통한 보이스 피싱 방지 방법 및 장치 | |
CN113271386B (zh) | 啸叫检测方法及装置、存储介质、电子设备 | |
CN113782036A (zh) | 音频质量评估方法、装置、电子设备和存储介质 | |
WO2021143095A1 (fr) | Procédé et appareil de test de numérotation, dispositif informatique et support d'enregistrement | |
WO2020186695A1 (fr) | Procédé et appareil de traitement par lots d'informations vocales, dispositif informatique et support de stockage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20924735 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3175103 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20924735 Country of ref document: EP Kind code of ref document: A1 |