WO2022227365A1 - 基于词库学习的饱和潜水氦语音解读方法 - Google Patents
基于词库学习的饱和潜水氦语音解读方法 Download PDFInfo
- Publication number
- WO2022227365A1 WO2022227365A1 PCT/CN2021/116054 CN2021116054W WO2022227365A1 WO 2022227365 A1 WO2022227365 A1 WO 2022227365A1 CN 2021116054 W CN2021116054 W CN 2021116054W WO 2022227365 A1 WO2022227365 A1 WO 2022227365A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- helium
- speech
- network
- learning
- Prior art date
Links
- 239000001307 helium Substances 0.000 title claims abstract description 91
- 229910052734 helium Inorganic materials 0.000 title claims abstract description 91
- SWQJXJOGLNCZEY-UHFFFAOYSA-N helium atom Chemical compound [He] SWQJXJOGLNCZEY-UHFFFAOYSA-N 0.000 title claims abstract description 86
- 230000009189 diving Effects 0.000 title claims abstract description 78
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012937 correction Methods 0.000 claims abstract description 57
- 238000010801 machine learning Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims description 3
- 238000003066 decision tree Methods 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 238000012549 training Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000007789 gas Substances 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 241000272525 Anas platyrhynchos Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- KFVPJMZRRXCXAO-UHFFFAOYSA-N [He].[O] Chemical compound [He].[O] KFVPJMZRRXCXAO-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the invention relates to a helium speech interpretation technology, and more particularly to a saturation diving helium speech interpretation method based on thesaurus learning.
- the 21st century is the era of ocean economy. We will obtain more than 50% of the clean energy and production materials needed by human beings from the ocean. Saturation diving has important application value in the fields of navigation, marine development, military marine, and marine rescue, and is an indispensable part of marine economic development.
- the existing saturation diving helium voice interpreters at home and abroad all use the helium voice interpreter in the diving cabin to manually adjust the frequency domain or time delay characteristics of the interpreter to interpret the helium voice, and cannot adapt to the depth of saturation diving operations. , and the interpretation effect is not ideal.
- the interpretation quality of the helium voice interpreter decreases rapidly, especially when the diver's diving depth is changing, it cannot effectively interpret the diver's helium voice.
- the saturation diving helium speech interpretation technology based on artificial intelligence is in its infancy at home and abroad, and there are few practical technical solutions. How to make full use of the capabilities of artificial intelligence machine learning, combined with the particularity of divers and working languages in saturation diving scenarios, to effectively interpret helium speech is an unsolved technical problem.
- the purpose of the present invention is to overcome the above-mentioned defects of the prior art, and provide a saturation diving helium speech interpretation method based on thesaurus learning.
- the present invention makes full use of the characteristics of the individual voice signals of divers and the vocabulary information of the working language lexicon, and proposes a lexicon-based method. Learn the Helium phonetic interpretation method.
- the correction network uses a supervised learning algorithm to learn the helium speech of different divers at different diving depths to obtain the correction network parameter set;
- the diver fits his helium speech signal with the vector signal of the correction network during diving operations, and selects
- the network parameters corresponding to the vector signal with the highest fitting degree are used as the parameters of the correction network, and the correction speech signal is obtained by correcting the diver's helium speech;
- the degree of fit is screened to generate supervised signals and vector signals for interpreting network machine learning, and the interpreting network uses a supervised learning algorithm to further learn the corrected speech signal; finally, the interpreting network interprets the corrected speech signal to complete the complete interpretation of helium speech.
- the method makes full use of the characteristics of individual speech signals of divers in different environments and
- a kind of saturation diving helium speech interpretation method based on vocabulary learning of the present invention described method comprises at least 1 diver, 1 helium speech correction network and 1 helium speech interpretation network, diver.
- the helium voice signal is S
- the helium voice interpretation technology includes the following steps:
- the first stage - correction network learning is the first stage - correction network learning
- Step 1 Thesaurus signal construction——according to the requirements of the saturation diving operation specification, construct the dictionary K of the commonly used working language of divers saturation diving operations;
- Step 5 Selection of correction network parameters - Fit the working voice S (helium voice) of the diver's saturation diving operation with all the vector signals Y i , l in the vector signal set Y , and select a vector signal Y with the highest fitting degree
- the parameters C n , l corresponding to n , l are used as the network parameters of the calibration network;
- Step 6 helium voice correction—use the helium voice signal S as the input signal of the correction network (at this time, the network parameter of the correction network is C n , l ), correct the helium voice signal S , and generate the correction voice signal T ;
- Step 7 Interpret network learning - compare the speech in the corrected speech signal T with the supervision signals in the correction network machine learning supervision signal set X by text, calculate the degree of fit between them, and select from the supervision signal set X
- the voice corresponding to the text with the highest degree of fit is matched with the voice corresponding to the corrected voice signal T text into groups, and these matched voices are sorted in the order of the degree of fit, and the combination of the top p % of the degree of fit is selected, so
- the voice of the corrected voice signal T in the combination is used as the vector signal U for interpreting the network machine learning
- the voice corresponding to the text of the supervision signal set X in the combination is used as the supervision signal V for interpreting the network machine learning, and the interpretation network performs supervised learning;
- Step 8 Helium speech interpretation - use the corrected speech signal T as the input signal of the interpretation network to complete the interpretation of the helium speech S.
- the present invention also has the following features:
- the constructed divers’ saturation diving common working language thesaurus K is set according to the requirements of the helium speech interpreter using the unit saturation diving operation specification. Different use units, common working language thesaurus K will be different.
- step 2 each diver has a supervision signal. Different divers have different supervision signals due to their different pronunciations.
- the depths of the helium voice test points h 1 , h 2 , h 3 , ..., h L are required to cover the preset depths of salvage diving operations uniformly, but can also non-uniformly cover the preset depths of salvage diving operations .
- step 3 the number of test points is determined by the preset salvage diving operation depth and the interval between test points. Interpretation complexity is higher.
- each diver has a corresponding vector signal at each test point (different diving depth).
- the adopted learning algorithm may be any form of supervised learning algorithm, and may also be any form of semi-supervised learning algorithm.
- the correction network structure corresponds to the learning algorithm selected in the step 4.
- the fit evaluation index used is the Euclidean distance between the helium speech S and the vector signals Y i , l , but may also be other evaluation indices such as mean and variance.
- the fitting evaluation index used is the Euclidean distance between the corrected speech signal T and the vocabulary in the thesaurus K , but may also be other evaluation indexes such as mean and variance.
- the phrase screening ratio p is related to the set size of the thesaurus K.
- the larger the thesaurus K the greater the probability that the divers’ conversation vocabulary at work falls in the thesaurus K.
- the phonetic interpretation is also more complete; usually, the number of words in the thesaurus K is between 100 and 300, and the size of p is selected between 85 and 98.
- the adopted learning algorithm can be any form of supervised learning algorithm, such as K-nearest neighbor algorithm, decision tree, etc., or can be any form of semi-supervised learning algorithm, such as self-training algorithm , semi-supervised support vector machines, etc.
- the correction network structure corresponds to the learning algorithm selected in the step 7.
- the corrected voice signal T can be directly output as the helium voice interpretation signal.
- steps 1-8 steps 1-4 are completed by the diver in the diving cabin (preparatory work for the diving operation), and steps 5-8 are completed by the diver during the deep-sea diving operation.
- the supervision signal of the correction network can adopt the text label, and now the diver does not need to read the text in the thesaurus K , and directly adopts the thesaurus K as the supervision signal X ;
- the generated corrected speech signal T is also in text
- the helium speech interpretation signal generated in the step 8 is also in text.
- the method of the present invention is in the interpretation of helium voice, and utilizes the characteristics of individual voice signals of divers under different environments, the vocabulary information of the working language lexicon and the machine learning ability of the artificial intelligence network, thereby producing the following beneficial effects:
- Figure 1 is a flow chart of helium speech interpretation.
- helium voice correction network In the system including divers, helium voice correction network and helium voice interpretation network, firstly, according to the requirements of saturation diving operation specifications, a dictionary of common working languages for divers is established, and the divers read the working language aloud in the normal atmospheric pressure environment and the corresponding environment of saturation diving operations respectively.
- Thesaurus generates supervised signals and vector signals for the correction network machine learning.
- the correction network uses a supervised learning algorithm to learn the helium speech of different divers at different diving depths to obtain the correction network parameter set; secondly, the divers use their helium speech during diving operations.
- the signal is fitted with the vector signal of the correction network, the network parameters corresponding to the vector signal with the highest fitting degree are selected as the parameters of the correction network, and the helium voice of the diver is corrected to obtain the corrected voice signal; then, the corrected voice signal is obtained.
- Step 1 Construction of thesaurus signal - according to the requirements of the saturation diving operation specification, construct the vocabulary library K of the working language commonly used in the saturation diving operation of divers.
- a common working language vocabulary K consisting of 150 words such as "diving, splint, temperature, pressure" is constructed.
- the words in the thesaurus K were read aloud by 2 divers respectively, and the sets X 1 and X 2 (speech signals) of the correction network machine learning supervision signals (speech signals) were generated.
- the saturation diving operation depth range is 200m ⁇ 250m
- the test point interval is 10m
- the saturation diving depths of 2 divers in the diving cabin are 200m, 210m, 220m, 230m, 240m and 250m.
- the correction network machine learning vector signal (speech signal) Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2,2 , Y 2,3 , Y 2,4 , Y 2,5 and Y 2,6 .
- the correction network uses the K-Nearest Neighbors algorithm for supervised learning.
- the correction network corresponds to different vector signals Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2 , 2 , Y 3,3 , Y 4,4 , Y 5,5 and Y 6,6 and the supervisory signals X 1 and X 2 to generate corresponding correction network parameters C 1,1 , C 1,2 , C 1 , 3 , C1,4 , C1,5 , C1,6 , C2,1 , C2,2 , C2,3 , C2,4 , C2,5 , and C2,6 ; when calibrating the network When the input vector signal is Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , its supervisory signal is X 1 ; when the input vector signal of the correction network
- Step 5 Selection of correction network parameters - Fit the working voice S (helium voice) of the diver's normal saturation diving operation with all vector signals Y i , l in the vector signal set Y , and select a vector signal with the highest fitting degree
- the network parameters C n , l corresponding to Y n , l are used as the network parameters of the correction network.
- diver 1 is working, so combine diver 1's working speech signal - helium speech S with all vector signals Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2,2 , Y 2,3 , Y 2,4 , Y 2,5 and Y 2,6 were fitted respectively, and the vector signal Y with the highest fitting degree was selected
- the network parameter C 1,3 corresponding to 1,3 is used as the network parameter of the calibration network, and the Euclidean distance is used as the evaluation index during fitting.
- Step 6 Helium voice correction—use the helium voice signal S as the input signal of the correction network (at this time, the network parameters of the correction network are C n , l ), correct the helium voice signal S , and generate the correction voice signal T .
- the correction network parameter used by the correction network to correct the helium speech signal S is C 1,3
- the generated corrected speech signal is T .
- Step 7 Interpret network learning - compare the speech in the corrected speech signal T with the supervision signals in the correction network machine learning supervision signal set X by text, calculate the degree of fit between them, and select from the supervision signal set X
- the voice corresponding to the text with the highest degree of fit is matched with the voice corresponding to the corrected voice signal T text into groups, and these matched voices are sorted in the order of the degree of fit, and the combination of the top p % of the degree of fit is selected, so
- the speech of the corrected speech signal T in the above combination is used as the vector signal U for interpreting network machine learning
- the speech corresponding to the text of the supervision signal set X in the combination is used as the supervision signal V for interpreting network machine learning, and the interpreting network performs supervised learning.
- Euclidean distance is used to compare the corrected speech signal T with the supervisory signals in the supervisory signal set X by text, and select the speech corresponding to the character with the highest degree of fit in the supervisory signal set X and the corrected speech signal T.
- the corresponding voices are matched into groups, and the matched voices are sorted in the order of fit, and the voice signals in the corrected voice signal T corresponding to the top 90% matching groups are selected as the vector for interpreting network machine learning.
- the signal U , the corresponding speech signal in the supervision signal set X is used as the supervision signal V for the machine learning of the interpretation network, the interpretation network performs supervised learning, and the interpretation network uses the K-nearest neighbor algorithm for supervised learning.
- Step 8 Helium speech interpretation - use the corrected speech signal T as the input signal of the interpretation network to complete the interpretation of the helium speech S.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (10)
- 一种基于词库学习的饱和潜水氦语音解读方法,包括至少1个潜水员、1个氦语音校正网络和1个氦语音解读网络,潜水员的氦语音信号为 S,所述氦语音解读方法包括如下步骤:第一阶段——校正网络学习步骤1、词库信号构建——根据饱和潜水作业规范要求,构建潜水员饱和潜水作业常用工作语言词库 K;步骤2、监督信号生成——在正常大气压环境下,潜水员 i朗读词库 K中的文字获得监督信号 X i ,从而生成校正网络机器学习监督信号集 X={ X i }, i=1,2,…, I, I为潜水员的人数;步骤3、矢量信号生成——潜水员 i分别在饱和潜水深度 h 1, h 2, h 3,…, h L 对应的环境下朗读词库 K中的文字获得矢量信号 Y i , l , l=1,2,…, L,从而生成校正网络机器学习的矢量信号集 Y={ Y i , l };步骤4、校正网络学习——以矢量信号 Y i , l 作为输入信号,以监督信号 X i 作为期望输出信号,校正网络进行监督学习,形成与矢量信号 Y i , l 相对应的校正网络参数集 C={ C i , l };第二阶段——氦语音解读步骤5、校正网络参数选取——将潜水员饱和潜水作业时的氦语音信号 S与矢量信号集 Y中所有矢量信号 Y i , l 进行拟合,选择一个拟合度最高的矢量信号 Y n , l 所对应的参数 C n , l 作为校正网络的网络参数;步骤6、氦语音校正——将氦语音信号 S作为校正网络的输入信号,对氦语音信号 S进行校正,生成校正语音信号 T;步骤7、解读网络学习——将校正语音信号 T中的语音与校正网络机器学习监督信号集 X中监督信号按文字进行逐一比较,计算他们之间的拟合度,在监督信号集 X中选取拟合度最高的文字对应的语音与校正语音信号 T文字对应的语音相匹配成组,并将这些匹配成组的语音按拟合度高低顺序排序,选取拟合度前 p%的组合,所述组合中校正语音信号 T的语音作为解读网络机器学习的矢量信号 U,所述组合中监督信号集 X的文字对应的语音作为解读网络机器学习的监督信号 V,解读网络进行监督学习;步骤8、氦语音解读——以校正语音信号 T作为解读网络的输入信号,完成氦语音 S的解读。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤5和步骤7中,所述拟合度的评价指标为欧氏距离或方差,欧氏距离越小拟合度越高,方差越小拟合度越高。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:根据氦语音解读器使用单位饱和潜水作业规范要求进行潜水员饱和潜水作业常用工作语言词库 K的设置。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:氦语音测试点深度 h 1, h 2, h 3,…, h L 均匀覆盖打捞潜水作业的预设深度。
- 根据权利要求4所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:测试点的个数根据预设打捞潜水作业深度和测试点间隔确定。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:所述步骤2中,当校正网络的监督信号采用文字标签时,直接采用词库 K作为监督信号 X;对应的,所述步骤6中所产生的校正语音信号 T也是文字,所述步骤8中所产生的氦语音解读信号为文字。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:词库 K中的词汇在100个~300个之间,则 p的大小选取在85~98之间。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤4和步骤7中采用的学习方法为K-近邻算法、决策树算法;或者自训练算法、半监督支持向量机算法。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:对潜水员的语音进行失真识别,若失真较低,则校正语音信号 T可以直接作为氦语音解读信号输出。
- 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤1-步骤4是潜水员在潜水舱中完成的,步骤5-步骤8是潜水员在深海潜水作业时完成的。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021232744A AU2021232744B2 (en) | 2021-04-26 | 2021-09-01 | Lexicon learning-based heliumspeech unscrambling method in saturation diving |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110450616.9A CN113178207B (zh) | 2021-04-26 | 2021-04-26 | 基于词库学习的饱和潜水氦语音解读方法 |
CN202110450616.9 | 2021-04-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022227365A1 true WO2022227365A1 (zh) | 2022-11-03 |
Family
ID=76926012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/116054 WO2022227365A1 (zh) | 2021-04-26 | 2021-09-01 | 基于词库学习的饱和潜水氦语音解读方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113178207B (zh) |
WO (1) | WO2022227365A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113178207B (zh) * | 2021-04-26 | 2021-10-08 | 南通大学 | 基于词库学习的饱和潜水氦语音解读方法 |
CN114120976A (zh) * | 2021-11-16 | 2022-03-01 | 南通大学 | 基于多目标优化的饱和潜水氦语音解读方法及系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111976924A (zh) * | 2020-08-12 | 2020-11-24 | 厦门大学 | 一种用于潜水全面罩的实时信息交流装置 |
CN113178207A (zh) * | 2021-04-26 | 2021-07-27 | 南通大学 | 基于词库学习的饱和潜水氦语音解读方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB1187536A (en) * | 1968-08-28 | 1970-04-08 | Standard Telephones Cables Ltd | Processor for Helium Speech |
US3813687A (en) * | 1972-11-29 | 1974-05-28 | Us Navy | Instant replay helium speech unscrambler using slowed tape for correction |
US3863026A (en) * | 1973-08-15 | 1975-01-28 | Us Navy | Helium speech decoder |
US3965298A (en) * | 1975-05-05 | 1976-06-22 | Long Enterprises | Deep sea diving speech converter |
FR2332670A1 (fr) * | 1975-11-19 | 1977-06-17 | Zurcher Jean | Transcodeur de voix en atmosphere d'helium |
JPH036964A (ja) * | 1989-06-02 | 1991-01-14 | Fuosutekusu Kk | 水中通話装置 |
JPH11327598A (ja) * | 1998-05-20 | 1999-11-26 | Oki Electric Ind Co Ltd | ヘリウム音声修復装置 |
JP2010134260A (ja) * | 2008-12-05 | 2010-06-17 | Sanyo Electric Co Ltd | 電子機器及び音声処理方法 |
JP5597575B2 (ja) * | 2011-02-23 | 2014-10-01 | 国立大学法人 琉球大学 | 通信装置 |
NO333567B1 (no) * | 2011-05-16 | 2013-07-08 | Kongsberg Seatex As | Fremgangsmate og system for maritim, hoyhastighets bredbandskommunikasjonsnettverk-oppbygging |
US9564146B2 (en) * | 2014-08-01 | 2017-02-07 | Bongiovi Acoustics Llc | System and method for digital signal processing in deep diving environment |
-
2021
- 2021-04-26 CN CN202110450616.9A patent/CN113178207B/zh active Active
- 2021-09-01 WO PCT/CN2021/116054 patent/WO2022227365A1/zh active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111976924A (zh) * | 2020-08-12 | 2020-11-24 | 厦门大学 | 一种用于潜水全面罩的实时信息交流装置 |
CN113178207A (zh) * | 2021-04-26 | 2021-07-27 | 南通大学 | 基于词库学习的饱和潜水氦语音解读方法 |
Non-Patent Citations (3)
Title |
---|
LI DONGMEI, ZHANG SHIBING, GUO LILI, CHEN YONGHONG: "Helium Speech Correction Algorithm Based on Deep Neural Networks", 2020 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), IEEE, 21 October 2020 (2020-10-21) - 23 October 2020 (2020-10-23), pages 99 - 103, XP055982422, ISBN: 978-1-7281-7236-1, DOI: 10.1109/WCSP49889.2020.9299782 * |
XIA WANG, DU GUI-MING, GUANG-YAN WANG, YAN ZHANG: "Mask speech recognition based on convolutional neural network", TRANSDUCER AND MICROSYSTEM TECHNOLOGIES, ZHONGGUO DIANZHI KEJI JITUAN GONGSI DI-49 YANJIUSUO, CN, vol. 36, no. 10, 31 October 2017 (2017-10-31), CN , pages 34 - 37, XP055982419, ISSN: 2096-2436 * |
ZHANG SHIBING; GUO LILI; LI HONGJUN; BAO ZHIHUA; ZHANG XIAOGE; CHEN YONGHONG: "A survey on heliumspeech communications in saturation diving", CHINA COMMUNICATIONS, CHINA INSTITUTE OF COMMUNICATIONS, PISCATAWAY, NJ, USA, vol. 17, no. 6, 1 June 2020 (2020-06-01), Piscataway, NJ, USA , pages 68 - 79, XP011795154, ISSN: 1673-5447, DOI: 10.23919/JCC.2020.06.006 * |
Also Published As
Publication number | Publication date |
---|---|
CN113178207A (zh) | 2021-07-27 |
CN113178207B (zh) | 2021-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jain et al. | Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning. | |
WO2022227365A1 (zh) | 基于词库学习的饱和潜水氦语音解读方法 | |
EP0342630B1 (en) | Speech recognition with speaker adaptation by learning | |
Xie et al. | Sequence error (SE) minimization training of neural network for voice conversion. | |
CN110875035A (zh) | 新型多任务联合的语音识别训练架构和方法 | |
CN110459208A (zh) | 一种基于知识迁移的序列到序列语音识别模型训练方法 | |
KR102152902B1 (ko) | 음성 인식 모델을 학습시키는 방법 및 상기 방법을 이용하여 학습된 음성 인식 장치 | |
CN111243591B (zh) | 一种引入外部数据校正的空中管制语音识别方法 | |
CN112349288A (zh) | 基于拼音约束联合学习的汉语语音识别方法 | |
Wang et al. | Speech augmentation using wavenet in speech recognition | |
CN113327595A (zh) | 发音偏误检测方法、装置及存储介质 | |
WO2023087779A1 (zh) | 基于多目标优化的饱和潜水氦语音解读方法及系统 | |
CN114944150A (zh) | 一种基于双任务的Conformer陆空通话声学模型构建方法 | |
Liang et al. | Transformer-based end-to-end speech recognition with residual gaussian-based self-attention | |
Han et al. | DiaCorrect: Error correction back-end for speaker diarization | |
Ashihara et al. | SpeechGLUE: How well can self-supervised speech models capture linguistic knowledge? | |
Koriyama et al. | Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis | |
JPH09179581A (ja) | 音声認識システム | |
Du et al. | Spectrum and prosody conversion for cross-lingual voice conversion with cyclegan | |
Saraclar et al. | Pronunciation ambiguity vs. pronunciation variability in speech recognition | |
AU2021232744B2 (en) | Lexicon learning-based heliumspeech unscrambling method in saturation diving | |
CN112242134A (zh) | 语音合成方法及装置 | |
Savchenko | Phonetic encoding method in the isolated words recognition problem | |
CN116092471A (zh) | 一种面向低资源条件下的多风格个性化藏语语音合成模型 | |
CN111063335B (zh) | 一种基于神经网络的端到端声调识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021232744 Country of ref document: AU Date of ref document: 20210901 Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21938829 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21938829 Country of ref document: EP Kind code of ref document: A1 |