WO2022227365A1 - 基于词库学习的饱和潜水氦语音解读方法 - Google Patents

基于词库学习的饱和潜水氦语音解读方法 Download PDF

Info

Publication number
WO2022227365A1
WO2022227365A1 PCT/CN2021/116054 CN2021116054W WO2022227365A1 WO 2022227365 A1 WO2022227365 A1 WO 2022227365A1 CN 2021116054 W CN2021116054 W CN 2021116054W WO 2022227365 A1 WO2022227365 A1 WO 2022227365A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
helium
speech
network
learning
Prior art date
Application number
PCT/CN2021/116054
Other languages
English (en)
French (fr)
Inventor
张士兵
吴建绒
郭莉莉
李明
包志华
Original Assignee
南通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南通大学 filed Critical 南通大学
Priority to AU2021232744A priority Critical patent/AU2021232744B2/en
Publication of WO2022227365A1 publication Critical patent/WO2022227365A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the invention relates to a helium speech interpretation technology, and more particularly to a saturation diving helium speech interpretation method based on thesaurus learning.
  • the 21st century is the era of ocean economy. We will obtain more than 50% of the clean energy and production materials needed by human beings from the ocean. Saturation diving has important application value in the fields of navigation, marine development, military marine, and marine rescue, and is an indispensable part of marine economic development.
  • the existing saturation diving helium voice interpreters at home and abroad all use the helium voice interpreter in the diving cabin to manually adjust the frequency domain or time delay characteristics of the interpreter to interpret the helium voice, and cannot adapt to the depth of saturation diving operations. , and the interpretation effect is not ideal.
  • the interpretation quality of the helium voice interpreter decreases rapidly, especially when the diver's diving depth is changing, it cannot effectively interpret the diver's helium voice.
  • the saturation diving helium speech interpretation technology based on artificial intelligence is in its infancy at home and abroad, and there are few practical technical solutions. How to make full use of the capabilities of artificial intelligence machine learning, combined with the particularity of divers and working languages in saturation diving scenarios, to effectively interpret helium speech is an unsolved technical problem.
  • the purpose of the present invention is to overcome the above-mentioned defects of the prior art, and provide a saturation diving helium speech interpretation method based on thesaurus learning.
  • the present invention makes full use of the characteristics of the individual voice signals of divers and the vocabulary information of the working language lexicon, and proposes a lexicon-based method. Learn the Helium phonetic interpretation method.
  • the correction network uses a supervised learning algorithm to learn the helium speech of different divers at different diving depths to obtain the correction network parameter set;
  • the diver fits his helium speech signal with the vector signal of the correction network during diving operations, and selects
  • the network parameters corresponding to the vector signal with the highest fitting degree are used as the parameters of the correction network, and the correction speech signal is obtained by correcting the diver's helium speech;
  • the degree of fit is screened to generate supervised signals and vector signals for interpreting network machine learning, and the interpreting network uses a supervised learning algorithm to further learn the corrected speech signal; finally, the interpreting network interprets the corrected speech signal to complete the complete interpretation of helium speech.
  • the method makes full use of the characteristics of individual speech signals of divers in different environments and
  • a kind of saturation diving helium speech interpretation method based on vocabulary learning of the present invention described method comprises at least 1 diver, 1 helium speech correction network and 1 helium speech interpretation network, diver.
  • the helium voice signal is S
  • the helium voice interpretation technology includes the following steps:
  • the first stage - correction network learning is the first stage - correction network learning
  • Step 1 Thesaurus signal construction——according to the requirements of the saturation diving operation specification, construct the dictionary K of the commonly used working language of divers saturation diving operations;
  • Step 5 Selection of correction network parameters - Fit the working voice S (helium voice) of the diver's saturation diving operation with all the vector signals Y i , l in the vector signal set Y , and select a vector signal Y with the highest fitting degree
  • the parameters C n , l corresponding to n , l are used as the network parameters of the calibration network;
  • Step 6 helium voice correction—use the helium voice signal S as the input signal of the correction network (at this time, the network parameter of the correction network is C n , l ), correct the helium voice signal S , and generate the correction voice signal T ;
  • Step 7 Interpret network learning - compare the speech in the corrected speech signal T with the supervision signals in the correction network machine learning supervision signal set X by text, calculate the degree of fit between them, and select from the supervision signal set X
  • the voice corresponding to the text with the highest degree of fit is matched with the voice corresponding to the corrected voice signal T text into groups, and these matched voices are sorted in the order of the degree of fit, and the combination of the top p % of the degree of fit is selected, so
  • the voice of the corrected voice signal T in the combination is used as the vector signal U for interpreting the network machine learning
  • the voice corresponding to the text of the supervision signal set X in the combination is used as the supervision signal V for interpreting the network machine learning, and the interpretation network performs supervised learning;
  • Step 8 Helium speech interpretation - use the corrected speech signal T as the input signal of the interpretation network to complete the interpretation of the helium speech S.
  • the present invention also has the following features:
  • the constructed divers’ saturation diving common working language thesaurus K is set according to the requirements of the helium speech interpreter using the unit saturation diving operation specification. Different use units, common working language thesaurus K will be different.
  • step 2 each diver has a supervision signal. Different divers have different supervision signals due to their different pronunciations.
  • the depths of the helium voice test points h 1 , h 2 , h 3 , ..., h L are required to cover the preset depths of salvage diving operations uniformly, but can also non-uniformly cover the preset depths of salvage diving operations .
  • step 3 the number of test points is determined by the preset salvage diving operation depth and the interval between test points. Interpretation complexity is higher.
  • each diver has a corresponding vector signal at each test point (different diving depth).
  • the adopted learning algorithm may be any form of supervised learning algorithm, and may also be any form of semi-supervised learning algorithm.
  • the correction network structure corresponds to the learning algorithm selected in the step 4.
  • the fit evaluation index used is the Euclidean distance between the helium speech S and the vector signals Y i , l , but may also be other evaluation indices such as mean and variance.
  • the fitting evaluation index used is the Euclidean distance between the corrected speech signal T and the vocabulary in the thesaurus K , but may also be other evaluation indexes such as mean and variance.
  • the phrase screening ratio p is related to the set size of the thesaurus K.
  • the larger the thesaurus K the greater the probability that the divers’ conversation vocabulary at work falls in the thesaurus K.
  • the phonetic interpretation is also more complete; usually, the number of words in the thesaurus K is between 100 and 300, and the size of p is selected between 85 and 98.
  • the adopted learning algorithm can be any form of supervised learning algorithm, such as K-nearest neighbor algorithm, decision tree, etc., or can be any form of semi-supervised learning algorithm, such as self-training algorithm , semi-supervised support vector machines, etc.
  • the correction network structure corresponds to the learning algorithm selected in the step 7.
  • the corrected voice signal T can be directly output as the helium voice interpretation signal.
  • steps 1-8 steps 1-4 are completed by the diver in the diving cabin (preparatory work for the diving operation), and steps 5-8 are completed by the diver during the deep-sea diving operation.
  • the supervision signal of the correction network can adopt the text label, and now the diver does not need to read the text in the thesaurus K , and directly adopts the thesaurus K as the supervision signal X ;
  • the generated corrected speech signal T is also in text
  • the helium speech interpretation signal generated in the step 8 is also in text.
  • the method of the present invention is in the interpretation of helium voice, and utilizes the characteristics of individual voice signals of divers under different environments, the vocabulary information of the working language lexicon and the machine learning ability of the artificial intelligence network, thereby producing the following beneficial effects:
  • Figure 1 is a flow chart of helium speech interpretation.
  • helium voice correction network In the system including divers, helium voice correction network and helium voice interpretation network, firstly, according to the requirements of saturation diving operation specifications, a dictionary of common working languages for divers is established, and the divers read the working language aloud in the normal atmospheric pressure environment and the corresponding environment of saturation diving operations respectively.
  • Thesaurus generates supervised signals and vector signals for the correction network machine learning.
  • the correction network uses a supervised learning algorithm to learn the helium speech of different divers at different diving depths to obtain the correction network parameter set; secondly, the divers use their helium speech during diving operations.
  • the signal is fitted with the vector signal of the correction network, the network parameters corresponding to the vector signal with the highest fitting degree are selected as the parameters of the correction network, and the helium voice of the diver is corrected to obtain the corrected voice signal; then, the corrected voice signal is obtained.
  • Step 1 Construction of thesaurus signal - according to the requirements of the saturation diving operation specification, construct the vocabulary library K of the working language commonly used in the saturation diving operation of divers.
  • a common working language vocabulary K consisting of 150 words such as "diving, splint, temperature, pressure" is constructed.
  • the words in the thesaurus K were read aloud by 2 divers respectively, and the sets X 1 and X 2 (speech signals) of the correction network machine learning supervision signals (speech signals) were generated.
  • the saturation diving operation depth range is 200m ⁇ 250m
  • the test point interval is 10m
  • the saturation diving depths of 2 divers in the diving cabin are 200m, 210m, 220m, 230m, 240m and 250m.
  • the correction network machine learning vector signal (speech signal) Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2,2 , Y 2,3 , Y 2,4 , Y 2,5 and Y 2,6 .
  • the correction network uses the K-Nearest Neighbors algorithm for supervised learning.
  • the correction network corresponds to different vector signals Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2 , 2 , Y 3,3 , Y 4,4 , Y 5,5 and Y 6,6 and the supervisory signals X 1 and X 2 to generate corresponding correction network parameters C 1,1 , C 1,2 , C 1 , 3 , C1,4 , C1,5 , C1,6 , C2,1 , C2,2 , C2,3 , C2,4 , C2,5 , and C2,6 ; when calibrating the network When the input vector signal is Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , its supervisory signal is X 1 ; when the input vector signal of the correction network
  • Step 5 Selection of correction network parameters - Fit the working voice S (helium voice) of the diver's normal saturation diving operation with all vector signals Y i , l in the vector signal set Y , and select a vector signal with the highest fitting degree
  • the network parameters C n , l corresponding to Y n , l are used as the network parameters of the correction network.
  • diver 1 is working, so combine diver 1's working speech signal - helium speech S with all vector signals Y 1,1 , Y 1,2 , Y 1,3 , Y 1,4 , Y 1,5 , Y 1,6 , Y 2,1 , Y 2,2 , Y 2,3 , Y 2,4 , Y 2,5 and Y 2,6 were fitted respectively, and the vector signal Y with the highest fitting degree was selected
  • the network parameter C 1,3 corresponding to 1,3 is used as the network parameter of the calibration network, and the Euclidean distance is used as the evaluation index during fitting.
  • Step 6 Helium voice correction—use the helium voice signal S as the input signal of the correction network (at this time, the network parameters of the correction network are C n , l ), correct the helium voice signal S , and generate the correction voice signal T .
  • the correction network parameter used by the correction network to correct the helium speech signal S is C 1,3
  • the generated corrected speech signal is T .
  • Step 7 Interpret network learning - compare the speech in the corrected speech signal T with the supervision signals in the correction network machine learning supervision signal set X by text, calculate the degree of fit between them, and select from the supervision signal set X
  • the voice corresponding to the text with the highest degree of fit is matched with the voice corresponding to the corrected voice signal T text into groups, and these matched voices are sorted in the order of the degree of fit, and the combination of the top p % of the degree of fit is selected, so
  • the speech of the corrected speech signal T in the above combination is used as the vector signal U for interpreting network machine learning
  • the speech corresponding to the text of the supervision signal set X in the combination is used as the supervision signal V for interpreting network machine learning, and the interpreting network performs supervised learning.
  • Euclidean distance is used to compare the corrected speech signal T with the supervisory signals in the supervisory signal set X by text, and select the speech corresponding to the character with the highest degree of fit in the supervisory signal set X and the corrected speech signal T.
  • the corresponding voices are matched into groups, and the matched voices are sorted in the order of fit, and the voice signals in the corrected voice signal T corresponding to the top 90% matching groups are selected as the vector for interpreting network machine learning.
  • the signal U , the corresponding speech signal in the supervision signal set X is used as the supervision signal V for the machine learning of the interpretation network, the interpretation network performs supervised learning, and the interpretation network uses the K-nearest neighbor algorithm for supervised learning.
  • Step 8 Helium speech interpretation - use the corrected speech signal T as the input signal of the interpretation network to complete the interpretation of the helium speech S.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)

Abstract

一种基于词库学习的饱和潜水氦语音解读方法,该方法应用于包括潜水员、校正网络和解读网络的系统中,该方法包括:建立饱和潜水作业常用工作语言词库,由潜水员分别在不同环境下朗读,生成校正网络的监督信号和矢量信号;校正网络对不同潜水员在不同潜水深度的氦语音进行学习得到网络校正参数,并对潜水员的氦语音进行校正得到校正语音;解读网络对校正语音进行学习并完成氦语音的解读。该方法通过设置潜水员在不同环境下个体语音信号特征和工作语言词库特征,将机器学习引入到氦语音解读中,解决了饱和潜水氦语音的解读问题,减少了网络对学习样本数的要求,提高了氦语音解读器在不同潜水深度的自适应性,尤其适合于不同潜水深度环境下的氦语音的完备解读。

Description

基于词库学习的饱和潜水氦语音解读方法 技术领域
本发明涉及氦语音解读技术,更为具体地说涉及一种基于词库学习的饱和潜水氦语音解读方法。
背景技术
21世纪是海洋经济时代,我们将要从海洋中获取人类所需的50%以上的清洁能源和生产资料。饱和潜水在航海作业、海洋开发、军事海洋、海上救援等领域有着及其重要的应用价值,是海洋经济发展不可或缺的组成部分。
由于深海作业环境和作业内容的特殊性,海洋中的许多工作还不能由载人深海潜水器或水下机器人去完成,需要潜水员直接下水、暴露在深海高压环境下应用饱和潜水技术进行作业。为了满足潜水员在深海高压环境下的生理需求,目前潜水员基本上都采用氦氧混合气体作为潜水员饱和潜水作业时的呼吸气体。当潜水作业深度超过50米时,潜水员的语音通话失真明显;当潜水作业深度超过100米时,潜水员的语音通话开始严重失真,正常的语音变成奇异的“鸭叫”-氦语音,很难听懂潜水员的语音通话,造成潜水舱内外和潜水员之间联系困难,直接影响到潜水员的深海作业,甚至威胁到潜水员的生命。因此,迫切需要解决深海饱和潜水员的语音通信问题-氦语音解读问题。
目前,国内外现有的饱和潜水氦语音解读器都是通过潜水舱中的氦语音解读器,人工调整解读器的频域或时延特性对氦语音进行解读的,无法自适应饱和潜水作业深度,并且解读效果不理想,当潜水深度大于200米以后,氦语音解读器解读质量迅速下降,特别当潜水员的潜水深度在变化时,无法有效解读潜水员的氦语音。基于人工智能的饱和潜水氦语音解读技术目前国内外都处于刚刚起步阶段,鲜见实用的技术方案。如何充分利用人工智能机器学习的能力,结合饱和潜水场景潜水员和工作语言的特殊性,有效解读氦语音,是一个尚未解决的技术难题。
技术问题
本发明的目的在于,克服上述现有技术的缺陷,提供一种基于词库学习的饱和潜水氦语音解读方法。
考虑到饱和潜水场景下,潜水员的人群是特定的,在潜水作业时潜水员的工作语言是有限的,本发明充分利用了潜水员个体语音信号特征和工作语言词库词汇信息,提出一种基于词库学习的氦语音解读方法。在该方法中,首先根据饱和潜水作业规范要求建立潜水员常用工作语言词库,由潜水员分别在正常大气压环境下和饱和潜水作业对应环境下朗读工作语言词库,生成校正网络机器学习的监督信号和矢量信号,校正网络采用监督学习算法对不同潜水员在不同潜水深度的氦语音进行学习得到校正网络参数集;其次,潜水员在潜水作业时将其氦语音信号与校正网络的矢量信号进行拟合,选择拟合度最高的矢量信号所对应的网络参数作为校正网络的参数,对潜水员的氦语音进行校正得到校正语音信号;然后,将校正得到的语音信号与常用工作语言词库进行拟合,并按拟合度高低进行筛选,生成解读网络机器学习的监督信号和矢量信号,解读网络采用监督学习算法对校正语音信号进行进一步学习;最后,解读网络对校正语音信号进行解读,完成氦语音的完备解读。该方法充分利用了对潜水员在不同环境下个体语音信号特征和工作语言词库词汇信息,采用机器学习算法校正和解读氦语音,大大提高了氦语音解读的准确性。
技术解决方案
上述目的通过下述技术方案予以实现:本发明一种基于词库学习的饱和潜水氦语音解读方法,所述方法包括至少1个潜水员、1个氦语音校正网络和1个氦语音解读网络,潜水员的氦语音信号为 S,所述氦语音解读技术包括如下步骤:
第一阶段-——校正网络学习:
步骤1、词库信号构建——根据饱和潜水作业规范要求,构建潜水员饱和潜水作业常用工作语言文词库 K
步骤2、监督信号生成——在正常大气压环境下,潜水员 i朗读词库 K中的文字获得监督信号 X i ,从而生成校正网络机器学习监督信号集 X={ X i }, i=1,2,…, II为潜水员的人数;
步骤3、矢量信号生成——潜水员 i分别在饱和潜水深度 h 1h 2h 3,…, h L 对应的环境下朗读词库 K中的文字获得矢量信号 Y i , l l=1,2,…, L,从而生成校正网络机器学习的矢量信号集 Y={ Y i , l };
步骤4、校正网络学习——以矢量信号 Y i , l 作为输入信号,以监督信号 X i 作为期望输出信号,校正网络进行监督学习,形成与矢量信号 Y i , l 相对应的校正网络参数集 C={ C i , l };
第二阶段-——氦语音解读:
步骤5、校正网络参数选取——将潜水员饱和潜水作业时的工作语音 S(氦语音)与矢量信号集 Y中所有矢量信号 Y i , l 进行拟合,选择一个拟合度最高的矢量信号 Y n , l 所对应的参数 C n , l 作为校正网络的网络参数;
步骤6、氦语音校正——将氦语音信号 S作为校正网络(此时,校正网络的网络参数为 C n , l )的输入信号,对氦语音信号 S进行校正,生成校正语音信号 T
步骤7、解读网络学习——将校正语音信号 T中的语音与校正网络机器学习监督信号集 X中监督信号按文字进行逐一比较,计算他们之间的拟合度,在监督信号集 X中选取拟合度最高的文字对应的语音与校正语音信号 T文字对应的语音相匹配成组,并将这些匹配成组的语音按拟合度高低顺序排序,选取拟合度前 p%的组合,所述组合中校正语音信号 T的语音作为解读网络机器学习的矢量信号 U,所述组合中监督信号集 X的文字对应的语音作为解读网络机器学习的监督信号 V,解读网络进行监督学习;
步骤8、氦语音解读——以校正语音信号 T作为解读网络的输入信号,完成氦语音 S的解读。
本发明还具有如下特征:
1、所述步骤1中,所构建的潜水员饱和潜水作业常用工作语言文词库 K是根据氦语音解读器使用单位饱和潜水作业规范要求进行设置的,不同的使用单位,常用工作语言文词库 K会有所不同。
2、所述步骤2中,每一个潜水员都有一个监督信号,不同的潜水员,由于他们的发音不一样,其监督信号也不一样。
3、所述步骤3中,氦语音测试点深度 h 1h 2h 3,…, h L 要求均匀覆盖打捞潜水作业的预设深度,但也可以非均匀覆盖打捞潜水作业的预设深度。
4、所述步骤3中,测试点的个数由预设打捞潜水作业深度和测试点间隔决定的,测试点间隔越细,氦语音解读越完备,但生成矢量信号的时间越长,氦语音解读复杂性越高。
5、所述步骤3中,每一个潜水员在每一个测试点(不同潜水深度)都有一个对应的矢量信号。
6、所述步骤4中,所采用的学习算法可以是任何一种形式的监督学习算法,也可以是任何一种形式半监督学习算法。
7、所述步骤4中,校正网络结构是与步骤4所选用的学习算法相对应的。
8、所述步骤5中,所采用的拟合度评价指标是氦语音 S与矢量信号 Y i , l 之间的欧氏距离,但也可以是均值、方差等其它评价指标。
9、所述步骤7中,所采用的拟合度评价指标是校正语音信号 T与词库 K中的词汇之间的欧氏距离,但也可以是均值、方差等其它评价指标。
10、所述步骤7中,词组筛选比例 p与词库 K设置的大小有关,词库 K越大,潜水员工作时的通话词汇落在词库 K中的概率越大, p也越大,氦语音解读也越完备;通常,词库 K中的词汇在100个~300个之间,则 p的大小选取在85~98之间。
11、所述步骤7中,所采用的学习算法可以是任何一种形式的监督学习算法,例如K-近邻算法、决策树等,也可以是任何一种形式半监督学习算法,例如自训练算法、半监督支持向量机等。
12、所述步骤7中,校正网络结构是与步骤7所选用的学习算法相对应的。
13、所述步骤6中,如果潜水员的语音失真不是很严重,校正语音信号 T可以直接作为氦语音解读信号输出。
14、所述步骤1-步骤8中,步骤1-步骤4是潜水员在潜水舱中(潜水作业前期准备工作时)完成的,步骤5-步骤8是潜水员在深海潜水作业时完成的。
15、所述步骤2中,校正网络的监督信号可以采用文字标签,此时潜水员不需要朗读词库 K中的文字,直接采用词库 K作为监督信号 X;对应的,所述步骤6中所产生的校正语音信号 T也是文字的,所述步骤8中所产生的氦语音解读信号也是文字的。
有益效果
本发明方法在于氦语音的解读中,利用了潜水员在不同环境下个体语音信号特征和工作语言词库词汇信息以及人工智能网络的机器学习能力,从而产生以下的有益效果:
(1)通过学习潜水员工作语言词库词汇信息,减少了网络对机器学习样本数的要求,使得潜水员能够在潜水舱的潜水作业前期准备工作阶段完成校正网络的学习。
(2)通过学习潜水员在不同环境下个体语音信号特征,提高了机器学习网络的学习效率,消除了环境噪声对氦语音解读的影响,使得氦语音解读器在解读不同潜水深度的氦语音时具有自适应性。
(3)校正网络和解读网络相结合,提高了氦语音解读的准确性。
附图说明
图1是氦语音解读流程图。
本发明的实施方式
下面结合附图和具体实施例对本发明做进一步说明。
在包括潜水员、氦语音校正网络和氦语音解读网络的系统中,首先根据饱和潜水作业规范要求建立潜水员常用工作语言词库,由潜水员分别在正常大气压环境下和饱和潜水作业对应环境下朗读工作语言词库,生成校正网络机器学习的监督信号和矢量信号,校正网络采用监督学习算法对不同潜水员在不同潜水深度的氦语音进行学习得到校正网络参数集;其次,潜水员在潜水作业时将其氦语音信号与校正网络的矢量信号进行拟合,选择拟合度最高的矢量信号所对应的网络参数作为校正网络的参数,对潜水员的氦语音进行校正得到校正语音信号;然后,将校正得到的语音信号与常用工作语言词库进行拟合,并按拟合度高低进行筛选,生成解读网络机器学习的监督信号和矢量信号,解读网络采用监督学习算法对校正语音信号进行进一步学习;最后,解读网络对校正语音信号进行解读,完成氦语音的完备解读。
第一阶段-——校正网络学习。
步骤1、词库信号构建——根据饱和潜水作业规范要求,构建潜水员饱和潜水作业常用工作语言文词库 K
在本例中,根据XX打捞局的饱和潜水作业规范要求,构建了由“潜水、夹板、温度、压力”等150个词汇组成的常用工作语言文词库 K
步骤2、监督信号生成——在正常大气压环境下,潜水员 i朗读词库 K中的文字获得监督信号 X i ,从而生成校正网络机器学习监督信号集 X={ X i }, i=1,2,…, II为潜水员的人数。
在本例中,由2个潜水员分别朗读了词库 K中的文字,生成校正网络机器学习监督信号(语音信号)集 X 1 X 2(语音信号)。
步骤3、矢量信号生成——潜水员 i分别在饱和潜水深度 h 1h 2h 3,…, h L 对应的环境下朗读词库 K中的文字获得矢量信号 Y i , l l=1,2,…, L,从而生成校正网络机器学习的矢量信号集 Y={ Y i , l }。
在本例中,饱和潜水作业深度范围为200米~250米,测试点间隔为10米,2个潜水员在潜水舱饱和潜水深度200米、210米、220米、230米、240米和250米对应的环境下,分别朗读词库 K中的文字,生成校正网络机器学习矢量信号(语音信号) Y 1,1Y 1,2Y 1,3Y 1,4Y 1,5Y 1,6Y 2,1Y 2,2Y 2,3Y 2,4Y 2,5Y 2,6
步骤4、校正网络学习——以矢量信号 Y i , l 作为输入信号,以监督信号 X i 作为期望输出信号,校正网络进行监督学习,形成与矢量信号 Y i , l 相对应的校正网络参数集 C={ C i , l }。
在本例中,校正网络采用K-近邻算法进行监督学习。监督学习后,校正网络对应于不同的矢量信号 Y 1,1Y 1,2Y 1,3Y 1,4Y 1,5Y 1,6Y 2,1Y 2,2Y 3,3Y 4,4Y 5,5Y 6,6和监督信号 X 1 X 2,生成了对应的校正网络参数 C 1,1C 1,2C 1,3C 1,4C 1,5C 1,6C 2,1C 2,2C 2,3C 2,4C 2,5C 2,6;当校正网络的输入矢量信号为 Y 1,1Y 1,2Y 1,3Y 1,4Y 1,5Y 1,6时,其监督信号为 X 1;当校正网络的输入矢量信号为 Y 2,1Y 2,2Y 2,3Y 2,4Y 2,5Y 2,6时,其监督信号为 X 2
第二阶段——氦语音解读。
步骤5、校正网络参数选取——将潜水员正常饱和潜水作业时的工作语音 S(氦语音)与矢量信号集 Y中所有矢量信号 Y i , l 进行拟合,选择一个拟合度最高的矢量信号 Y n , l 所对应的网络参数 C n , l 作为校正网络的网络参数。
在本例中,潜水员1在工作,因此将潜水员1的工作语音信号-氦语音 S与所有矢量信号 Y 1,1Y 1,2Y 1,3Y 1,4Y 1,5Y 1,6Y 2,1Y 2,2Y 2,3Y 2,4Y 2,5Y 2,6分别进行拟合,选择了拟合度最高的矢量信号 Y 1,3所对应的网络参数 C 1,3作为校正网络的网络参数,拟合时采用欧氏距离作为评价指标。
步骤6、氦语音校正——将氦语音信号 S作为校正网络(此时,校正网络的网络参数为 C n , l )的输入信号,对氦语音信号 S进行校正,产生校正语音信号 T
在本例中,校正网络对氦语音信号 S进行校正时所采用的校正网络参数为 C 1,3,产生的校正语音信号为 T
步骤7、解读网络学习——将校正语音信号 T中的语音与校正网络机器学习监督信号集 X中监督信号按文字进行逐一比较,计算他们之间的拟合度,在监督信号集 X中选取拟合度最高的文字对应的语音与校正语音信号 T文字对应的语音相匹配成组,并将这些匹配成组的语音按拟合度高低顺序排序,选取拟合度前 p%的组合,所述组合中校正语音信号 T的语音作为解读网络机器学习的矢量信号 U,所述组合中监督信号集 X的文字对应的语音作为解读网络机器学习的监督信号 V,解读网络进行监督学习。
在本例中,采用欧氏距离将校正语音信号 T与监督信号集 X中监督信号按文字进行逐一比较,在监督信号集 X中选取拟合度最高的文字对应的语音与校正语音信号 T文字对应的语音相匹配成组,并将这些匹配成组的语音按拟合度高低顺序排序,选取拟合度前90%匹配组所对应的校正语音信号 T中语音信号作为解读网络机器学习的矢量信号 U,其对应的监督信号集 X中的语音信号作为解读网络机器学习的监督信号 V,解读网络进行监督学习,解读网络采用K-近邻算法进行监督学习。
步骤8、氦语音解读——以校正语音信号 T作为解读网络的输入信号,完成氦语音 S的解读。
除上述实施例外,本发明还可以有其他实施方式。凡采用等同替换或等效变换形成的技术方案,均落在本发明要求的保护范围。

Claims (10)

  1. 一种基于词库学习的饱和潜水氦语音解读方法,包括至少1个潜水员、1个氦语音校正网络和1个氦语音解读网络,潜水员的氦语音信号为 S,所述氦语音解读方法包括如下步骤:
    第一阶段——校正网络学习
    步骤1、词库信号构建——根据饱和潜水作业规范要求,构建潜水员饱和潜水作业常用工作语言词库 K
    步骤2、监督信号生成——在正常大气压环境下,潜水员 i朗读词库 K中的文字获得监督信号 X i ,从而生成校正网络机器学习监督信号集 X={ X i }, i=1,2,…, II为潜水员的人数;
    步骤3、矢量信号生成——潜水员 i分别在饱和潜水深度 h 1h 2h 3,…, h L 对应的环境下朗读词库 K中的文字获得矢量信号 Y i , l l=1,2,…, L,从而生成校正网络机器学习的矢量信号集 Y={ Y i , l };
    步骤4、校正网络学习——以矢量信号 Y i , l 作为输入信号,以监督信号 X i 作为期望输出信号,校正网络进行监督学习,形成与矢量信号 Y i , l 相对应的校正网络参数集 C={ C i , l };
    第二阶段——氦语音解读
    步骤5、校正网络参数选取——将潜水员饱和潜水作业时的氦语音信号 S与矢量信号集 Y中所有矢量信号 Y i , l 进行拟合,选择一个拟合度最高的矢量信号 Y n , l 所对应的参数 C n , l 作为校正网络的网络参数;
    步骤6、氦语音校正——将氦语音信号 S作为校正网络的输入信号,对氦语音信号 S进行校正,生成校正语音信号 T
    步骤7、解读网络学习——将校正语音信号 T中的语音与校正网络机器学习监督信号集 X中监督信号按文字进行逐一比较,计算他们之间的拟合度,在监督信号集 X中选取拟合度最高的文字对应的语音与校正语音信号 T文字对应的语音相匹配成组,并将这些匹配成组的语音按拟合度高低顺序排序,选取拟合度前 p%的组合,所述组合中校正语音信号 T的语音作为解读网络机器学习的矢量信号 U,所述组合中监督信号集 X的文字对应的语音作为解读网络机器学习的监督信号 V,解读网络进行监督学习;
    步骤8、氦语音解读——以校正语音信号 T作为解读网络的输入信号,完成氦语音 S的解读。
  2. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤5和步骤7中,所述拟合度的评价指标为欧氏距离或方差,欧氏距离越小拟合度越高,方差越小拟合度越高。
  3. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:根据氦语音解读器使用单位饱和潜水作业规范要求进行潜水员饱和潜水作业常用工作语言词库 K的设置。
  4. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:氦语音测试点深度 h 1h 2h 3,…, h L 均匀覆盖打捞潜水作业的预设深度。
  5. 根据权利要求4所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:测试点的个数根据预设打捞潜水作业深度和测试点间隔确定。
  6. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:所述步骤2中,当校正网络的监督信号采用文字标签时,直接采用词库 K作为监督信号 X;对应的,所述步骤6中所产生的校正语音信号 T也是文字,所述步骤8中所产生的氦语音解读信号为文字。
  7. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:词库 K中的词汇在100个~300个之间,则 p的大小选取在85~98之间。
  8. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤4和步骤7中采用的学习方法为K-近邻算法、决策树算法;或者自训练算法、半监督支持向量机算法。
  9. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:对潜水员的语音进行失真识别,若失真较低,则校正语音信号 T可以直接作为氦语音解读信号输出。
  10. 根据权利要求1所述的基于词库学习的饱和潜水氦语音解读方法,其特征在于:步骤1-步骤4是潜水员在潜水舱中完成的,步骤5-步骤8是潜水员在深海潜水作业时完成的。
PCT/CN2021/116054 2021-04-26 2021-09-01 基于词库学习的饱和潜水氦语音解读方法 WO2022227365A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021232744A AU2021232744B2 (en) 2021-04-26 2021-09-01 Lexicon learning-based heliumspeech unscrambling method in saturation diving

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110450616.9A CN113178207B (zh) 2021-04-26 2021-04-26 基于词库学习的饱和潜水氦语音解读方法
CN202110450616.9 2021-04-26

Publications (1)

Publication Number Publication Date
WO2022227365A1 true WO2022227365A1 (zh) 2022-11-03

Family

ID=76926012

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116054 WO2022227365A1 (zh) 2021-04-26 2021-09-01 基于词库学习的饱和潜水氦语音解读方法

Country Status (2)

Country Link
CN (1) CN113178207B (zh)
WO (1) WO2022227365A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113178207B (zh) * 2021-04-26 2021-10-08 南通大学 基于词库学习的饱和潜水氦语音解读方法
CN114120976A (zh) * 2021-11-16 2022-03-01 南通大学 基于多目标优化的饱和潜水氦语音解读方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111976924A (zh) * 2020-08-12 2020-11-24 厦门大学 一种用于潜水全面罩的实时信息交流装置
CN113178207A (zh) * 2021-04-26 2021-07-27 南通大学 基于词库学习的饱和潜水氦语音解读方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1187536A (en) * 1968-08-28 1970-04-08 Standard Telephones Cables Ltd Processor for Helium Speech
US3813687A (en) * 1972-11-29 1974-05-28 Us Navy Instant replay helium speech unscrambler using slowed tape for correction
US3863026A (en) * 1973-08-15 1975-01-28 Us Navy Helium speech decoder
US3965298A (en) * 1975-05-05 1976-06-22 Long Enterprises Deep sea diving speech converter
FR2332670A1 (fr) * 1975-11-19 1977-06-17 Zurcher Jean Transcodeur de voix en atmosphere d'helium
JPH036964A (ja) * 1989-06-02 1991-01-14 Fuosutekusu Kk 水中通話装置
JPH11327598A (ja) * 1998-05-20 1999-11-26 Oki Electric Ind Co Ltd ヘリウム音声修復装置
JP2010134260A (ja) * 2008-12-05 2010-06-17 Sanyo Electric Co Ltd 電子機器及び音声処理方法
JP5597575B2 (ja) * 2011-02-23 2014-10-01 国立大学法人 琉球大学 通信装置
NO333567B1 (no) * 2011-05-16 2013-07-08 Kongsberg Seatex As Fremgangsmate og system for maritim, hoyhastighets bredbandskommunikasjonsnettverk-oppbygging
US9564146B2 (en) * 2014-08-01 2017-02-07 Bongiovi Acoustics Llc System and method for digital signal processing in deep diving environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111976924A (zh) * 2020-08-12 2020-11-24 厦门大学 一种用于潜水全面罩的实时信息交流装置
CN113178207A (zh) * 2021-04-26 2021-07-27 南通大学 基于词库学习的饱和潜水氦语音解读方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI DONGMEI, ZHANG SHIBING, GUO LILI, CHEN YONGHONG: "Helium Speech Correction Algorithm Based on Deep Neural Networks", 2020 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), IEEE, 21 October 2020 (2020-10-21) - 23 October 2020 (2020-10-23), pages 99 - 103, XP055982422, ISBN: 978-1-7281-7236-1, DOI: 10.1109/WCSP49889.2020.9299782 *
XIA WANG, DU GUI-MING, GUANG-YAN WANG, YAN ZHANG: "Mask speech recognition based on convolutional neural network", TRANSDUCER AND MICROSYSTEM TECHNOLOGIES, ZHONGGUO DIANZHI KEJI JITUAN GONGSI DI-49 YANJIUSUO, CN, vol. 36, no. 10, 31 October 2017 (2017-10-31), CN , pages 34 - 37, XP055982419, ISSN: 2096-2436 *
ZHANG SHIBING; GUO LILI; LI HONGJUN; BAO ZHIHUA; ZHANG XIAOGE; CHEN YONGHONG: "A survey on heliumspeech communications in saturation diving", CHINA COMMUNICATIONS, CHINA INSTITUTE OF COMMUNICATIONS, PISCATAWAY, NJ, USA, vol. 17, no. 6, 1 June 2020 (2020-06-01), Piscataway, NJ, USA , pages 68 - 79, XP011795154, ISSN: 1673-5447, DOI: 10.23919/JCC.2020.06.006 *

Also Published As

Publication number Publication date
CN113178207A (zh) 2021-07-27
CN113178207B (zh) 2021-10-08

Similar Documents

Publication Publication Date Title
Jain et al. Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning.
WO2022227365A1 (zh) 基于词库学习的饱和潜水氦语音解读方法
EP0342630B1 (en) Speech recognition with speaker adaptation by learning
Xie et al. Sequence error (SE) minimization training of neural network for voice conversion.
CN110875035A (zh) 新型多任务联合的语音识别训练架构和方法
CN110459208A (zh) 一种基于知识迁移的序列到序列语音识别模型训练方法
KR102152902B1 (ko) 음성 인식 모델을 학습시키는 방법 및 상기 방법을 이용하여 학습된 음성 인식 장치
CN111243591B (zh) 一种引入外部数据校正的空中管制语音识别方法
CN112349288A (zh) 基于拼音约束联合学习的汉语语音识别方法
Wang et al. Speech augmentation using wavenet in speech recognition
CN113327595A (zh) 发音偏误检测方法、装置及存储介质
WO2023087779A1 (zh) 基于多目标优化的饱和潜水氦语音解读方法及系统
CN114944150A (zh) 一种基于双任务的Conformer陆空通话声学模型构建方法
Liang et al. Transformer-based end-to-end speech recognition with residual gaussian-based self-attention
Han et al. DiaCorrect: Error correction back-end for speaker diarization
Ashihara et al. SpeechGLUE: How well can self-supervised speech models capture linguistic knowledge?
Koriyama et al. Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis
JPH09179581A (ja) 音声認識システム
Du et al. Spectrum and prosody conversion for cross-lingual voice conversion with cyclegan
Saraclar et al. Pronunciation ambiguity vs. pronunciation variability in speech recognition
AU2021232744B2 (en) Lexicon learning-based heliumspeech unscrambling method in saturation diving
CN112242134A (zh) 语音合成方法及装置
Savchenko Phonetic encoding method in the isolated words recognition problem
CN116092471A (zh) 一种面向低资源条件下的多风格个性化藏语语音合成模型
CN111063335B (zh) 一种基于神经网络的端到端声调识别方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021232744

Country of ref document: AU

Date of ref document: 20210901

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938829

Country of ref document: EP

Kind code of ref document: A1