CN114387976B - Underwater sound voice digital communication method based on voiceprint features and semantic compression - Google Patents

Underwater sound voice digital communication method based on voiceprint features and semantic compression Download PDF

Info

Publication number
CN114387976B
CN114387976B CN202111598552.3A CN202111598552A CN114387976B CN 114387976 B CN114387976 B CN 114387976B CN 202111598552 A CN202111598552 A CN 202111598552A CN 114387976 B CN114387976 B CN 114387976B
Authority
CN
China
Prior art keywords
voice
voiceprint
semantic
compression
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111598552.3A
Other languages
Chinese (zh)
Other versions
CN114387976A (en
Inventor
申晓红
王超
赵瑞琴
陈帆
解伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202111598552.3A priority Critical patent/CN114387976B/en
Publication of CN114387976A publication Critical patent/CN114387976A/en
Application granted granted Critical
Publication of CN114387976B publication Critical patent/CN114387976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a underwater sound voice digital communication method based on voiceprint features and semantic compression. Secondly, by compressing the input voice, the data volume required to be transmitted is reduced, so that the transmission energy consumption and the transmission time can be effectively reduced. Finally, by matching the user identity with the compressed code at the receiving end, the safety performance of the received voice is effectively improved. While reducing the amount of data transmission, transmission of voice features is ensured, thereby realizing efficient underwater acoustic voice communication. The data transmission amount is reduced, and the effective transmission of the voice characteristics of the transmitting end is ensured.

Description

Underwater sound voice digital communication method based on voiceprint features and semantic compression
Technical Field
The invention relates to the technical field of underwater sound voice communication, in particular to a high-efficiency underwater sound voice communication method, and in particular relates to methods of semantic recognition, voiceprint feature recognition, data compression and the like.
Background
With the increasing development and utilization of the ocean, research on underwater wireless communication has been increasingly paid attention. Since acoustic waves have good propagation performance compared with other energy radiation forms of transmitting information, such as electromagnetic waves, in an underwater environment, underwater acoustic communication is still the most effective means of transmitting underwater information. Among them, underwater acoustic communication has become a research hotspot in underwater acoustic communication because it has a very important role in the fields of frogman fight, underwater operation, marine scientific investigation, and the like.
The underwater acoustic communication system can be classified into an analog underwater acoustic communication system and an underwater acoustic digital communication system according to the transmission of analog signals or digital signals. In the early development of underwater acoustic communication technology, people mostly adopt an analog underwater acoustic communication mode because of simpler analog communication technology. In recent years, with the rapid development of the deep water communication technology, the digital communication technology has become the mainstream technology in the contemporary underwater sound voice communication field due to the advantages of strong anti-interference capability, easy realization of signal error detection and correction, convenient establishment of comprehensive communication network and equipment and integration, and the like.
However, research on underwater acoustic voice digital communication technology is mainly focused on overcoming the influence of bandwidth limitation, complex marine environment noise, multipath effect and the like existing in underwater voice communication through a modulation mode, and the problems of large transmission data volume, prolonged propagation and the like of voice communication are caused. In order to solve the problem, a method for performing semantic compression on voice is proposed, and the amount of data to be transmitted is reduced by establishing a mapping relation between voice and semantic codes and transmitting the semantic codes. However, the voice broadcasted by such a method only contains semantic information, and the characteristics of the voice of the transmitting end are not considered, which results in inaccurate judgment of the voice information broadcasted by the receiving end.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the underwater sound voice digital communication method based on voiceprint characteristics and semantic compression, which reduces the data transmission quantity and ensures the effective transmission of the voice characteristics of a transmitting end.
The technical scheme adopted by the invention for solving the technical problems comprises the following detailed steps:
Step 1: the voiceprint of the user is learned and modeled through the equipment, and different voiceprint identities ID v epsilon {1, & I } are allocated to different users according to the voiceprint, so that different voiceprint feature models corresponding to different users one by one are obtained, and the equipment can identify which known user a certain section of voice comes from or does not belong to;
Step 2: inputting voice K into a voice content library predefined by different users according to requirements, wherein the voice content capacity of the voice content library is K, K is { 1.. The equipment extracts semantic features m ki and voiceprint features v ki to complete feature matching of the semantic features and the voiceprint features, so that the input voice and each user establish a matching relationship and record the matching relationship into a semantic-voiceprint library L;
Step 3: the method comprises the steps of establishing a matching relation between semantic features and voiceprint features of input voice, extracting the speech speed features s j, J epsilon { 1.,. The first place, J }, and establishing a speech speed model; and performing pattern fitting on the semantic-voiceprint library and the speech speed characteristics to obtain fitting characteristics y=f (m ki,vki,sj) corresponding to the speech, I e { 1..the I }, K e { 1..the K }, J e { 1..the J };
Step 4: after the mode fitting is completed, establishing a compression mapping relation of the voice;
When a voice is input, if the voice belongs to a semantic-voiceprint library, extracting the speech speed characteristics of the voice, allocating a compression code N y for the voice by combining the semantic-voiceprint, allocating a unique compression code N y for the fitting characteristic y, and recording the voice and the corresponding compression code into the compression code library so as to establish a complete compression mapping relation among each user, each voice content and the speech speed; otherwise, discarding the input voice and waiting for new voice input;
Step 5: after the compression mapping relation is established, when the input end inputs voice, firstly judging whether the voice belongs to a semantic-voiceprint library L: if the voice belongs to the semantic-voiceprint library L, extracting the semantic, voiceprint and speech speed of the voice to obtain a user identity ID v, and compressing the voice to obtain a compressed code; otherwise, discarding the input voice and waiting for new voice input;
Step 6: after the voice compression is completed, packing the compressed code into a data packet p, wherein the data packet p is formed by a transmitting end identity ID t epsilon { 1..the I }, a receiving end identity ID r epsilon { 1..the I }, a user identity ID v and a compressed code corresponding to the voice, and transmitting the data packet p to the receiving end; the sender identity ID t has a different meaning than the user identity ID v, the sender identity ID t characterizes the ID number of the user in the communication network, and the user identity ID v characterizes the voiceprint ID number of the user;
Step 7: after receiving the data packet, the receiving end first determines whether the user ID v in the packet header matches with the voiceprint information corresponding to N y in the compressed code library: if the voice data is matched with the voice data, decompressing the received data to obtain the voice semantics, voiceprint and speech speed information corresponding to the compressed code, and broadcasting the voice; otherwise, the data packet is regarded as the voice characteristic mismatch, and the data packet is discarded.
The underwater sound voice digital communication method based on the voiceprint features and the semantic compression has the advantages that the problems that the voice content broadcast only contains semantic information and cannot embody voice features in the existing method can be effectively solved by utilizing the methods of semantic feature extraction, voiceprint feature extraction, speech speed feature extraction, data compression and the like. Firstly, a semantic-voiceprint library is established, and pattern fitting is carried out on the semantic-voiceprint library and speech speed characteristics, so that multi-dimensional characteristic extraction of input speech is realized, and input speech can be better collected and restored. Secondly, by compressing the input voice, the data volume required to be transmitted is reduced, so that the transmission energy consumption and the transmission time can be effectively reduced. Finally, by matching the user identity with the compressed code at the receiving end, the safety performance of the received voice is effectively improved. So that the invention can ensure that the high-efficiency underwater sound communication is realized.
Drawings
Fig. 1 is a general flow chart of the voice broadcast of the present invention.
FIG. 2 is a flow chart of the semantic-voiceprint library creation of the present invention.
Fig. 3 is a flow chart of the compression process of the present invention.
Fig. 4 is a flow chart of the sender-side speech compression of the present invention.
Fig. 5 is a data packet format of the present invention.
Fig. 6 is a flow chart of the voice broadcast at the receiving end of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
Aiming at the problem that the voice broadcasting content in the existing underwater sound semantic compression voice digital communication only contains semantic information and cannot embody voice characteristics, the invention ensures the transmission of the voice characteristics while reducing the data transmission quantity by considering the methods of semantic characteristic extraction, voiceprint characteristic extraction, speech speed characteristic extraction, data compression and the like, thereby realizing high-efficiency underwater sound voice communication.
In consideration of the advantages that the underwater digital communication has strong anti-interference capability compared with the analog communication, is easy to realize signal error detection and correction, is convenient to establish a comprehensive communication network, equipment and integration, and the like, the invention adopts the digital voice communication technology. In order to solve the problem that the voice broadcasting content only contains semantic information but cannot embody voice characteristics in the existing voice digital communication with underwater acoustic semantic compression, the underwater acoustic voice digital communication method based on voiceprint characteristics and semantic compression is provided. The invention can ensure the realization of high-efficiency underwater sound communication by utilizing methods such as semantic feature extraction, voiceprint feature extraction, speech speed feature extraction, data compression and the like.
The invention is further described below by taking 2 users 1 and 2 as examples to carry out underwater voice communication, and the corresponding underwater voice communication flow is shown in fig. 1.
The technical scheme mainly comprises three parts: the method comprises the following steps of establishing a semantic-voiceprint library, establishing compression mapping and transmitting voice data:
Step 1: the device learns and models voiceprints of 2 users through an offline learning method, and then respectively distributes different voiceprint identities ID v =1 and ID v =2 for the 2 users according to different voiceprints. Thereby enabling the device to identify from which known user a piece of speech came or did not belong to.
Step 2: user 1 and user 2 are caused to input voices K, K e 1, K, here, the voice content library capacity is set to k=50, and all voices in the voice content library are input by each user for the equipment to learn. The device extracts the semantic features m k1、mk2 and voiceprint features v k1、vk2, k epsilon { 1..once, 50} of the user A, B for the voice, and completes feature matching of the semantic features and the voiceprint features, so that the input voice and each user establish a matching relationship, and the matching relationship is recorded into a semantic-voiceprint library L. Semantic-voiceprint library creation a flow chart is shown in figure 2,
Step 3: and when the matching relation between the semantic features and the voiceprint features is established for the input voice, extracting the speech speed features s j, J epsilon { 1.,. The first place, J }, and establishing a speech speed model. Here, the highest level of the divided speech rate is j=10. Then carrying out mode fitting on the semantic-voiceprint library and the speech speed characteristics, wherein each user obtains fitting characteristics corresponding to the speech, namely y=f (m ki,vki,sj), i e {1,2}, k e { 1..50 }, j e { 1..10 }.
Step 4: after the pattern fitting is completed, a compression mapping relation of the voice is required to be established. When a voice is input, if the voice belongs to a semantic-voiceprint library, extracting the speech speed characteristics of the voice, distributing a compression code N y for the voice by combining the semantic-voiceprint, and recording the compression code N y into the compression code library; otherwise, discarding the input voice and waiting for new voice input. Thus, the complete compression mapping relation of each user, each voice content and each voice speed is established. The compression process flow diagram is shown in fig. 3. Step 5: after the compression mapping relation is established, when the input end inputs voice, firstly judging whether the voice belongs to a semantic-voiceprint library L: if the user belongs to the code, extracting the semantics, voiceprint and speech speed of the code to obtain a user identity ID v, and compressing the voice to obtain a compressed code; otherwise, discarding the input voice and waiting for new voice input. The sender speech compression flow chart is shown in fig. 4. If the 1 st piece of content of the voice content library is input by the user 1 and the speech speed is 3, the compressed code N y,y=f(m1A,v1A,s3 corresponding to the voice can be obtained by the steps 1-4.
Step 6: after the voice compression is completed for the user 1, the data is packed into a data packet p, and the data packet header is provided with a transmitting end ID t =1, a receiving end ID r =2, a user identity IDv =1 and a compression code N y,y=f(m1A,v1A,s3 corresponding to the voice, and the data is sent to the user 2. The packet format is shown in fig. 5.
Step 7: when the user 2 receives the data packet, it is first determined whether the user ID v =1 in the packet header matches with the voiceprint information corresponding to N y in the compressed code library: when the data packet is actually sent from the user 1, the data packet is regarded as information matching, the user 2 decompresses the received data to obtain the semantic, voiceprint and speech speed information of the voice corresponding to the compressed code, and further the voice broadcasting is carried out; otherwise, when the data packet is forged by other users, the other users do not know the compressed codes of different users in the compressed code library, so that the user identity is difficult to be completely matched with the voiceprint information, and the device discards the data packet. This improves the security of voice communications. The receiving end voice broadcasting flow chart is shown in fig. 6.
According to the invention, the semantic-voiceprint library is established, and the mode fitting is carried out with the speech speed characteristics, so that the multi-dimensional characteristic extraction of the input speech is realized, and the input speech can be better collected and restored. And the data volume required to be transmitted is reduced by compressing the input voice, so that the transmission energy consumption and the transmission time length can be effectively reduced. In addition, by matching the user identity with the compressed code at the receiving end, the safety performance of the received voice is effectively improved. So that the invention can ensure that the high-efficiency underwater sound communication is realized.
The above examples are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims of the present invention without departing from the spirit of the present invention.

Claims (1)

1. A underwater sound voice digital communication method based on voiceprint characteristics and semantic compression is characterized by comprising the following steps:
Step 1: the voiceprint of the user is learned and modeled through the equipment, and different voiceprint identities ID v epsilon {1, & I } are allocated to different users according to the voiceprint, so that different voiceprint feature models corresponding to different users one by one are obtained, and the equipment can identify which known user a certain section of voice comes from or does not belong to;
Step 2: inputting voice K into a voice content library predefined by different users according to requirements, wherein the voice content capacity of the voice content library is K, K is { 1.. The equipment extracts semantic features m ki and voiceprint features v ki to complete feature matching of the semantic features and the voiceprint features, so that the input voice and each user establish a matching relationship and record the matching relationship into a semantic-voiceprint library L;
Step 3: the method comprises the steps of establishing a matching relation between semantic features and voiceprint features of input voice, extracting the speech speed features s j, J epsilon { 1.,. The first place, J }, and establishing a speech speed model; and performing pattern fitting on the semantic-voiceprint library and the speech speed characteristics to obtain fitting characteristics y=f (m ki,vki,sj) corresponding to the speech, I e { 1..the I }, K e { 1..the K }, J e { 1..the J };
Step 4: after the mode fitting is completed, establishing a compression mapping relation of the voice;
When a voice is input, if the voice belongs to a semantic-voiceprint library, extracting the speech speed characteristics of the voice, allocating a compression code N y for the voice by combining the semantic-voiceprint, allocating a unique compression code N y for the fitting characteristic y, and recording the voice and the corresponding compression code into the compression code library so as to establish a complete compression mapping relation among each user, each voice content and the speech speed; otherwise, discarding the input voice and waiting for new voice input;
Step 5: after the compression mapping relation is established, when the input end inputs voice, firstly judging whether the voice belongs to a semantic-voiceprint library L: if the voice belongs to the semantic-voiceprint library L, extracting the semantic, voiceprint and speech speed of the voice to obtain a user identity ID v, and compressing the voice to obtain a compressed code; otherwise, discarding the input voice and waiting for new voice input;
Step 6: after the voice compression is completed, packing the compressed code into a data packet p, wherein the data packet p is formed by a transmitting end identity ID t epsilon { 1..the I }, a receiving end identity ID r epsilon { 1..the I }, a user identity ID v and a compressed code corresponding to the voice, and transmitting the data packet p to the receiving end; the sender identity ID t has a different meaning than the user identity ID v, the sender identity ID t characterizes the ID number of the user in the communication network, and the user identity ID v characterizes the voiceprint ID number of the user;
Step 7: after receiving the data packet, the receiving end first determines whether the user identity IDv in the packet header matches with the voiceprint information corresponding to Ny in the compressed code library: if the voice data is matched with the voice data, decompressing the received data to obtain the voice semantics, voiceprint and speech speed information corresponding to the compressed code, and broadcasting the voice; otherwise, the data packet is regarded as the voice characteristic mismatch, and the data packet is discarded.
CN202111598552.3A 2021-12-24 2021-12-24 Underwater sound voice digital communication method based on voiceprint features and semantic compression Active CN114387976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111598552.3A CN114387976B (en) 2021-12-24 2021-12-24 Underwater sound voice digital communication method based on voiceprint features and semantic compression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111598552.3A CN114387976B (en) 2021-12-24 2021-12-24 Underwater sound voice digital communication method based on voiceprint features and semantic compression

Publications (2)

Publication Number Publication Date
CN114387976A CN114387976A (en) 2022-04-22
CN114387976B true CN114387976B (en) 2024-05-14

Family

ID=81198523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111598552.3A Active CN114387976B (en) 2021-12-24 2021-12-24 Underwater sound voice digital communication method based on voiceprint features and semantic compression

Country Status (1)

Country Link
CN (1) CN114387976B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825857A (en) * 2016-03-11 2016-08-03 无锡吾芯互联科技有限公司 Voiceprint-recognition-based method for assisting deaf patient in determining sound type

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782564B (en) * 2016-11-18 2018-09-11 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825857A (en) * 2016-03-11 2016-08-03 无锡吾芯互联科技有限公司 Voiceprint-recognition-based method for assisting deaf patient in determining sound type

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高速自适应水声语音系统的设计与实现;党华;仲顺安;陈越洋;;北京理工大学学报;20090415(04);全文 *

Also Published As

Publication number Publication date
CN114387976A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN102111314B (en) Smart home voice control system and method based on Bluetooth transmission
CN101304391A (en) Voice call method and system based on instant communication system
CN103402171B (en) Method and the terminal of background music is shared in call
CN105846909B (en) Acoustic signals sending and receiving methods, device and its system
CN104917671A (en) Mobile terminal based audio processing method and device
CN103456305A (en) Terminal and speech processing method based on multiple sound collecting units
CN107343113A (en) Audio communication method and device
CN102592591A (en) Dual-band speech encoding
CN104766608A (en) Voice control method and voice control device
CN103714823A (en) Integrated speech coding-based adaptive underwater communication method
CN104410973A (en) Recognition method and system for tape played phone fraud
CN113612808B (en) Audio processing method, related device, storage medium, and program product
CN105790854A (en) Short distance data transmission method and device based on sound waves
WO2020237886A1 (en) Voice and text conversion transmission method and system, and computer device and storage medium
CN109451329A (en) Mixed audio processing method and device
CN111107284B (en) Real-time generation system and generation method for video subtitles
CN113395116A (en) Underwater sound voice digital transmission method based on semantic compression
CN114387976B (en) Underwater sound voice digital communication method based on voiceprint features and semantic compression
CN103474075B (en) Voice signal sending method and system, method of reseptance and system
CN101753657A (en) Method and device for reducing call noise
CN103474067B (en) speech signal transmission method and system
CN106683682A (en) Method for improving speech transmission efficiency
CN101742006B (en) Embedded Linux-based voice chat client and implementation method thereof
CN104202321A (en) Method and device for voice recording
CN209930503U (en) Disconnect-type intelligence audio amplifier system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant