CN112885345A - Special garment voice interaction system and method - Google Patents

Special garment voice interaction system and method Download PDF

Info

Publication number
CN112885345A
CN112885345A CN202110040219.4A CN202110040219A CN112885345A CN 112885345 A CN112885345 A CN 112885345A CN 202110040219 A CN202110040219 A CN 202110040219A CN 112885345 A CN112885345 A CN 112885345A
Authority
CN
China
Prior art keywords
voice
sound source
external sound
signal
source signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110040219.4A
Other languages
Chinese (zh)
Inventor
马翼平
王阳
于泽
王诗怡
许召辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avic East China Photoelectric Shanghai Co ltd
Original Assignee
Avic East China Photoelectric Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avic East China Photoelectric Shanghai Co ltd filed Critical Avic East China Photoelectric Shanghai Co ltd
Priority to CN202110040219.4A priority Critical patent/CN112885345A/en
Publication of CN112885345A publication Critical patent/CN112885345A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64GCOSMONAUTICS; VEHICLES OR EQUIPMENT THEREFOR
    • B64G6/00Space suits
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/323Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/02Spatial or constructional arrangements of loudspeakers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • G10L2015/0633Creating reference templates; Clustering using lexical or orthographic knowledge sources
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a voice interaction system for special clothes, which comprises: the microphone array is arranged in the special garment and used for acquiring voice signals; the voice recognition module is used for recognizing the voice signal and outputting a recognition result; the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal; the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal; the voice synthesis module is used for receiving the control signal and outputting synthesized voice; the stereo broadcasting device is arranged in the special clothes and is used for receiving and broadcasting the first warning voice, the second warning voice or the synthesized voice. The system can effectively restrain noise generated in a special environment of the extravehicular suit and accurately identify the voice command sent by the operating personnel.

Description

Special garment voice interaction system and method
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice interaction system and method for special clothes.
Background
The special clothes are protective clothes designed for special operations. In order to achieve the protection purpose, the operator cannot flexibly move the joint part to operate and control the equipment when wearing the clothing, the visual field range can be narrowed, the surrounding objects cannot be efficiently explored, and especially in some special environments, the special clothing is fully closed, so that the limitation degree is increased. Taking the extravehicular space suit as an example, the extravehicular space suit (the extravehicular space suit for short) mainly has the function of supporting and completing a large number of simple to complex various extravehicular activity tasks such as extravehicular scientific experiments, payload maintenance, space station assembly and maintenance and the like.
The intelligent voice interaction technology is combined with the cabin outer suit system, so that the man-machine effect of the cabin outer suit is improved, and the development trend is at present. However, the man-machine interaction scene of the extravehicular suit is greatly different from the traditional man-machine interaction scene, and mainly comprises:
1. the environment for sound transmission is special, the outside-cabin clothes are closed cavities formed by flexible materials and aluminum materials, the air pressure and the humidity in the cavities are obviously different from the surface environment, and the sound can be reflected and absorbed to different degrees when being transmitted in the cavities.
2. The noise is more, and the noise that equipment such as fan, pump that the outdoor clothing carried produced can produce the serious influence to the quality of communication voice.
3. Need the sound source location, when carrying out cabin outer work, the staff needs fix a position the sound source, promotes work efficiency.
Therefore, in view of the above problems, it is necessary to propose a further solution to solve at least one of the problems.
Disclosure of Invention
The invention aims to provide a voice interaction system and method for special clothes, which aim to overcome the defects in the prior art.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a special garment voice interaction system is characterized by comprising:
the microphone array is arranged in the special garment and used for acquiring voice signals;
the voice recognition module is used for recognizing the voice signal and outputting a recognition result;
the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal;
the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal;
the voice synthesis module is used for receiving the control signal and outputting synthesized voice;
and the stereo broadcasting device is arranged in the special clothing and is used for receiving and playing the first warning voice, the second warning voice or the synthesized voice.
In a preferred embodiment of the present invention, the speech recognition module comprises:
the acoustic model training unit is used for establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology;
the signal processing unit is used for extracting voice characteristic parameters from the received voice signals;
and the voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
In a preferred embodiment of the present invention, the speech synthesis module comprises:
the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training;
the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters;
and the voice synthesis unit is used for generating synthesized voice according to the HTS + Stright algorithm and the prosody parameters.
In a preferred embodiment of the present invention, the voice alarm module includes:
a sound source acquisition unit for acquiring an external sound source signal;
an azimuth selecting unit for acquiring azimuth information of the external sound source according to the external sound source signal;
a distance detection unit for acquiring distance information of the external sound source according to the external sound source signal;
the convolution calculation unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal;
and the voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
The invention also provides a special garment voice interaction method, which comprises the following steps:
s1, acquiring a voice signal;
s2 recognizing the voice signal and outputting a recognition result, the recognition result including at least text information;
s3, acquiring an external sound source signal and positioning information of the external sound source signal;
s4 outputting a control signal according to the recognition result to control the operation of the device, and outputting a first warning voice according to the external sound source signal or outputting a second warning voice according to the external sound source signal and the location information of the external sound source signal;
s5 receiving the control signal and outputting a synthesized voice;
s6 plays the first warning voice, the second warning voice or the synthesized voice.
In a preferred embodiment of the present invention, the recognizing the voice signal and outputting the recognition result in step S2 includes:
and establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology, and matching the voice signal with the voice model to output a recognition result.
In a preferred embodiment of the present invention, the step S3 of acquiring the external sound source signal and the positioning information of the external sound source signal includes:
HRTF technology is adopted, based on the space positioning capability of an auditory system, the sound source is positioned in two aspects of direction and distance, and segmented processing is adopted in the convolution calculation stage.
In a preferred embodiment of the present invention, the step S5 of receiving the control signal and outputting a synthesized voice includes:
and generating the synthesized voice according to the HTS + Stright algorithm and the prosodic parameters of the voice signal.
In a preferred embodiment of the present invention, the step S1 of acquiring the voice signal includes:
and acquiring voice signals of personnel in the special clothes by adopting a microphone array.
In a preferred embodiment of the present invention, the step S6 playing the first warning voice, the second warning voice or the synthesized voice includes:
and playing the first warning voice, the second warning voice or the synthesized voice by adopting a stereo broadcasting device arranged in the special clothes.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention is suitable for voice recognition in special environment, and the recognition rate of the fan or the air conditioning valve in the special clothes can reach more than 98 percent no matter whether the fan or the air conditioning valve is opened or not by adopting the microphone array to effectively inhibit the environmental noise.
(2) The invention can sense the azimuth through voice, is particularly suitable for space environment, and reflects the azimuth sense of the target sound source under the complex motion condition that a plurality of aircrafts simultaneously revolve and rotate or when astronauts carry out extravehicular operation, so that personnel in the suit can subjectively sense the relative position of the target or the operating personnel, and the working efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of a speech recognition module according to the present invention;
FIG. 3 is a schematic diagram of a speech synthesis module according to the present invention;
FIG. 4 is a diagram of a voice alarm module according to the present invention.
Specifically, 100, a microphone array; 200. a voice recognition module; 300. a voice alarm module; 400. a control module; 500. a speech synthesis module; 600. a stereophonic broadcasting device.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, should not be taken as limiting the scope of the present invention. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
As shown in fig. 1, a special garment voice interaction system includes a microphone array 100, a voice recognition module 200, a voice alarm module 300, a control module 400, a voice synthesis module 500, and a stereo broadcasting device 600.
Specifically, the microphone array 100 is disposed in a special garment and is used to acquire a voice signal. The speech recognition module 200 is used for recognizing the speech signal and outputting a recognition result. The voice warning module 300 is used for acquiring an external sound source signal and positioning information of the external sound source signal. The control module 400 is configured to output a control signal according to the recognition result to control the operation of the device, and output a first warning voice according to the external sound source signal, or output a second warning voice according to the external sound source signal and the positioning information of the external sound source signal. The speech synthesis module 500 is configured to receive the control signal and output a synthesized speech. The stereo broadcasting device 600 is disposed in the special garment and is configured to receive and play the first warning voice, the second warning voice or the synthesized voice.
As shown in fig. 1, the communication voice a2 of the far-end voice communication is sent by the control module 400 to the speech synthesis module 500 for communication voice broadcast a 3. The microphone array 100 picks up the voice b1, the voice b1 is subjected to noise reduction and recognition by the voice recognition module 200, the voice b21 after noise reduction is transmitted to the control module 400 to be sent out, and the recognition result is reported to b22 to the control module 400. The command for recognizing the opening can be realized by a voice wake-up mode, and also can be realized by a mode of recognizing the command for issuing b 01. The recognition result may or may not be sent to the speech synthesis module 500 for broadcasting. The control module 400 issues a voice synthesis instruction to c2 to the voice synthesis module 500, performs synthetic voice broadcasting c3 after synthesis, and feeds back whether the synthesis is normal to the upper layer application. The control module 400 issues the warning instruction to d21 or sends the audio d22 containing the azimuth information to the voice warning module 300, that is, the sound source acquisition unit in the voice warning module 300 first sends the acquired external sound source signal to the control module 400, the control module 400 determines to output the first warning voice or the second warning voice, and sends the determination result and the external sound source signal to the voice warning module 300, and the voice warning module 300 plays the fixed azimuth audio (the first warning voice) according to the instruction or synthesizes and broadcasts the audio (the second warning voice) with the azimuth sense in real time after analyzing the azimuth data and the audio.
The system adopts the microphone array 100 and the in-garment stereo broadcasting device 600, realizes the non-contact between the pickup and broadcasting device and personnel, and avoids discomfort caused by collision, friction, sultry and the like.
The system combines the microphone array 100 technology, the voice recognition technology, the voice synthesis technology and the virtual sound 3D alarm technology, can effectively inhibit noise generated in special environment of the extravehicular suit, accurately identifies voice instructions sent by operators, can enable the system to synthesize voice and have direction information, avoids frequent joint movement of the operators, reduces workload, and greatly improves man-machine cooperation efficiency. Specifically, the system takes the microphone array 100 as input, converts the input into an instruction through the voice recognition module 200, reports the instruction to the control module 400, the control module 400 receives the recognized control instruction, and can operate the equipment inside the cabin outer suit, such as modulation and display, query and the like, and meanwhile, the control module 400 issues an alarm and synthesis instruction, and the three modes of voice synthesis, ordinary alarm and 3D voice alarm can be selected to be broadcasted through the stereo broadcasting device 600 in different forms.
As shown in fig. 2, the speech recognition module 200 includes an acoustic model training unit, a signal processing unit, and a speech recognition unit. Specifically, the acoustic model training unit is used for establishing a speech model according to a corpus and a combined modeling based on a deep neural network and an HMM, and combined with a discriminative training technology. The signal processing unit is used for extracting voice characteristic parameters from the received voice signals. The voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
As shown in fig. 3, the speech synthesis module 500 includes a modeling unit, a text analysis unit, and a speech synthesis unit. Specifically, the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training. And the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters. The speech synthesis unit is used for generating synthesized speech according to the HTS + Stright algorithm and the prosodic parameters.
As shown in fig. 4, the voice alarm module 300 includes a sound source acquiring unit, an azimuth selecting unit, a distance detecting unit, a convolution calculating unit, and a voice alarm generating unit. Specifically, the sound source acquisition unit is used to acquire an external sound source signal, i.e., a target. The azimuth selection unit is used for acquiring azimuth information of the external sound source according to the external sound source signal. The distance detection unit is used for acquiring distance information of the external sound source according to the external sound source signal. The convolution calculating unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal. The voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
A special garment voice interaction method comprises the following steps:
s1 acquires a voice signal.
Preferably, a microphone array 100 is used to acquire the speech signal of the person inside the specialty garment.
S2 recognizes the speech signal and outputs a recognition result, the recognition result including at least text information.
Preferably, a speech model is established according to the corpus and the joint modeling based on the deep neural network and the HMM, and a discriminative training technology is combined, and the speech signal is matched with the speech model to output a recognition result. The corpus is a large amount of general corpora and part of simulation scene corpora collected according to the using environment. Adaptation to a specific acoustic environment on the basis of ensuring coverage of (secondary) linguistic phenomena in a broad-spectrum sense is achieved by the speech recognition module 200.
S3 acquires the external sound source signal and the localization information of the external sound source signal.
Preferably, by adopting the HRTF technology, the sound source is positioned in two aspects of direction and distance based on the space positioning capability of the auditory system, and the sound field auditory system can realize positioning in the case of a single sound source and in the case of multiple sound sources. In the convolution calculation stage, segmentation processing is adopted, so that the calculation amount is reduced.
S4 outputs a control signal for controlling the operation of the device according to the recognition result, and outputs a first warning voice according to the external sound source signal or outputs a second warning voice according to the external sound source signal and the location information of the external sound source signal.
S5 receives the control signal and outputs a synthesized speech.
Preferably, the synthesized speech is generated from the HTS + Stright algorithm and prosodic parameters of the speech signal. The HTS + Stright algorithm is suitable for efficient parametric synthesis in low power scenarios. S6 plays the first warning voice, the second warning voice or the synthesized voice.
Preferably, the first warning voice, the second warning voice or the synthesized voice is played by using the stereo broadcasting device 600 arranged in the special clothes.
To sum up, the system combines the microphone array technology, the voice recognition technology, the voice synthesis technology and the virtual sound 3D warning technology, can effectively restrain noise generated in the special environment of the cabin outer suit, accurately recognizes voice instructions sent by operators, meanwhile, enables the system to synthesize voice and have direction information, avoids frequent joint movement of the operators, reduces workload, and greatly improves human-computer cooperation efficiency.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (10)

1. A special garment voice interaction system is characterized by comprising:
the microphone array is arranged in the special garment and used for acquiring voice signals;
the voice recognition module is used for recognizing the voice signal and outputting a recognition result;
the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal;
the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal;
the voice synthesis module is used for receiving the control signal and outputting synthesized voice;
and the stereo broadcasting device is arranged in the special clothing and is used for receiving and playing the first warning voice, the second warning voice or the synthesized voice.
2. The special garment voice interaction system of claim 1, wherein the voice recognition module comprises:
the acoustic model training unit is used for establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology;
the signal processing unit is used for extracting voice characteristic parameters from the received voice signals;
and the voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
3. The special garment voice interaction system of claim 1, wherein the voice synthesis module comprises:
the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training;
the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters;
and the voice synthesis unit is used for generating synthesized voice according to the HTS + Stright algorithm and the prosody parameters.
4. The special garment voice interaction system as claimed in claim 1, wherein the voice alarm module comprises:
a sound source acquisition unit for acquiring an external sound source signal;
an azimuth selecting unit for acquiring azimuth information of the external sound source according to the external sound source signal;
a distance detection unit for acquiring distance information of the external sound source according to the external sound source signal;
the convolution calculation unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal;
and the voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
5. A special garment voice interaction method is characterized by comprising the following steps:
s1, acquiring a voice signal;
s2 recognizing the voice signal and outputting a recognition result, the recognition result including at least text information;
s3, acquiring an external sound source signal and positioning information of the external sound source signal;
s4 outputting a control signal according to the recognition result to control the operation of the device, and outputting a first warning voice according to the external sound source signal or outputting a second warning voice according to the external sound source signal and the location information of the external sound source signal;
s5 receiving the control signal and outputting a synthesized voice;
s6 plays the first warning voice, the second warning voice or the synthesized voice.
6. The special garment voice interaction method according to claim 5, wherein the step S2 of recognizing the voice signal and outputting the recognition result comprises:
and establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology, and matching the voice signal with the voice model to output a recognition result.
7. The special garment voice interaction method according to claim 5, wherein the step S3 of acquiring the external sound source signal and the positioning information of the external sound source signal comprises:
HRTF technology is adopted, based on the space positioning capability of an auditory system, the sound source is positioned in two aspects of direction and distance, and segmented processing is adopted in the convolution calculation stage.
8. The special garment voice interaction method according to claim 5, wherein the step S5 is receiving the control signal and outputting synthesized voice, and comprises:
and generating the synthesized voice according to the HTS + Stright algorithm and the prosodic parameters of the voice signal.
9. The special garment voice interaction method according to claim 5, wherein the step S1 of acquiring a voice signal comprises:
and acquiring voice signals of personnel in the special clothes by adopting a microphone array.
10. The special garment voice interaction method according to claim 5, wherein the step S6 playing the first warning voice, the second warning voice or the synthesized voice comprises:
and playing the first warning voice, the second warning voice or the synthesized voice by adopting a stereo broadcasting device arranged in the special clothes.
CN202110040219.4A 2021-01-13 2021-01-13 Special garment voice interaction system and method Pending CN112885345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110040219.4A CN112885345A (en) 2021-01-13 2021-01-13 Special garment voice interaction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110040219.4A CN112885345A (en) 2021-01-13 2021-01-13 Special garment voice interaction system and method

Publications (1)

Publication Number Publication Date
CN112885345A true CN112885345A (en) 2021-06-01

Family

ID=76045158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110040219.4A Pending CN112885345A (en) 2021-01-13 2021-01-13 Special garment voice interaction system and method

Country Status (1)

Country Link
CN (1) CN112885345A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015076797A (en) * 2013-10-10 2015-04-20 富士通株式会社 Spatial information presentation device, spatial information presentation method, and spatial information presentation computer
US20160001193A1 (en) * 2014-07-01 2016-01-07 Disney Enterprises, Inc. Full-duplex, wireless control system for interactive costumed characters
WO2016033269A1 (en) * 2014-08-28 2016-03-03 Analog Devices, Inc. Audio processing using an intelligent microphone
CN106128478A (en) * 2016-06-28 2016-11-16 北京小米移动软件有限公司 Voice broadcast method and device
CN206079071U (en) * 2016-10-17 2017-04-12 福州领头虎软件有限公司 Intelligent clothing
US20170303052A1 (en) * 2016-04-18 2017-10-19 Olive Devices LLC Wearable auditory feedback device
CN207054840U (en) * 2017-07-06 2018-03-02 劲霸男装(上海)有限公司 Intelligent clothing
CN107925816A (en) * 2015-10-30 2018-04-17 谷歌有限责任公司 Method and apparatus for re-creating direction prompting in the audio of beam forming
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015076797A (en) * 2013-10-10 2015-04-20 富士通株式会社 Spatial information presentation device, spatial information presentation method, and spatial information presentation computer
US20160001193A1 (en) * 2014-07-01 2016-01-07 Disney Enterprises, Inc. Full-duplex, wireless control system for interactive costumed characters
WO2016033269A1 (en) * 2014-08-28 2016-03-03 Analog Devices, Inc. Audio processing using an intelligent microphone
CN107925816A (en) * 2015-10-30 2018-04-17 谷歌有限责任公司 Method and apparatus for re-creating direction prompting in the audio of beam forming
US20170303052A1 (en) * 2016-04-18 2017-10-19 Olive Devices LLC Wearable auditory feedback device
CN106128478A (en) * 2016-06-28 2016-11-16 北京小米移动软件有限公司 Voice broadcast method and device
CN206079071U (en) * 2016-10-17 2017-04-12 福州领头虎软件有限公司 Intelligent clothing
CN207054840U (en) * 2017-07-06 2018-03-02 劲霸男装(上海)有限公司 Intelligent clothing
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"《仪器仪表学报》2011年度第32卷总目次", 《仪器仪表学报》 *

Similar Documents

Publication Publication Date Title
CN107464564B (en) Voice interaction method, device and equipment
CN108447479B (en) Robot voice control system in noisy working condition environment
CN106440192B (en) A kind of household electric appliance control method, device, system and intelligent air condition
CN110517705B (en) Binaural sound source positioning method and system based on deep neural network and convolutional neural network
CN109286875A (en) For orienting method, apparatus, electronic equipment and the storage medium of pickup
US11854566B2 (en) Wearable system speech processing
CN102298443A (en) Smart home voice control system combined with video channel and control method thereof
CN109767769A (en) A kind of audio recognition method, device, storage medium and air-conditioning
Nakadai et al. Development of microphone-array-embedded UAV for search and rescue task
JP3627058B2 (en) Robot audio-visual system
CN108297108B (en) Spherical following robot and following control method thereof
CN107526437A (en) A kind of gesture identification method based on Audio Doppler characteristic quantification
US20230386461A1 (en) Voice user interface using non-linguistic input
Nakadai et al. Real-time speaker localization and speech separation by audio-visual integration
TWI222622B (en) Robotic vision-audition system
CN107390175A (en) A kind of auditory localization guider with the artificial carrier of machine
CN113053368A (en) Speech enhancement method, electronic device, and storage medium
CN109583598A (en) A kind of power equipment automatic tour inspection system
CN110517702A (en) The method of signal generation, audio recognition method and device based on artificial intelligence
CN112885345A (en) Special garment voice interaction system and method
CN110517677A (en) Speech processing system, method, equipment, speech recognition system and storage medium
CN110164443A (en) Method of speech processing, device and electronic equipment for electronic equipment
CN111932619A (en) Microphone tracking system and method combining image recognition and voice positioning
CN111412587A (en) Voice processing method and device of air conditioner, air conditioner and storage medium
Zhao et al. A robust real-time sound source localization system for olivia robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601