CN112885345A - Special garment voice interaction system and method - Google Patents
Special garment voice interaction system and method Download PDFInfo
- Publication number
- CN112885345A CN112885345A CN202110040219.4A CN202110040219A CN112885345A CN 112885345 A CN112885345 A CN 112885345A CN 202110040219 A CN202110040219 A CN 202110040219A CN 112885345 A CN112885345 A CN 112885345A
- Authority
- CN
- China
- Prior art keywords
- voice
- sound source
- external sound
- signal
- source signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 31
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 31
- 238000005516 engineering process Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 3
- 238000004378 air conditioning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 1
- 229910052782 aluminium Inorganic materials 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64G—COSMONAUTICS; VEHICLES OR EQUIPMENT THEREFOR
- B64G6/00—Space suits
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/323—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/326—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/02—Spatial or constructional arrangements of loudspeakers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
- G10L2015/0633—Creating reference templates; Clustering using lexical or orthographic knowledge sources
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Otolaryngology (AREA)
- Evolutionary Computation (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- Probability & Statistics with Applications (AREA)
- Alarm Systems (AREA)
Abstract
The invention discloses a voice interaction system for special clothes, which comprises: the microphone array is arranged in the special garment and used for acquiring voice signals; the voice recognition module is used for recognizing the voice signal and outputting a recognition result; the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal; the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal; the voice synthesis module is used for receiving the control signal and outputting synthesized voice; the stereo broadcasting device is arranged in the special clothes and is used for receiving and broadcasting the first warning voice, the second warning voice or the synthesized voice. The system can effectively restrain noise generated in a special environment of the extravehicular suit and accurately identify the voice command sent by the operating personnel.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice interaction system and method for special clothes.
Background
The special clothes are protective clothes designed for special operations. In order to achieve the protection purpose, the operator cannot flexibly move the joint part to operate and control the equipment when wearing the clothing, the visual field range can be narrowed, the surrounding objects cannot be efficiently explored, and especially in some special environments, the special clothing is fully closed, so that the limitation degree is increased. Taking the extravehicular space suit as an example, the extravehicular space suit (the extravehicular space suit for short) mainly has the function of supporting and completing a large number of simple to complex various extravehicular activity tasks such as extravehicular scientific experiments, payload maintenance, space station assembly and maintenance and the like.
The intelligent voice interaction technology is combined with the cabin outer suit system, so that the man-machine effect of the cabin outer suit is improved, and the development trend is at present. However, the man-machine interaction scene of the extravehicular suit is greatly different from the traditional man-machine interaction scene, and mainly comprises:
1. the environment for sound transmission is special, the outside-cabin clothes are closed cavities formed by flexible materials and aluminum materials, the air pressure and the humidity in the cavities are obviously different from the surface environment, and the sound can be reflected and absorbed to different degrees when being transmitted in the cavities.
2. The noise is more, and the noise that equipment such as fan, pump that the outdoor clothing carried produced can produce the serious influence to the quality of communication voice.
3. Need the sound source location, when carrying out cabin outer work, the staff needs fix a position the sound source, promotes work efficiency.
Therefore, in view of the above problems, it is necessary to propose a further solution to solve at least one of the problems.
Disclosure of Invention
The invention aims to provide a voice interaction system and method for special clothes, which aim to overcome the defects in the prior art.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a special garment voice interaction system is characterized by comprising:
the microphone array is arranged in the special garment and used for acquiring voice signals;
the voice recognition module is used for recognizing the voice signal and outputting a recognition result;
the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal;
the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal;
the voice synthesis module is used for receiving the control signal and outputting synthesized voice;
and the stereo broadcasting device is arranged in the special clothing and is used for receiving and playing the first warning voice, the second warning voice or the synthesized voice.
In a preferred embodiment of the present invention, the speech recognition module comprises:
the acoustic model training unit is used for establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology;
the signal processing unit is used for extracting voice characteristic parameters from the received voice signals;
and the voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
In a preferred embodiment of the present invention, the speech synthesis module comprises:
the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training;
the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters;
and the voice synthesis unit is used for generating synthesized voice according to the HTS + Stright algorithm and the prosody parameters.
In a preferred embodiment of the present invention, the voice alarm module includes:
a sound source acquisition unit for acquiring an external sound source signal;
an azimuth selecting unit for acquiring azimuth information of the external sound source according to the external sound source signal;
a distance detection unit for acquiring distance information of the external sound source according to the external sound source signal;
the convolution calculation unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal;
and the voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
The invention also provides a special garment voice interaction method, which comprises the following steps:
s1, acquiring a voice signal;
s2 recognizing the voice signal and outputting a recognition result, the recognition result including at least text information;
s3, acquiring an external sound source signal and positioning information of the external sound source signal;
s4 outputting a control signal according to the recognition result to control the operation of the device, and outputting a first warning voice according to the external sound source signal or outputting a second warning voice according to the external sound source signal and the location information of the external sound source signal;
s5 receiving the control signal and outputting a synthesized voice;
s6 plays the first warning voice, the second warning voice or the synthesized voice.
In a preferred embodiment of the present invention, the recognizing the voice signal and outputting the recognition result in step S2 includes:
and establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology, and matching the voice signal with the voice model to output a recognition result.
In a preferred embodiment of the present invention, the step S3 of acquiring the external sound source signal and the positioning information of the external sound source signal includes:
HRTF technology is adopted, based on the space positioning capability of an auditory system, the sound source is positioned in two aspects of direction and distance, and segmented processing is adopted in the convolution calculation stage.
In a preferred embodiment of the present invention, the step S5 of receiving the control signal and outputting a synthesized voice includes:
and generating the synthesized voice according to the HTS + Stright algorithm and the prosodic parameters of the voice signal.
In a preferred embodiment of the present invention, the step S1 of acquiring the voice signal includes:
and acquiring voice signals of personnel in the special clothes by adopting a microphone array.
In a preferred embodiment of the present invention, the step S6 playing the first warning voice, the second warning voice or the synthesized voice includes:
and playing the first warning voice, the second warning voice or the synthesized voice by adopting a stereo broadcasting device arranged in the special clothes.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention is suitable for voice recognition in special environment, and the recognition rate of the fan or the air conditioning valve in the special clothes can reach more than 98 percent no matter whether the fan or the air conditioning valve is opened or not by adopting the microphone array to effectively inhibit the environmental noise.
(2) The invention can sense the azimuth through voice, is particularly suitable for space environment, and reflects the azimuth sense of the target sound source under the complex motion condition that a plurality of aircrafts simultaneously revolve and rotate or when astronauts carry out extravehicular operation, so that personnel in the suit can subjectively sense the relative position of the target or the operating personnel, and the working efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of a speech recognition module according to the present invention;
FIG. 3 is a schematic diagram of a speech synthesis module according to the present invention;
FIG. 4 is a diagram of a voice alarm module according to the present invention.
Specifically, 100, a microphone array; 200. a voice recognition module; 300. a voice alarm module; 400. a control module; 500. a speech synthesis module; 600. a stereophonic broadcasting device.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, should not be taken as limiting the scope of the present invention. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the invention, the meaning of "a plurality" is two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
As shown in fig. 1, a special garment voice interaction system includes a microphone array 100, a voice recognition module 200, a voice alarm module 300, a control module 400, a voice synthesis module 500, and a stereo broadcasting device 600.
Specifically, the microphone array 100 is disposed in a special garment and is used to acquire a voice signal. The speech recognition module 200 is used for recognizing the speech signal and outputting a recognition result. The voice warning module 300 is used for acquiring an external sound source signal and positioning information of the external sound source signal. The control module 400 is configured to output a control signal according to the recognition result to control the operation of the device, and output a first warning voice according to the external sound source signal, or output a second warning voice according to the external sound source signal and the positioning information of the external sound source signal. The speech synthesis module 500 is configured to receive the control signal and output a synthesized speech. The stereo broadcasting device 600 is disposed in the special garment and is configured to receive and play the first warning voice, the second warning voice or the synthesized voice.
As shown in fig. 1, the communication voice a2 of the far-end voice communication is sent by the control module 400 to the speech synthesis module 500 for communication voice broadcast a 3. The microphone array 100 picks up the voice b1, the voice b1 is subjected to noise reduction and recognition by the voice recognition module 200, the voice b21 after noise reduction is transmitted to the control module 400 to be sent out, and the recognition result is reported to b22 to the control module 400. The command for recognizing the opening can be realized by a voice wake-up mode, and also can be realized by a mode of recognizing the command for issuing b 01. The recognition result may or may not be sent to the speech synthesis module 500 for broadcasting. The control module 400 issues a voice synthesis instruction to c2 to the voice synthesis module 500, performs synthetic voice broadcasting c3 after synthesis, and feeds back whether the synthesis is normal to the upper layer application. The control module 400 issues the warning instruction to d21 or sends the audio d22 containing the azimuth information to the voice warning module 300, that is, the sound source acquisition unit in the voice warning module 300 first sends the acquired external sound source signal to the control module 400, the control module 400 determines to output the first warning voice or the second warning voice, and sends the determination result and the external sound source signal to the voice warning module 300, and the voice warning module 300 plays the fixed azimuth audio (the first warning voice) according to the instruction or synthesizes and broadcasts the audio (the second warning voice) with the azimuth sense in real time after analyzing the azimuth data and the audio.
The system adopts the microphone array 100 and the in-garment stereo broadcasting device 600, realizes the non-contact between the pickup and broadcasting device and personnel, and avoids discomfort caused by collision, friction, sultry and the like.
The system combines the microphone array 100 technology, the voice recognition technology, the voice synthesis technology and the virtual sound 3D alarm technology, can effectively inhibit noise generated in special environment of the extravehicular suit, accurately identifies voice instructions sent by operators, can enable the system to synthesize voice and have direction information, avoids frequent joint movement of the operators, reduces workload, and greatly improves man-machine cooperation efficiency. Specifically, the system takes the microphone array 100 as input, converts the input into an instruction through the voice recognition module 200, reports the instruction to the control module 400, the control module 400 receives the recognized control instruction, and can operate the equipment inside the cabin outer suit, such as modulation and display, query and the like, and meanwhile, the control module 400 issues an alarm and synthesis instruction, and the three modes of voice synthesis, ordinary alarm and 3D voice alarm can be selected to be broadcasted through the stereo broadcasting device 600 in different forms.
As shown in fig. 2, the speech recognition module 200 includes an acoustic model training unit, a signal processing unit, and a speech recognition unit. Specifically, the acoustic model training unit is used for establishing a speech model according to a corpus and a combined modeling based on a deep neural network and an HMM, and combined with a discriminative training technology. The signal processing unit is used for extracting voice characteristic parameters from the received voice signals. The voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
As shown in fig. 3, the speech synthesis module 500 includes a modeling unit, a text analysis unit, and a speech synthesis unit. Specifically, the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training. And the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters. The speech synthesis unit is used for generating synthesized speech according to the HTS + Stright algorithm and the prosodic parameters.
As shown in fig. 4, the voice alarm module 300 includes a sound source acquiring unit, an azimuth selecting unit, a distance detecting unit, a convolution calculating unit, and a voice alarm generating unit. Specifically, the sound source acquisition unit is used to acquire an external sound source signal, i.e., a target. The azimuth selection unit is used for acquiring azimuth information of the external sound source according to the external sound source signal. The distance detection unit is used for acquiring distance information of the external sound source according to the external sound source signal. The convolution calculating unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal. The voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
A special garment voice interaction method comprises the following steps:
s1 acquires a voice signal.
Preferably, a microphone array 100 is used to acquire the speech signal of the person inside the specialty garment.
S2 recognizes the speech signal and outputs a recognition result, the recognition result including at least text information.
Preferably, a speech model is established according to the corpus and the joint modeling based on the deep neural network and the HMM, and a discriminative training technology is combined, and the speech signal is matched with the speech model to output a recognition result. The corpus is a large amount of general corpora and part of simulation scene corpora collected according to the using environment. Adaptation to a specific acoustic environment on the basis of ensuring coverage of (secondary) linguistic phenomena in a broad-spectrum sense is achieved by the speech recognition module 200.
S3 acquires the external sound source signal and the localization information of the external sound source signal.
Preferably, by adopting the HRTF technology, the sound source is positioned in two aspects of direction and distance based on the space positioning capability of the auditory system, and the sound field auditory system can realize positioning in the case of a single sound source and in the case of multiple sound sources. In the convolution calculation stage, segmentation processing is adopted, so that the calculation amount is reduced.
S4 outputs a control signal for controlling the operation of the device according to the recognition result, and outputs a first warning voice according to the external sound source signal or outputs a second warning voice according to the external sound source signal and the location information of the external sound source signal.
S5 receives the control signal and outputs a synthesized speech.
Preferably, the synthesized speech is generated from the HTS + Stright algorithm and prosodic parameters of the speech signal. The HTS + Stright algorithm is suitable for efficient parametric synthesis in low power scenarios. S6 plays the first warning voice, the second warning voice or the synthesized voice.
Preferably, the first warning voice, the second warning voice or the synthesized voice is played by using the stereo broadcasting device 600 arranged in the special clothes.
To sum up, the system combines the microphone array technology, the voice recognition technology, the voice synthesis technology and the virtual sound 3D warning technology, can effectively restrain noise generated in the special environment of the cabin outer suit, accurately recognizes voice instructions sent by operators, meanwhile, enables the system to synthesize voice and have direction information, avoids frequent joint movement of the operators, reduces workload, and greatly improves human-computer cooperation efficiency.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (10)
1. A special garment voice interaction system is characterized by comprising:
the microphone array is arranged in the special garment and used for acquiring voice signals;
the voice recognition module is used for recognizing the voice signal and outputting a recognition result;
the voice alarm module is used for acquiring an external sound source signal and positioning information of the external sound source signal;
the control module is used for outputting a control signal according to the recognition result so as to control equipment operation, and outputting a first alarm voice according to the external sound source signal or outputting a second alarm voice according to the external sound source signal and the positioning information of the external sound source signal;
the voice synthesis module is used for receiving the control signal and outputting synthesized voice;
and the stereo broadcasting device is arranged in the special clothing and is used for receiving and playing the first warning voice, the second warning voice or the synthesized voice.
2. The special garment voice interaction system of claim 1, wherein the voice recognition module comprises:
the acoustic model training unit is used for establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology;
the signal processing unit is used for extracting voice characteristic parameters from the received voice signals;
and the voice recognition unit is used for matching the voice characteristic parameters of the voice signals with the voice model, the language model and the dictionary and outputting a recognition result.
3. The special garment voice interaction system of claim 1, wherein the voice synthesis module comprises:
the modeling unit is used for establishing a speech synthesis model according to the sound library and based on HMM training;
the text analysis unit is used for extracting context-dependent HMM sequence decision information according to the recognition result and the speech synthesis model and generating prosodic parameters;
and the voice synthesis unit is used for generating synthesized voice according to the HTS + Stright algorithm and the prosody parameters.
4. The special garment voice interaction system as claimed in claim 1, wherein the voice alarm module comprises:
a sound source acquisition unit for acquiring an external sound source signal;
an azimuth selecting unit for acquiring azimuth information of the external sound source according to the external sound source signal;
a distance detection unit for acquiring distance information of the external sound source according to the external sound source signal;
the convolution calculation unit is used for processing the azimuth information and the distance information in a segmented mode and generating a positioning signal;
and the voice alarm generating unit is used for generating a first alarm voice according to the external sound source signal and generating a second alarm voice according to the external sound source signal and the positioning signal.
5. A special garment voice interaction method is characterized by comprising the following steps:
s1, acquiring a voice signal;
s2 recognizing the voice signal and outputting a recognition result, the recognition result including at least text information;
s3, acquiring an external sound source signal and positioning information of the external sound source signal;
s4 outputting a control signal according to the recognition result to control the operation of the device, and outputting a first warning voice according to the external sound source signal or outputting a second warning voice according to the external sound source signal and the location information of the external sound source signal;
s5 receiving the control signal and outputting a synthesized voice;
s6 plays the first warning voice, the second warning voice or the synthesized voice.
6. The special garment voice interaction method according to claim 5, wherein the step S2 of recognizing the voice signal and outputting the recognition result comprises:
and establishing a voice model according to a corpus, combined modeling based on a deep neural network and an HMM (hidden Markov model) and combined with a discriminative training technology, and matching the voice signal with the voice model to output a recognition result.
7. The special garment voice interaction method according to claim 5, wherein the step S3 of acquiring the external sound source signal and the positioning information of the external sound source signal comprises:
HRTF technology is adopted, based on the space positioning capability of an auditory system, the sound source is positioned in two aspects of direction and distance, and segmented processing is adopted in the convolution calculation stage.
8. The special garment voice interaction method according to claim 5, wherein the step S5 is receiving the control signal and outputting synthesized voice, and comprises:
and generating the synthesized voice according to the HTS + Stright algorithm and the prosodic parameters of the voice signal.
9. The special garment voice interaction method according to claim 5, wherein the step S1 of acquiring a voice signal comprises:
and acquiring voice signals of personnel in the special clothes by adopting a microphone array.
10. The special garment voice interaction method according to claim 5, wherein the step S6 playing the first warning voice, the second warning voice or the synthesized voice comprises:
and playing the first warning voice, the second warning voice or the synthesized voice by adopting a stereo broadcasting device arranged in the special clothes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110040219.4A CN112885345A (en) | 2021-01-13 | 2021-01-13 | Special garment voice interaction system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110040219.4A CN112885345A (en) | 2021-01-13 | 2021-01-13 | Special garment voice interaction system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112885345A true CN112885345A (en) | 2021-06-01 |
Family
ID=76045158
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110040219.4A Pending CN112885345A (en) | 2021-01-13 | 2021-01-13 | Special garment voice interaction system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112885345A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015076797A (en) * | 2013-10-10 | 2015-04-20 | 富士通株式会社 | Spatial information presentation device, spatial information presentation method, and spatial information presentation computer |
US20160001193A1 (en) * | 2014-07-01 | 2016-01-07 | Disney Enterprises, Inc. | Full-duplex, wireless control system for interactive costumed characters |
WO2016033269A1 (en) * | 2014-08-28 | 2016-03-03 | Analog Devices, Inc. | Audio processing using an intelligent microphone |
CN106128478A (en) * | 2016-06-28 | 2016-11-16 | 北京小米移动软件有限公司 | Voice broadcast method and device |
CN206079071U (en) * | 2016-10-17 | 2017-04-12 | 福州领头虎软件有限公司 | Intelligent clothing |
US20170303052A1 (en) * | 2016-04-18 | 2017-10-19 | Olive Devices LLC | Wearable auditory feedback device |
CN207054840U (en) * | 2017-07-06 | 2018-03-02 | 劲霸男装(上海)有限公司 | Intelligent clothing |
CN107925816A (en) * | 2015-10-30 | 2018-04-17 | 谷歌有限责任公司 | Method and apparatus for re-creating direction prompting in the audio of beam forming |
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
-
2021
- 2021-01-13 CN CN202110040219.4A patent/CN112885345A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2015076797A (en) * | 2013-10-10 | 2015-04-20 | 富士通株式会社 | Spatial information presentation device, spatial information presentation method, and spatial information presentation computer |
US20160001193A1 (en) * | 2014-07-01 | 2016-01-07 | Disney Enterprises, Inc. | Full-duplex, wireless control system for interactive costumed characters |
WO2016033269A1 (en) * | 2014-08-28 | 2016-03-03 | Analog Devices, Inc. | Audio processing using an intelligent microphone |
CN107925816A (en) * | 2015-10-30 | 2018-04-17 | 谷歌有限责任公司 | Method and apparatus for re-creating direction prompting in the audio of beam forming |
US20170303052A1 (en) * | 2016-04-18 | 2017-10-19 | Olive Devices LLC | Wearable auditory feedback device |
CN106128478A (en) * | 2016-06-28 | 2016-11-16 | 北京小米移动软件有限公司 | Voice broadcast method and device |
CN206079071U (en) * | 2016-10-17 | 2017-04-12 | 福州领头虎软件有限公司 | Intelligent clothing |
CN207054840U (en) * | 2017-07-06 | 2018-03-02 | 劲霸男装(上海)有限公司 | Intelligent clothing |
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
Non-Patent Citations (1)
Title |
---|
"《仪器仪表学报》2011年度第32卷总目次", 《仪器仪表学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107464564B (en) | Voice interaction method, device and equipment | |
CN108447479B (en) | Robot voice control system in noisy working condition environment | |
Grondin et al. | The ManyEars open framework: Microphone array open software and open hardware system for robotic applications | |
CN110517705B (en) | Binaural sound source positioning method and system based on deep neural network and convolutional neural network | |
WO2020168727A1 (en) | Voice recognition method and device, storage medium, and air conditioner | |
CN109286875A (en) | For orienting method, apparatus, electronic equipment and the storage medium of pickup | |
US11854566B2 (en) | Wearable system speech processing | |
CN102298443A (en) | Smart home voice control system combined with video channel and control method thereof | |
JP3627058B2 (en) | Robot audio-visual system | |
CN108297108B (en) | Spherical following robot and following control method thereof | |
CN107526437A (en) | A kind of gesture identification method based on Audio Doppler characteristic quantification | |
CN109147787A (en) | A kind of smart television acoustic control identifying system and its recognition methods | |
TWI222622B (en) | Robotic vision-audition system | |
CN107390175A (en) | A kind of auditory localization guider with the artificial carrier of machine | |
CN113053368A (en) | Speech enhancement method, electronic device, and storage medium | |
CN110444189A (en) | One kind is kept silent communication means, system and storage medium | |
CN110517702A (en) | The method of signal generation, audio recognition method and device based on artificial intelligence | |
CN112885345A (en) | Special garment voice interaction system and method | |
CN110164443A (en) | Method of speech processing, device and electronic equipment for electronic equipment | |
CN108680902A (en) | A kind of sonic location system based on multi-microphone array | |
CN111412587B (en) | Voice processing method and device of air conditioner, air conditioner and storage medium | |
Zhao et al. | A robust real-time sound source localization system for olivia robot | |
Nakadai et al. | Auditory fovea based speech separation and its application to dialog system | |
CN109188559A (en) | Safety inspection method, device, equipment and storage medium | |
JP3843741B2 (en) | Robot audio-visual system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210601 |