CN109822587B - Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals - Google Patents
Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals Download PDFInfo
- Publication number
- CN109822587B CN109822587B CN201910163672.7A CN201910163672A CN109822587B CN 109822587 B CN109822587 B CN 109822587B CN 201910163672 A CN201910163672 A CN 201910163672A CN 109822587 B CN109822587 B CN 109822587B
- Authority
- CN
- China
- Prior art keywords
- voice
- neck
- module
- main control
- mouth
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000003745 diagnosis Methods 0.000 title claims abstract description 24
- 230000009471 action Effects 0.000 claims abstract description 61
- 230000007246 mechanism Effects 0.000 claims abstract description 32
- 230000033001 locomotion Effects 0.000 claims description 16
- 238000004519 manufacturing process Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 claims description 5
- 241000282414 Homo sapiens Species 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 4
- 102220605052 Histone H4-like protein type G_S61A_mutation Human genes 0.000 claims description 3
- 102220479869 Protein FAM180A_S62A_mutation Human genes 0.000 claims description 3
- 102220579099 Protein TSSC4_S62D_mutation Human genes 0.000 claims description 3
- 102220347773 c.185C>G Human genes 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 230000002093 peripheral effect Effects 0.000 claims description 3
- 230000005855 radiation Effects 0.000 claims description 3
- 230000000241 respiratory effect Effects 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 238000002054 transplantation Methods 0.000 claims description 3
- 230000003313 weakening effect Effects 0.000 claims description 3
- 244000101724 Apium graveolens Dulce Group Species 0.000 claims 1
- 235000015849 Apium graveolens Dulce Group Nutrition 0.000 claims 1
- 235000010591 Appio Nutrition 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 3
- 239000002184 metal Substances 0.000 description 52
- 238000005065 mining Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 5
- 101000965313 Legionella pneumophila subsp. pneumophila (strain Philadelphia 1 / ATCC 33152 / DSM 7513) Aconitate hydratase A Proteins 0.000 description 4
- 101000642268 Homo sapiens Speckle-type POZ protein Proteins 0.000 description 2
- 235000015429 Mirabilis expansa Nutrition 0.000 description 2
- 244000294411 Mirabilis expansa Species 0.000 description 2
- 102100036422 Speckle-type POZ protein Human genes 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004377 microelectronic Methods 0.000 description 2
- 235000013536 miso Nutrition 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000000968 medical method and process Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Abstract
The invention relates to a head and neck device of a voice diagnosis guide robot for an industrial hospital and a control method thereof; the problem that the factory and mine hospitals are low in triage efficiency due to the lack of professional diagnosis guide personnel is solved; the head and neck device comprises a head device, a neck device and a control system, wherein the head device is arranged above the neck device; the head device comprises a head structure, a mouth action mechanism and a voice module; the neck device comprises a neck support and a neck action mechanism; the control system controls the voice module, aiming at the conditions of unstable background noise and large intensity fluctuation of a hospital, low-frequency energy is adopted to replace traditional short-time energy as characteristic quantity, the accuracy of voice recognition in a complex noise environment is improved, the main voice guidance work of a hospital department road is completed according to voice recognition contents, and the robot realizes the mouth action of an anthropomorphic robot and the neck pitching rotation action while voice conversation, so that the interaction capacity of the guide robot is improved.
Description
Technical Field
The invention belongs to the field of intelligent service robots, and particularly relates to a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital and a control method.
Background
With the development of the intelligent manufacturing industry in China and the continuous improvement of the technological level, the intelligent service robot is more and more widely applied to daily life, the intelligent service robot can be applied to various fields of tour guides, medical treatment, welcome and the like at present, and the intelligent service robot can become the most powerful assistant and the most intimate partner for human beings in the future.
The invention is applied to the field of voice diagnosis guiding service in industrial and mining hospitals, which belong to hospitals of enterprises, have smaller scale than hospitals of third-level class A and the like, have large flow of medical staff and lack of professional diagnosis guiding staff, and cause the problem of low triage efficiency.
Disclosure of Invention
The invention solves the problems and provides a head and neck device and a control device of a voice diagnosis guide robot for an industrial and mining hospital, which can share the working pressure of patients in the industrial and mining hospital in high stages of diagnosis, improve the diagnosis efficiency of the hospital, identify the voice information of the patients in the complex noise environment of the hospital with high accuracy, quickly make voice response according to the identified voice information, and specifically comprise voice direction, department inquiry and expert introduction.
In order to solve the above problems, a first object of the present invention is to provide a head and neck apparatus of a voice diagnosis guide robot for an industrial and mining hospital, and a second object of the present invention is to provide a control method of a head and neck apparatus of a voice diagnosis guide robot for an industrial and mining hospital.
The first technical scheme adopted by the invention is as follows:
a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital comprises a head device, a neck device and a control system, wherein the head device is arranged above the neck device;
the head device comprises a head structure, a mouth action mechanism and a voice module;
the neck device comprises a neck support and a neck action mechanism;
the control system takes a main control chip as a main part and is connected with the voice module and used for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism and the neck action mechanism to complete the actions of mouth action, neck pitching and turning of the anthropomorphic user.
The head structure comprises a face support plate, a mouth first support plate and a mouth second support plate; the face support plate is vertically arranged with the mouth first support plate and the mouth second support plate and is fixedly connected with the mouth first support plate and the mouth second support plate.
The mouth action mechanism comprises a mouth action control module, a stepping motor, a motor fixing support, a metal coupler, a mouth transmission support and a chin component, the stepping motor is fixedly connected with a second mouth supporting plate through the motor fixing support, an output shaft of the stepping motor is fixedly connected with an input end of the metal coupler, the other end of the metal coupler is connected with an input end of the mouth transmission support, and the tail end of the mouth transmission support is fixedly connected with the chin component.
PB0, PB1, PB2, PB3 pin of control system's main control chip respectively with mouth action control module's IN1, IN2, IN3, IN4 pin are connected, mouth action control module's positive negative pole and 5V's power are connected, mouth action control module's output OUT1, OUT2 respectively with step motor's positive negative pole input is connected, is used for control the motion of guide's robot mouth.
The voice module comprises a main control board, a microphone, a voice recognition module, a voice sounding module, a loudspeaker and a loudspeaker support, wherein the microphone is fixed on the face supporting plate and is connected with a single sound channel inlet of the voice recognition module, the loudspeaker is vertically arranged and fixedly connected with the first mouth supporting plate and the second mouth supporting plate through the loudspeaker support, the positive electrode and the negative electrode of the loudspeaker are connected with the output end positive electrode and the output end negative electrode of the voice sounding module, and the main control board is respectively connected with the voice recognition module, the voice sounding module, the mouth action control module and the neck action mechanism.
The main control chip of the control system is arranged on a main control board, pins PA4, PA5, PA6 and PA7 of the main control chip are respectively connected with pins MISO, MOSI, SCK and NSS of a voice recognition module to communicate according to an SPI protocol and transmit voice recognition information, and pins RST, WR and IRQ of the voice recognition module are respectively connected with pins PB12, PB13 and PB14 of the main control chip; MICP and MICN pins of the voice recognition module are used as input ends of the anode and the cathode of the microphone, wherein the MICP is a positive input end, and the MICN is a negative input end; SPOP and SPON of the voice recognition module are respectively connected with IN + and IN-of the voice production module, and OUT + and OUT-of the voice production module are respectively connected with the positive electrode and the negative electrode of the horn and used for outputting response voice of the robot.
The neck support comprises a neck metal support, a neck metal base and a head and neck connecting piece.
The neck action mechanism comprises a single-shaft metal steering engine, a first metal steering wheel, a double-shaft metal steering engine and a second metal steering wheel; the output shaft of unipolar metal steering wheel with first metal steering wheel is connected, the output of first metal steering wheel is fixed in neck metal support's draw-in groove, neck metal support lower extreme and biax metal steering wheel rigid coupling, the output shaft of biax metal steering wheel is perpendicular downwards, and with second metal steering wheel is connected, second metal steering wheel is fixed in neck metal base's draw-in groove.
A PB7 pin of the main control chip is connected with an OUT pin of the single-shaft metal steering engine; and a PB8 pin of the main control chip is respectively connected with an OUT pin of the double-shaft metal steering engine.
The second technical scheme adopted by the invention is as follows:
based on the control method realized by the head and neck device of the voice diagnosis guide robot for the factory and mine hospitals, the method comprises the following steps:
step S1, the microphone collects the voice information of the patient, converts the sound wave into a digital voice signal and transmits the digital voice signal into the voice recognition module;
step S2, the voice recognition module preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;
step S3, the voice recognition module extracts acoustic features of the preprocessed voice signals;
step S4, obtaining the probability that a certain section of voice information in the recognition network of the acoustic features belongs to a certain acoustic symbol;
step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module;
and step S6, the main control chip completes corresponding response voice and matches the mouth action and neck action of the voice according to the instruction corresponding to the text information recognized by the voice recognition module.
The improved preprocessing algorithm in step S2 specifically includes:
step S201, the digital voice signal is passed through a transfer function h (z) ═ 1-az-1The high-pass digital filter emphasizes the high-frequency part, removes the influence of lip radiation and increases the high-frequency resolution of voice;
step S202, framing the digital voice signal according to the short-time stationarity of the voice signal;
step S203, performing windowing processing on the voice signal, emphasizing the voice waveform near the sampling n, weakening the waveform of the rest part, wherein the window length is 25ms, the window shift is 10ms, each frame has 410 sampling points, the low-frequency energy is adopted to replace the general short-time energy as the characteristic quantity, and the voice signal obtained by the voice signal to be detected through an FIR low-pass filter is xh(i) By the formulaJudging low frequency energy, whereinx (i) is a speech signal with detection, hkFor FIR low-pass filter coefficients, l is the order of the filter, xh(i) The method is characterized in that a filtered voice signal is subjected to low-frequency energy estimation through training to preset a threshold value of low-frequency energy, aiming at the characteristic that a noise environment is unstable and has sudden cusp noise in a hospital scene, median fuzzy processing is adopted to filter the sudden cusp noise after a voice signal is windowed, and the sudden cusp noise is filtered in a one-dimensional sequence f1,f2,...,fnIn, if the window size is m, fiThe output of the blurring process is yi,yiIs in the windowThe value of the median after permutation from large to small: y isi=Med{fi-v,...,fi,...,fi+v},i∈N,And outputting the voice signal with the sharp points removed.
In step S3, the acoustic feature extraction adopts an MFCC speech feature extraction technique, which specifically includes:
step S301, performing Fast Fourier Transform (FFT) on the windowed signal,k is more than or equal to 0 and less than or equal to N to obtain a frequency spectrum, wherein x (N) is an input voice signal, and N represents the number of points of Fourier transform;
step S302, converting the actual frequency f scale into Mel frequency Mel (f) scale with the formula of
And step S303, filtering the frequency spectrum coefficient obtained by conversion by using a sequence triangular filter, smoothing the frequency spectrum, eliminating the effect of harmonic waves, highlighting the formant of the original voice and reducing the calculation amount.
Step S305, performing Discrete Cosine Transform (DCT) on the energy value S (m) obtained in the previous step to obtain an MFCC coefficient:1,2, L, where L is the MFCC coefficient order and M is the number of triangular filters.
The recognition network is built by adopting a Gaussian mixture model-hidden Markov model GMM-HMM in the step S4, the recognition network has the characteristics of high training speed, small model and easiness in transplantation, and the conditional probability obtained by the MFCC characteristic parameters is input into the S4 in the step S3 to obtain the probability of the phoneme or syllable corresponding to the voice frame.
The step S6, in which the responding voice is completed by the main control chip controlling the voice emitting module, includes the following steps:
step S61A, if the voice recognition module converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol in step S5, if the recognition: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochenan doctor profile', and the like, the main control chip transmits corresponding ASCII codes to the voice sounding module, and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);
step S61B, after the voice module receives the instruction corresponding to the main control chip, initializes a register set according to the designated sequence, clears the play start position to nMP3Pos equal to 0, writes MP3 data in the serial Flash into the FIF0 register (one byte at a time) nMp3Pos + +, modifies the BA and 17 registers, and opens the interrupt permission EX0 equal to 1.
The method for completing the mouth movement and the neck movement in step S6 in cooperation with the speech is as follows:
step S62A, the main control chip transmits the corresponding ASCII code to the voice production module and controls the mouth action mechanism and the neck action mechanism to act at the same time;
step S62B, initialization: starting a peripheral clock, configuring an initialization structure body and calling out a structure body initialization function;
step S62C, configuring the pulse width through the duty ratio, and configuring the desired rotation angle of the motor through the TIM _ SetCompare1() function;
and step S62D, setting a plurality of angle values according to the natural rules of the mouth movement and the neck movement of the human, so that the diagnosis guide robot has high natural anthropomorphic degree of movement.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a head and neck device of a voice guide robot for an industrial hospital and a control method, wherein an improved voice recognition method is adopted, aiming at the complex and changeable environment of the hospital voice environment, different from the general VAD algorithm based on short-time energy, an improved voice activation algorithm based on low-frequency energy is provided, the interference of high-frequency part energy is obviously reduced, and the voice information of a patient can be still accurately detected under the condition of low signal-to-noise ratio;
2. the head and neck device of the diagnosis guide robot is simple and elegant in mechanical structure, small in occupied space, powerful in function and practical, is applied to the field of hospital diagnosis guide, solves the problem that a factory and mine hospital is low in triage efficiency due to the fact that professional diagnosis guide personnel are lacked, solves the problem to a certain extent, improves the overall image of the hospital, enables the diagnosis guide personnel to speak and hold the image representing the hospital, and changes the view that people cannot see the first group of robots going into life;
3. the medical guide robot can realize basic medical guide problems of voice response, including problems of testing process, payment process, department position, admission process and the like, and can realize anthropomorphic natural mouth action and neck action while answering voice, so that a patient can get emotional joy, the medical process is convenient, and the rehabilitation of diseases is facilitated.
Drawings
FIG. 1 is a front view of the apparatus of the present invention;
FIG. 2 is a rear view of the apparatus of the present invention;
FIG. 3 is a partial view of the mouth action mechanism of the present invention;
FIG. 4 is a view of the neck assembly of the present invention;
FIG. 5 is a circuit diagram of a main control chip of the control system of the present invention;
FIG. 6 is a circuit diagram of a speech module of the present invention;
FIG. 7 is a circuit diagram of the mouth motion control module, the stepping motor, the double-shaft metal steering engine and the single-shaft metal steering engine according to the present invention;
FIG. 8 is an overall workflow diagram of the present invention;
FIG. 9 is a flow chart of speech recognition of the present invention;
FIG. 10 is a schematic diagram of the improved preprocessing algorithm of the present invention.
In the figure: the head device 1, the neck device 2, the control system 3, the head structure 1-1, the mouth action mechanism 1-2, the voice module 1-3, the face support plate 1-1A, the mouth first support plate 1-1B, the mouth second support plate 1-1C, the mouth action control module 1-2A, the stepping motor 1-2B, the motor fixing support 1-2C, the metal coupling 1-2D, the mouth transmission support 1-2E, the chin component 1-2F, the main control plate 1-3A, the microphone 1-3B, the voice recognition module 1-3C, the voice production module 1-3D, the loudspeaker 1-3E, the loudspeaker support 1-3F, the neck support 2-1, the neck action mechanism 2-2, the neck metal support 2-1A, the voice recognition module 1-3C, the voice production module 1-3D, the loudspeaker 1-3E, the loudspeaker support 1-3F, the neck support 2-1, the neck action mechanism 2-2, and the neck metal support 2-1A, The device comprises a neck metal base 2-1B, a head and neck connecting piece 2-1C, a single-shaft metal steering engine 2-2A, a first metal steering wheel 2-2B, a double-shaft metal steering engine 2-2C and a second metal steering wheel 2-2D.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
Detailed description of the invention
A head and neck device of a voice diagnosis guide robot for an industrial and mining hospital, as shown in fig. 1 and fig. 2, comprises a head device 1, a neck device 2 and a control system 3, wherein the head device 1 is arranged above the neck device 2;
the head device 1 comprises a head structure 1-1, a mouth action mechanism 1-2 and a voice module 1-3;
the neck device 2 comprises a neck support 2-1 and a neck action mechanism 2-2;
the control system 3 takes a main control chip as a main part and is connected with the voice module 1-3 for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism 1-2 and the neck action mechanism 2-2 to complete the mouth action and the neck pitching and turning actions of the anthropomorphic person.
Detailed description of the invention
As shown in fig. 1,2 and 3, in the first embodiment, the head structure 1-1 includes a face support plate 1-1A, a mouth first support plate 1-1B and a mouth second support plate 1-1C; the face support plate 1-1A is vertically arranged with the mouth first support plate 1-1B and the mouth second support plate 1-1C and is fixedly connected with the mouth first support plate 1-1B.
The mouth action mechanism 1-2 comprises a mouth action control module 1-2A, a stepping motor 1-2B, a motor fixing support 1-2C, a metal coupler 1-2D, a mouth transmission support 1-2E and a chin component 1-2F, the stepping motor 1-2B is fixedly connected with a second mouth support plate 1-1C through the motor fixing support 1-2C, an output shaft of the stepping motor 1-2B is fixedly connected with an input end of the metal coupler 1-2D, the other end of the metal coupler 1-2D is connected with an input end of the mouth transmission support 1-2E, and the tail end of the mouth transmission support 1-2E is fixedly connected with the chin component 1-2F.
The voice module 1-3 comprises a main control board 1-3A, a microphone 1-3B, a voice recognition module 1-3C, a voice sounding module 1-3D, a loudspeaker 1-3E and a loudspeaker support 1-3F, the microphone 1-3B is fixed on the face support board 1-1A and is connected with a single sound channel inlet of the voice recognition module 1-3C, the loudspeaker 1-3E is vertically arranged and fixedly connected with the first mouth support board 1-1B and the second mouth support board 1-1C through the loudspeaker support 1-3F, the positive and negative poles of the loudspeaker 1-3E are connected with the positive and negative poles of the output end of the voice sounding module 1-3D, and the main control board 1-3A is respectively connected with the voice recognition module 1-3C and the voice sounding module 1-3D, The mouth action control module 1-2A and the neck action mechanism 2-2 are connected in a control way.
In the embodiment, the microphone 1-3B is the prior art, and the manufacturer is a Guangdong Jiaxin microelectronic private store with the model of 52 DB.
In the embodiment, the voice recognition modules 1 to 3C are in the prior art, and the manufacturer is a Guangdong Jiaxin microelectronic private store with the model of LD 3320.
In the embodiment, the voice production modules 1-3D are in the prior art, and the manufacturer is Shenzhen dimension core technology Limited, and the model is PAM 8406.
In the embodiment, the loudspeakers 1 to 3E are in the prior art, and the manufacturer is Shenzhen Wei core science and technology Limited, and the model is spearer 5W 4 omega.
As shown in fig. 2 and 4, the neck support 2-1 comprises a neck metal support 2-1A, a neck metal base 2-1B and a head and neck connector 2-1C; the neck action mechanism 2-2 comprises a single-shaft metal steering engine 2-2A, a first metal rudder disk 2-2B, a double-shaft metal steering engine 2-2C and a second metal rudder disk 2-2D; the output shaft of the single-shaft metal steering engine 2-2A is connected with the first metal steering wheel 2-2B, the output end of the first metal steering wheel 2-2B is fixed in the clamping groove of the neck metal support 2-1A, the lower end of the neck metal support 2-1A is fixedly connected with the double-shaft metal steering engine 2-2C, the output shaft of the double-shaft metal steering engine 2-2C is vertically downward and connected with the second metal steering wheel 2-2D, and the second metal steering wheel 2-2D is fixed in the clamping groove of the neck metal base 2-1B.
As shown IN fig. 5, 6 and 7, pins PB0, PB1, PB2 and PB3 of the main control chip of the control system 3 are respectively connected to pins IN1, IN2, IN3 and IN4 of the nozzle operation control modules 1-2A, positive and negative poles of the nozzle operation control modules 1-2A are connected to a 5V power supply, and output terminals OUT1 and OUT2 of the nozzle operation control modules 1-2A are respectively connected to positive and negative pole input terminals of the stepping motors 1-2B, so as to control the nozzle operation of the lead robot; the main control chip of the control system 3 is arranged on the main control board 1-3A, pins PA4, PA5, PA6 and PA7 of the main control chip are respectively connected with pins MISO, MOSI, SCK and NSS of the voice recognition modules 1-3C to communicate according to the SPI protocol, so as to transmit voice recognition information, and pins RST, WR and IRQ of the voice recognition modules 1-3C are respectively connected with pins PB12, PB13 and PB14 of the main control chip; MICP and MICN pins of the voice recognition modules 1-3C are used as input ends of positive and negative electrodes of the microphones 1-3B, wherein the MICP is a positive electrode input end, and the MICN is a negative electrode input end; SPOP and SPON of the voice recognition modules 1-3C are respectively connected with IN + and IN-of the voice production modules 1-3D, OUT + and OUT-of the voice production modules 1-3D are respectively connected with the positive and negative electrodes of the loudspeakers 1-3E and used for outputting response voice of the robot; a PB7 pin of the main control chip is connected with an OUT pin of the single-shaft metal steering engine 2-2A; and a PB8 pin of the main control chip is respectively connected with an OUT pin of the double-shaft metal steering engine 2-2C.
The model of the main control chip is STM32F 407.
The working process is as follows:
the microphone 1-3B collects the voice information of the patient, the microphone 1-3B converts the sound wave into a digital voice signal and transmits the digital voice signal into a voice recognition module 1-3C, the digital voice signal is preprocessed, wherein the preprocessing comprises the steps of pre-emphasizing to increase the high-frequency resolution of the voice, windowing and framing the digital voice signal, adopting an improved preprocessing algorithm, specifically adopting low-frequency energy to replace the traditional short-time energy as characteristic quantity, carrying out end point detection on the digital voice signal, then adopting an MFCC voice characteristic extraction technology and a GMM-HMM model training recognition network method to match an acoustic model, converting the digital voice signal into segmented text information through a language model and a pronunciation dictionary to realize the voice recognition function, and transmitting the text information to a main control chip by the voice recognition module 1-3C, the main control chip determines whether to continue execution according to a preset recognition list, if a character command in the recognition list is matched, the main control chip controls the command corresponding to the voice to control the voice sounding module 1-3D to output response voice and output voice through the loudspeakers 1-3E, and meanwhile, the mouth action control module and the neck action mechanism complete mouth opening and closing actions and neck rotation pitching actions to complete a complete man-machine conversation process (voice interaction and head and neck action interaction).
Detailed description of the invention
As shown in fig. 8, 9 and 10, a control method implemented by a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital includes the following steps:
step S1, the microphone 1-3B collects the voice information of the patient, converts the sound wave into a digital voice signal and transmits the digital voice signal into the voice recognition module 1-3C;
step S2, the voice recognition module 1-3C preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;
step S3, the voice recognition module 1-3C extracts acoustic features of the preprocessed voice signals;
step S4, obtaining the probability that a certain section of voice information in the recognition network of the acoustic features belongs to a certain acoustic symbol;
step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module 1-3C;
and step S6, the main control chip completes corresponding response voice and mouth action and neck action matched with the voice according to the instruction corresponding to the text information recognized by the voice recognition modules 1-3C.
The improved preprocessing algorithm in step S2 specifically includes:
step S201, the digital voice signal is passed through a transfer function h (z) ═ 1-az-1The high-pass digital filter emphasizes the high-frequency part, removes the influence of lip radiation and increases the high-frequency resolution of voice;
step S202, framing the digital voice signal according to the short-time stationarity of the voice signal;
step S203, windowing the voice signal, emphasizing the voice waveform near the sampling n, weakening the waveform of the rest part, the window length is 25ms, the window shift is 10ms, each frame has 410 sampling points, adopting low-frequency energy to replace general short-time energy as characteristic quantity, and enabling the voice signal to be detected to pass through an FIR low-pass filter to obtain the voice signal xh(i) By the formulaJudging low frequency energy, whereinx (i) is a speech signal with detection, hkFor FIR low-pass filter coefficients, l is the order of the filter, xh(i) The method is characterized in that a filtered voice signal is subjected to low-frequency energy estimation through training to preset a threshold value of low-frequency energy, aiming at the characteristic that a noise environment is unstable and has sudden cusp noise in a hospital scene, median fuzzy processing is adopted to filter the sudden cusp noise after a voice signal is windowed, and the sudden cusp noise is filtered in a one-dimensional sequence f1,f2,...,fnIn, if the window size is m, fiThe output of the individual value fuzzy processing is yi,yiIs the value of the median after the arrangement from large to small within the window: y isi=Med{fi-v,...,fi,...,fi+v},i∈N,And outputting the voice signal with the sharp points removed.
In step S3, the acoustic feature extraction adopts an MFCC speech feature extraction technique, which specifically includes:
step S301, performing Fast Fourier Transform (FFT) on the windowed signal,k is more than or equal to 0 and less than or equal to N to obtain a frequency spectrum, wherein x (N) is an input voice signal, and N represents the number of points of Fourier transform;
And step S303, filtering the frequency spectrum coefficient obtained by conversion by using a sequence triangular filter, smoothing the frequency spectrum, eliminating the effect of harmonic waves, highlighting the formant of the original voice and reducing the calculation amount.
Step S305, performing Discrete Cosine Transform (DCT) on the energy value S (m) obtained in the previous step to obtain an MFCC coefficient:1,2, L, where L is the MFCC coefficient order and M is the number of triangular filters.
The recognition network is built by adopting a Gaussian mixture model-hidden Markov model GMM-HMM in the step S4, the recognition network has the characteristics of high training speed, small model and easiness in transplantation, and the conditional probability obtained by the MFCC characteristic parameters is input into the S4 in the step S3 to obtain the probability of the phoneme or syllable corresponding to the voice frame.
In the step S6, the responding voice is completed by controlling the voice emitting module 1-3D through the main control chip, and the specific steps are as follows:
step S61A, the voice recognition module 1-3C converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol as in step S5, and if the recognized voice information is matched with the text information: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochenan doctor profile', and the like, the main control chip transmits corresponding ASCII codes to the voice sounding modules 1-3D, and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);
step S61B, after receiving the instruction corresponding to the main control chip, the voice module 1-3D initializes a register set according to a designated sequence, clears nMP3Pos 0 at the start play position, writes MP3 data in the serial Flash into the FIF0 register (one byte at a time) nMp3Pos + +, modifies BA and 17 registers, and opens the interrupt permission EX0 to 1.
The method for completing the mouth movement and the neck movement in step S6 in cooperation with the speech is as follows:
step S62A, the main control chip transmits corresponding ASCII codes to the voice production modules 1-3D and controls the mouth action mechanism and the neck action mechanism to act at the same time;
step S62B, initialization: starting a peripheral clock, configuring an initialization structure body and calling out a structure body initialization function;
step S62C, configuring the pulse width through the duty ratio, and configuring the desired rotation angle of the motor through the TIM _ SetCompare1() function;
and step S62D, setting a plurality of angle values according to the natural rules of human mouth movement and neck movement, so that the diagnosis guide robot has high natural anthropomorphic degree.
Claims (1)
1. A control method of a head and neck device of a voice diagnosis guiding robot for an industrial hospital is disclosed, wherein the head and neck device comprises a head device (1), a neck device (2) and a control system (3), and the head device (1) is arranged above the neck device (2);
the head device (1) comprises a head structure (1-1), a mouth action mechanism (1-2) and a voice module (1-3);
the neck device (2) comprises a neck support (2-1) and a neck action mechanism (2-2);
the control system (3) takes a main control chip as a main part and is connected with the voice modules (1-3) for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism (1-2) and the neck action mechanism (2-2) to complete the mouth action and the neck pitching and turning actions of the anthropomorphic user;
the voice module (1-3) comprises a main control board (1-3A), a microphone (1-3B), a voice recognition module (1-3C), a voice production module (1-3D), a loudspeaker (1-3E) and a loudspeaker support (1-3F), the microphone (1-3B) is fixed on the face support board (1-1A) and is connected with a single sound channel inlet of the voice recognition module (1-3C), the loudspeaker (1-3E) is vertically arranged and fixedly connected with the first mouth support board (1-1B) and the second mouth support board (1-1C) through the loudspeaker support (1-3F), the positive and negative poles of the loudspeaker (1-3E) are connected with the positive and negative poles of the output end of the voice production module (1-3D), the main control board (1-3A) is respectively connected with the voice recognition module (1-3C), the voice sounding module (1-3D), the mouth action control module (1-2A) and the neck action mechanism (2-2) in a control way;
the method is characterized by comprising the following steps:
step S1, the microphone (1-3B) collects the voice information of the patient, converts the sound waves into digital voice signals and transmits the digital voice signals into the voice recognition module (1-3C);
step S2, the voice recognition module (1-3C) preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;
step S3, the voice recognition module (1-3C) extracts acoustic features of the preprocessed voice signals;
step S4, obtaining the probability that a certain section of voice information in the recognition network of the acoustic features belongs to a certain acoustic symbol;
step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module (1-3C);
step S6, the main control chip completes corresponding response voice and matches mouth action and neck action of the voice according to the instruction corresponding to the text information recognized by the voice recognition module (1-3C);
the improved preprocessing algorithm in step S2 specifically includes:
step S201, the digital voice signal is passed through a transfer function h (z) ═ 1-az-1The high-pass digital filter emphasizes the high-frequency part, removes the influence of lip radiation and increases the high-frequency resolution of voice;
step S202, framing the digital voice signal according to the short-time stationarity of the voice signal;
step S203, performing windowing processing on the voice signal, emphasizing the voice waveform near the sampling n, weakening the waveform of the rest part, wherein the window length is 25ms, the window shift is 10ms, each frame has 410 sampling points, the low-frequency energy is adopted to replace the general short-time energy as the characteristic quantity, and the voice signal obtained by the voice signal to be detected through an FIR low-pass filter is xh(i) By the formulaJudging low frequency energy, whereinx (i) is a speech signal with detection, hkFor FIR low-pass filter coefficients, l is the order of the filter, xh(i) The method is characterized in that a filtered voice signal is obtained, the low-frequency energy of background noise is estimated through training to preset a threshold value of the low-frequency energy, and sudden sharp point noise exists in the noise environment under a hospital sceneThe characteristics of sound, filtering out sudden sharp point noise by median fuzzy processing after windowing the voice signal, and filtering out sudden sharp point noise in a one-dimensional sequence f1,f2,...,fnIn, if the window size is m, fiThe output of the median blurring process is yi,yiIs the value of the median after the arrangement from large to small within the window: y isi=Med{fi-v,...,fi,...,fi+v},i∈N,Outputting the voice signal without the cusp;
in step S3, the acoustic feature extraction adopts an MFCC speech feature extraction technique, which specifically includes:
step S301, performing Fast Fourier Transform (FFT) on the windowed signal,k is more than or equal to 0 and less than or equal to N to obtain a frequency spectrum, wherein x (N) is an input voice signal, and N represents the number of points of Fourier transform;
Step S303, filtering the frequency spectrum coefficient obtained by conversion by using a sequence triangular filter, smoothing the frequency spectrum, eliminating the effect of harmonic waves, highlighting the formant of the original voice and reducing the operation amount;
Step S305, performing Discrete Cosine Transform (DCT) on the energy value S (m) obtained in the previous step to obtain an MFCC coefficient:wherein L is the MFCC coefficient order, and M is the number of the triangular filters;
s4, the recognition network is built by adopting a Gaussian mixture model-hidden Markov model GMM-HMM, the recognition network has the characteristics of high training speed, small model and easiness in transplantation, and step S3 inputs the conditional probability obtained by the MFCC characteristic parameters into S4 to obtain the probability of the phoneme or syllable corresponding to the voice frame;
the step S6, in which the responding voice is completed by the main control chip controlling the voice emitting module (1-3D), includes the following steps:
step S61A, the voice recognition module (1-3C) converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol as in step S5, and if the recognized voice information matches: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochou celery doctor introduction' and the like, the main control chip transmits corresponding ASCII codes to the voice production module (1-3D), and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);
step S61B, after the voice module (1-3D) receives the instruction corresponding to the main control chip, initializing a register set according to a designated sequence, resetting nMP3Pos to 0 at the start playing position, writing MP3 data in the serial Flash into a FIF0 register (one byte at a time) nMp3Pos + +, modifying the BA and 17 registers, and opening interrupt to allow EX0 to 1;
the method for completing the mouth movement and the neck movement in step S6 in cooperation with the speech is as follows:
step S62A, the main control chip transmits corresponding ASCII codes to the voice production module (1-3D) and controls the mouth action mechanism and the neck action mechanism to act at the same time;
step S62B, initialization: starting a peripheral clock, configuring an initialization structure body and calling out a structure body initialization function;
step S62C, configuring the pulse width through the duty ratio, and configuring the desired rotation angle of the motor through the TIM _ SetCompare1() function;
and step S62D, setting a plurality of angle values according to the natural rules of the mouth movement and the neck movement of the human, so that the diagnosis guide robot has high natural anthropomorphic degree of movement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910163672.7A CN109822587B (en) | 2019-03-05 | 2019-03-05 | Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910163672.7A CN109822587B (en) | 2019-03-05 | 2019-03-05 | Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109822587A CN109822587A (en) | 2019-05-31 |
CN109822587B true CN109822587B (en) | 2022-05-31 |
Family
ID=66865298
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910163672.7A Expired - Fee Related CN109822587B (en) | 2019-03-05 | 2019-03-05 | Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109822587B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110931034B (en) * | 2019-11-27 | 2022-05-24 | 深圳市悦尔声学有限公司 | Pickup noise reduction method for built-in earphone of microphone |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5837765A (en) * | 1981-08-28 | 1983-03-05 | Toshiba Corp | Diagnostic management system |
JP2008216735A (en) * | 2007-03-06 | 2008-09-18 | Fujitsu Ltd | Reception robot and method of adapting to conversation for reception robot |
CN106965193A (en) * | 2017-03-31 | 2017-07-21 | 旗瀚科技有限公司 | A kind of intelligent robot diagnosis guiding system |
JP2018001403A (en) * | 2016-07-07 | 2018-01-11 | 深▲せん▼狗尾草智能科技有限公司Shenzhen Gowild Robotics Co.,Ltd. | Method, system and robot body for synchronizing voice and virtual operation |
CN107901046A (en) * | 2017-11-03 | 2018-04-13 | 深圳市易特科信息技术有限公司 | A guide and examine auxiliary robot for hospital |
CN108942973A (en) * | 2018-09-29 | 2018-12-07 | 哈尔滨理工大学 | Science and technology center's guest-greeting machine department of human head and neck device with temperature and humidity casting function |
-
2019
- 2019-03-05 CN CN201910163672.7A patent/CN109822587B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5837765A (en) * | 1981-08-28 | 1983-03-05 | Toshiba Corp | Diagnostic management system |
JP2008216735A (en) * | 2007-03-06 | 2008-09-18 | Fujitsu Ltd | Reception robot and method of adapting to conversation for reception robot |
JP2018001403A (en) * | 2016-07-07 | 2018-01-11 | 深▲せん▼狗尾草智能科技有限公司Shenzhen Gowild Robotics Co.,Ltd. | Method, system and robot body for synchronizing voice and virtual operation |
CN106965193A (en) * | 2017-03-31 | 2017-07-21 | 旗瀚科技有限公司 | A kind of intelligent robot diagnosis guiding system |
CN107901046A (en) * | 2017-11-03 | 2018-04-13 | 深圳市易特科信息技术有限公司 | A guide and examine auxiliary robot for hospital |
CN108942973A (en) * | 2018-09-29 | 2018-12-07 | 哈尔滨理工大学 | Science and technology center's guest-greeting machine department of human head and neck device with temperature and humidity casting function |
Also Published As
Publication number | Publication date |
---|---|
CN109822587A (en) | 2019-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6665305B2 (en) | Method, apparatus and storage medium for building a speech decoding network in digit speech recognition | |
Hofe et al. | Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
Wand et al. | Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition. | |
US11810233B2 (en) | End-to-end virtual object animation generation method and apparatus, storage medium, and terminal | |
Wand et al. | Domain-Adversarial Training for Session Independent EMG-based Speech Recognition. | |
CN109822587B (en) | Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals | |
CN112232127A (en) | Intelligent speech training system and method | |
JP2001166789A (en) | Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end | |
CN110349565B (en) | Auxiliary pronunciation learning method and system for hearing-impaired people | |
CN110444189A (en) | One kind is kept silent communication means, system and storage medium | |
US20230298616A1 (en) | System and Method For Identifying Sentiment (Emotions) In A Speech Audio Input with Haptic Output | |
Kuamr et al. | Implementation and performance evaluation of continuous Hindi speech recognition | |
Schultz | ICCHP keynote: Recognizing silent and weak speech based on electromyography | |
CN111667834A (en) | Hearing-aid device and hearing-aid method | |
CN1009320B (en) | Speech recognition | |
CN111312251A (en) | Remote mechanical arm control method based on voice recognition | |
Arpitha et al. | Diagnosis of disordered speech using automatic speech recognition | |
Stone | A silent-speech interface using electro-optical stomatography | |
Tao et al. | Design of elevator auxiliary control system based on speech recognition | |
US7353172B2 (en) | System and method for cantonese speech recognition using an optimized phone set | |
Jeyalakshmi et al. | Transcribing deaf and hard of hearing speech using Hidden markov model | |
US20230139394A1 (en) | Eeg based speech prosthetic for stroke survivors | |
Krishna et al. | Continuous Speech Recognition using EEG and Video | |
Naziraliev et al. | ANALYSIS OF SPEECH SIGNALS FOR AUTOMATIC RECOGNITION |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220531 |