CN109822587B

CN109822587B - Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals

Info

Publication number: CN109822587B
Application number: CN201910163672.7A
Authority: CN
Inventors: 王鹏; 罗鹏; 刘然; 宋春宵; 黎晓强; 张元�; 张鹏
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2019-03-05
Filing date: 2019-03-05
Publication date: 2022-05-31
Anticipated expiration: 2039-03-05
Also published as: CN109822587A

Abstract

The invention relates to a head and neck device of a voice diagnosis guide robot for an industrial hospital and a control method thereof; the problem that the factory and mine hospitals are low in triage efficiency due to the lack of professional diagnosis guide personnel is solved; the head and neck device comprises a head device, a neck device and a control system, wherein the head device is arranged above the neck device; the head device comprises a head structure, a mouth action mechanism and a voice module; the neck device comprises a neck support and a neck action mechanism; the control system controls the voice module, aiming at the conditions of unstable background noise and large intensity fluctuation of a hospital, low-frequency energy is adopted to replace traditional short-time energy as characteristic quantity, the accuracy of voice recognition in a complex noise environment is improved, the main voice guidance work of a hospital department road is completed according to voice recognition contents, and the robot realizes the mouth action of an anthropomorphic robot and the neck pitching rotation action while voice conversation, so that the interaction capacity of the guide robot is improved.

Description

Control method for head and neck device of voice diagnosis guide robot for factory and mine hospitals

Technical Field

The invention belongs to the field of intelligent service robots, and particularly relates to a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital and a control method.

Background

With the development of the intelligent manufacturing industry in China and the continuous improvement of the technological level, the intelligent service robot is more and more widely applied to daily life, the intelligent service robot can be applied to various fields of tour guides, medical treatment, welcome and the like at present, and the intelligent service robot can become the most powerful assistant and the most intimate partner for human beings in the future.

The invention is applied to the field of voice diagnosis guiding service in industrial and mining hospitals, which belong to hospitals of enterprises, have smaller scale than hospitals of third-level class A and the like, have large flow of medical staff and lack of professional diagnosis guiding staff, and cause the problem of low triage efficiency.

Disclosure of Invention

The invention solves the problems and provides a head and neck device and a control device of a voice diagnosis guide robot for an industrial and mining hospital, which can share the working pressure of patients in the industrial and mining hospital in high stages of diagnosis, improve the diagnosis efficiency of the hospital, identify the voice information of the patients in the complex noise environment of the hospital with high accuracy, quickly make voice response according to the identified voice information, and specifically comprise voice direction, department inquiry and expert introduction.

In order to solve the above problems, a first object of the present invention is to provide a head and neck apparatus of a voice diagnosis guide robot for an industrial and mining hospital, and a second object of the present invention is to provide a control method of a head and neck apparatus of a voice diagnosis guide robot for an industrial and mining hospital.

The first technical scheme adopted by the invention is as follows:

a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital comprises a head device, a neck device and a control system, wherein the head device is arranged above the neck device;

the head device comprises a head structure, a mouth action mechanism and a voice module;

the neck device comprises a neck support and a neck action mechanism;

the control system takes a main control chip as a main part and is connected with the voice module and used for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism and the neck action mechanism to complete the actions of mouth action, neck pitching and turning of the anthropomorphic user.

The head structure comprises a face support plate, a mouth first support plate and a mouth second support plate; the face support plate is vertically arranged with the mouth first support plate and the mouth second support plate and is fixedly connected with the mouth first support plate and the mouth second support plate.

The mouth action mechanism comprises a mouth action control module, a stepping motor, a motor fixing support, a metal coupler, a mouth transmission support and a chin component, the stepping motor is fixedly connected with a second mouth supporting plate through the motor fixing support, an output shaft of the stepping motor is fixedly connected with an input end of the metal coupler, the other end of the metal coupler is connected with an input end of the mouth transmission support, and the tail end of the mouth transmission support is fixedly connected with the chin component.

PB0, PB1, PB2, PB3 pin of control system's main control chip respectively with mouth action control module's IN1, IN2, IN3, IN4 pin are connected, mouth action control module's positive negative pole and 5V's power are connected, mouth action control module's output OUT1, OUT2 respectively with step motor's positive negative pole input is connected, is used for control the motion of guide's robot mouth.

The voice module comprises a main control board, a microphone, a voice recognition module, a voice sounding module, a loudspeaker and a loudspeaker support, wherein the microphone is fixed on the face supporting plate and is connected with a single sound channel inlet of the voice recognition module, the loudspeaker is vertically arranged and fixedly connected with the first mouth supporting plate and the second mouth supporting plate through the loudspeaker support, the positive electrode and the negative electrode of the loudspeaker are connected with the output end positive electrode and the output end negative electrode of the voice sounding module, and the main control board is respectively connected with the voice recognition module, the voice sounding module, the mouth action control module and the neck action mechanism.

The main control chip of the control system is arranged on a main control board, pins PA4, PA5, PA6 and PA7 of the main control chip are respectively connected with pins MISO, MOSI, SCK and NSS of a voice recognition module to communicate according to an SPI protocol and transmit voice recognition information, and pins RST, WR and IRQ of the voice recognition module are respectively connected with pins PB12, PB13 and PB14 of the main control chip; MICP and MICN pins of the voice recognition module are used as input ends of the anode and the cathode of the microphone, wherein the MICP is a positive input end, and the MICN is a negative input end; SPOP and SPON of the voice recognition module are respectively connected with IN + and IN-of the voice production module, and OUT + and OUT-of the voice production module are respectively connected with the positive electrode and the negative electrode of the horn and used for outputting response voice of the robot.

The neck support comprises a neck metal support, a neck metal base and a head and neck connecting piece.

The neck action mechanism comprises a single-shaft metal steering engine, a first metal steering wheel, a double-shaft metal steering engine and a second metal steering wheel; the output shaft of unipolar metal steering wheel with first metal steering wheel is connected, the output of first metal steering wheel is fixed in neck metal support's draw-in groove, neck metal support lower extreme and biax metal steering wheel rigid coupling, the output shaft of biax metal steering wheel is perpendicular downwards, and with second metal steering wheel is connected, second metal steering wheel is fixed in neck metal base's draw-in groove.

A PB7 pin of the main control chip is connected with an OUT pin of the single-shaft metal steering engine; and a PB8 pin of the main control chip is respectively connected with an OUT pin of the double-shaft metal steering engine.

The second technical scheme adopted by the invention is as follows:

based on the control method realized by the head and neck device of the voice diagnosis guide robot for the factory and mine hospitals, the method comprises the following steps:

step S1, the microphone collects the voice information of the patient, converts the sound wave into a digital voice signal and transmits the digital voice signal into the voice recognition module;

step S2, the voice recognition module preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;

step S3, the voice recognition module extracts acoustic features of the preprocessed voice signals;

step S4, obtaining the probability that a certain section of voice information in the recognition network of the acoustic features belongs to a certain acoustic symbol;

step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module;

and step S6, the main control chip completes corresponding response voice and matches the mouth action and neck action of the voice according to the instruction corresponding to the text information recognized by the voice recognition module.

The improved preprocessing algorithm in step S2 specifically includes:

step S201, the digital voice signal is passed through a transfer function h (z) ═ 1-az^-1The high-pass digital filter emphasizes the high-frequency part, removes the influence of lip radiation and increases the high-frequency resolution of voice;

step S202, framing the digital voice signal according to the short-time stationarity of the voice signal;

step S203, performing windowing processing on the voice signal, emphasizing the voice waveform near the sampling n, weakening the waveform of the rest part, wherein the window length is 25ms, the window shift is 10ms, each frame has 410 sampling points, the low-frequency energy is adopted to replace the general short-time energy as the characteristic quantity, and the voice signal obtained by the voice signal to be detected through an FIR low-pass filter is x_h(i) By the formula

Judging low frequency energy, wherein

x (i) is a speech signal with detection, h_kFor FIR low-pass filter coefficients, l is the order of the filter, x_h(i) The method is characterized in that a filtered voice signal is subjected to low-frequency energy estimation through training to preset a threshold value of low-frequency energy, aiming at the characteristic that a noise environment is unstable and has sudden cusp noise in a hospital scene, median fuzzy processing is adopted to filter the sudden cusp noise after a voice signal is windowed, and the sudden cusp noise is filtered in a one-dimensional sequence f₁,f₂,...,f_nIn, if the window size is m, f_iThe output of the blurring process is y_i，y_iIs in the windowThe value of the median after permutation from large to small: y is_i＝Med{f_i-v,...,f_i,...,f_i+v}，i∈N,

And outputting the voice signal with the sharp points removed.

In step S3, the acoustic feature extraction adopts an MFCC speech feature extraction technique, which specifically includes:

step S301, performing Fast Fourier Transform (FFT) on the windowed signal,

k is more than or equal to 0 and less than or equal to N to obtain a frequency spectrum, wherein x (N) is an input voice signal, and N represents the number of points of Fourier transform;

step S302, converting the actual frequency f scale into Mel frequency Mel (f) scale with the formula of

And step S303, filtering the frequency spectrum coefficient obtained by conversion by using a sequence triangular filter, smoothing the frequency spectrum, eliminating the effect of harmonic waves, highlighting the formant of the original voice and reducing the calculation amount.

Step S304, calculating the logarithmic energy of each filter group with the formula of

0≤m≤M；

Step S305, performing Discrete Cosine Transform (DCT) on the energy value S (m) obtained in the previous step to obtain an MFCC coefficient:

1,2, L, where L is the MFCC coefficient order and M is the number of triangular filters.

The recognition network is built by adopting a Gaussian mixture model-hidden Markov model GMM-HMM in the step S4, the recognition network has the characteristics of high training speed, small model and easiness in transplantation, and the conditional probability obtained by the MFCC characteristic parameters is input into the S4 in the step S3 to obtain the probability of the phoneme or syllable corresponding to the voice frame.

The step S6, in which the responding voice is completed by the main control chip controlling the voice emitting module, includes the following steps:

step S61A, if the voice recognition module converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol in step S5, if the recognition: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochenan doctor profile', and the like, the main control chip transmits corresponding ASCII codes to the voice sounding module, and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);

step S61B, after the voice module receives the instruction corresponding to the main control chip, initializes a register set according to the designated sequence, clears the play start position to nMP3Pos equal to 0, writes MP3 data in the serial Flash into the FIF0 register (one byte at a time) nMp3Pos + +, modifies the BA and 17 registers, and opens the interrupt permission EX0 equal to 1.

The method for completing the mouth movement and the neck movement in step S6 in cooperation with the speech is as follows:

step S62A, the main control chip transmits the corresponding ASCII code to the voice production module and controls the mouth action mechanism and the neck action mechanism to act at the same time;

step S62B, initialization: starting a peripheral clock, configuring an initialization structure body and calling out a structure body initialization function;

step S62C, configuring the pulse width through the duty ratio, and configuring the desired rotation angle of the motor through the TIM _ SetCompare1() function;

and step S62D, setting a plurality of angle values according to the natural rules of the mouth movement and the neck movement of the human, so that the diagnosis guide robot has high natural anthropomorphic degree of movement.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a head and neck device of a voice guide robot for an industrial hospital and a control method, wherein an improved voice recognition method is adopted, aiming at the complex and changeable environment of the hospital voice environment, different from the general VAD algorithm based on short-time energy, an improved voice activation algorithm based on low-frequency energy is provided, the interference of high-frequency part energy is obviously reduced, and the voice information of a patient can be still accurately detected under the condition of low signal-to-noise ratio;

2. the head and neck device of the diagnosis guide robot is simple and elegant in mechanical structure, small in occupied space, powerful in function and practical, is applied to the field of hospital diagnosis guide, solves the problem that a factory and mine hospital is low in triage efficiency due to the fact that professional diagnosis guide personnel are lacked, solves the problem to a certain extent, improves the overall image of the hospital, enables the diagnosis guide personnel to speak and hold the image representing the hospital, and changes the view that people cannot see the first group of robots going into life;

3. the medical guide robot can realize basic medical guide problems of voice response, including problems of testing process, payment process, department position, admission process and the like, and can realize anthropomorphic natural mouth action and neck action while answering voice, so that a patient can get emotional joy, the medical process is convenient, and the rehabilitation of diseases is facilitated.

Drawings

FIG. 1 is a front view of the apparatus of the present invention;

FIG. 2 is a rear view of the apparatus of the present invention;

FIG. 3 is a partial view of the mouth action mechanism of the present invention;

FIG. 4 is a view of the neck assembly of the present invention;

FIG. 5 is a circuit diagram of a main control chip of the control system of the present invention;

FIG. 6 is a circuit diagram of a speech module of the present invention;

FIG. 7 is a circuit diagram of the mouth motion control module, the stepping motor, the double-shaft metal steering engine and the single-shaft metal steering engine according to the present invention;

FIG. 8 is an overall workflow diagram of the present invention;

FIG. 9 is a flow chart of speech recognition of the present invention;

FIG. 10 is a schematic diagram of the improved preprocessing algorithm of the present invention.

In the figure: the head device 1, the neck device 2, the control system 3, the head structure 1-1, the mouth action mechanism 1-2, the voice module 1-3, the face support plate 1-1A, the mouth first support plate 1-1B, the mouth second support plate 1-1C, the mouth action control module 1-2A, the stepping motor 1-2B, the motor fixing support 1-2C, the metal coupling 1-2D, the mouth transmission support 1-2E, the chin component 1-2F, the main control plate 1-3A, the microphone 1-3B, the voice recognition module 1-3C, the voice production module 1-3D, the loudspeaker 1-3E, the loudspeaker support 1-3F, the neck support 2-1, the neck action mechanism 2-2, the neck metal support 2-1A, the voice recognition module 1-3C, the voice production module 1-3D, the loudspeaker 1-3E, the loudspeaker support 1-3F, the neck support 2-1, the neck action mechanism 2-2, and the neck metal support 2-1A, The device comprises a neck metal base 2-1B, a head and neck connecting piece 2-1C, a single-shaft metal steering engine 2-2A, a first metal steering wheel 2-2B, a double-shaft metal steering engine 2-2C and a second metal steering wheel 2-2D.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

Detailed description of the invention

A head and neck device of a voice diagnosis guide robot for an industrial and mining hospital, as shown in fig. 1 and fig. 2, comprises a head device 1, a neck device 2 and a control system 3, wherein the head device 1 is arranged above the neck device 2;

the head device 1 comprises a head structure 1-1, a mouth action mechanism 1-2 and a voice module 1-3;

the neck device 2 comprises a neck support 2-1 and a neck action mechanism 2-2;

the control system 3 takes a main control chip as a main part and is connected with the voice module 1-3 for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism 1-2 and the neck action mechanism 2-2 to complete the mouth action and the neck pitching and turning actions of the anthropomorphic person.

Detailed description of the invention

As shown in fig. 1,2 and 3, in the first embodiment, the head structure 1-1 includes a face support plate 1-1A, a mouth first support plate 1-1B and a mouth second support plate 1-1C; the face support plate 1-1A is vertically arranged with the mouth first support plate 1-1B and the mouth second support plate 1-1C and is fixedly connected with the mouth first support plate 1-1B.

The mouth action mechanism 1-2 comprises a mouth action control module 1-2A, a stepping motor 1-2B, a motor fixing support 1-2C, a metal coupler 1-2D, a mouth transmission support 1-2E and a chin component 1-2F, the stepping motor 1-2B is fixedly connected with a second mouth support plate 1-1C through the motor fixing support 1-2C, an output shaft of the stepping motor 1-2B is fixedly connected with an input end of the metal coupler 1-2D, the other end of the metal coupler 1-2D is connected with an input end of the mouth transmission support 1-2E, and the tail end of the mouth transmission support 1-2E is fixedly connected with the chin component 1-2F.

The voice module 1-3 comprises a main control board 1-3A, a microphone 1-3B, a voice recognition module 1-3C, a voice sounding module 1-3D, a loudspeaker 1-3E and a loudspeaker support 1-3F, the microphone 1-3B is fixed on the face support board 1-1A and is connected with a single sound channel inlet of the voice recognition module 1-3C, the loudspeaker 1-3E is vertically arranged and fixedly connected with the first mouth support board 1-1B and the second mouth support board 1-1C through the loudspeaker support 1-3F, the positive and negative poles of the loudspeaker 1-3E are connected with the positive and negative poles of the output end of the voice sounding module 1-3D, and the main control board 1-3A is respectively connected with the voice recognition module 1-3C and the voice sounding module 1-3D, The mouth action control module 1-2A and the neck action mechanism 2-2 are connected in a control way.

In the embodiment, the microphone 1-3B is the prior art, and the manufacturer is a Guangdong Jiaxin microelectronic private store with the model of 52 DB.

In the embodiment, the voice recognition modules 1 to 3C are in the prior art, and the manufacturer is a Guangdong Jiaxin microelectronic private store with the model of LD 3320.

In the embodiment, the voice production modules 1-3D are in the prior art, and the manufacturer is Shenzhen dimension core technology Limited, and the model is PAM 8406.

In the embodiment, the loudspeakers 1 to 3E are in the prior art, and the manufacturer is Shenzhen Wei core science and technology Limited, and the model is spearer 5W 4 omega.

As shown in fig. 2 and 4, the neck support 2-1 comprises a neck metal support 2-1A, a neck metal base 2-1B and a head and neck connector 2-1C; the neck action mechanism 2-2 comprises a single-shaft metal steering engine 2-2A, a first metal rudder disk 2-2B, a double-shaft metal steering engine 2-2C and a second metal rudder disk 2-2D; the output shaft of the single-shaft metal steering engine 2-2A is connected with the first metal steering wheel 2-2B, the output end of the first metal steering wheel 2-2B is fixed in the clamping groove of the neck metal support 2-1A, the lower end of the neck metal support 2-1A is fixedly connected with the double-shaft metal steering engine 2-2C, the output shaft of the double-shaft metal steering engine 2-2C is vertically downward and connected with the second metal steering wheel 2-2D, and the second metal steering wheel 2-2D is fixed in the clamping groove of the neck metal base 2-1B.

As shown IN fig. 5, 6 and 7, pins PB0, PB1, PB2 and PB3 of the main control chip of the control system 3 are respectively connected to pins IN1, IN2, IN3 and IN4 of the nozzle operation control modules 1-2A, positive and negative poles of the nozzle operation control modules 1-2A are connected to a 5V power supply, and output terminals OUT1 and OUT2 of the nozzle operation control modules 1-2A are respectively connected to positive and negative pole input terminals of the stepping motors 1-2B, so as to control the nozzle operation of the lead robot; the main control chip of the control system 3 is arranged on the main control board 1-3A, pins PA4, PA5, PA6 and PA7 of the main control chip are respectively connected with pins MISO, MOSI, SCK and NSS of the voice recognition modules 1-3C to communicate according to the SPI protocol, so as to transmit voice recognition information, and pins RST, WR and IRQ of the voice recognition modules 1-3C are respectively connected with pins PB12, PB13 and PB14 of the main control chip; MICP and MICN pins of the voice recognition modules 1-3C are used as input ends of positive and negative electrodes of the microphones 1-3B, wherein the MICP is a positive electrode input end, and the MICN is a negative electrode input end; SPOP and SPON of the voice recognition modules 1-3C are respectively connected with IN + and IN-of the voice production modules 1-3D, OUT + and OUT-of the voice production modules 1-3D are respectively connected with the positive and negative electrodes of the loudspeakers 1-3E and used for outputting response voice of the robot; a PB7 pin of the main control chip is connected with an OUT pin of the single-shaft metal steering engine 2-2A; and a PB8 pin of the main control chip is respectively connected with an OUT pin of the double-shaft metal steering engine 2-2C.

The model of the main control chip is STM32F 407.

The working process is as follows:

the microphone 1-3B collects the voice information of the patient, the microphone 1-3B converts the sound wave into a digital voice signal and transmits the digital voice signal into a voice recognition module 1-3C, the digital voice signal is preprocessed, wherein the preprocessing comprises the steps of pre-emphasizing to increase the high-frequency resolution of the voice, windowing and framing the digital voice signal, adopting an improved preprocessing algorithm, specifically adopting low-frequency energy to replace the traditional short-time energy as characteristic quantity, carrying out end point detection on the digital voice signal, then adopting an MFCC voice characteristic extraction technology and a GMM-HMM model training recognition network method to match an acoustic model, converting the digital voice signal into segmented text information through a language model and a pronunciation dictionary to realize the voice recognition function, and transmitting the text information to a main control chip by the voice recognition module 1-3C, the main control chip determines whether to continue execution according to a preset recognition list, if a character command in the recognition list is matched, the main control chip controls the command corresponding to the voice to control the voice sounding module 1-3D to output response voice and output voice through the loudspeakers 1-3E, and meanwhile, the mouth action control module and the neck action mechanism complete mouth opening and closing actions and neck rotation pitching actions to complete a complete man-machine conversation process (voice interaction and head and neck action interaction).

Detailed description of the invention

As shown in fig. 8, 9 and 10, a control method implemented by a head and neck device of a voice diagnosis guide robot for an industrial and mining hospital includes the following steps:

step S1, the microphone 1-3B collects the voice information of the patient, converts the sound wave into a digital voice signal and transmits the digital voice signal into the voice recognition module 1-3C;

step S2, the voice recognition module 1-3C preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;

step S3, the voice recognition module 1-3C extracts acoustic features of the preprocessed voice signals;

step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module 1-3C;

and step S6, the main control chip completes corresponding response voice and mouth action and neck action matched with the voice according to the instruction corresponding to the text information recognized by the voice recognition modules 1-3C.

The improved preprocessing algorithm in step S2 specifically includes:

step S203, windowing the voice signal, emphasizing the voice waveform near the sampling n, weakening the waveform of the rest part, the window length is 25ms, the window shift is 10ms, each frame has 410 sampling points, adopting low-frequency energy to replace general short-time energy as characteristic quantity, and enabling the voice signal to be detected to pass through an FIR low-pass filter to obtain the voice signal x_h(i) By the formula

Judging low frequency energy, wherein

x (i) is a speech signal with detection, h_kFor FIR low-pass filter coefficients, l is the order of the filter, x_h(i) The method is characterized in that a filtered voice signal is subjected to low-frequency energy estimation through training to preset a threshold value of low-frequency energy, aiming at the characteristic that a noise environment is unstable and has sudden cusp noise in a hospital scene, median fuzzy processing is adopted to filter the sudden cusp noise after a voice signal is windowed, and the sudden cusp noise is filtered in a one-dimensional sequence f₁,f₂,...,f_nIn, if the window size is m, f_iThe output of the individual value fuzzy processing is y_i，y_iIs the value of the median after the arrangement from large to small within the window: y is_i＝Med{f_i-v,...,f_i,...,f_i+v}，i∈N,

And outputting the voice signal with the sharp points removed.

step S301, performing Fast Fourier Transform (FFT) on the windowed signal,

step S302, converting the actual frequency f scale into Mel frequency Mel (f) scale by the formula

0≤m≤M；

In the step S6, the responding voice is completed by controlling the voice emitting module 1-3D through the main control chip, and the specific steps are as follows:

step S61A, the voice recognition module 1-3C converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol as in step S5, and if the recognized voice information is matched with the text information: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochenan doctor profile', and the like, the main control chip transmits corresponding ASCII codes to the voice sounding modules 1-3D, and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);

step S61B, after receiving the instruction corresponding to the main control chip, the voice module 1-3D initializes a register set according to a designated sequence, clears nMP3Pos 0 at the start play position, writes MP3 data in the serial Flash into the FIF0 register (one byte at a time) nMp3Pos + +, modifies BA and 17 registers, and opens the interrupt permission EX0 to 1.

step S62A, the main control chip transmits corresponding ASCII codes to the voice production modules 1-3D and controls the mouth action mechanism and the neck action mechanism to act at the same time;

and step S62D, setting a plurality of angle values according to the natural rules of human mouth movement and neck movement, so that the diagnosis guide robot has high natural anthropomorphic degree.

Claims

1. A control method of a head and neck device of a voice diagnosis guiding robot for an industrial hospital is disclosed, wherein the head and neck device comprises a head device (1), a neck device (2) and a control system (3), and the head device (1) is arranged above the neck device (2);

the head device (1) comprises a head structure (1-1), a mouth action mechanism (1-2) and a voice module (1-3);

the neck device (2) comprises a neck support (2-1) and a neck action mechanism (2-2);

the control system (3) takes a main control chip as a main part and is connected with the voice modules (1-3) for recognizing the voice information of the patient and responding the inquiry information of the patient, and the main control chip is in control connection with the mouth action mechanism (1-2) and the neck action mechanism (2-2) to complete the mouth action and the neck pitching and turning actions of the anthropomorphic user;

the voice module (1-3) comprises a main control board (1-3A), a microphone (1-3B), a voice recognition module (1-3C), a voice production module (1-3D), a loudspeaker (1-3E) and a loudspeaker support (1-3F), the microphone (1-3B) is fixed on the face support board (1-1A) and is connected with a single sound channel inlet of the voice recognition module (1-3C), the loudspeaker (1-3E) is vertically arranged and fixedly connected with the first mouth support board (1-1B) and the second mouth support board (1-1C) through the loudspeaker support (1-3F), the positive and negative poles of the loudspeaker (1-3E) are connected with the positive and negative poles of the output end of the voice production module (1-3D), the main control board (1-3A) is respectively connected with the voice recognition module (1-3C), the voice sounding module (1-3D), the mouth action control module (1-2A) and the neck action mechanism (2-2) in a control way;

the method is characterized by comprising the following steps:

step S1, the microphone (1-3B) collects the voice information of the patient, converts the sound waves into digital voice signals and transmits the digital voice signals into the voice recognition module (1-3C);

step S2, the voice recognition module (1-3C) preprocesses the digital voice signal, and the accuracy of voice recognition in a complex noise environment is improved by adopting an improved preprocessing algorithm;

step S3, the voice recognition module (1-3C) extracts acoustic features of the preprocessed voice signals;

step S5, decoding the voice information passing through the acoustic model through a language model and a pronunciation dictionary, finding out a character string sequence with the maximum probability from the candidate character sequences, and finally transmitting the text result of voice recognition to a main control chip by the voice recognition module (1-3C);

step S6, the main control chip completes corresponding response voice and matches mouth action and neck action of the voice according to the instruction corresponding to the text information recognized by the voice recognition module (1-3C);

the improved preprocessing algorithm in step S2 specifically includes:

Judging low frequency energy, wherein

x (i) is a speech signal with detection, h_kFor FIR low-pass filter coefficients, l is the order of the filter, x_h(i) The method is characterized in that a filtered voice signal is obtained, the low-frequency energy of background noise is estimated through training to preset a threshold value of the low-frequency energy, and sudden sharp point noise exists in the noise environment under a hospital sceneThe characteristics of sound, filtering out sudden sharp point noise by median fuzzy processing after windowing the voice signal, and filtering out sudden sharp point noise in a one-dimensional sequence f₁,f₂,...,f_nIn, if the window size is m, f_iThe output of the median blurring process is y_i，y_iIs the value of the median after the arrangement from large to small within the window: y is_i＝Med{f_i-v,...,f_i,...,f_i+v}，i∈N,

Outputting the voice signal without the cusp;

step S301, performing Fast Fourier Transform (FFT) on the windowed signal,

Step S303, filtering the frequency spectrum coefficient obtained by conversion by using a sequence triangular filter, smoothing the frequency spectrum, eliminating the effect of harmonic waves, highlighting the formant of the original voice and reducing the operation amount;

0≤m≤M；

wherein L is the MFCC coefficient order, and M is the number of the triangular filters;

s4, the recognition network is built by adopting a Gaussian mixture model-hidden Markov model GMM-HMM, the recognition network has the characteristics of high training speed, small model and easiness in transplantation, and step S3 inputs the conditional probability obtained by the MFCC characteristic parameters into S4 to obtain the probability of the phoneme or syllable corresponding to the voice frame;

the step S6, in which the responding voice is completed by the main control chip controlling the voice emitting module (1-3D), includes the following steps:

step S61A, the voice recognition module (1-3C) converts the recognized voice information into text information and sends the text information to the main control chip through the SPI communication protocol as in step S5, and if the recognized voice information matches: "how to walk in emergency department? "," how to walk in respiratory medicine? And preset text information such as 'Caochou celery doctor introduction' and the like, the main control chip transmits corresponding ASCII codes to the voice production module (1-3D), and waits for an event that a sending buffer area is empty: while (SPI _ I2S _ getflag status (FLASH _ SPIx, SPI _ I2S _ FLAG _ TXE) ═ RESET), write the data register, write the data to be written into the transmission buffer: SPI _ I2S _ SendData (FLASH _ SPIx, byte);

step S61B, after the voice module (1-3D) receives the instruction corresponding to the main control chip, initializing a register set according to a designated sequence, resetting nMP3Pos to 0 at the start playing position, writing MP3 data in the serial Flash into a FIF0 register (one byte at a time) nMp3Pos + +, modifying the BA and 17 registers, and opening interrupt to allow EX0 to 1;

step S62A, the main control chip transmits corresponding ASCII codes to the voice production module (1-3D) and controls the mouth action mechanism and the neck action mechanism to act at the same time;