CN105185381A

CN105185381A - Intelligent robot-based voice identification system

Info

Publication number: CN105185381A
Application number: CN201510528605.2A
Authority: CN
Inventors: 王慧; 张义民; 吕皖丽; 王丙祥; 仝猛
Original assignee: JIANGSU JIUXIANG AUTOMOBILE APPLIANCE GROUP CO Ltd
Current assignee: JIANGSU JIUXIANG AUTOMOBILE APPLIANCE GROUP CO Ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2015-12-23

Abstract

The invention relates to an intelligent robot-based voice identification system comprising a remote database. The remote database consists of a first processor, a storage device, a high-resolution recording system and a communication system. The first processor carries out analysis on collected voice information, an analog-to-digital conversion circuit converted obtained analog audio information to segmented digital audio information, and the first processor carries out sequence marking on the segmented digital audio information. A robot analysis unit includes a comparison recorder, a second processor, and a display device; the comparison recorder is used for obtaining and recording audio information needed to be identified and the second processor and the display device are in signal connection with the comparison recorder; and an analog-to-digital conversion circuit is arranged in the comparison recorder in a built-in mode. According to the invention, an audio signal is calculated based on the hardware system to obtain a voice feature vector and filtering processing is carried out during the operation process, so that the influence on voice identification by external noises can be substantially reduced and accuracy of identification during the system operation can be improved.

Description

Intelligent robot sound recognition system

Technical field

The application relates to biological intelligence recognition technology field, more particularly, and particularly a kind of intelligent robot sound recognition system.

Background technology

Along with the development of the develop rapidly of infotech, particularly Internet, deepening continuously of data message.Increasing affairs, can be handled by intelligent robot, such as: at the intelligent robot of the deploying to ensure effective monitoring and control of illegal activities for intelligent entrance guard, intelligent video monitoring, public security of public safety field, customs's authentication, actual driving license checking etc.; In civil and economic field, all kinds of bank card, fiscard, credit card, the holder that saves card are carried out to the intelligent robot of authentication.In order to information security, usually need before transacting business by after checking personnel identity, intelligent robot could handle asked business for it.

Traditional auth method is according to the password pre-set or specific identify label thing, as: certificate, differentiate different user.There is obvious shortcoming in this method, as: the identify label thing of individual is easily lost or is forged, and password is easily forgotten or is decrypted.More seriously, these systems cannot be distinguished real owner and obtain the jactitator of identify label thing.In order to overcome the defect of traditional identity checking, differentiate the method for Different Individual and some physiology of feature and mankind itself and behavioural characteristic in conjunction with the mankind, as sound, face, fingerprint etc., wherein fingerprint be also easily stolen after cover die.

Wherein, differentiated the identity of personnel by sound, when receiving Speech input, ambient noise can affect the accuracy differentiated for sound.

Summary of the invention

(1) summary of the invention

A kind of intelligent robot sound recognition system, comprise remote data base, described remote data base includes first processor, reservoir, high-res recording system and communication system, described communication system is connected the transmitting-receiving for realizing numerical information with described first processor signal, described reservoir is for carrying out the storage of acoustic information, described first processor can carry out analyzing and processing to the acoustic information collected, the audio-frequency information of target sound is obtained by described high-res recording system, and by analog to digital conversion circuit, the analog audio information of acquisition is converted to the digitized audio message of segmentation, and by described first processor, the digitized audio message after segmentation is carried out sequence notation,

Described first processor is handled as follows the digitized audio message after mark: first carry out Fourier transform to each piece of digital audio-frequency information, then non-linear power-function arithmetic is being carried out to it, after operation result being done discrete cosine transform, obtaining standard voice characteristic parameter;

Described first processor is connected with described reservoir signal and is stored by the standard voice characteristic parameter obtained by described reservoir;

Comprise robot analytic unit, the described robot analytic unit audio-frequency information included for differentiating needs carries out obtaining the contrast phonographic recorder recorded, the second processor be connected with described contrast phonographic recorder signal and display, described contrast phonographic recorder is built-in with analog to digital conversion circuit, the wide cut segments word signal such as utilize this analog to digital conversion circuit the sound signal of analog quantity to be converted to after obtaining audio-frequency information by described contrast phonographic recorder, by described second processor, the wide cut segments word signal such as described is handled as follows: by etc. wide cut segments word signal carry out Fourier transform, then non-linear power-function arithmetic is being carried out to it, contrast sound characteristic parameter is obtained after operation result being done discrete cosine transform, contrast sound characteristic parameter is uploaded in described remote data base by described communication system by described second processor, to be compared process by described first processor, its comparison method is as follows:

By the initial position of digital signal, it is marked when splitting needing the sound signal of contrast, then, obtain first group of contrast sound characteristic parameter according to flag sequence to retrieve in described remote data base as initial retrieval word, when result for retrieval is greater than 1, again second group of contrast sound characteristic parameter is retrieved at this in initial retrieval result as quadratic search word, by that analogy until when result for retrieval is 1, be and contrast successfully, when result for retrieval is 0, be and do not exist, when result for retrieval is greater than 1, namely reports to the police and make mistakes.

Preferably, be also provided with wave filter in described remote data base, described wave filter is arranged between described first processor and described high-res recording system for carrying out filtering process to sound signal;

Described filter-incorporatedly have filter, and the filtering computing formula of described filter is:

filter(t)＝B ⁿt ^n-1e ^-2πBtcos(2πf ₀t+θ)u(t)；

Wherein:

θ is the initial phase of wave filter, and n is the exponent number of wave filter.

Preferably, described communication system is that the internet realized by router is connected;

Or,

Described communication system is the mobile data network realized by SIM card card reader.

(2) beneficial effect

Intelligent robot sound recognition system provided by the invention, based on above-mentioned hardware system structure, gathered audio-frequency information is carried out to the calculating of sound characteristic vector, there is filtering process in calculating process, the impact of outside noise on voice recognition can be reduced dramatically, the accuracy rate identified when improve system cloud gray model.Further, the present invention also has structural system and forms succinct, is easy to the advantages such as installation, maintenance.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the application, for those of ordinary skill in the art, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the process flow diagram for sound identification method in intelligent robot sound recognition system in the embodiment of the present invention;

Fig. 2 is the process flow diagram setting up voice recognition data storehouse in the embodiment of the present invention.

Embodiment

In order to solve the problem that the intelligent robot that exists in prior art makes voice recognition accuracy rate decline due to environmental noise, this application discloses a kind of sound identification method be applied in intelligent robot system.

Please refer to Fig. 1 and Fig. 2, wherein, Fig. 1 is the process flow diagram for sound identification method in intelligent robot sound recognition system in the embodiment of the present invention; Fig. 2 is the process flow diagram setting up voice recognition data storehouse in the embodiment of the present invention.

The invention provides a kind of intelligent robot sound recognition system, comprise remote data base, remote data base includes first processor, reservoir, high-res recording system and communication system, communication system is connected the transmitting-receiving for realizing numerical information with first processor signal, reservoir is for carrying out the storage of acoustic information, first processor can carry out analyzing and processing to the acoustic information collected, the audio-frequency information of target sound is obtained by high-res recording system, and by analog to digital conversion circuit, the analog audio information of acquisition is converted to the digitized audio message of segmentation, and by first processor, the digitized audio message after segmentation is carried out sequence notation, first processor is handled as follows the digitized audio message after mark: first carry out Fourier transform to each piece of digital audio-frequency information, then non-linear power-function arithmetic is being carried out to it, after operation result being done discrete cosine transform, obtaining standard voice characteristic parameter, first processor is connected with reservoir signal and is stored by the standard voice characteristic parameter obtained by reservoir, comprise robot analytic unit, the robot analytic unit audio-frequency information included for differentiating needs carries out obtaining the contrast phonographic recorder recorded, the second processor be connected with contrast phonographic recorder signal and display, contrast phonographic recorder is built-in with analog to digital conversion circuit, wide cut segments word signals such as utilizing after audio-frequency information this analog to digital conversion circuit the sound signal of analog quantity to be converted to is obtained by contrast phonographic recorder, be handled as follows by the second processor equity wide cut segmentation digital signal: by etc. wide cut segments word signal carry out Fourier transform, then non-linear power-function arithmetic is being carried out to it, contrast sound characteristic parameter is obtained after operation result being done discrete cosine transform, contrast sound characteristic parameter is uploaded in remote data base by communication system by the second processor, to be compared process by first processor, its comparison method is as follows: mark it by the initial position of digital signal when splitting needing the sound signal of contrast, then, obtain first group of contrast sound characteristic parameter according to flag sequence to retrieve in remote data base as initial retrieval word, when result for retrieval is greater than 1, again second group of contrast sound characteristic parameter is retrieved at this in initial retrieval result as quadratic search word, by that analogy until when result for retrieval is 1, be and contrast successfully, when result for retrieval is 0, be and do not exist, when result for retrieval is greater than 1, namely report to the police and make mistakes.

In said structure design, innovative point of the present invention is: by the digital signal of the wide cuts such as the sound signal of collection is separated into, and obtains characteristic parameter to carrying out formulae discovery to it piecemeal after the digital signal mark after segmentation.Then, sent the characteristic parameter calculated by robot cell to remote data base, by first processor, characteristic parameter is retrieved.By sound signal dividing processing, the object of so design is: when carrying out data retrieval, can retrieve, so, can reduce the fussy degree of retrieval, improve retrieval rate with the characteristic parameter corresponding to " segment signal " of segmentation.

Its method for voice recognition of intelligent robot sound recognition system provided by the invention specifically comprises the following steps:

S11: set up voice recognition data storehouse.

This voice recognition data storehouse can be arranged in intelligent robot, or also can be arranged on intelligent robot outside, is accessed the voice recognition data wherein stored by intelligent robot by network.Data can be transmitted by wired or wireless network between intelligent robot and voice recognition data storehouse.

Reducing the problem of accuracy rate in order to avoid causing sound characteristic to change due to environmental factor, setting up voice recognition data storehouse and comprising the following steps:

Gather the reliable sound of all personnel;

As required, gather n the reliable acoustic information of the personnel that all needs are differentiated, n is positive integer.Such as: if differentiated for the personnel identity of a company, this needs all personnel's acoustic information gathering the said firm.

In order to ensure the accuracy rate of voice recognition, all personnel's voice messaging in varied situations constantly can be increased, to improve discrimination.Such as: the acoustic information that annual Resurvey is once new.

The corresponding sound signal x (i) of acoustic information for the personnel i collected is stored to raw tone memory block in voice recognition data storehouse as standard voice characteristic parameter, wherein i=1 ..., n (i is positive integer).

S12: the feature extraction of sound:

Following process is carried out for each sound signal x (i) stored in raw tone memory block:

(1) sound signal x (i) is divided into a series of continuous print frame, Fourier transform is done to every frame signal.

(2) wave filter is used to process sound signal, to reduce the mutual leakage of spectrum energy between nearby frequency bands; The filter function used in wave filter is:

Filter (t)=B ⁿt ^n-1e ^{-2 π Bt}cos (2 π f ₀t+ θ) u (t)) formula (1)

Wherein:

Parameter θ is the initial phase of wave filter, and n is the exponent number of wave filter;

As t < 0, u (t)=0, as t > 0, u (t)=1;

B=1.019*ERB (f ₀), ERB (f ₀) be the Equivalent Rectangular Bandwidth of wave filter, it is with filter centre frequency f ₀pass be:

ERB (f ₀)=24.7+0.108f ₀formula (2).

(3) the middle deviation of sound signal is removed.

After sound signal framing, the frame of some is formed a segmentation, preferably 7 frames are formed a segmentation in the present invention, this can be arranged according to the processing power of system.

The frame length that most of sound recognition system uses is 20ms-30ms, preferably use 26.5ms as Hamming window in the present invention, overlapping frame length is 10ms, the intermediate quantity Q (i of every frame, j) obtained by the mean value of frame energy P (i, j) in compute segment:

Q (i, j) = \frac{1}{2 M + 1} Σ_{j^{'} = j - M}^{j + M} P (i, j^{'})

Formula (3)

In formula (3) due to the present invention preferably 7 frames form segmentation, thus a M=3.I is channel number, and j is the sequence of required frame, and j ' is the sequence of frame each in required segmentation.

In noise energy removal process, use the ratio (AM/GM) of arithmetic mean and geometrical mean can represent the degree that voice signal is corroded, obtain after logarithm is asked to above-mentioned ratio:

G (i) = \log [Σ_{j = 0}^{J - 1} \max (Q (i, j), z)] - \frac{1}{J} Σ_{j = 0}^{J = 1} \max (Q (i, j), z)

Formula (4)

In formula (4), z is floor coefficient, in order to avoid negative infinitesimal valuation to ensure that the deviation of result of calculation is in allowed band; J is the sequence sum of frame.

Suppose that B (i) is the deviation caused by ground unrest, i represents channel sequence, is obtained by that thing of conditional probability, removes the intermediate quantity Q ' after deviation (i, j|B (i)) to be:

Q ' (i, j|B (i))=max (Q (i, j)-B (i), 10 ^-3q (i, j)) formula (5)

Can obtain:

G^{'} (i, B (i)) = \log [Σ_{j = 0}^{J - 1} \max (Q (i, j | B (i)), z)] - \frac{1}{J} Σ_{j = 0}^{J = 1} \max (Q (i, j), z)

Formula (6)

For formula (6), when AM/GM value closest to acoustic signal of the ratio of AM/GM under noise situations, can be in the hope of the estimated value of B (i):

B ' (i)=min{B (i) | G ' (i|B (i))>=G _c(i) } formula (7)

Wherein, G _ci () represents G (i) respective value in acoustic signal, obtain after each channel computing formula (7), for each time-frequency BIN signal (i, j), the ratio of noise removal is:

w (i, j) = \frac{Q^{'} (i, j | B^{'} (i))}{Q (i, j)}

Formula (8)

In order to smoothing computation, average to the noise removal ratio of channel i-N to i+N, after adjustment, final function is:

P^{'} (i, j) = (\frac{1}{2 N + 1} Σ_{i^{'} = \max (i - N, 1)}^{\min (i + N, j)} w (i^{'}, j)) P (i, j)

Formula (10)

Use formula (10) to process sound signals all in wave filter, remove the output as wave filter after middle deviation.

(4) audio signal data exported all wave filters does non-linear power-function arithmetic, and the power function used is:

Y=X ⁰¹formula (11).

(5) sound characteristic parameter is obtained after discrete cosine transform being done further to the output of (4) step.

Because discrete cosine transform (DCT) is the known processing mode in speech processes field, do not repeat them here.

S13: obtained sound characteristic parameter and personnel identity information are stored to voice recognition data storehouse explicitly.

Except obtaining the sound characteristic information of collection, everyone other information can also be associated with the sound characteristic information of people, to help to confirm identity authentication further, other information can include but not limited to: fingerprint, iris information etc.

S2: input sound to be identified.

By the sound of microphone collector that intelligent robot is arranged.Also other voice capture device can be adopted.

The method calculating the proper vector of input sound to be identified is as follows:

Following process is carried out for each sound signal x (i) of sound import:

Filter (t)=B ⁿt ^n-1e ^{-2 π Bt}cos (2 π f ₀t+ θ) u (t)) formula (1)

Wherein:

As t < 0, u (t)=0, as t > 0, u (t)=1;

ERB (f ₀)=24.7+0.108f ₀formula (2).

(3) the middle deviation of sound signal is removed.

Q (i, j) = \frac{1}{2 M + 1} Σ_{j^{'} = j - M}^{j + M} P (i, j^{'})

Formula (3)

G (i) = \log [Σ_{j = 0}^{J - 1} \max (Q (i, j), z)] - \frac{1}{J} Σ_{j = 0}^{J = 1} \max (Q (i, j), z)

Formula (4)

Q ' (i, j|B (i))=max (Q (i, j)-B (i), 10 ^-3q (i, j)) formula (5)

Can obtain:

G^{'} (i, B (i)) = \log [Σ_{j = 0}^{J - 1} \max (Q (i, j | B (i)), z)] - \frac{1}{J} Σ_{j = 0}^{J = 1} \max (Q (i, j), z)

Formula (6)

B ' (i)=min{B (i) | G ' (i|B (i))>=G _c(i) } formula (7)

w (i, j) = \frac{Q^{'} (i, j | B^{'} (i))}{Q (i, j)}

Formula (8)

P^{'} (i, j) = (\frac{1}{2 N + 1} Σ_{i^{'} = \max (i - N, 1)}^{\min (i + N, j)} w (i^{'}, j)) P (i, j)

Formula (10)

(4) data exported all wave filters do non-linear power-function arithmetic, and the power function used is:

Y=X ⁰¹formula (11).

S3: carry out identity verify.

By sound characteristic to be verified input speech recognition database, in speech recognition database, search the sound characteristic information whether having coupling.If find corresponding sound characteristic information, then return the identity information that sound characteristic is corresponding; If do not find corresponding sound characteristic information, then the information that returns that it fails to match.Export identity verify result.

The identity verify result of carrying out in aforesaid operations can be exported to the CPU (central processing unit) of intelligent robot, intelligent robot can carry out respective handling according to identification result, differentiate personnel identity information if find, then intelligent robot continue process by the personnel of discriminating the business of asking; Differentiate personnel identity information if do not find, then export the information of identity verify failure.

In said structure design, processed by the acoustic information of processor to phonographic recorder collection, its idiographic flow is: the acoustic information (analog quantity) collected by phonographic recorder forms digital signal through analog to digital conversion circuit and sends to processor; Afterwards, processed the acoustic information gathered by processor, its concrete process operation please refer to above-mentioned audio-frequency processing method, will repeat no more at this; Meanwhile, by the whole standard vectors in processor called data storehouse; Finally, processor is compared, and is shown on display by comparison result.

In the present invention, processor, phonographic recorder and display are arranged on same circuit board by welding circuit board technique is integrated, simplify physical dimension of the present invention thus.Database specifically includes two kinds: local data base and remote data base, in employing local data base structural design, be connected by data line between database with processor, in employing remote data base structural design, the device such as WIFI assembly, SIM card reading assembly can be set, utilize the communication connection that wireless network realizes between processor and database.

For the ease of use operation of the present invention, in the present embodiment, be touch LCD screen by described display design, realized the control of whole device by contact action, it is simple to operate, easy.

Certainly, the arbitrary technical scheme implementing the application must not necessarily need to reach above all advantages simultaneously.

It will be understood by those skilled in the art that the embodiment of the application can be provided as method, device (equipment) or computer program.Therefore, the application can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the application can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.

The application describes with reference to according to the process flow diagram of the method for the embodiment of the present application, device (equipment) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

Claims

1. an intelligent robot sound recognition system, is characterized in that,

Comprise remote data base, described remote data base includes first processor, reservoir, high-res recording system and communication system, described communication system is connected the transmitting-receiving for realizing numerical information with described first processor signal, described reservoir is for carrying out the storage of acoustic information, described first processor can carry out analyzing and processing to the acoustic information collected, the audio-frequency information of target sound is obtained by described high-res recording system, and by analog to digital conversion circuit, the analog audio information of acquisition is converted to the digitized audio message of segmentation, and by described first processor, the digitized audio message after segmentation is carried out sequence notation,

2. intelligent robot sound recognition system according to claim 1, is characterized in that,

Also be provided with wave filter in described remote data base, described wave filter is arranged between described first processor and described high-res recording system for carrying out filtering process to sound signal;

filter(t)＝B ⁿt ^n-1e ^-2πBtcos(2πf ₀t+θ)u(t)；

Wherein:

3. intelligent robot sound recognition system according to claim 1, is characterized in that,

Described communication system is that the internet realized by router is connected;

Or,