Embodiment
In order to there be understanding clearly to technical characteristic of the present invention, object and effect, now contrast accompanying drawing and describe the specific embodiment of the present invention in detail.
In describing the invention, it will be appreciated that, term " on ", D score, "front", "rear", "left", "right", " top ", " end ", " interior ", the orientation of the instruction such as " outward " or position relationship be based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore can not be interpreted as limitation of the present invention.In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise one or more these features.In describing the invention, except as otherwise noted, the implication of " multiple ", " several " is all two or more.
In describing the invention, it should be noted that, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or connect integratedly; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals.For the ordinary skill in the art, concrete condition above-mentioned term concrete meaning in the present invention can be understood.
Fig. 1 shows the used in nuclear power station identity identifying method based on Audio and Video in the embodiment of the present invention one, the method is used for carrying out authentication to the instruction issuing person of instruction issuing end, optionally the instruction that instruction issuing person sends is sent to command reception end according to the corresponding authority of instruction issuing person, carry out corresponding inspection operation to make the field service personnel of command reception end according to the instruction received.
The used in nuclear power station identity identifying method based on Audio and Video in the embodiment of the present invention one comprises step S1, step S2, step S3 and step S4.
Wherein, step S1 is: instruction issuing end receives instruction.
In above-mentioned steps S1, because the personnel being positioned at instruction issuing end are a lot, be not all the authorised operator with operating right.Like this, the instruction issuing person sending instruction may be the authorised operator with operating right, also may be other personnel without instruction issuing authority.Further, if the instruction issuing person sending instruction is the authorised operator with operating right, then its instruction issuing authority had limits within the specific limits.
Step S2 is: carry out acquiring video information at instruction issuing end, carries out identity video identification according to video information, and obtains the video identification data comprising identity ID.
Wherein, in above-mentioned steps S2, comprise step S2-1 and S2-2.
S2-1: carry out Video Capture to obtain video information at instruction issuing end.
As selection, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture can have numerous embodiments.Preferably, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture can for face-image with the video information got; In above-mentioned steps S2-1, carrying out Video Capture at instruction issuing end also can for pupil image with the video information got; In above-mentioned steps S2-1, instruction issuing end can also for iris image with the video information got to carrying out Video Capture.As selection, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture can also for the image of other types with the video information got, as long as undertaken catching by video mode.
As selection, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture with the video information got for face-image.Namely first catch face-image, then obtain the video information comprising face-image.Meanwhile, the video information in video information data base comprises all face-images with the authorised operator of operating right.Understandably, in the present embodiment, when face-image is caught, facial recognition techniques is employed.
As one of visual pattern most important in image and video, the face of people comprises a lot of information, by face recognition, and the part basic document knowing a people that we can be very fast, as sex, expression, the essential information such as age and identity.As all used face recognizing technology in public security, finance, network security, estate management and work attendance etc. in actual life, therefore face recognizing technology has researching value and the commercial value of very high science.
Face recognizing technology, as a very important branch in living things feature recognition, has become a research field very active in computer vision and area of pattern recognition.Face based on video detects a difficult point of academia especially.Digital video is digital picture expansion on a timeline, each frame of video can be considered as a secondary static image, first can process video, and then carries out face detection to image.In the nineties later stage, face recognizing technology is shown one's talent along with the raising at full speed of computer disposal speed and the revolutionary character improvement of algorithm for pattern recognition.Face recognizing technology is with the convenience of its uniqueness, economy, be accurately subject to attracting attention of common people.
Face recognizing technology comprises image surface detection, image surface is followed the tracks of with image surface than reciprocity aspect.Wherein, image surface detects and refers in the background of dynamic scene with complexity, judges whether to there is image surface and isolates image surface.Image surface is followed the tracks of and is referred to carry out dynamic target tracking to the image surface be detected.Image surface comparison is then carry out identity validation to the image surface be detected or carry out target search in image surface storehouse.
Particularly, image surface detects and is divided into the methods such as reference template, face rule, sample learning, complexion model and the sub-face of feature.Wherein, first reference template method designs one or several standard faces template, then calculates the matching degree between test sample book and standard form, judges whether to there is face by computing machine comparison.Because face has certain structure distribution feature, namely face rule is extracted these features and is generated corresponding rule to judge whether that test sample book comprises face.Sample learning then adopts Artificial Neural Network in pattern-recognition, produces sorter by opposite as sample set and non-face as the study of sample set.Complexion model detects according to distribute in the color space rule of Relatively centralized of the image surface colour of skin.All image surface set are considered as an image surface subspace by the sub-face of feature, based on detecting sample and whether its Distance Judgment between the projection of subspace exists image surface.
Said method also can comprehensively adopt in systems in practice.Image surface follows the tracks of general employing based on the method for model or based on the method combined with model of moving, and in addition, complexion model follows the tracks of a kind of simple and effective means of also can yet be regarded as.
Image surface comparison is the comparison successively of sampling image surface and stock's image surface in essence and finds out optimum matching object.Therefore, the description of image surface determines concrete grammar and the performance of face recognizing.Mainly contain proper vector and face line template two kinds of describing methods at present.
Feature vector method first determines size, position, distance, angle etc. the attribute of the image surface face profiles such as an iris, the wing of nose, the corners of the mouth, then calculates their geometric feature, and these characteristic quantities form the proper vector that describes this image surface.
And face line template rule stores some index planes as template or image surface organ template in storehouse, when comparison, sampling image surface all pixels adopt normalization correlative to measure with templates all in storehouse to mate.In addition, the method also having the auto-correlation network of pattern-recognition or feature to be combined with template.
Face recognizing system then adopts " Local Features Analysis " (Local Feature Analysis, LFA) algorithm, this algorithm speed is fast, misidentification is low, without the need to study, utilize the data such as orientation, ratio, corresponding geometric relationship of each organ of face and characteristic portion to form identification parameter, compare with initial parameters all in database, judge, confirm.The bare bones of any one face recognizing system how image surface to be encoded.Face recognizing technology uses Local Features Analysis LFA to describe face image, and he comes from similar principle of building the partial statistics of building blocks.LFA is a kind of computing method based on the following fact, namely all image surfaces (comprising the style of various complexity) comprehensively can form from by the structural unit subset much can not streamlined any further. and these unit employ complicated statistical technique and are formed, which represent whole image surface. they usually cross over multiple pixel (in regional area) and represent general face shape, but are not the facial characteristics on ordinary meaning.In fact the position of face structure unit specific surface picture is much more.But, comprehensively to form true to nature, an accurate image surface, only need subset of cells (12-40 feature unit) little in whole usable set, determine one's identity and not only depend on the unit of characteristic, be also decided by their geometry (such as their relevant position).In this way, the characteristic of individual correspondence is become a kind of Chinese real number expression of complexity by LFA, can carry out contrasting and identifying.Such as: the workflow of Tickets Systems is: first, automatically in video data stream, search for face image; When there is the head portrait of user for one, polytype matching algorithm is automatically used to judge whether really there is a face in that position.These algorithms accurately can detect multiple faces simultaneously occurred, and can determine their accurate location.Once detect a face, the image of this face will be separated by from background, and this images will recover its size, light, expression and attitude subsequently through a series of special processing.Then by this width face image at internal system conversion surface line, it contains the peculiar information of this face; Again by existing " face line " in " the face line " of Real-time Obtaining and database is compared; Complete and certain face is confirmed." face line " coded system carrys out work according to the essential characteristic of face and shape, it can resist the change of light, skin color, facial hair, hair style, glasses, expression and attitude, there is powerful reliability, make it can accurately recognize a people from million people.Above-mentioned whole process completes all automatically, continuously, in real time.And system only needs common treatment facility.
The characteristic of face recognizing technology is that other often plant the interoperation that biometric discrimination method all needs some, and face recognizing does not need passive cooperation, can automatically be used in hidden occasion, as public security department's monitoring action; When the biological recorder of the people that record one attempt logs in, only have image surface more intuitively, the identity of this people can be verified more easily.
The use step of facial recognition techniques:
1) set up image surface archives: from camera collection image surface file or photo files can be got, generate face line (Faceprint) coding and proper vector;
2) obtain current image surface, from camera capture surface picture or photo input can be got, generate its face line;
3) the face line of current image surface coding is encoded with the face line in archives carry out retrieval comparison;
4) confirming face is selected as identity or proposition identity.Whole process completes all automatically, continuously, in real time.And system only needs common treatment facility.
Known by the above explanation to facial recognition techniques, facial recognition techniques has numerous advantages such as more easy, accurate, economic and extensibility is good, can be widely used in the various aspects such as safety verification, monitoring, control.
As selection, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture with the video information got for pupil image.Namely first catch pupil image, then obtain the video information comprising pupil image.Meanwhile, the video information in video information data base comprises all pupil image with the authorised operator of operating right.Understandably, in the present embodiment, when pupil image is caught, pupil recognition technology is employed.
Pupil, as the important component part in the eyeball of human body, has prospect extremely widely current to the identification of pupil: by detecting pupil, can realize, to the detection of the object that people watches attentively, realizing man-machine interaction.By can assist the judgement realized the state of mind of driver to the oculopupillary detection of people, by the detection to pupil, auxiliary detection can be played at medical instruments field, be convenient to the effect that doctor carries out curative activity.
Pupil recognition technology is the important problem of computer vision field one, relates to multiple subjects such as physiology, artificial intelligence, pattern-recognition, computer vision, computer assisted image processing.
Pupil recognition technology is along with persona face detection at first, and the development of the human face analysis technology such as Expression analysis gets up.Propose recognition of face (face recognition) as far back as the 60 to 70's of eighties of last century to require to detect face, just along with the proposition of pupil recognition technology.
Pupil recognition technology relates to multiple ambits such as psychology, physiology, artificial intelligence, pattern-recognition, computer vision, computer assisted image processing, the comprehensive problem of the combination of pattern-recognition especially, image procossing and computer vision.To further investigation and the thoroughly solution of this problem, contribute to researching and analysing and solving other object automatic recognition problem.
Another major reason that pupil recognition technology is taken seriously is that it has huge potential using value in fields such as health care, traffic safety, public safety guarantee, military affairs, criminal investigations.At this, briefly for the example of several practical application in life:
1, in medical, by detection and tracking, we can monitor that the pupil situation of a people can judge the mental status of a people, such as whether spirit is normal;
2, traffic safety, we can carry out detection and tracking to the pupil of driver, judge whether it is fatigue driving, the generation that not only can avoid traffic accident according to the active situation of its pupil, and unnecessary human and financial resources can be saved, indirectly increase economic efficiency;
3, carry out monitoring to its pupil in interrogation convict process and follow the tracks of, judging its psychological activity situation, grasp the initiative of interrogation, the authenticity of its confession can also be judged according to its pupil active situation;
4, in human-machine interface technology of future generation, also there will be by the pupil that blinks to substitute the input mode of clicking the mouse at present.
The main process of pupil recognition technology can be divided into three parts: Face datection, pupil detection, pupil tracking.Camera takes the importation of facial image as whole pupil detection and tracker in real time.Gather video image with image pick-up card, obtain one group of image sequence.First skin detection is carried out to determine whether face and to judge the approximate region of face to each two field picture collected, in human face region, then detects pupil position to go forward side by side line trace.In the pupil tracking stage, all to confirm whether correctly to follow the tracks of to each two field picture, if tracking error exceedes the threshold value of regulation, need again to detect face, determine pupil position, then carry out tracking work.
As selection, in above-mentioned steps S2-1, instruction issuing end carries out Video Capture with the video information got for iris image.Namely first catch iris image, then obtain the video information comprising iris image.Meanwhile, the video information in video information data base comprises all iris images with the authorised operator of operating right.Understandably, in the present embodiment, when iris image is caught, iris recognition technology is employed.
The outside drawing of eye is made up of sclera, iris, pupil three part.The white portion of sclera and eyeball periphery, accounts for 30% of the total area; Eye center is pupil portion, accounts for 5%; Iris, between sclera and pupil, contains the abundantest texture information, occupies 65%.Seeing in appearance, be made up of many gland nests, gauffer, pigmented spots etc., is one of structure unique in human body.The formation of iris is determined by gene, and human body gene expresses the form, physiology, color and the total outward appearance that determine iris.People grows by about eight months, and iris is just substantially grown and arrived sufficient size, enters metastable period.Unless the anomalous condition of seldom seeing, health or wound large mentally just may cause outside the change on appearance of iris, iris pattern can keep the few of change of many decades.On the other hand, iris is outside visible, but belongs to interior tissue again simultaneously, is positioned at facies posterior corneae.Change appearance of iris, need very meticulous operation, and VI danger will be emitted.The height of iris is unique, stability and not modifiable feature, is the material base that iris can be used as identity verify.
Iris has many innate advantages as identify label:
1) uniqueness, because iris image also exists the minutia of many stochastic distribution, creates the uniqueness of iris patterns.The iris phase characteristic that univ cambridge uk John professor Daugman proposes confirms that film image has 244 independently degree of freedom, and namely on average the quantity of information of every square millimeter is 3.2 bits.In fact extracting characteristics of image with mode identification method is lossy compression method process, can predict that the information capacity of iris texture is much larger than this.And iris minutia is mainly determined by the enchancement factor of embryonic development environment, even if also there is significant difference between the iris image of human cloning, twins, same people's right and left eyes.
The uniqueness of iris is that high-precision identification is laid a good foundation.The test result of United Kingdom National physics laboratory shows: iris recognition is that in various biological feather recognition method, error rate is minimum.
2) stability, iris germinates brephic 3rd month from baby, and the main texture structure to 8th month iris is shaped.Unless experience jeopardizes the operation of eyes, after this almost constant throughout one's life.Due to the protective effect of cornea, full grown iris is not vulnerable to extraneous injury.
3) noncontact, iris is a visible internal in outside, and need not be close to harvester and just can obtain qualified iris image, recognition method needs relative to fingerprint, hand shape etc. the biological characteristic clean hygiene more contacting perception, can not stained imaging device, affect other people identification.
4) be convenient to signal transacting, region contiguous with iris in eye image is pupil and sclera, and they and iris region also exist obvious gray scale rank and become, and zone boundary is all close to circle, so iris region is easy to matching segmentation and normalization.Iris structure is conducive to realizing a kind of pattern expression way with translation scaling and rotation invariant.
5) antifalsification is good, the radius of iris is little, Chinese iris image presents dark brown under visible light, can't see texture information, the Image Acquisition with clear iris texture needs special iris image acquiring device and the cooperation of user, so be difficult to the iris image stealing other people in the ordinary course of things.In addition eyes have a lot of optics and physiological property and can be used for living body iris and detect.
In all biological identification technologies comprising fingerprint, iris recognition is that current application is the most convenient and accurate a kind of.Iris recognition technology is widely regarded as the rising biometrics of 21st century most, the application of following multiple fields such as security protection, national defence, ecommerce, and also inevitable meeting is attached most importance to iris recognition technology.This trend starts gradually to display in the various application of all parts of the world, and market application foreground is boundless.
Iris recognition determines the identity of people by the similarity between contrast iris image feature, and its core is that the iris features of method to eye such as using forestland identification, image procossing are described and mate, thus realizes automatic personal identification.
In general the process of iris recognition technology is divided into: iris image acquisition, Image semantic classification, feature extraction and characteristic matching four steps.
1) iris image acquisition
Iris image acquisition refers to and uses the whole eye of specific digital camera equipment to people to take, and is transferred in computing machine by image pick-up card by the image photographed and stores.The acquisition of iris image is the first step in iris recognition, is also more difficult step simultaneously, needs the integrated application of light, mechanical, electrical technology.Because the area of people's eyes is little, if the image resolution requirement meeting recognizer just must improve the enlargement factor of optical system, thus cause the depth of field of iris imaging less, so existing iris authentication system needs user to be parked in correct position, eye gaze camera lens (Stop and Stare) simultaneously.The iris color of Asians is comparatively dark in addition, cannot collect discernible iris image with common camera.Be different from the Image Acquisition of the biological characteristics such as face picture, gait, the acquisition of iris image needs optical system reasonable in design, the light source that configuration is necessary and electronic control unit.Because the technical threshold of iris image acquisition device independent research is high, limit carrying out of domestic iris recognition research.Institute of Automation Research of CAS developed the iris image gathering system of domestic first set independent intellectual property right in 1999, be characterized in small and exquisite, flexible, low cost, clear picture.Through constantly updating, the iris imager of Institute of Automation recent development can collect qualified iris image at 20 ~ 30cm distance range by technology such as voice message, active vision feedbacks.
2) Image semantic classification
Image semantic classification refers to that the eyes image owing to photographing includes much unnecessary information, and can not meet the demands in sharpness etc., needs to comprise the pretreatment operation such as image smoothing, rim detection, separation of images to it.Iris preprocessing process generally includes Iris Location, iris image normalization, image enhaucament three parts.
A. Iris Location
It is generally acknowledged, the inner and outer boundary of iris can carry out matching with circle approx.Inner circle represents the border of iris and pupil, and cylindrical represents the border of iris and sclera, but these two circles are not concentric circles.Usually, iris near upper lower eyelid part always block by eyelid, therefore also must detect the border of iris and upper hypodermis, thus determine the effective coverage of iris exactly.The border of iris and upper lower eyelid can represent with quafric curve.The object of Iris Location is exactly determine these circle and quafric curve positions in the picture.Conventional localization method is roughly divided into two classes: the method that rim detection combines with Hough transform; Based on the method for Edge Search.The common shortcoming of these two kinds of methods is that operation time is long, therefore occurred that some are based on above-mentioned two kinds of tactful improving one's methods, but speed does not have the raising of the order of magnitude.Location remains one of step that in iris recognition process, operation time is the longest.
B. iris image normalization
The normalized object of iris image is that the size of iris is adjusted to fixing size.Up to the present, iris texture does not also obtain with the mathematical models of illumination variation.Therefore, the researcher being engaged in iris recognition mainly adopts the method for mapping to be normalized iris image.If the process founding mathematical models that can change with intensity of illumination iris texture or this process of approximate simulation, will have very great help to the raising of iris authentication system performance.
C. image enhaucament
The object of image enhaucament is to solve the problem causing picture contrast after normalization low because eye image uneven illumination is even.In order to improve discrimination, the image after to normalization is needed to carry out image enhaucament.
3) feature extraction
Feature extraction refers to the unique point being extracted uniqueness by certain algorithm from isolated iris image, and encodes to it.The iris feature of main flow extracts and recognition methods can be divided into eight large classes:
A. based on the method for image
Iris image is regarded as the Quantity Field of two dimension, grey scale pixel value just forms joint distribution, and the correlativity between image array has just measured similarity.
B. based on the method for phase place
This method is thought and as the positional information of " events " such as point, line, edges, is mostly included in the material particular in image in phase place, so give up the amplitude information of reflection intensity of illumination and contrast when feature extraction.
C. based on the method for singular point
Singular point in iris image divides two kinds:
Zero crossing
Extreme point
D. based on the method for hyperchannel texture filtering statistical nature
Iris image can regard 2 d texture as, and different scale in a frequency domain and direction having the strong statistical nature of distinction can for identifying, this is also method conventional in texture analysis.
E. based on the method for frequency domain decomposition coefficient
Image can be regarded as and be made up of the base in a lot of different frequency and direction, can go deep into having regular information in cognitive map picture by analysis chart picture in the size distribution of each base projection value.
F. based on the method for iris signal shape feature
Iris signal shape feature comprises the information of two aspects:
One is the ups and downs two-dimensional shape information of iris curved surface,
Two is the one dimension shape informations along iris circumference.
G. based on directional characteristic method
Direction (Direction) or be a relative value towards (Orientation), stronger to the robustness of illumination, contrast change, and local gray level feature can be described, be a kind of form being relatively applicable to iris image feature representation.
H. based on the method for subspace
The method of subspace needs to find several optimal bases according to the optiaml ciriterion of definition on fairly large training dataset, then using the characteristics of image of the projection coefficient of original image on optimal base as dimensionality reduction.
4) characteristic matching
Characteristic matching refers to that the iris image feature coding carrying out storing in advance in feature coding that feature extraction obtains and database according to the iris image of current collection is compared, verified, thus reaches the object of identification.
Preferably, biological identification technology is employed when catching.Biological identification technology is that the physiological characteristic that utilizes human body intrinsic by computing machine or behavioural characteristic carry out personal identification qualification.
S2-2: the video information in the video information obtained in step S2-1 and video information data base is contrasted to carry out identity video identification one by one, and the video identification data of identity ID in the video information obtaining comprising the video information data base that the video information that whether has and instruction to assign end matches and the video information situation of video information data base matched in the video information having and instruction to assign end.
Wherein, all video informations with the authorised operator of operating right are stored in video information data base.Understandably, the authorised operator with operating right is the expert with operating right, stores the video information of all experts in video information data base.When instruction issuing end carry out Video Capture with the video information got for face-image time, the video information in video information data base comprises all face-images with the authorised operator of operating right.When instruction issuing end carry out Video Capture with the video information got for pupil image time, the video information in video information data base comprises all pupil image with the authorised operator of operating right.When instruction issuing end carry out Video Capture with the video information got for iris image time, the video information in video information data base comprises all iris images with the authorised operator of operating right.
Step S3 is: carry out audio-frequency information acquisition at instruction issuing end, carry out identity audio identification according to audio-frequency information, and obtain the audio identification data comprising identity ID.
Preferably, in the present embodiment, the step S3 based on the used in nuclear power station identity identifying method of Audio and Video comprises step S3-1 and step S3-2.
S3-1: carry out audio capturing to obtain audio-frequency information at instruction issuing end;
S3-2: the audio-frequency information in the audio-frequency information obtained in step S3-1 and audio information database is contrasted to carry out identity audio identification one by one, and the audio identification data of identity ID in the audio-frequency information obtaining comprising the audio information database that the audio-frequency information that whether has and instruction to assign end matches and the audio-frequency information situation of audio information database matched at the audio-frequency information having and instruction to assign end.
As selection, in above-mentioned steps S3-1, instruction issuing end carries out audio frequency and obtains to get audio-frequency information.I.e. first capturing audio information, then carry out identity audio identification according to audio-frequency information, and obtain the audio identification data comprising identity ID.Understandably, when obtaining audio-frequency information, employ audio frequency identification technique.
Meanwhile, all audio-frequency informations with the authorised operator of operating right are stored in audio information database.Understandably, the authorised operator with operating right is the expert with operating right, stores the audio-frequency information of all experts in audio information database.
Step S4 is: the identity ID of the identity ID of video identification data and audio identification data is carried out contrast and obtains comparing result, optionally instruction is sent to command reception end according to comparing result determination instruction authority according to instruction authority.
Preferably, the present invention is based in the present embodiment of used in nuclear power station identity identifying method of Audio and Video, step S4 also comprises step S4-1, step S4-2 and step S4-3.
S4-1: carry out the identity ID of the identity ID of video identification data and audio identification data to contrast the comparing result obtaining comprising identity ID that both video identification data and audio identification data confirm and whether match, if match, then perform step S4-2; If do not match, then decision instruction authority is lack of competence, instruction is not sent to command reception end.
Preferably, one, after the identity ID of video identification data and both identity ID of audio identification data contrasts, if match, then illustrate: the instruction issuing person that video identification mode identifies is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is expert, simultaneously, the instruction issuing person that audio identification mode identifies is one of authorised operator with operating right stored in audio information database, namely the instruction issuing person that audio identification mode identifies is expert, and, the instruction issuing person that video identification mode identifies and the instruction issuing person that audible identifies are same people, so just next step can be proceeded, namely the authority of the instruction issuing person that this identifies is analyzed.Two, after the instruction issuing person identity ID of video identification data and both instruction issuing person identity ID of audio identification data contrasts, if do not conform to, then illustrating that the instruction issuing person that video identification mode identifies and the instruction issuing person that audible identifies not are same people, may be any one in following four kinds of situations:
1) the instruction issuing person that video identification mode identifies is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is expert, and one of authorised operator with operating right that the instruction issuing person that audio identification mode identifies does not store for audio information database is interior, the instruction issuing person that namely audio identification mode identifies is not expert;
2) the instruction issuing person that audio identification mode identifies is one of authorised operator with operating right stored in audio information database, namely the instruction issuing person that audio identification mode identifies is expert, and one of authorised operator with operating right that the instruction issuing person that video identification mode identifies does not store for video information data base is interior, the instruction issuing person that namely video identification mode identifies is not expert;
3) one of authorised operator with operating right that the instruction issuing person that video identification mode identifies does not store for video information data base is interior, namely the instruction issuing person that video identification mode identifies is not expert, meanwhile, the instruction issuing person that audio identification mode identifies is not one of authorised operator with operating right of storing in audio information database, and namely the instruction issuing person that identifies of audio identification mode neither expert;
4) the instruction issuing person that video identification mode identifies is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is an expert, but the instruction issuing person that audio identification mode identifies for store in audio information database have in the authorised operator of operating right except that expert that video identification mode identifies one of, the instruction issuing person that namely audio identification mode identifies is another one expert.
In above-mentioned four kinds of situations, the instruction authority of the decision instruction person of assigning is lack of competence, the instruction of instruction issuing person is not sent to command reception end.As selection, in this case also can directly the instruction of instruction issuing person be abandoned.
S4-2: the authority information in the identity ID in comparing result and authority information database is contrasted to carry out authentication one by one, and obtain the instruction authority whether with operating right comprising instruction issuing end, if instruction authority is for having permission, then instruction is sent to command reception end; If instruction authority is lack of competence, then instruction is not sent to command reception end.
Authority information databases contains all authority informations with the authorised operator of operating right.Understandably, the authorised operator with operating right is the expert with operating right, and authority information databases contains the authority information of all experts.
As selection, instruction can transfer to command reception end in the form of speech, also can transfer to command reception end with the form being converted into word, so that identify in environment noisy at the scene.
S4-3: the instruction authority that the verification process of storing step S1 to step S4-2 and verification process obtain.
As selection, in step S4, step S4-1, S4-2 and S4-3 can be comprised, also can not comprise step S4-3, namely only comprise step S4-1 and step S4-2.When not comprising step S4-3, the instruction authority that verification process and the verification process of step S1 to step S4-2 obtain is not stored.
In step S4-3, the authentication process in video identification process, the audio identification process in step S3, the identity information comparison process in step S4-1 and step S4-2 in the verification process of storing step S1 to step S4-2 and storing step S1 in instruction issuing process, step S2.Understandably, by preserving the video identification process in step S2 and the audio identification process in step S3, record backup effect can be played, to consult when needing afterwards.And the instruction authority that verification process obtains is carried out preserving and can be preserved each authentication result obtained by step S1 to step S4-2, record backup effect can be played, consult for when needing.
Fig. 2 shows the used in nuclear power station identity authorization system based on Audio and Video in the embodiment of the present invention two, for carrying out authentication to instruction issuing end under nuclear power plant environment, optionally instruction is sent to command reception end according to corresponding authority, carry out corresponding inspection operation to make the field service personnel of command reception end according to the instruction received.In the present embodiment, the used in nuclear power station identity authorization system based on Audio and Video comprises video identification device 10, speech recognizing device 30 and authentication device 50.
Wherein, video identification device 10 for carrying out acquiring video information at instruction issuing end, carrying out identity video identification according to video information, and obtains the video identification data comprising identity ID.Video identification device 10 comprises Video Capture module 11, video information memory module 13 and video identification module 15.
Video Capture module 11 is for carrying out Video Capture to obtain video information at instruction issuing end.
As selection, Video Capture module 11 carries out Video Capture at instruction issuing end can have numerous embodiments.Preferably, Video Capture module 11 carries out Video Capture at instruction issuing end can for face-image with the video information got, Video Capture module 11 can be camera, video camera, camera, infrared camera etc., as long as can face-image carry out catching can.Preferably, Video Capture module 11 carry out Video Capture at instruction issuing end also can for pupil image with the video information got, Video Capture module 11 can be camera, video camera, camera, infrared camera etc., as long as can pupil image carry out catching can.Preferably, Video Capture module 11 carries out Video Capture at instruction issuing end can also for iris image with the video information got, Video Capture module 11 can be camera, video camera, camera, infrared camera etc., can as long as can catch iris image.As selection, Video Capture module 11 carries out Video Capture at instruction issuing end can also for the image of other types with the video information got, as long as undertaken catching by video mode.
Preferably, Video Capture module 11 for catching face-image, and obtains the video information comprising face-image; Video information in video information data base comprises all face-images with the authorised operator of operating right.Understandably, in the present embodiment, when Video Capture module 11 pairs of face-images are caught, facial recognition techniques is employed.
As one of visual pattern most important in image and video, the face of people comprises a lot of information, by face recognition, and the part basic document knowing a people that we can be very fast, as sex, expression, the essential information such as age and identity.As all used face recognizing technology in public security, finance, network security, estate management and work attendance etc. in actual life, therefore face recognizing technology has researching value and the commercial value of very high science.
Face recognizing technology, as a very important branch in living things feature recognition, has become a research field very active in computer vision and area of pattern recognition.Face based on video detects a difficult point of academia especially.Digital video is digital picture expansion on a timeline, each frame of video can be considered as a secondary static image, first can process video, and then carries out face detection to image.In the nineties later stage, face recognizing technology is shown one's talent along with the raising at full speed of computer disposal speed and the revolutionary character improvement of algorithm for pattern recognition.Face recognizing technology is with the convenience of its uniqueness, economy, be accurately subject to attracting attention of common people.
Face recognizing technology comprises image surface detection, image surface is followed the tracks of with image surface than reciprocity aspect.Wherein, image surface detects and refers in the background of dynamic scene with complexity, judges whether to there is image surface and isolates image surface.Image surface is followed the tracks of and is referred to carry out dynamic target tracking to the image surface be detected.Image surface comparison is then carry out identity validation to the image surface be detected or carry out target search in image surface storehouse.
Particularly, image surface detects and is divided into the methods such as reference template, face rule, sample learning, complexion model and the sub-face of feature.Wherein, first reference template method designs one or several standard faces template, then calculates the matching degree between test sample book and standard form, judges whether to there is face by computing machine comparison.Because face has certain structure distribution feature, namely face rule is extracted these features and is generated corresponding rule to judge whether that test sample book comprises face.Sample learning then adopts Artificial Neural Network in pattern-recognition, produces sorter by opposite as sample set and non-face as the study of sample set.Complexion model detects according to distribute in the color space rule of Relatively centralized of the image surface colour of skin.All image surface set are considered as an image surface subspace by the sub-face of feature, based on detecting sample and whether its Distance Judgment between the projection of subspace exists image surface.
Said method also can comprehensively adopt in systems in practice.Image surface follows the tracks of general employing based on the method for model or based on the method combined with model of moving, and in addition, complexion model follows the tracks of a kind of simple and effective means of also can yet be regarded as.
Image surface comparison is the comparison successively of sampling image surface and stock's image surface in essence and finds out optimum matching object.Therefore, the description of image surface determines concrete grammar and the performance of face recognizing.Mainly contain proper vector and face line template two kinds of describing methods at present.
Feature vector method first determines size, position, distance, angle etc. the attribute of the image surface face profiles such as an iris, the wing of nose, the corners of the mouth, then calculates their geometric feature, and these characteristic quantities form the proper vector that describes this image surface.
And face line template rule stores some index planes as template or image surface organ template in storehouse, when comparison, sampling image surface all pixels adopt normalization correlative to measure with templates all in storehouse to mate.In addition, the method also having the auto-correlation network of pattern-recognition or feature to be combined with template.
Face recognizing system then adopts " Local Features Analysis " (Local Feature Analysis, LFA) algorithm, this algorithm speed is fast, misidentification is low, without the need to study, utilize the data such as orientation, ratio, corresponding geometric relationship of each organ of face and characteristic portion to form identification parameter, compare with initial parameters all in database, judge, confirm.The bare bones of any one face recognizing system how image surface to be encoded.Face recognizing technology uses Local Features Analysis LFA to describe face image, and he comes from similar principle of building the partial statistics of building blocks.LFA is a kind of computing method based on the following fact, namely all image surfaces (comprising the style of various complexity) comprehensively can form from by the structural unit subset much can not streamlined any further. and these unit employ complicated statistical technique and are formed, which represent whole image surface. they usually cross over multiple pixel (in regional area) and represent general face shape, but are not the facial characteristics on ordinary meaning.In fact the position of face structure unit specific surface picture is much more.But, comprehensively to form true to nature, an accurate image surface, only need subset of cells (12-40 feature unit) little in whole usable set, determine one's identity and not only depend on the unit of characteristic, be also decided by their geometry (such as their relevant position).In this way, the characteristic of individual correspondence is become a kind of Chinese real number expression of complexity by LFA, can carry out contrasting and identifying.Such as: the workflow of Tickets Systems is: first, automatically in video data stream, search for face image; When there is the head portrait of user for one, polytype matching algorithm is automatically used to judge whether really there is a face in that position.These algorithms accurately can detect multiple faces simultaneously occurred, and can determine their accurate location.Once detect a face, the image of this face will be separated by from background, and this images will recover its size, light, expression and attitude subsequently through a series of special processing.Then by this width face image at internal system conversion surface line, it contains the peculiar information of this face; Again by existing " face line " in " the face line " of Real-time Obtaining and database is compared; Complete and certain face is confirmed." face line " coded system carrys out work according to the essential characteristic of face and shape, it can resist the change of light, skin color, facial hair, hair style, glasses, expression and attitude, there is powerful reliability, make it can accurately recognize a people from million people.Above-mentioned whole process completes all automatically, continuously, in real time.And system only needs common treatment facility.
The characteristic of face recognizing technology is that other often plant the interoperation that biometric discrimination method all needs some, and face recognizing does not need passive cooperation, can automatically be used in hidden occasion, as public security department's monitoring action; When the biological recorder of the people that record one attempt logs in, only have image surface more intuitively, the identity of this people can be verified more easily.
The use step of facial recognition techniques:
1) set up image surface archives: from camera collection image surface file or photo files can be got, generate face line (Faceprint) coding and proper vector;
2) obtain current image surface, from camera capture surface picture or photo input can be got, generate its face line;
3) the face line of current image surface coding is encoded with the face line in archives carry out retrieval comparison;
4) confirming face is selected as identity or proposition identity.Whole process completes all automatically, continuously, in real time.And system only needs common treatment facility.
Known by the above explanation to facial recognition techniques, facial recognition techniques has numerous advantages such as more easy, accurate, economic and extensibility is good, can be widely used in the various aspects such as safety verification, monitoring, control.
Preferably, Video Capture module 11 for catching pupil image, and obtains the video information comprising pupil image; Video information in video information data base comprises all pupil image with the authorised operator of operating right.Understandably, in the present embodiment, when Video Capture module 11 pairs of pupil image are caught, pupil recognition technology is employed.
Pupil, as the important component part in the eyeball of human body, has prospect extremely widely current to the identification of pupil: by detecting pupil, can realize, to the detection of the object that people watches attentively, realizing man-machine interaction.By can assist the judgement realized the state of mind of driver to the oculopupillary detection of people, by the detection to pupil, auxiliary detection can be played at medical instruments field, be convenient to the effect that doctor carries out curative activity.
Pupil recognition technology is the important problem of computer vision field one, relates to multiple subjects such as physiology, artificial intelligence, pattern-recognition, computer vision, computer assisted image processing.
Pupil recognition technology is along with persona face detection at first, and the development of the human face analysis technology such as Expression analysis gets up.Propose recognition of face (face recognition) as far back as the 60 to 70's of eighties of last century to require to detect face, just along with the proposition of pupil recognition technology.
Pupil recognition technology relates to multiple ambits such as psychology, physiology, artificial intelligence, pattern-recognition, computer vision, computer assisted image processing, the comprehensive problem of the combination of pattern-recognition especially, image procossing and computer vision.To further investigation and the thoroughly solution of this problem, contribute to researching and analysing and solving other object automatic recognition problem.
Another major reason that pupil recognition technology is taken seriously is that it has huge potential using value in fields such as health care, traffic safety, public safety guarantee, military affairs, criminal investigations.At this, briefly for the example of several practical application in life:
1, in medical, by detection and tracking, we can monitor that the pupil situation of a people can judge the mental status of a people, such as whether spirit is normal;
2, traffic safety, we can carry out detection and tracking to the pupil of driver, judge whether it is fatigue driving, the generation that not only can avoid traffic accident according to the active situation of its pupil, and unnecessary human and financial resources can be saved, indirectly increase economic efficiency;
3, carry out monitoring to its pupil in interrogation convict process and follow the tracks of, judging its psychological activity situation, grasp the initiative of interrogation, the authenticity of its confession can also be judged according to its pupil active situation;
4, in human-machine interface technology of future generation, also there will be by the pupil that blinks to substitute the input mode of clicking the mouse at present.
The main process of pupil recognition technology can be divided into three parts: Face datection, pupil detection, pupil tracking.Camera takes the importation of facial image as whole pupil detection and tracker in real time.Gather video image with image pick-up card, obtain one group of image sequence.First skin detection is carried out to determine whether face and to judge the approximate region of face to each two field picture collected, in human face region, then detects pupil position to go forward side by side line trace.In the pupil tracking stage, all to confirm whether correctly to follow the tracks of to each two field picture, if tracking error exceedes the threshold value of regulation, need again to detect face, determine pupil position, then carry out tracking work.
Preferably, Video Capture module 11 for catching iris image, and obtains the video information comprising iris image; Video information in video information data base comprises all iris images with the authorised operator of operating right.Understandably, in the present embodiment, when Video Capture module 11 pairs of iris images are caught, iris recognition technology is employed.
The outside drawing of eye is made up of sclera, iris, pupil three part.The white portion of sclera and eyeball periphery, accounts for 30% of the total area; Eye center is pupil portion, accounts for 5%; Iris, between sclera and pupil, contains the abundantest texture information, occupies 65%.Seeing in appearance, be made up of many gland nests, gauffer, pigmented spots etc., is one of structure unique in human body.The formation of iris is determined by gene, and human body gene expresses the form, physiology, color and the total outward appearance that determine iris.People grows by about eight months, and iris is just substantially grown and arrived sufficient size, enters metastable period.Unless the anomalous condition of seldom seeing, health or wound large mentally just may cause outside the change on appearance of iris, iris pattern can keep the few of change of many decades.On the other hand, iris is outside visible, but belongs to interior tissue again simultaneously, is positioned at facies posterior corneae.Change appearance of iris, need very meticulous operation, and VI danger will be emitted.The height of iris is unique, stability and not modifiable feature, is the material base that iris can be used as identity verify.
Iris has many innate advantages as identify label:
1) uniqueness, because iris image also exists the minutia of many stochastic distribution, creates the uniqueness of iris patterns.The iris phase characteristic that univ cambridge uk John professor Daugman proposes confirms that film image has 244 independently degree of freedom, and namely on average the quantity of information of every square millimeter is 3.2 bits.In fact extracting characteristics of image with mode identification method is lossy compression method process, can predict that the information capacity of iris texture is much larger than this.And iris minutia is mainly determined by the enchancement factor of embryonic development environment, even if also there is significant difference between the iris image of human cloning, twins, same people's right and left eyes.
The uniqueness of iris is that high-precision identification is laid a good foundation.The test result of United Kingdom National physics laboratory shows: iris recognition is that in various biological feather recognition method, error rate is minimum.
2) stability, iris germinates brephic 3rd month from baby, and the main texture structure to 8th month iris is shaped.Unless experience jeopardizes the operation of eyes, after this almost constant throughout one's life.Due to the protective effect of cornea, full grown iris is not vulnerable to extraneous injury.
3) noncontact, iris is a visible internal in outside, and need not be close to harvester and just can obtain qualified iris image, recognition method needs relative to fingerprint, hand shape etc. the biological characteristic clean hygiene more contacting perception, can not stained imaging device, affect other people identification.
4) be convenient to signal transacting, region contiguous with iris in eye image is pupil and sclera, and they and iris region also exist obvious gray scale rank and become, and zone boundary is all close to circle, so iris region is easy to matching segmentation and normalization.Iris structure is conducive to realizing a kind of pattern expression way with translation scaling and rotation invariant.
5) antifalsification is good, the radius of iris is little, Chinese iris image presents dark brown under visible light, can't see texture information, the Image Acquisition with clear iris texture needs special iris image acquiring device and the cooperation of user, so be difficult to the iris image stealing other people in the ordinary course of things.In addition eyes have a lot of optics and physiological property and can be used for living body iris and detect.
In all biological identification technologies comprising fingerprint, iris recognition is that current application is the most convenient and accurate a kind of.Iris recognition technology is widely regarded as the rising biometrics of 21st century most, the application of following multiple fields such as security protection, national defence, ecommerce, and also inevitable meeting is attached most importance to iris recognition technology.This trend starts gradually to display in the various application of all parts of the world, and market application foreground is boundless.
Iris recognition determines the identity of people by the similarity between contrast iris image feature, and its core is that the iris features of method to eye such as using forestland identification, image procossing are described and mate, thus realizes automatic personal identification.
In general the process of iris recognition technology is divided into: iris image acquisition, Image semantic classification, feature extraction and characteristic matching four steps.
1) iris image acquisition
Iris image acquisition refers to and uses the whole eye of specific digital camera equipment to people to take, and is transferred in computing machine by image pick-up card by the image photographed and stores.The acquisition of iris image is the first step in iris recognition, is also more difficult step simultaneously, needs the integrated application of light, mechanical, electrical technology.Because the area of people's eyes is little, if the image resolution requirement meeting recognizer just must improve the enlargement factor of optical system, thus cause the depth of field of iris imaging less, so existing iris authentication system needs user to be parked in correct position, eye gaze camera lens (Stop and Stare) simultaneously.The iris color of Asians is comparatively dark in addition, cannot collect discernible iris image with common camera.Be different from the Image Acquisition of the biological characteristics such as face picture, gait, the acquisition of iris image needs optical system reasonable in design, the light source that configuration is necessary and electronic control unit.Because the technical threshold of iris image acquisition device independent research is high, limit carrying out of domestic iris recognition research.Institute of Automation Research of CAS developed the iris image gathering system of domestic first set independent intellectual property right in 1999, be characterized in small and exquisite, flexible, low cost, clear picture.Through constantly updating, the iris imager of Institute of Automation recent development can collect qualified iris image at 20 ~ 30cm distance range by technology such as voice message, active vision feedbacks.
2) Image semantic classification
Image semantic classification refers to that the eyes image owing to photographing includes much unnecessary information, and can not meet the demands in sharpness etc., needs to comprise the pretreatment operation such as image smoothing, rim detection, separation of images to it.Iris preprocessing process generally includes Iris Location, iris image normalization, image enhaucament three parts.
A. Iris Location
It is generally acknowledged, the inner and outer boundary of iris can carry out matching with circle approx.Inner circle represents the border of iris and pupil, and cylindrical represents the border of iris and sclera, but these two circles are not concentric circles.Usually, iris near upper lower eyelid part always block by eyelid, therefore also must detect the border of iris and upper hypodermis, thus determine the effective coverage of iris exactly.The border of iris and upper lower eyelid can represent with quafric curve.The object of Iris Location is exactly determine these circle and quafric curve positions in the picture.Conventional localization method is roughly divided into two classes: the method that rim detection combines with Hough transform; Based on the method for Edge Search.The common shortcoming of these two kinds of methods is that operation time is long, therefore occurred that some are based on above-mentioned two kinds of tactful improving one's methods, but speed does not have the raising of the order of magnitude.Location remains one of step that in iris recognition process, operation time is the longest.
B. iris image normalization
The normalized object of iris image is that the size of iris is adjusted to fixing size.Up to the present, iris texture does not also obtain with the mathematical models of illumination variation.Therefore, the researcher being engaged in iris recognition mainly adopts the method for mapping to be normalized iris image.If the process founding mathematical models that can change with intensity of illumination iris texture or this process of approximate simulation, will have very great help to the raising of iris authentication system performance.
C. image enhaucament
The object of image enhaucament is to solve the problem causing picture contrast after normalization low because eye image uneven illumination is even.In order to improve discrimination, the image after to normalization is needed to carry out image enhaucament.
3) feature extraction
Feature extraction refers to the unique point being extracted uniqueness by certain algorithm from isolated iris image, and encodes to it.The iris feature of main flow extracts and recognition methods can be divided into eight large classes:
A. based on the method for image
Iris image is regarded as the Quantity Field of two dimension, grey scale pixel value just forms joint distribution, and the correlativity between image array has just measured similarity.
B. based on the method for phase place
This method is thought and as the positional information of " events " such as point, line, edges, is mostly included in the material particular in image in phase place, so give up the amplitude information of reflection intensity of illumination and contrast when feature extraction.
C. based on the method for singular point
Singular point in iris image divides two kinds:
Zero crossing
Extreme point
D. based on the method for hyperchannel texture filtering statistical nature
Iris image can regard 2 d texture as, and different scale in a frequency domain and direction having the strong statistical nature of distinction can for identifying, this is also method conventional in texture analysis.
E. based on the method for frequency domain decomposition coefficient
Image can be regarded as and be made up of the base in a lot of different frequency and direction, can go deep into having regular information in cognitive map picture by analysis chart picture in the size distribution of each base projection value.
F. based on the method for iris signal shape feature
Iris signal shape feature comprises the information of two aspects:
One is the ups and downs two-dimensional shape information of iris curved surface,
Two is the one dimension shape informations along iris circumference.
G. based on directional characteristic method
Direction (Direction) or be a relative value towards (Orientation), stronger to the robustness of illumination, contrast change, and local gray level feature can be described, be a kind of form being relatively applicable to iris image feature representation.
H. based on the method for subspace
The method of subspace needs to find several optimal bases according to the optiaml ciriterion of definition on fairly large training dataset, then using the characteristics of image of the projection coefficient of original image on optimal base as dimensionality reduction.
4) characteristic matching
Characteristic matching refers to that the iris image feature coding carrying out storing in advance in feature coding that feature extraction obtains and database according to the iris image of current collection is compared, verified, thus reaches the object of identification.
Video information memory module 13 is for storing all video informations with the authorised operator of operating right, and all video informations form a video information data base.
Wherein, all video informations with the authorised operator of operating right are stored in video information data base.Understandably, the authorised operator with operating right is the expert with operating right, stores the video information of all experts in video information data base.When Video Capture module 11 instruction issuing end carry out Video Capture with the video information got for face-image time, the video information in video information data base comprises all face-images with the authorised operator of operating right.When Video Capture module 11 instruction issuing end carry out Video Capture with the video information got for pupil image time, the video information in video information data base comprises all pupil image with the authorised operator of operating right.When Video Capture module 11 instruction issuing end carry out Video Capture with the video information got for iris image time, the video information in video information data base comprises all iris images with the authorised operator of operating right.
Video identification module 15 contrasts to carry out identity video identification for the video information in the video information that Video Capture module 11 obtained and video information data base one by one, and the video identification data of identity ID in the video information obtaining comprising the video information data base that the video information that whether has and instruction to assign end matches and the video information situation of video information data base matched in the video information having and instruction to assign end.
Speech recognizing device 30 for carrying out audio-frequency information acquisition at instruction issuing end, carrying out identity audio identification according to audio-frequency information, and obtains the audio identification data comprising identity ID.Speech recognizing device 30 comprises audio capture module 31, audio-frequency information memory module 33 and audio identification module 35.
Wherein, audio capture module 31 is for carrying out audio capturing to obtain audio-frequency information at instruction issuing end.
The i.e. first capturing audio information of audio capture module 31, then carry out identity audio identification according to audio-frequency information, and obtain the audio identification data comprising identity ID.Understandably, when audio-frequency information obtains, audio frequency identification technique is employed.
Vocal print refers to the wave spectrum figure be depicted as by acoustic characteristic by special electroacoustic transduction equipment (sound spectrograph, sonagraph etc.), and it is the set of various acoustic feature collection of illustrative plates." voiceprint " annotation in dictionary is: the voiceprint varied with each individual retouching record with instrument.Vocal print is one " I.D. " of human body, is characteristic signal steady in a long-term.
Application on Voiceprint Recognition is that unknown human phonetic material (sample) and known person phonetic material (sample) are plotted vocal print collection of illustrative plates respectively by electroacoustic transduction equipment, then compares according to the Speech acoustics feature on collection of illustrative plates and comprehensively analyze! To draw the deterministic process whether both are same.Application on Voiceprint Recognition has very wide application prospect, is worldwide just being widely used in the fields such as finance, security, social security, public security, army and other civil safety certifications.At present, the unloading phase that Chinese market still belonging to, its development space is more wide.
Application on Voiceprint Recognition is broadly divided into speech recognition and Speaker Identification two kinds: speech recognition recognizes its said voice, syllable, word or simple sentence according to the pronunciation of speaker, this will get rid of the personal presentation of different speaker, find out the common feature representing each phonetic unit, Speaker Identification recognizes speaker according to voice, and do not consider content and the meaning of sound! This just needs the characteristic isolating each individuality.
At present, in universal significance, the concept of Application on Voiceprint Recognition refers to Speaker Identification.Speaker Identification comprises speaker's identification and speaker verification two aspects.Speaker's identification is the analytic process of one-to-many, namely judges that certain section of voice are that in some people, which is said, is mainly used in criminal investigation and case detection, criminal's tracking, national defence monitoring, personalized application etc.Speaker verification is man-to-man deterministic process, namely confirms whether certain section of voice belong to someone that specify, are mainly used in security exchange, bank transaction, personal computer acoustic control lock, automobile sound control lock, I.D., credit card etc.The core identified is typing sample sound in advance, and extracts the unique feature of each sample, sets up property data base, is mated by sound to be checked during use with the feature in database, by analytical calculation, realize Speaker Identification.
The principle of Application on Voiceprint Recognition
1, feature extraction
The essential characteristic that can reflect individual information in sound is extracted in feature extraction, and it is individual that these essential characteristics must distinguish different sounding accurately and efficiently, and for same individuality, these essential characteristics should have stability.
Current Voiceprint Recognition System mainly relies on the acoustic feature of lower level to identify.These acoustic features mainly contain the following aspects:
1) voice messaging is exported by bank of filters, carries out sampling obtain spectrum envelope characteristic parameter with suitable speed to output;
2) based on the characteristic parameter that phonatory organ extracts as the physiological structure of glottis, sound channel and nasal cavity, as pitch contour, formant frequency bandwidth and track thereof etc.;
3) with the characteristic parameter that linear prediction is derived, as linear predictor coefficient, coefficient of autocorrelation, reflection coefficient etc.;
4) the auditory properties parameter that people's ear obtains the characteristic of sound frequency perception is simulated, as Mel cepstrum coefficient, perception linear prediction etc.
Along with the continuous expansion to Application on Voiceprint Recognition range of application, and improving constantly system accuracy requirement, only consider that the acoustic feature of lower level can not meet the demands, this just needs to consider high-level characteristic information simultaneously, as word speed, grammer, the rhythm, languages, dialect, characteristic pronunciation, characteristic word, channel (channel that sound letter obtains) etc.For the information that these are high-level, the problem of most critical selects, and will decide in light of the circumstances simultaneously.Such as, for this feature of channel, in criminal investigation and case detection, just wish not adopt, namely wish that channel does not have an impact to identification, thus the sound that the indirect means such as recording are obtained can become the evidence helping to solve a case; And in bank transaction, just wish to adopt, namely wish that channel has an impact to identification, the harm that the malicious acts such as recording are brought could be rejected like this.Therefore, in Application on Voiceprint Recognition process, according to actual conditions, the combination of different characteristic parameter must be arranged, to improve the performance of real system, when correlativity is little when between each combination parameter, better recognition effect can be obtained.
2, pattern match
The key of sound groove recognition technology in e is to process various acoustical characteristic parameters, and deterministic model matching process, main method for mode matching is:
(1) probabilistic method: acoustic information is comparatively steady in short-term, by the statistical study of steady state characteristic as fundamental tone, glottis gain, low order reflection coefficient, can utilize the statistic such as average, variance and probability density function to carry out classification judgement.This method need not be whole in the enterprising professional etiquette of time domain to characteristic parameter, is applicable to text-independent Application on Voiceprint Recognition.
(2) dynamic time warping method: speaker information not only has ballast (structure of phonatory organ and sounding custom etc.), also factor (word speed, intonation, stress and the rhythm etc.) is sometimes become, recognition template and reference template are carried out time contrast, draws the similarity degree between two templates according to certain range determination.
(3) vector quantization method: everyone particular text is weaved into code book, encodes test text by this code book during identification, using quantize produce degree of distortion as judgement standard, have accuracy of identification high, judge fireballing feature.
(4) hidden markov model approach: hidden Markov model is the probabilistic model based on transition probability and transmission probability, it regards voice as the stochastic process be made up of observable symbol sebolic addressing, and symbol sebolic addressing is then the output of sonification system status switch.During identification, for each pronunciation individuality sets up sonification model, obtain state transition probability matrix and symbol output probability matrix by training.Calculate the maximum probability of unknown voice in state migration procedure during identification, the model corresponding according to maximum probability is adjudicated.This method does not need Time alignment, and computing time when can save judgement and memory space, be widely used at present; When shortcoming is training, calculated amount is larger.
(5) Artificial Neural Network: artificial neural network simulates biological apperceive characteristic to a certain extent, it is a kind of network model of distributed variable-frequencypump structure, there is self-organization and self-learning capability, very strong complicated classification boundaries separating capacity and the robustness to imperfect information, the class device of its performance approximate ideal, shortcoming is that the training time is long, dynamic time warping ability is weak, may greatly to the degree being difficult to train when network size increases with speaker's number.
Organ---tongue, tooth, larynx, lung, nasal cavity that health uses when talking, in size and form, everyone is widely different, and this has the uniqueness different from other people and stability constant in the regular period with regard to making everyone sound characteristic.Sounding at general people's ear although feign another's voice may be extremely similar, if but adopt sound groove recognition technology in e to identify, just can demonstrate huge difference, therefore, being how brilliant, similar sound imitates all distinguishes by sound groove recognition technology in e.
Sound groove recognition technology in e proposed in the U.S. in mid-term in 20th century, and that carry out technique research the earliest is the Lao Lunsi of U.S.'s Bell Laboratory, and Ke Site, he has carried out Analysis and Identification to up to ten thousand vocal print figure of more than 100 Healthy Peoples, and rate of accuracy reached is to 99.65%.The sound groove recognition technology in e of China is started late, and just start to carry out formal research in the nineties in 20th century, that carries out correlative study at present has Peking University, Tsing-Hua University, Acoustical Inst., Chinese Academy of Sciences and some political-legal departments.
Now, worldwide, sound groove recognition technology in e has been applied to as numerous areas such as finance, security, criminal investigation and other civil safety Verification Systems.Especially in safety certification, sound does not relate to privacy concern, and relevant device is cheap, and utilizing vocal print to carry out identification is the economic again method of nature, and user's acceptance level is relatively high.The password available sounds of such as bank, securities system replaces, and namely utilizes vocal print technology, makes sound be converted into key, and such people do not need to remember complicated password, do not need the thing carrying with key, smart card and so on yet.In addition, extort at phone in the case of a class, the evidence the most easily obtained is exactly telephonograph, by sound groove recognition technology in e, just can obtain clue according to telephonograph, shortens and solves a case the cycle.Although most countries does not also list sound in the vaild evidence scope of court judgment now, sample sound day by day comes into one's own in criminal investigation and juridical effect.
Audio-frequency information memory module 33 is for storing all audio-frequency informations, and all audio-frequency informations form an audio information database.Understandably, the authorised operator with operating right is the expert with operating right, stores the audio-frequency information of all experts in audio information database.
Audio identification module 35 contrasts to carry out identity audio identification for the audio-frequency information in the audio-frequency information that audio capture module 31 obtained and audio information database one by one, and the audio identification data of identity ID in the audio-frequency information obtaining comprising the audio information database that the audio-frequency information that whether has and instruction to assign end matches and the audio-frequency information situation of audio information database matched at the audio-frequency information having and instruction to assign end.
The recognition method of audio identification module 35 for: carry out the identity ID of the identity ID of video identification data and audio identification data to contrast the comparing result obtaining comprising identity ID that both video identification data and audio identification data confirm and whether match, if do not match, then decision instruction authority is lack of competence, instruction is not sent to command reception end, if match, then: one, after the identity ID of video identification data and both identity ID of audio identification data contrasts, if match, then illustrate: the instruction issuing person that video identification mode identifies is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is expert, simultaneously, the instruction issuing person that audio identification mode identifies is one of authorised operator with operating right stored in audio information database, namely the instruction issuing person that audio identification mode identifies is expert, and, the instruction issuing person that video identification mode identifies and the instruction issuing person that audible identifies are same people, so just next step can be proceeded, namely the authority of the instruction issuing person that this identifies is analyzed.Two, after the instruction issuing person identity ID of video identification data and both instruction issuing person identity ID of audio identification data contrasts, if do not conform to, then illustrating that the instruction issuing person that video identification mode identifies and the instruction issuing person that audible identifies not are same people, may be any one in following four kinds of situations:
The instruction issuing person that 1 video identification mode identifies is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is expert, and one of authorised operator with operating right that the instruction issuing person that audio identification mode identifies does not store for audio information database is interior, the instruction issuing person that namely audio identification mode identifies is not expert;
The instruction issuing person that 2 audio identification modes identify is one of authorised operator with operating right stored in audio information database, namely the instruction issuing person that audio identification mode identifies is expert, and one of authorised operator with operating right that the instruction issuing person that video identification mode identifies does not store for video information data base is interior, the instruction issuing person that namely video identification mode identifies is not expert;
One of authorised operator with operating right that the instruction issuing person that 3 video identification modes identify does not store for video information data base is interior, namely the instruction issuing person that video identification mode identifies is not expert, meanwhile, the instruction issuing person that audio identification mode identifies is not one of authorised operator with operating right of storing in audio information database, and namely the instruction issuing person that identifies of audio identification mode neither expert;
The instruction issuing person that 4 video identification modes identify is one of authorised operator with operating right stored in video information data base, namely the instruction issuing person that video identification mode identifies is an expert, but the instruction issuing person that audio identification mode identifies for store in audio information database have in the authorised operator of operating right except that expert that video identification mode identifies one of, the instruction issuing person that namely audio identification mode identifies is another one expert.
In above-mentioned four kinds of situations, the instruction authority of the decision instruction person of assigning is lack of competence, the instruction of instruction issuing person is not sent to command reception end.As selection, in this case also can directly the instruction of instruction issuing person be abandoned.
Authentication device 50 obtains comparing result for being carried out contrasting by the identity ID of the identity ID of video identification data and audio identification data, optionally the instruction assigned is sent to command reception end according to comparing result determination instruction authority according to instruction authority.Authentication device 50 comprises contrast module 51, authority information memory module 53, authentication module 55, delivery module 57 and historical record memory module 59.
Wherein, module 51 is contrasted for carrying out the identity ID of the identity ID of video identification data and audio identification data to contrast the comparing result obtaining comprising identity ID that both video identification data and audio identification data confirm and whether match;
Authority information memory module 53 is for storing all authority informations with the authorised operator of operating right, and entitlement limit information forms an authority information database.Understandably, the authorised operator with operating right is the expert with operating right, and authority information databases contains the authority information of all experts.
Authentication module 55 for being contrasted one by one to carry out authentication by the authority information comprised in comparing result and authority information database that identity ID that both video identification data and audio identification data confirm matches, and obtains comprising the instruction authority whether instruction issuing end has operating right.
Delivery module 57 is for being optionally sent to instruction issuing end by instruction according to instruction authority.
The instruction authority that historical record memory module 59 obtains for store video recognition device 10, speech recognizing device 30, the verification process contrasting module 51, authority information, memory module, authentication module 55 and delivery module 57 and verification process.As selection, historical record memory module 59 can be arranged, and also can not arrange.When not arranging historical record memory module 59, the instruction authority that verification process and the verification process of video identification device 10, speech recognizing device 30, contrast module 51, authority information, memory module, authentication module 55 and delivery module 57 obtain is not stored.
The above is only the preferred embodiment of the present invention, protection scope of the present invention be not only confined to above-described embodiment, and all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.