CN107767137A

CN107767137A - A kind of information processing method, device and terminal

Info

Publication number: CN107767137A
Application number: CN201610712317.7A
Authority: CN
Inventors: 饶凯
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Co Ltd
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2018-03-06

Abstract

Embodiments of the invention provide a kind of information processing method, device and terminal, and method includes：During secure payment certification, the facial image and acoustic information of delivery operation person are obtained；According to the facial image, mouth spatial positional information is obtained, and according to the acoustic information, obtain acoustic space positional information；The facial image and the face information that prestores the match is successful and/or acoustic information and the acoustic information that prestores the match is successful, in facial image mouth dynamic, and mouth spatial positional information it is consistent with the acoustic space positional information in acoustic information when, pass through the verification process to the delivery operation person.The solution of the present invention can improve the security of payment, reduce the possibility that mobile phone is cheated by video.

Description

A kind of information processing method, device and terminal

Technical field

The present invention relates to payment flow process field, particularly relates to a kind of information processing method, device and terminal.

Background technology

At present, mobile-phone payment industry development is rapid, because mobile phone is lost, information leakage or telecommunication fraud cause to lose, Therefore mobile phone is when carrying out the sensitive transactions such as paying, it is determined that the people for carrying out the transaction is exactly the mobile phone or the phone number owner I am particularly significant.

And in the prior art, during mobile-phone payment, various leaks be present, safety of payment hidden danger be present.

The content of the invention

The invention provides a kind of information processing method, device and terminal, is positioned using framing and voice, and checking is worked as Preceding delivery operation person is the operator that user allows, and improves the security of payment.

In order to solve the above technical problems, embodiments of the invention provide following scheme：

A kind of information processing method, including：

During secure payment certification, the facial image and acoustic information of delivery operation person are obtained；

According to the facial image, mouth spatial positional information is obtained, and according to the acoustic information, obtain acoustic space Positional information；

In the facial image and the face information that prestores, the match is successful and/or acoustic information and the sound that prestores The match is successful for message breath, and mouth is moving in facial image, and the acoustic space in mouth spatial positional information and acoustic information When positional information is consistent, pass through the verification process to the delivery operation person.

Wherein, during secure payment certification, obtain delivery operation person facial image the step of include：

During secure payment certification, the facial image of the delivery operation person using picture pick-up device collection is obtained.

Wherein, during secure payment certification, obtain delivery operation person acoustic information the step of include：

During secure payment certification, the acoustic information of the delivery operation person using the collection of sound recording device is obtained.

Wherein, according to the facial image, the step of obtaining mouth spatial positional information, includes：

Mouth is identified from the half-tone information of facial image, and mouth sky is obtained from the depth information of the facial image Between positional information；Or mouth in several facial images is identified, obtain the line segment of the pixel by mouth position Between intersection point, using the intersection point as mouth spatial positional information.

Wherein, according to the acoustic information, the step of obtaining acoustic space positional information, includes：

Obtain the time difference between the multiple microphones of the acoustic information arrival；

According to sound propagation velocity and the time difference, calculate sound source and leave one's post the distance between meaning microphone and with appointing Angle between two microphone lines of meaning；

According to the distance and the angle, acoustic space positional information is obtained.

Wherein, mouth includes in dynamic deterministic process in facial image：

Obtain continuous multiple image in facial image；

If in the multiple image, the image of part changes where mouth, then judge that mouth is dynamic in facial image.

Wherein, the mouth spatial positional information deterministic process bag consistent with the acoustic space positional information in acoustic information Include：

Sound sky of the First ray for judging to include the mouth spatial positional information at M time point with including M time point Between positional information the second sequence in, the mouth locus at N number of time point is overlapped with acoustic space position or locus When difference is in a predetermined threshold value, determine that mouth spatial positional information is consistent with the acoustic space positional information in acoustic information；

Wherein, the ratio obtained by the N/M is more than a preset value, and M is more than N, and M, N are positive integer.

Embodiments of the invention also provide a kind of information processor, including：

First acquisition module, for during secure payment certification, obtaining the facial image and sound of delivery operation person Information；

Second acquisition module, for according to the facial image, obtaining mouth spatial positional information, and according to the sound Information, obtain acoustic space positional information；

Processing module, for the match is successful and/or acoustic information in the facial image and the face information that prestores The match is successful with the acoustic information that prestores, and mouth is dynamic in facial image, and mouth spatial positional information harmony message When acoustic space positional information in breath is consistent, pass through the verification process to the delivery operation person.

Wherein, first acquisition module includes：

First acquisition unit, for during secure payment certification, obtaining the delivery operation person using picture pick-up device collection Facial image；

Second acquisition unit, for during secure payment certification, obtaining the payment behaviour using the collection of sound recording device The acoustic information of author.

Wherein, second acquisition module includes：

3rd acquiring unit, for identifying mouth from the half-tone information of facial image, and from the depth of the facial image Spend and mouth spatial positional information is obtained in information；Or mouth in several facial images is identified, obtain by where mouth Intersection point between the line segment of the pixel of position, using the intersection point as mouth spatial positional information；

4th acquiring unit, for obtaining the time difference between the multiple microphones of the acoustic information arrival；According to sound Spread speed and the time difference, calculate sound source leave one's post meaning the distance between microphone and with any two microphone line Between angle；According to the distance and the angle, acoustic space positional information is obtained.

Wherein, the processing module includes：

First processing units, for obtaining continuous multiple image in facial image, if in the multiple image, mouth institute Image in part changes, then judges that mouth is dynamic in facial image；

Second processing unit, the First ray of the mouth spatial positional information at M time point is included for judgement and includes M In second sequence of the acoustic space positional information at individual time point, mouth locus and the acoustic space position at N number of time point When coincidence or locus difference are in a predetermined threshold value, determine that the sound in mouth spatial positional information and acoustic information is empty Between positional information it is consistent；Wherein, the ratio obtained by the N/M is more than a preset value, and M is more than N, and M, N are positive integer.

Embodiments of the invention also provide a kind of terminal, including：Information processor as described above.

The such scheme of the present invention comprises at least following beneficial effect：

The such scheme of the present invention is by during secure payment certification, obtaining the facial image and sound of delivery operation person Message ceases；According to the facial image, mouth spatial positional information is obtained, and according to the acoustic information, obtain acoustic space Positional information；In the facial image and the face information that prestores, the match is successful and/or acoustic information and prestores The match is successful for acoustic information, and mouth is moving in facial image, and the sound in mouth spatial positional information and acoustic information is empty Between positional information it is consistent when, pass through the verification process to the delivery operation person.Ensure that voice is sent out from current operator's mouth The sound gone out, facial image are the facial images of current operator, reduce the possibility that mobile phone is cheated by video.

Brief description of the drawings

Fig. 1 is the information processing method flow chart of the present invention；

Fig. 2 is the module frame chart of the information processor of the present invention；

Fig. 3 is the module frame chart of the terminal of the present invention.

Embodiment

The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.

Embodiments of the invention are by the facial image of certification current operator, acoustic information and determine in facial image Mouth is dynamic, and when mouth spatial positional information is consistent with sound spatial positional information, it is determined that current delivery operation person is legal Operator, so as to reduce the possibility that mobile phone is cheated by video.

As shown in figure 1, the first embodiment of the present invention provides a kind of information processing method, including：

Step 11, during secure payment certification, the facial image and acoustic information of delivery operation person are obtained；

Step 12, according to the facial image, mouth spatial positional information is obtained, and according to the acoustic information, obtain Acoustic space positional information；

Step 13, the facial image and the face information that prestores the match is successful and/or acoustic information with advance The match is successful for the acoustic information of storage, and mouth is moving in facial image, and in mouth spatial positional information and acoustic information When acoustic space positional information is consistent, pass through the verification process to the delivery operation person.

In the embodiment, terminal storage has the legal face information and acoustic information of the advance typing of user.Wherein, people Face information can be, but not limited to be several angles that user is shot with Front camera facial image, and from several angles Facial image extraction face characteristic, obtained face information.Acoustic information can with but be not limited only to be user read aloud numeral 1 ~9 voice messaging, or the phonetic feature extracted from the acoustic information of user, such as voiceprint.

The face information prestored and acoustic information, can be stored in the safety chip newly increased in mobile phone, use The typing of row information is entered in family when buying mobile phone.Can also be mobile phone SIM card in one application, user purchase mobile phone When, enter the typing of row information.

When the facial image matches with the face information prestored, can use but be not limited only to Meanshift, The ripe algorithms such as neutral net.When acoustic information matches with the acoustic information prestored, it can use such as in acoustic information Voiceprint be compared, if comparing successfully, then it is assumed that be that acoustic information matches with the acoustic information prestored.

In the above embodiment of the present invention, wherein, in step 11, during secure payment certification, obtain delivery operation person Facial image include：During secure payment certification, the facial image of the delivery operation person using picture pick-up device collection is obtained, Wherein, the picture pick-up device such as can be the preposition picture pick-up device of terminal or independent picture pick-up device, and shooting here is set Standby can be depth camera equipment.

The facial image of depth camera equipment collection, there is the depth information and half-tone information of the image, depth information It such as can be the attribute information for the room and time information for representing image, can such as include：The time when facial image is shot Information, spatial coordinate location information etc.；Half-tone information such as can be：The half-tone informations such as the color of the image, brightness.

Further, in step 11, during secure payment certification, obtain delivery operation person acoustic information the step of wrap Include：During secure payment certification, the acoustic information of the delivery operation person using the collection of sound recording device is obtained.The sound is recorded Enter equipment and include but is not limited to be the sound collection equipments such as microphone or microphone array.

In the embodiment of the present invention, in step 12, according to the facial image, the step of acquisition mouth spatial positional information Suddenly include：Mouth is identified from the half-tone information of facial image, and mouth sky is obtained from the depth information of the facial image Between positional information.

Specifically, the mouth of half-tone information (i.e. the color information of normal image) the identification user of facial image can be utilized Portion, naturally it is also possible to use but be not limited only to the ripe algorithms such as geometric properties matching, color histogram match face is identified Mouth in image；And the pixel in mouth center is marked, if the facial image is the depth map of depth camera shooting Picture, because depth image includes the three-dimensional coordinate information of each observation station, therefore directly store user's mouth center pixel point Locus, you can obtain mouth spatial positional information.

In the embodiment of the present invention, in step 12, according to the facial image, the step of acquisition mouth spatial positional information Suddenly can also include：Identify the mouth in several facial images, obtain by the pixel of mouth position line segment it Between intersection point, using the intersection point as mouth spatial positional information.

Specifically, three-dimensional coordinate information can be reconstructed with the video camera of two known relative positions.The method of reconstruct A point M for observation space identical with the vision system of people, if obtaining the first image, M (such as mouths with first video camera Lip) pixel M1 is expressed as in the picture, one can be found in the picture containing the line segment for having point M1.Similarly, if with another Individual video camera obtains information, and M is expressed as pixel M2 in the picture, can find one in the picture containing the line segment for having point M2. Using grey matching algorithm, the pixel of same spatial location in two images can be found, for example, M two pixel M1 and M2.This two lines section of two images includes M1 and M2 image coordinate respectively, is mapped to real space, the intersection point of this two lines section It is exactly M locus, this locus is unduplicated.

In the embodiment of the present invention, in step 12, according to the acoustic information, the step of acquisition acoustic space positional information Suddenly include：Obtain the time difference between the multiple microphones of the acoustic information arrival；According to sound propagation velocity and the time Difference, calculate sound source and leave one's post meaning the distance between microphone and the angle between any two microphone line；According to institute Distance and the angle are stated, obtains acoustic space positional information.

Specifically, the acoustic information obtained using the microphone array of known position information, can use but not only limit The locus of collection voice sound source is calculated in the algorithm of triangle polyester fibre.The general principle of triangle polyester fibre is as follows：Microphone obtains Voice signal, extract the feature of voice signal, it is known that sound propagation velocity, the time difference of each microphone, meter are reached according to voice Calculate sound source to leave one's post meaning the distance between microphone and the angle between any two microphone line, sound is calculated with this The locus in source.

In the embodiment of the present invention, in step 12, mouth includes in dynamic deterministic process in facial image：

Obtain continuous multiple image in facial image；If in the multiple image, the image of part occurs where mouth Change, then judge that mouth is dynamic in facial image.

Specifically, whether spoken really in detection user delivery operation person in secure payment authenticated time section, i.e., Mouth whether really dynamic, can with but be not limited only to be identified using frame difference method.

In the embodiment of the present invention, in step 12, the acoustic space position in mouth spatial positional information and acoustic information Confidence, which ceases consistent deterministic process, to be included：

Specifically, the two spaces positional information sequence calculated according to facial image and acoustic information, the sequence can be with But be not limited only to be 5 time points in proving time section positional information, calculate and judge whether voice is adopted from present image The sound that the mouth of collection is sent：The First ray of mouth spatial positional information including M time point with including M time point In second sequence of acoustic space positional information, the mouth locus at N number of time point is overlapped with acoustic space position or sky Between position difference in a predetermined threshold value when, then it is assumed that determine the acoustic space position in mouth spatial positional information and acoustic information Confidence breath is consistent；The positional information of All Time point is credible, then it is assumed that and voice comes from the operator in present image, otherwise, It is not.

In the concrete application embodiment of the present invention, the safety of payment grade of user's setting can be stored, returns to current hand over The result easily assessed.Under different secure payment grades, accepted standard is each different.Specific classification can with but not only limit In following 4 class：A, minimum safe grade, user only need one of voice recognition or image recognition by now may be used To tackle the environment that illumination is severe or environment is excessively noisy；B, intermediate security grade, voice recognition and image recognition all need to lead to Cross；C, compared with high safety grade, user not only needs sound, face characteristic matching, the position of articulation and image for also needing voice to position The mouth position of positioning is consistent, while the result of frame difference method must show that user is speaking really；D highest safe classes, give birth to again Into accidental validation information, it is desirable to the relative position of the relative mobile phone of operator's conversion, carry out secondary-confirmation.Higher safe class Under, the non-mobile phone of people of operating handset transaction or the possibility of phone number are lower.

The solution of the present invention can with but be not limited only to according to amount adjust automatically safe class is paid, small-amount pays corresponding Lower security grade, high amount correspond to high safety grade；Or highest safe class is defaulted as, user is good in illumination, ambient noise By freely setting safe class after checking in the case that condition is good.

Specific application example step is as follows：

Step 1：User typing face information and voice messaging when using mobile phone or Mobile phone card first.

The method of user's typing information is as follows：Have in Data Enter and memory module once effective initial close Code, when enabling, input initial password typing, typing success password failure.If mobile phone or Mobile phone card become renewal user, user Verified by highest ranking, the new user profile of typing.

Step 2：User, which enters, pays link.

Step 3：Application call secure payment interface is paid, whether current operation is user for inquiry.

Step 4：Mobile phone elastic frame prompts user plane and to say the checking sentence of prompting to front camera, can with but not It is only limitted to the identifying code of 6 Arabic numerals.Operator can be required when carrying out speech verification, movement is adopted in front camera Collect the relative position in image.

Step 5：User reads identifying code in face of camera as requested.

Step 6：Recognition of face, framing, voice positioning, sound identification module obtain from preposition depth camera respectively Head and the information of microphone array collection.

Step 7：Recognition of face, speech recognition contrast according to collection information and storage information, confirm errorless.If wrong, eventually Only merchandise.

Step 8：Spoken according to image in time of payment section and speech frame difference validation of information current operator.

Step 9：The spatial positional information positioned according to framing and voice, it is present image collection letter to confirm voice Breath person sends.

Step 10：Change checking sentence, it is desirable to which operator changes the relative position in current acquired image, re-executes Step 5 is to step 9.

Step 11：Risk assessment result is provided according to safe class.

Step 12：Mobile phone shows Query Result, if showing that the user is non-mobile phone or the cell-phone number owner It is very risky, then terminate transaction, otherwise continuous business.

The scheme of embodiments of the invention is using image recognition and the technology of speech processes, to tackle different environment and want Ask, on this basis, also add depth camera and microphone array in systems, positioned using framing and voice, really Phonetic feature is protected from the sound sent in current operator's mouth, with reference to the methods of secondary-confirmation, mobile phone is reduced and is cheated by video Possibility, greatly increase the cost of deception, possibility of the current phone user for my non-checking reduced, so that mobile phone It is safer to pay class transaction.

As shown in Fig. 2 the second embodiment of the present invention also provides a kind of information processor 20, including：

First acquisition module 21, for during secure payment certification, obtaining the facial image and sound of delivery operation person Message ceases；

Second acquisition module 22, for according to the facial image, obtaining mouth spatial positional information, and according to the sound Message ceases, and obtains acoustic space positional information；

Processing module 23, for the match is successful and/or sound is believed in the facial image and the face information that prestores Cease and the acoustic information that prestores the match is successful, and mouth is dynamic in facial image, and mouth spatial positional information and sound When acoustic space positional information in information is consistent, pass through the verification process to the delivery operation person.

Wherein, first acquisition module includes：

Wherein, second acquisition module includes：

3rd acquiring unit, for identifying mouth from the half-tone information of facial image, and from the depth of the facial image Spend and mouth spatial positional information is obtained in information；Or mouth in several facial images is identified, obtain by where mouth Intersection point between the line segment of the pixel of position, using the intersection point as mouth spatial positional information.

Wherein, the processing module includes：

As shown in figure 3, the terminal of the present invention realizes frame diagram, terminal 30 includes：Secure payment interface module, face are known Other module, framing module, voice locating module, sound identification module, Comprehensive Assessment module, Data Enter and storage mould Block.

Increase in mobile phone two-microphone array (can also be multi-microphone array, the microphone array of three and the above Be listed in voice positioning when it is more accurate), preposition video camera can with but be not limited only to be depth camera.

1) Data Enter and the face information and voice messaging of memory module storage user's typing.Face information can be But the feature that the face picture for being not limited only to some angles that user is shot with Front camera extracts.Voice messaging can with but not only It is limited to the feature that user reads aloud the voice extraction of numeral 1~9.

The module can be the safety chip increased newly in mobile phone, and user carries out Data Enter when buying mobile phone, now originally Scheme can reduce the risk of the non-mobile phone owner of operator.Can also be one in user's SIM card card application, user The typing information when buying Mobile phone card, then the scheme described in embodiments of the invention, which can reduce the non-cell-phone number of operator, to be owned The risk of person.Both the above can reduce the transaction risk of mobile-phone payment.

2) face recognition module, for detecting facial image and the people of memory module storage of current front camera collection Whether face information matches, and can use but be not limited only to Meanshift, the ripe algorithm such as neutral net carries out recognition of face.

Secondly for the mouth identified in front camera current acquired image, it can use but be not limited only to geometry spy The ripe algorithms such as sign matching, color histogram match are identified.

Again, whether spoken really in secure payment authenticated time section for detecting user, i.e., whether mouth is certain Dynamic, can with but be not limited only to be identified using frame difference method.

3) framing module (the 3rd acquiring unit of i.e. above-mentioned second acquisition module) is according to preposition depth camera Deep image information, the mouth identification method in face recognition module is called, utilize half-tone information (the i.e. common figure of depth image The color information of picture) mouth of user is identified, and the pixel in mouth center is marked, because depth image includes each observation The three-dimensional coordinate information of point, therefore directly store the locus of user's mouth center pixel point.

If preposition video camera common camera, the video cameras of two known relative positions is needed to reconstruct three-dimensional coordinate Information.The method of reconstruct is identical with the vision system of people：For a point M of observation space, if obtained with first video camera Information, M are expressed as pixel M1 in the picture, can find one in the picture containing the line segment for having point M1.Similarly, if with separately One video camera obtains information, and M is expressed as pixel M2 in the picture, can find one in the picture containing the line for having point M2 Section.Using grey matching algorithm, the pixel of same spatial location in two images, such as M two pixels can be found M1 and M2.This two lines section of two images includes M1 and M2 image coordinate respectively, is mapped to real space, this two lines section Intersection point is exactly M locus, and this locus is unduplicated.Same method can calculate the three-dimensional of different spaces point Coordinate, only need after mouth is identified in the scheme of embodiments of the invention, calculate and store the space of mouth center pixel point Position.

4) sound identification module be used to detecting the collection of current microphone phonetic feature whether the language with memory module storage Sound information matches, can with but be not limited only to using neutral net carry out speech recognition.

5) voice locating module (the 4th acquiring unit of i.e. above-mentioned second acquisition module) utilizes the wheat of known position information The voice signal that gram wind array obtains, the algorithm that can be used but be not limited only to triangle polyester fibre calculate the space of collection voice sound source Position.The general principle of triangle polyester fibre is as follows：Microphone obtains voice signal, extracts the feature of voice signal, it is known that sound passes Speed is broadcast, the time difference of each microphone is reached according to voice, sound source is calculated and leaves one's post the distance between meaning microphone and with appointing Angle between two microphone lines of meaning, the locus of sound source is calculated with this.

6) secure payment interface module is used to be paid for application call, returns to the result that current transaction risk is assessed.

7) Comprehensive Assessment module is whole by the result of calculation of recognition of face, Face detection, voice positioning, sound identification module Close, evaluate the integrated risk of the transaction.

Comprehensive Assessment module is responsible in proving time window same to each image and speech processing module progress time series Step.

For Comprehensive Assessment module according to framing and the two spaces positional information sequence of voice location Calculation, the sequence can With but be not limited only to be 5 time points in proving time section positional information, calculate and judge whether voice comes from present image The mouth of collection：Position overlaps between while if two modules return or spacing is within credible threshold value, then it is assumed that the time Dot position information is credible；The positional information of All Time point is credible, then it is assumed that voice comes from the operator in present image, no Then, then it is not.

The safety of payment grade that Comprehensive Assessment module storage user is set, return to the result that current transaction is assessed.

Face recognition module, sound identification module, Comprehensive Assessment module in the embodiment etc. can be arranged at above-mentioned Realized in processing module.

Pay in scene, the classification of payment can with but be not limited only to following 4 class：

A, minimum safe grade, user only need one of speech recognition or image recognition by now can be with Tackle the environment that illumination is severe or environment is excessively noisy；

B, intermediate security grade, speech recognition and image recognition all need to pass through；

C, compared with high safety grade, user not only need voice, face characteristic matching, also need voice position position of articulation with The mouth position of framing is consistent, while the result of frame difference method must show that user is speaking really；

D, highest safe class, accidental validation information is regenerated, it is desirable to the relative position of the relative mobile phone of operator's conversion, Carry out secondary-confirmation.

Under higher safe class, the non-mobile phone of people of operating handset transaction or the possibility of phone number are lower.

Scheme described in embodiments of the invention can with but be not limited only to according to paying amount adjust automatically safe class, it is small Amount pays corresponding lower security grade, and high amount corresponds to high safety grade；Or highest safe class is defaulted as, user is in illumination It is good, by freely setting safe class after checking in the case that background noise conditions are good.

The equipment and system proposed by the present invention that safe mobile phone payment is carried out using mobile phone camera and dual microphone, bag Containing preposition depth camera, double (more) microphone arrays, face recognition module, framing module, sound identification module, voice Locating module, secure payment interface module, Comprehensive Assessment module, Data Enter and memory module, pass through framing and voice Position location sequence, judge that voice messaging derives from operator in present image, the methods of can further combining secondary-confirmation, The possibility that mobile phone is cheated by video is reduced, greatly increases the cost of deception, current phone user is reduced and is verified in person to be non- Possibility so that mobile-phone payment class transaction it is safer.

Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, on the premise of principle of the present invention is not departed from, some improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

A kind of 1. information processing method, it is characterised in that including：

During secure payment certification, the facial image and acoustic information of delivery operation person are obtained；

According to the facial image, mouth spatial positional information is obtained, and according to the acoustic information, obtain acoustic space position Information；

In the facial image and the face information prestored, the match is successful and/or acoustic information is believed with the sound prestored The match is successful for breath, and mouth is moving in facial image, and the acoustic space position in mouth spatial positional information and acoustic information When information is consistent, pass through the verification process to the delivery operation person.
2. information processing method according to claim 1, it is characterised in that during secure payment certification, obtain and pay The step of facial image of operator, includes：

During secure payment certification, the facial image of the delivery operation person using picture pick-up device collection is obtained.
3. information processing method according to claim 1, it is characterised in that during secure payment certification, obtain and pay The step of acoustic information of operator, includes：

During secure payment certification, the acoustic information of the delivery operation person using the collection of sound recording device is obtained.
4. information processing method according to claim 1, it is characterised in that according to the facial image, it is empty to obtain mouth Between positional information the step of include：

Mouth is identified from the half-tone information of facial image, and mouth space bit is obtained from the depth information of the facial image Confidence ceases；Or

The mouth in several facial images is identified, obtains the intersection point between the line segment by the pixel of mouth position, Using the intersection point as mouth spatial positional information.
5. information processing method according to claim 1, it is characterised in that according to the acoustic information, it is empty to obtain sound Between positional information the step of include：

Obtain the time difference between the multiple microphones of the acoustic information arrival；

According to sound propagation velocity and the time difference, calculate sound source and leave one's post the distance between meaning microphone and with any two Angle between individual microphone line；

According to the distance and the angle, acoustic space positional information is obtained.
6. information processing method according to claim 1, it is characterised in that mouth is in dynamic deterministic process in facial image Including：

Obtain continuous multiple image in facial image；

If in the multiple image, the image of part changes where mouth, then judge that mouth is dynamic in facial image.
7. information processing method according to claim 1, it is characterised in that in mouth spatial positional information and acoustic information The consistent deterministic process of acoustic space positional information include：

Acoustic space position of the First ray for judging to include the mouth spatial positional information at M time point with including M time point In second sequence of confidence breath, and the acoustic space position coincidence or locus of the mouth locus at N number of time point differs When in a predetermined threshold value, determine that mouth spatial positional information is consistent with the acoustic space positional information in acoustic information；

Wherein, the ratio obtained by the N/M is more than a preset value, and M is more than N, and M, N are positive integer.
A kind of 8. information processor, it is characterised in that including：

First acquisition module, for during secure payment certification, obtaining the facial image and acoustic information of delivery operation person；

Second acquisition module, for according to the facial image, obtaining mouth spatial positional information, and believe according to the sound Breath, obtain acoustic space positional information；

Processing module, for the facial image and the face information that prestores the match is successful and/or acoustic information with it is pre- The match is successful for the acoustic information first stored, and mouth is moving in facial image, and in mouth spatial positional information and acoustic information Acoustic space positional information it is consistent when, pass through the verification process to the delivery operation person.
9. information processor according to claim 8, it is characterised in that first acquisition module includes：

First acquisition unit, for during secure payment certification, obtaining the people of the delivery operation person using picture pick-up device collection Face image；

Second acquisition unit, for during secure payment certification, obtaining the delivery operation person using the collection of sound recording device Acoustic information.
10. information processor according to claim 8, it is characterised in that second acquisition module includes：

3rd acquiring unit, believe for identifying mouth from the half-tone information of facial image, and from the depth of the facial image Mouth spatial positional information is obtained in breath；Or mouth in several facial images is identified, mouth position is passed through in acquisition Pixel line segment between intersection point, using the intersection point as mouth spatial positional information；

4th acquiring unit, for obtaining the time difference between the multiple microphones of the acoustic information arrival；According to sound transmission Speed and the time difference, calculate sound source and leave one's post meaning the distance between microphone and between any two microphone line Angle；According to the distance and the angle, acoustic space positional information is obtained.
11. information processor according to claim 8, it is characterised in that the processing module includes：

First processing units, for obtaining continuous multiple image in facial image, if in the multiple image, mouth place portion The image divided changes, then judges that mouth is dynamic in facial image；

Second processing unit, when including the First ray of the mouth spatial positional information at M time point for judgement and including M Between in the second sequence of acoustic space positional information for putting, the mouth locus at N number of time point overlaps with acoustic space position Or locus difference in a predetermined threshold value when, determine the acoustic space position in mouth spatial positional information and acoustic information Confidence breath is consistent；Wherein, the ratio obtained by the N/M is more than a preset value, and M is more than N, and M, N are positive integer.
A kind of 12. terminal, it is characterised in that including：Information processor as described in claim any one of 8-11.