CN105590104A - Recognition method and apparatus, and electronic equipment - Google Patents

Recognition method and apparatus, and electronic equipment Download PDF

Info

Publication number
CN105590104A
CN105590104A CN201511032295.1A CN201511032295A CN105590104A CN 105590104 A CN105590104 A CN 105590104A CN 201511032295 A CN201511032295 A CN 201511032295A CN 105590104 A CN105590104 A CN 105590104A
Authority
CN
China
Prior art keywords
recognized
preset
image
information
biological characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511032295.1A
Other languages
Chinese (zh)
Inventor
张守鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201511032295.1A priority Critical patent/CN105590104A/en
Publication of CN105590104A publication Critical patent/CN105590104A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a recognition method and an apparatus, and electronic equipment. The method includes: obtaining acquired images of a to-be-recognized object during speech sounding of the to-be-recognized object; extracting biological characteristic information from the obtained images during speech sounding of the to-be-recognized object; and performing matching of the extracted biological characteristic information and a plurality of pre-stored biological characteristic templates in order to realize identity recognition of the to-be-recognized object. Biological characteristics of the to-be-recognized object are less influenced by external factors such as makeup during speech sounding so that identity recognition is performed via the biological characteristics of the to-be-recognized object during speech sounding, and the reliability of identity recognition is improved.

Description

Identification method and device and electronic equipment
Technical Field
The present invention relates to the field of identification technologies, and in particular, to an identification method, an identification device, and an electronic device.
Background
Most of the existing identity recognition methods are implemented based on photographs, for example, a collected face photograph of an object to be recognized is compared with a pre-stored face photograph, so as to determine the identity of a face in the collected photograph. However, due to the influence of makeup, age and other factors, the conventional identification method is prone to error and has low reliability.
Therefore, how to improve the reliability of the identity recognition becomes an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide an identification method, an identification device and electronic equipment so as to improve the reliability of identity identification.
In order to achieve the purpose, the invention provides the following technical scheme:
an identification method, comprising:
acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
In the above method, preferably, the acquiring the image of the object to be recognized, which is acquired when the object to be recognized makes a speech sound, includes:
simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
performing voice recognition on the collected voice;
and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
In the foregoing method, preferably, the matching the extracted biometric information with a plurality of pre-stored biometric templates includes:
determining a biological characteristic template corresponding to the first preset information in the pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic templates;
and matching the extracted biological characteristic information with the determined biological characteristic template.
In the above method, preferably, the biometric information of the object to be recognized during the speech utterance includes: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extracting of the biometric information of the object to be recognized from the acquired image includes:
performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
In the above method, preferably, the biometric information of the object to be recognized during the speech utterance includes: the form of the preset part of the object to be recognized at the preset sounding moment; the extracting of the biometric information of the object to be recognized from the acquired image includes:
acquiring an image acquired at a preset sounding moment of the object to be recognized;
performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
An identification device comprising:
the device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance;
the extraction module is used for extracting the biological characteristic information of the object to be recognized during voice production from the acquired image;
the matching module is used for matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates and determining a first biological characteristic template matched with the extracted biological characteristic;
and the determining module is used for determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
Preferably, the above apparatus, the obtaining module includes:
the acquisition unit is used for simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
the voice recognition unit is used for carrying out voice recognition on the collected voice;
the first obtaining unit is used for obtaining the image of the object to be recognized, which is collected at the moment when the voice recognition result is the first preset information.
The above apparatus, preferably, the matching module includes:
a first determining unit, configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to the first preset information from among the plurality of pre-stored biometric templates;
and the matching unit is used for matching the extracted biological characteristic information with the determined biological characteristic template.
Preferably, in the apparatus, the biometric information of the object to be recognized during the speech utterance includes: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extraction module comprises:
the first processing unit is used for carrying out first processing on the acquired image to obtain a first processing result, and the first processing result represents the position of the predetermined part in each frame of image;
and the second determining unit is used for determining the motion track of the preset part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the preset part in each frame of image.
Preferably, in the apparatus, the biometric information of the object to be recognized during the speech utterance includes: the form of the preset part of the object to be recognized at the preset sounding moment; the extraction module comprises:
the second acquisition unit is used for acquiring an image acquired at the preset sounding moment of the object to be recognized;
the second processing unit is used for carrying out second processing on the acquired image to obtain a second processing result, and the second processing result represents the boundary characteristic of the preset part;
and the third determining unit is used for determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
An electronic device comprising an identification apparatus as claimed in any preceding claim.
According to the scheme, the identification method, the identification device and the electronic equipment provided by the application acquire the image of the object to be identified, which is acquired when the object to be identified makes a voice sound; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of an identification method according to an embodiment of the present invention;
fig. 2 is a flowchart of an implementation of acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs a speech utterance according to an embodiment of the present invention;
fig. 3 is a flowchart of an implementation of matching the extracted biometric information with a plurality of pre-stored biometric templates according to an embodiment of the present invention;
FIG. 4 is a flowchart of an implementation of extracting biometric information of a voice utterance of an object to be recognized from an acquired image according to an embodiment of the present invention;
fig. 5 is a flowchart of another implementation of extracting biometric information of a voice utterance of an object to be recognized from an acquired image according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an identification apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an obtaining module according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a matching module according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of an extraction module according to an embodiment of the present invention;
fig. 10 is another schematic structural diagram of an extraction module according to an embodiment of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of an identification method according to an embodiment of the present invention, which may include:
step S11: acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
in the embodiment of the invention, the image of the object to be recognized is collected in the speaking process of the object to be recognized. For example, an image of the object to be recognized may be acquired during a question asking process for the object to be recognized. The questioning process may be initiated by an electronic device or may be initiated by a human being.
Step S12: extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
because the acquired image is acquired in the speaking process of the object to be recognized, the biological characteristic information of the object to be recognized extracted from the image represents the dynamic characteristic of the object to be recognized.
Step S13: matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
the pre-stored biometric template is also biometric information of the object to be recognized when the object to be recognized is in a dynamic state, and can be used for collecting an image of the object to be recognized in advance when the object to be recognized makes a voice sound, and performing biometric extraction on the collected image of the object to be recognized to establish the biometric template of the object to be recognized when the object to be recognized is in the dynamic state. And establishing the incidence relation between the identity information of the object to be identified and the biological characteristic template.
Optionally, the first biometric template may be determined as a biometric template having a matching degree with the extracted biometric features higher than a preset threshold and having a highest matching degree.
Step S14: and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
The recognition method provided by the embodiment of the invention comprises the steps of acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Optionally, an implementation flowchart for acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs a speech utterance, is shown in fig. 2, and may include:
step S21: simultaneously acquiring the voice of an object to be recognized and the image of the object to be recognized;
that is, while the voice of the object to be recognized is being captured, an image of the object to be recognized is captured.
Step S22: performing voice recognition on the collected voice;
step S23: and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
In the embodiment of the present invention, a plurality of pieces of preset information may be preset, and the first preset information may be at least one of the plurality of pieces of preset information.
In the embodiment of the invention, not all the acquired images of the object to be recognized are acquired, but only a part of the images of the object to be recognized are acquired. And determining the partial image according to the voice recognition result, if the voice recognition result is the first preset information, determining the acquisition time of the voice of which the voice recognition result is the first preset information, and then only acquiring the image of the object to be recognized, which is acquired within the voice acquisition time. That is, if the voice recognition result is that the voice of the first preset information is captured within the time period t1 to t2, the image of the object to be recognized captured within the time period t1 to t2 is acquired.
Optionally, as shown in fig. 3, an implementation flowchart for matching the extracted biometric information with a plurality of pre-stored biometric templates according to the embodiment of the present invention may include:
step S31: determining a biological characteristic template corresponding to the first preset information in a plurality of pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic template;
for the same object to be identified, different preset information may have different biological characteristics, and therefore, in the embodiment of the present invention, the biological characteristic templates are respectively established for the different preset information, and the association relationship between the preset information and the biological characteristic templates is established.
For example, the preset information may be a specific word, such as wednesday, etc.; or phoneme information such as vowel vocalization, consonant vocalization, etc.
Step S32: and matching the extracted biological characteristic information with the determined biological characteristic template, and determining a first biological characteristic template matched with the extracted biological characteristic.
In the embodiment of the invention, only the extracted biological characteristic information is matched with part of biological characteristic templates (namely, the biological characteristic templates corresponding to the first preset information), so that the calculation amount in the matching process is reduced, and the matching efficiency is improved.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the method comprises the following steps that when an object to be recognized pronounces voice, the motion trail of a preset part of the object to be recognized is obtained; accordingly, an implementation flowchart of extracting biometric information of a to-be-recognized object in a speech utterance from an acquired image according to an embodiment of the present invention is shown in fig. 4, and may include:
step S41: performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized. Specifically, a plurality of specific points may be selected at specific positions, and the positions of the plurality of feature points in each frame of image may be determined. Because the acquisition time of each frame of image is different, the position of the predetermined part in each frame of image is the position of the predetermined part at different time. According to the position of the predetermined part in each frame image, the motion track of the predetermined part can be determined.
Step S42: and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the form of a preset part of the object to be recognized at a preset sounding moment; accordingly, another implementation flowchart for extracting biometric information of a to-be-recognized object during a speech utterance from an acquired image according to an embodiment of the present invention is shown in fig. 5, and may include:
step S51: acquiring an image acquired at a preset sounding moment of the object to be recognized;
the predetermined utterance timing may be a predetermined utterance timing of a vowel, and/or a predetermined utterance timing of a consonant.
Step S52: performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized.
Step S53: and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
The form of the predetermined portion is the shape or posture of the predetermined portion.
Optionally, if the biometric template matching the extracted biometric information is not found, which indicates that the identity information of the identification object is not stored in the database, the biometric information of the identification object may be saved, and the association relationship between the identity information of the identification object and the biometric information of the identification object may be established.
Corresponding to the method embodiment, an embodiment of the present invention further provides an identification device, and a schematic structural diagram of the identification device provided in the embodiment of the present invention is shown in fig. 6, and may include:
an acquisition module 61, an extraction module 62, a matching module 63 and a determination module 64; wherein,
the acquisition module 61 is configured to acquire an image of an object to be recognized, which is acquired when the object to be recognized performs a voice utterance;
in the embodiment of the invention, the image of the object to be recognized is collected in the speaking process of the object to be recognized. For example, an image of the object to be recognized may be acquired during a question asking process for the object to be recognized. The questioning process may be initiated by an electronic device or may be initiated by a human being.
The extraction module 62 is configured to extract, from the acquired image, biometric information of the object to be recognized during voice utterance;
because the acquired image is acquired in the speaking process of the object to be recognized, the biological characteristics of the object to be recognized extracted from the image represent the dynamic characteristics of the object to be recognized.
The matching module 63 is configured to match the extracted biometric information with a plurality of pre-stored biometric templates, and determine a first biometric template matching the extracted biometric information;
the pre-stored biometric template is also biometric information of the object to be recognized when the object to be recognized is in a dynamic state, and can be used for collecting an image of the object to be recognized in advance when the object to be recognized makes a voice sound, and performing biometric extraction on the collected image of the object to be recognized to establish the biometric template of the object to be recognized when the object to be recognized is in the dynamic state. And establishing the incidence relation between the identity information of the object to be identified and the biological characteristic template.
Optionally, the first biometric template may be determined as a biometric template having a matching degree with the extracted biometric features higher than a preset threshold and having a highest matching degree.
The determining module 64 is configured to determine the identity information corresponding to the first biometric template as the identity information of the object to be recognized.
The recognition device provided by the embodiment of the invention acquires the image of the object to be recognized, which is acquired when the object to be recognized makes a voice utterance; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Optionally, a schematic structural diagram of the obtaining module 61 provided in the embodiment of the present invention is shown in fig. 7, and may include:
a collection unit 71, a voice recognition unit 72 and a first acquisition unit 73; wherein,
the acquisition unit 71 is configured to simultaneously acquire a voice of an object to be recognized and an image of the object to be recognized;
that is, the acquisition unit 71 acquires an image of the object to be recognized while acquiring the voice of the object to be recognized.
The voice recognition unit 72 is used for performing voice recognition on the collected voice;
the first obtaining unit 73 is configured to obtain, when the voice recognition result is the first preset information, an image of the object to be recognized that is collected at a time of collecting the voice of which the voice recognition result is the first preset information.
In the embodiment of the present invention, a plurality of pieces of preset information may be preset, and the first preset information may be at least one of the plurality of pieces of preset information.
In the embodiment of the invention, not all the acquired images of the object to be recognized are acquired, but only a part of the images of the object to be recognized are acquired. And determining the partial image according to the voice recognition result, if the voice recognition result is the first preset information, determining the acquisition time of the voice of which the voice recognition result is the first preset information, and then only acquiring the image of the object to be recognized, which is acquired within the voice acquisition time. That is, if the voice recognition result is that the voice of the first preset information is captured within the time period t1 to t2, the image of the object to be recognized captured within the time period t1 to t2 is acquired.
Optionally, a schematic structural diagram of the matching module 63 provided in the embodiment of the present invention is shown in fig. 8, and may include:
a first determination unit 81 and a matching unit 82; wherein,
the first determining unit 81 is configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to first preset information from a plurality of pre-stored biometric templates;
for the same object to be identified, different preset information may have different biological characteristics, and therefore, in the embodiment of the present invention, the biological characteristic templates are respectively established for the different preset information, and the association relationship between the preset information and the biological characteristic templates is established.
For example, the preset information may be a specific word, such as wednesday, etc.; or phoneme information such as vowel vocalization, consonant vocalization, etc.
The matching unit 82 is configured to match the extracted biometric information with the determined biometric template, and determine a first biometric template matching the extracted biometric information.
In the embodiment of the invention, only the extracted biological characteristic information is matched with part of biological characteristic templates (namely, the biological characteristic templates corresponding to the first preset information), so that the calculation amount in the matching process is reduced, and the matching efficiency is improved.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the method comprises the following steps that when an object to be recognized pronounces voice, the motion trail of a preset part of the object to be recognized is obtained; accordingly, a schematic structural diagram of the extraction module 62 provided in the embodiment of the present invention is shown in fig. 9, and may include:
a first processing unit 91 and a second determining unit 92; wherein,
the first processing unit 91 is configured to perform first processing on the acquired image to obtain a first processing result, where the first processing result represents a position of the predetermined portion in each frame of image;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized. Specifically, a plurality of specific points may be selected at specific positions, and the positions of the plurality of feature points in each frame of image may be determined. Because the acquisition time of each frame of image is different, the position of the predetermined part in each frame of image is the position of the predetermined part at different time. According to the position of the predetermined part in each frame image, the motion track of the predetermined part can be determined.
The second determination unit 92 is configured to determine a motion trajectory of a predetermined portion of the object to be recognized based on a position of the predetermined portion in each frame image.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the form of a preset part of the object to be recognized at a preset sounding moment; correspondingly, another schematic structural diagram of the extraction module 62 provided in the embodiment of the present invention is shown in fig. 10, and may include:
a second acquisition unit 101, a second processing unit 102, and a third determination unit 103; wherein,
the second acquiring unit 101 is used for acquiring an image acquired at a preset sounding moment of the object to be recognized;
the predetermined utterance timing may be a predetermined utterance timing of a vowel, and/or a predetermined utterance timing of a consonant.
The second processing unit 102 is configured to perform second processing on the acquired image to obtain a second processing result, where the second processing result represents a boundary feature of the predetermined portion;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized.
The third determination unit 103 is configured to determine a morphology of the predetermined portion of the object to be recognized at the predetermined utterance time based on the boundary characteristics of the predetermined portion.
The form of the predetermined portion is the shape or posture of the predetermined portion.
Optionally, the identification apparatus provided in the embodiment of the present invention may further include:
and a storing module, configured to store the biometric information of the object to be recognized and establish an association relationship between the identity information of the object to be recognized and the biometric information of the object to be recognized if the matching module 63 does not find the biometric template matching the extracted biometric information, which indicates that the identity information of the object to be recognized is not stored in the database.
The embodiment of the invention also provides electronic equipment which is provided with the identification device disclosed by the embodiment shown in any one of figures 6-10.
Based on the recognition method, the device and the electronic equipment provided by the embodiment of the invention, a question can be asked for an object to be recognized at random, the voice and the image of the object to be recognized are collected at the same time, the first voice and the first image collected at the same time (the first image may be only one frame of image or a plurality of frames of images) are recognized to obtain a recognition result, preset information is extracted from the recognition result, the biological feature of the preset part of the object to be recognized is extracted from the first image, the extracted biological feature is matched with a biological feature template corresponding to the preset information, a biological feature template matched with the extracted biological feature is determined, and the identity information corresponding to the biological feature template is the identity information of the object to be recognized.
In order to make the object to be recognized speak the preset information, the object to be recognized can be asked with a targeted question. In order to further improve the identification accuracy, the question can be asked at random for the object to be identified.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems (if any), apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system (if present), apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. An identification method, comprising:
acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
2. The method of claim 1, wherein the acquiring the image of the object to be recognized captured while the object to be recognized is making a speech utterance comprises:
simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
performing voice recognition on the collected voice;
and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
3. The method of claim 2, wherein matching the extracted biometric information with a number of pre-stored biometric templates comprises:
determining a biological characteristic template corresponding to the first preset information in the pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic templates;
and matching the extracted biological characteristic information with the determined biological characteristic template.
4. The method according to claim 1, wherein the biometric information of the object to be recognized in the voice utterance comprises: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extracting of the biological feature information of the object to be recognized in the voice utterance from the acquired image comprises:
performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
5. The method according to claim 1, wherein the biometric information of the object to be recognized in the voice utterance comprises: the form of the preset part of the object to be recognized at the preset sounding moment; the extracting of the biological feature information of the object to be recognized in the voice utterance from the acquired image comprises:
acquiring an image acquired at a preset sounding moment of the object to be recognized;
performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
6. An identification device, comprising:
the device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance;
the extraction module is used for extracting the biological characteristic information of the object to be recognized during voice production from the acquired image;
the matching module is used for matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates and determining a first biological characteristic template matched with the extracted biological characteristic;
and the determining module is used for determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
7. The apparatus of claim 6, wherein the obtaining module comprises:
the acquisition unit is used for simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
the voice recognition unit is used for carrying out voice recognition on the collected voice;
the first obtaining unit is used for obtaining the image of the object to be recognized, which is collected at the moment when the voice recognition result is the first preset information.
8. The apparatus of claim 7, wherein the matching module comprises:
a first determining unit, configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to the first preset information from among the plurality of pre-stored biometric templates;
and the matching unit is used for matching the extracted biological characteristic information with the determined biological characteristic template.
9. The apparatus according to claim 6, wherein the biometric information of the object to be recognized in the speech utterance comprises: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extraction module comprises:
the first processing unit is used for carrying out first processing on the acquired image to obtain a first processing result, and the first processing result represents the position of the predetermined part in each frame of image;
and the second determining unit is used for determining the motion track of the preset part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the preset part in each frame of image.
10. The apparatus according to claim 6, wherein the biometric information of the object to be recognized in the speech utterance comprises: the form of the preset part of the object to be recognized at the preset sounding moment; the extraction module comprises:
the second acquisition unit is used for acquiring an image acquired at the preset sounding moment of the object to be recognized;
the second processing unit is used for carrying out second processing on the acquired image to obtain a second processing result, and the second processing result represents the boundary characteristic of the preset part;
and the third determining unit is used for determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
11. An electronic device, characterized in that it comprises an identification device according to any one of claims 6-10.
CN201511032295.1A 2015-12-31 2015-12-31 Recognition method and apparatus, and electronic equipment Pending CN105590104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511032295.1A CN105590104A (en) 2015-12-31 2015-12-31 Recognition method and apparatus, and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511032295.1A CN105590104A (en) 2015-12-31 2015-12-31 Recognition method and apparatus, and electronic equipment

Publications (1)

Publication Number Publication Date
CN105590104A true CN105590104A (en) 2016-05-18

Family

ID=55929674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511032295.1A Pending CN105590104A (en) 2015-12-31 2015-12-31 Recognition method and apparatus, and electronic equipment

Country Status (1)

Country Link
CN (1) CN105590104A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system
CN104598796A (en) * 2015-01-30 2015-05-06 科大讯飞股份有限公司 Method and system for identifying identity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
CN104361276A (en) * 2014-11-18 2015-02-18 新开普电子股份有限公司 Multi-mode biometric authentication method and multi-mode biometric authentication system
CN104598796A (en) * 2015-01-30 2015-05-06 科大讯飞股份有限公司 Method and system for identifying identity

Similar Documents

Publication Publication Date Title
CN104361276B (en) A kind of multi-modal biological characteristic identity identifying method and system
WO2017162017A1 (en) Method and device for voice data processing and storage medium
EP3229171A1 (en) Method and device for determining identity identifier of human face in human face image, and terminal
CN106709402A (en) Living person identity authentication method based on voice pattern and image features
CN105488227B (en) A kind of electronic equipment and its method that audio file is handled based on vocal print feature
US8712774B2 (en) Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems
CN111339806B (en) Training method of lip language recognition model, living body recognition method and device
CN109036436A (en) A kind of voice print database method for building up, method for recognizing sound-groove, apparatus and system
CN104376250A (en) Real person living body identity verification method based on sound-type image feature
WO2021051608A1 (en) Voiceprint recognition method and device employing deep learning, and apparatus
CN105989836B (en) Voice acquisition method and device and terminal equipment
CN111243619B (en) Training method and device for speech signal segmentation model and computer equipment
CN109658921B (en) Voice signal processing method, equipment and computer readable storage medium
US20140148709A1 (en) System and method for integrating heart rate measurement and identity recognition
JP2018159788A5 (en) Information processing device, emotion recognition method, and program
Fox et al. VALID: A new practical audio-visual database, and comparative results
KR20170081350A (en) Text Interpretation Apparatus and Method for Performing Text Recognition and Translation Per Frame Length Unit of Image
CN111950327A (en) Mouth shape correcting method, mouth shape correcting device, mouth shape correcting medium and computing equipment
CN108696768A (en) A kind of audio recognition method and system
CN111179919B (en) Method and device for determining aphasia type
CN109065026B (en) Recording control method and device
CN107910006A (en) Audio recognition method, device and multiple source speech differentiation identifying system
Jain et al. Visual speech recognition for isolated digits using discrete cosine transform and local binary pattern features
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
CN108665901A (en) A kind of phoneme/syllable extracting method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160518

RJ01 Rejection of invention patent application after publication