CN105590104A - Recognition method and apparatus, and electronic equipment - Google Patents
Recognition method and apparatus, and electronic equipment Download PDFInfo
- Publication number
- CN105590104A CN105590104A CN201511032295.1A CN201511032295A CN105590104A CN 105590104 A CN105590104 A CN 105590104A CN 201511032295 A CN201511032295 A CN 201511032295A CN 105590104 A CN105590104 A CN 105590104A
- Authority
- CN
- China
- Prior art keywords
- recognized
- preset
- image
- information
- biological characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012545 processing Methods 0.000 claims description 45
- 238000000605 extraction Methods 0.000 claims description 14
- 238000004519 manufacturing process Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 description 10
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a recognition method and an apparatus, and electronic equipment. The method includes: obtaining acquired images of a to-be-recognized object during speech sounding of the to-be-recognized object; extracting biological characteristic information from the obtained images during speech sounding of the to-be-recognized object; and performing matching of the extracted biological characteristic information and a plurality of pre-stored biological characteristic templates in order to realize identity recognition of the to-be-recognized object. Biological characteristics of the to-be-recognized object are less influenced by external factors such as makeup during speech sounding so that identity recognition is performed via the biological characteristics of the to-be-recognized object during speech sounding, and the reliability of identity recognition is improved.
Description
Technical Field
The present invention relates to the field of identification technologies, and in particular, to an identification method, an identification device, and an electronic device.
Background
Most of the existing identity recognition methods are implemented based on photographs, for example, a collected face photograph of an object to be recognized is compared with a pre-stored face photograph, so as to determine the identity of a face in the collected photograph. However, due to the influence of makeup, age and other factors, the conventional identification method is prone to error and has low reliability.
Therefore, how to improve the reliability of the identity recognition becomes an urgent problem to be solved.
Disclosure of Invention
The invention aims to provide an identification method, an identification device and electronic equipment so as to improve the reliability of identity identification.
In order to achieve the purpose, the invention provides the following technical scheme:
an identification method, comprising:
acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
In the above method, preferably, the acquiring the image of the object to be recognized, which is acquired when the object to be recognized makes a speech sound, includes:
simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
performing voice recognition on the collected voice;
and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
In the foregoing method, preferably, the matching the extracted biometric information with a plurality of pre-stored biometric templates includes:
determining a biological characteristic template corresponding to the first preset information in the pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic templates;
and matching the extracted biological characteristic information with the determined biological characteristic template.
In the above method, preferably, the biometric information of the object to be recognized during the speech utterance includes: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extracting of the biometric information of the object to be recognized from the acquired image includes:
performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
In the above method, preferably, the biometric information of the object to be recognized during the speech utterance includes: the form of the preset part of the object to be recognized at the preset sounding moment; the extracting of the biometric information of the object to be recognized from the acquired image includes:
acquiring an image acquired at a preset sounding moment of the object to be recognized;
performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
An identification device comprising:
the device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance;
the extraction module is used for extracting the biological characteristic information of the object to be recognized during voice production from the acquired image;
the matching module is used for matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates and determining a first biological characteristic template matched with the extracted biological characteristic;
and the determining module is used for determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
Preferably, the above apparatus, the obtaining module includes:
the acquisition unit is used for simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
the voice recognition unit is used for carrying out voice recognition on the collected voice;
the first obtaining unit is used for obtaining the image of the object to be recognized, which is collected at the moment when the voice recognition result is the first preset information.
The above apparatus, preferably, the matching module includes:
a first determining unit, configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to the first preset information from among the plurality of pre-stored biometric templates;
and the matching unit is used for matching the extracted biological characteristic information with the determined biological characteristic template.
Preferably, in the apparatus, the biometric information of the object to be recognized during the speech utterance includes: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extraction module comprises:
the first processing unit is used for carrying out first processing on the acquired image to obtain a first processing result, and the first processing result represents the position of the predetermined part in each frame of image;
and the second determining unit is used for determining the motion track of the preset part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the preset part in each frame of image.
Preferably, in the apparatus, the biometric information of the object to be recognized during the speech utterance includes: the form of the preset part of the object to be recognized at the preset sounding moment; the extraction module comprises:
the second acquisition unit is used for acquiring an image acquired at the preset sounding moment of the object to be recognized;
the second processing unit is used for carrying out second processing on the acquired image to obtain a second processing result, and the second processing result represents the boundary characteristic of the preset part;
and the third determining unit is used for determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
An electronic device comprising an identification apparatus as claimed in any preceding claim.
According to the scheme, the identification method, the identification device and the electronic equipment provided by the application acquire the image of the object to be identified, which is acquired when the object to be identified makes a voice sound; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of an identification method according to an embodiment of the present invention;
fig. 2 is a flowchart of an implementation of acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs a speech utterance according to an embodiment of the present invention;
fig. 3 is a flowchart of an implementation of matching the extracted biometric information with a plurality of pre-stored biometric templates according to an embodiment of the present invention;
FIG. 4 is a flowchart of an implementation of extracting biometric information of a voice utterance of an object to be recognized from an acquired image according to an embodiment of the present invention;
fig. 5 is a flowchart of another implementation of extracting biometric information of a voice utterance of an object to be recognized from an acquired image according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an identification apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an obtaining module according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a matching module according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of an extraction module according to an embodiment of the present invention;
fig. 10 is another schematic structural diagram of an extraction module according to an embodiment of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of an identification method according to an embodiment of the present invention, which may include:
step S11: acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
in the embodiment of the invention, the image of the object to be recognized is collected in the speaking process of the object to be recognized. For example, an image of the object to be recognized may be acquired during a question asking process for the object to be recognized. The questioning process may be initiated by an electronic device or may be initiated by a human being.
Step S12: extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
because the acquired image is acquired in the speaking process of the object to be recognized, the biological characteristic information of the object to be recognized extracted from the image represents the dynamic characteristic of the object to be recognized.
Step S13: matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
the pre-stored biometric template is also biometric information of the object to be recognized when the object to be recognized is in a dynamic state, and can be used for collecting an image of the object to be recognized in advance when the object to be recognized makes a voice sound, and performing biometric extraction on the collected image of the object to be recognized to establish the biometric template of the object to be recognized when the object to be recognized is in the dynamic state. And establishing the incidence relation between the identity information of the object to be identified and the biological characteristic template.
Optionally, the first biometric template may be determined as a biometric template having a matching degree with the extracted biometric features higher than a preset threshold and having a highest matching degree.
Step S14: and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
The recognition method provided by the embodiment of the invention comprises the steps of acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Optionally, an implementation flowchart for acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs a speech utterance, is shown in fig. 2, and may include:
step S21: simultaneously acquiring the voice of an object to be recognized and the image of the object to be recognized;
that is, while the voice of the object to be recognized is being captured, an image of the object to be recognized is captured.
Step S22: performing voice recognition on the collected voice;
step S23: and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
In the embodiment of the present invention, a plurality of pieces of preset information may be preset, and the first preset information may be at least one of the plurality of pieces of preset information.
In the embodiment of the invention, not all the acquired images of the object to be recognized are acquired, but only a part of the images of the object to be recognized are acquired. And determining the partial image according to the voice recognition result, if the voice recognition result is the first preset information, determining the acquisition time of the voice of which the voice recognition result is the first preset information, and then only acquiring the image of the object to be recognized, which is acquired within the voice acquisition time. That is, if the voice recognition result is that the voice of the first preset information is captured within the time period t1 to t2, the image of the object to be recognized captured within the time period t1 to t2 is acquired.
Optionally, as shown in fig. 3, an implementation flowchart for matching the extracted biometric information with a plurality of pre-stored biometric templates according to the embodiment of the present invention may include:
step S31: determining a biological characteristic template corresponding to the first preset information in a plurality of pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic template;
for the same object to be identified, different preset information may have different biological characteristics, and therefore, in the embodiment of the present invention, the biological characteristic templates are respectively established for the different preset information, and the association relationship between the preset information and the biological characteristic templates is established.
For example, the preset information may be a specific word, such as wednesday, etc.; or phoneme information such as vowel vocalization, consonant vocalization, etc.
Step S32: and matching the extracted biological characteristic information with the determined biological characteristic template, and determining a first biological characteristic template matched with the extracted biological characteristic.
In the embodiment of the invention, only the extracted biological characteristic information is matched with part of biological characteristic templates (namely, the biological characteristic templates corresponding to the first preset information), so that the calculation amount in the matching process is reduced, and the matching efficiency is improved.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the method comprises the following steps that when an object to be recognized pronounces voice, the motion trail of a preset part of the object to be recognized is obtained; accordingly, an implementation flowchart of extracting biometric information of a to-be-recognized object in a speech utterance from an acquired image according to an embodiment of the present invention is shown in fig. 4, and may include:
step S41: performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized. Specifically, a plurality of specific points may be selected at specific positions, and the positions of the plurality of feature points in each frame of image may be determined. Because the acquisition time of each frame of image is different, the position of the predetermined part in each frame of image is the position of the predetermined part at different time. According to the position of the predetermined part in each frame image, the motion track of the predetermined part can be determined.
Step S42: and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the form of a preset part of the object to be recognized at a preset sounding moment; accordingly, another implementation flowchart for extracting biometric information of a to-be-recognized object during a speech utterance from an acquired image according to an embodiment of the present invention is shown in fig. 5, and may include:
step S51: acquiring an image acquired at a preset sounding moment of the object to be recognized;
the predetermined utterance timing may be a predetermined utterance timing of a vowel, and/or a predetermined utterance timing of a consonant.
Step S52: performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized.
Step S53: and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
The form of the predetermined portion is the shape or posture of the predetermined portion.
Optionally, if the biometric template matching the extracted biometric information is not found, which indicates that the identity information of the identification object is not stored in the database, the biometric information of the identification object may be saved, and the association relationship between the identity information of the identification object and the biometric information of the identification object may be established.
Corresponding to the method embodiment, an embodiment of the present invention further provides an identification device, and a schematic structural diagram of the identification device provided in the embodiment of the present invention is shown in fig. 6, and may include:
an acquisition module 61, an extraction module 62, a matching module 63 and a determination module 64; wherein,
the acquisition module 61 is configured to acquire an image of an object to be recognized, which is acquired when the object to be recognized performs a voice utterance;
in the embodiment of the invention, the image of the object to be recognized is collected in the speaking process of the object to be recognized. For example, an image of the object to be recognized may be acquired during a question asking process for the object to be recognized. The questioning process may be initiated by an electronic device or may be initiated by a human being.
The extraction module 62 is configured to extract, from the acquired image, biometric information of the object to be recognized during voice utterance;
because the acquired image is acquired in the speaking process of the object to be recognized, the biological characteristics of the object to be recognized extracted from the image represent the dynamic characteristics of the object to be recognized.
The matching module 63 is configured to match the extracted biometric information with a plurality of pre-stored biometric templates, and determine a first biometric template matching the extracted biometric information;
the pre-stored biometric template is also biometric information of the object to be recognized when the object to be recognized is in a dynamic state, and can be used for collecting an image of the object to be recognized in advance when the object to be recognized makes a voice sound, and performing biometric extraction on the collected image of the object to be recognized to establish the biometric template of the object to be recognized when the object to be recognized is in the dynamic state. And establishing the incidence relation between the identity information of the object to be identified and the biological characteristic template.
Optionally, the first biometric template may be determined as a biometric template having a matching degree with the extracted biometric features higher than a preset threshold and having a highest matching degree.
The determining module 64 is configured to determine the identity information corresponding to the first biometric template as the identity information of the object to be recognized.
The recognition device provided by the embodiment of the invention acquires the image of the object to be recognized, which is acquired when the object to be recognized makes a voice utterance; extracting biological characteristic information of the object to be recognized during voice production from the acquired image; the extracted biological characteristic information is matched with a plurality of pre-stored biological characteristic templates, so that the identity of the object to be recognized is recognized, and the biological characteristics of the object to be recognized during voice production are less influenced by external factors such as makeup, so that the reliability of the identity recognition is improved by recognizing the identity through the biological characteristics of the object to be recognized during voice production.
Optionally, a schematic structural diagram of the obtaining module 61 provided in the embodiment of the present invention is shown in fig. 7, and may include:
a collection unit 71, a voice recognition unit 72 and a first acquisition unit 73; wherein,
the acquisition unit 71 is configured to simultaneously acquire a voice of an object to be recognized and an image of the object to be recognized;
that is, the acquisition unit 71 acquires an image of the object to be recognized while acquiring the voice of the object to be recognized.
The voice recognition unit 72 is used for performing voice recognition on the collected voice;
the first obtaining unit 73 is configured to obtain, when the voice recognition result is the first preset information, an image of the object to be recognized that is collected at a time of collecting the voice of which the voice recognition result is the first preset information.
In the embodiment of the present invention, a plurality of pieces of preset information may be preset, and the first preset information may be at least one of the plurality of pieces of preset information.
In the embodiment of the invention, not all the acquired images of the object to be recognized are acquired, but only a part of the images of the object to be recognized are acquired. And determining the partial image according to the voice recognition result, if the voice recognition result is the first preset information, determining the acquisition time of the voice of which the voice recognition result is the first preset information, and then only acquiring the image of the object to be recognized, which is acquired within the voice acquisition time. That is, if the voice recognition result is that the voice of the first preset information is captured within the time period t1 to t2, the image of the object to be recognized captured within the time period t1 to t2 is acquired.
Optionally, a schematic structural diagram of the matching module 63 provided in the embodiment of the present invention is shown in fig. 8, and may include:
a first determination unit 81 and a matching unit 82; wherein,
the first determining unit 81 is configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to first preset information from a plurality of pre-stored biometric templates;
for the same object to be identified, different preset information may have different biological characteristics, and therefore, in the embodiment of the present invention, the biological characteristic templates are respectively established for the different preset information, and the association relationship between the preset information and the biological characteristic templates is established.
For example, the preset information may be a specific word, such as wednesday, etc.; or phoneme information such as vowel vocalization, consonant vocalization, etc.
The matching unit 82 is configured to match the extracted biometric information with the determined biometric template, and determine a first biometric template matching the extracted biometric information.
In the embodiment of the invention, only the extracted biological characteristic information is matched with part of biological characteristic templates (namely, the biological characteristic templates corresponding to the first preset information), so that the calculation amount in the matching process is reduced, and the matching efficiency is improved.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the method comprises the following steps that when an object to be recognized pronounces voice, the motion trail of a preset part of the object to be recognized is obtained; accordingly, a schematic structural diagram of the extraction module 62 provided in the embodiment of the present invention is shown in fig. 9, and may include:
a first processing unit 91 and a second determining unit 92; wherein,
the first processing unit 91 is configured to perform first processing on the acquired image to obtain a first processing result, where the first processing result represents a position of the predetermined portion in each frame of image;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized. Specifically, a plurality of specific points may be selected at specific positions, and the positions of the plurality of feature points in each frame of image may be determined. Because the acquisition time of each frame of image is different, the position of the predetermined part in each frame of image is the position of the predetermined part at different time. According to the position of the predetermined part in each frame image, the motion track of the predetermined part can be determined.
The second determination unit 92 is configured to determine a motion trajectory of a predetermined portion of the object to be recognized based on a position of the predetermined portion in each frame image.
Optionally, in this embodiment of the present invention, the biometric information of the object to be recognized during the speech sound generation may include: the form of a preset part of the object to be recognized at a preset sounding moment; correspondingly, another schematic structural diagram of the extraction module 62 provided in the embodiment of the present invention is shown in fig. 10, and may include:
a second acquisition unit 101, a second processing unit 102, and a third determination unit 103; wherein,
the second acquiring unit 101 is used for acquiring an image acquired at a preset sounding moment of the object to be recognized;
the predetermined utterance timing may be a predetermined utterance timing of a vowel, and/or a predetermined utterance timing of a consonant.
The second processing unit 102 is configured to perform second processing on the acquired image to obtain a second processing result, where the second processing result represents a boundary feature of the predetermined portion;
in the embodiment of the present invention, the specific portion may be an oral portion, an eye portion, or the like of the object to be recognized.
The third determination unit 103 is configured to determine a morphology of the predetermined portion of the object to be recognized at the predetermined utterance time based on the boundary characteristics of the predetermined portion.
The form of the predetermined portion is the shape or posture of the predetermined portion.
Optionally, the identification apparatus provided in the embodiment of the present invention may further include:
and a storing module, configured to store the biometric information of the object to be recognized and establish an association relationship between the identity information of the object to be recognized and the biometric information of the object to be recognized if the matching module 63 does not find the biometric template matching the extracted biometric information, which indicates that the identity information of the object to be recognized is not stored in the database.
The embodiment of the invention also provides electronic equipment which is provided with the identification device disclosed by the embodiment shown in any one of figures 6-10.
Based on the recognition method, the device and the electronic equipment provided by the embodiment of the invention, a question can be asked for an object to be recognized at random, the voice and the image of the object to be recognized are collected at the same time, the first voice and the first image collected at the same time (the first image may be only one frame of image or a plurality of frames of images) are recognized to obtain a recognition result, preset information is extracted from the recognition result, the biological feature of the preset part of the object to be recognized is extracted from the first image, the extracted biological feature is matched with a biological feature template corresponding to the preset information, a biological feature template matched with the extracted biological feature is determined, and the identity information corresponding to the biological feature template is the identity information of the object to be recognized.
In order to make the object to be recognized speak the preset information, the object to be recognized can be asked with a targeted question. In order to further improve the identification accuracy, the question can be asked at random for the object to be identified.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems (if any), apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system (if present), apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. An identification method, comprising:
acquiring an image of an object to be recognized, which is acquired when the object to be recognized performs voice sounding;
extracting biological characteristic information of the object to be recognized during voice production from the acquired image;
matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates, and determining a first biological characteristic template matched with the extracted biological characteristic;
and determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
2. The method of claim 1, wherein the acquiring the image of the object to be recognized captured while the object to be recognized is making a speech utterance comprises:
simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
performing voice recognition on the collected voice;
and when the voice recognition result is first preset information, acquiring the image of the object to be recognized, which is acquired at the acquisition moment of the voice of which the voice recognition result is the first preset information.
3. The method of claim 2, wherein matching the extracted biometric information with a number of pre-stored biometric templates comprises:
determining a biological characteristic template corresponding to the first preset information in the pre-stored biological characteristic templates according to the corresponding relation between the preset information and the biological characteristic templates;
and matching the extracted biological characteristic information with the determined biological characteristic template.
4. The method according to claim 1, wherein the biometric information of the object to be recognized in the voice utterance comprises: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extracting of the biological feature information of the object to be recognized in the voice utterance from the acquired image comprises:
performing first processing on the acquired image to obtain a first processing result, wherein the first processing result represents the position of the predetermined part in each frame of image;
and determining the motion trail of the predetermined part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the predetermined part in each frame image.
5. The method according to claim 1, wherein the biometric information of the object to be recognized in the voice utterance comprises: the form of the preset part of the object to be recognized at the preset sounding moment; the extracting of the biological feature information of the object to be recognized in the voice utterance from the acquired image comprises:
acquiring an image acquired at a preset sounding moment of the object to be recognized;
performing second processing on the acquired image to obtain a second processing result, wherein the second processing result represents the boundary characteristic of the predetermined part;
and determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
6. An identification device, comprising:
the device comprises an acquisition module, a voice recognition module and a voice recognition module, wherein the acquisition module is used for acquiring an image of an object to be recognized, which is acquired when the object to be recognized makes a voice utterance;
the extraction module is used for extracting the biological characteristic information of the object to be recognized during voice production from the acquired image;
the matching module is used for matching the extracted biological characteristic information with a plurality of pre-stored biological characteristic templates and determining a first biological characteristic template matched with the extracted biological characteristic;
and the determining module is used for determining the identity information corresponding to the first biological characteristic template as the identity information of the object to be identified.
7. The apparatus of claim 6, wherein the obtaining module comprises:
the acquisition unit is used for simultaneously acquiring the voice of the object to be recognized and the image of the object to be recognized;
the voice recognition unit is used for carrying out voice recognition on the collected voice;
the first obtaining unit is used for obtaining the image of the object to be recognized, which is collected at the moment when the voice recognition result is the first preset information.
8. The apparatus of claim 7, wherein the matching module comprises:
a first determining unit, configured to determine, according to a correspondence between preset information and a biometric template, a biometric template corresponding to the first preset information from among the plurality of pre-stored biometric templates;
and the matching unit is used for matching the extracted biological characteristic information with the determined biological characteristic template.
9. The apparatus according to claim 6, wherein the biometric information of the object to be recognized in the speech utterance comprises: the motion trail of the preset part of the object to be recognized is obtained when the object to be recognized vocalizes; the extraction module comprises:
the first processing unit is used for carrying out first processing on the acquired image to obtain a first processing result, and the first processing result represents the position of the predetermined part in each frame of image;
and the second determining unit is used for determining the motion track of the preset part of the object to be recognized when the voice of the object to be recognized is sounded based on the position of the preset part in each frame of image.
10. The apparatus according to claim 6, wherein the biometric information of the object to be recognized in the speech utterance comprises: the form of the preset part of the object to be recognized at the preset sounding moment; the extraction module comprises:
the second acquisition unit is used for acquiring an image acquired at the preset sounding moment of the object to be recognized;
the second processing unit is used for carrying out second processing on the acquired image to obtain a second processing result, and the second processing result represents the boundary characteristic of the preset part;
and the third determining unit is used for determining the form of the preset part of the object to be recognized at the preset sounding moment based on the boundary characteristics of the preset part.
11. An electronic device, characterized in that it comprises an identification device according to any one of claims 6-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511032295.1A CN105590104A (en) | 2015-12-31 | 2015-12-31 | Recognition method and apparatus, and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511032295.1A CN105590104A (en) | 2015-12-31 | 2015-12-31 | Recognition method and apparatus, and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105590104A true CN105590104A (en) | 2016-05-18 |
Family
ID=55929674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511032295.1A Pending CN105590104A (en) | 2015-12-31 | 2015-12-31 | Recognition method and apparatus, and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105590104A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
CN104361276A (en) * | 2014-11-18 | 2015-02-18 | 新开普电子股份有限公司 | Multi-mode biometric authentication method and multi-mode biometric authentication system |
CN104598796A (en) * | 2015-01-30 | 2015-05-06 | 科大讯飞股份有限公司 | Method and system for identifying identity |
-
2015
- 2015-12-31 CN CN201511032295.1A patent/CN105590104A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101101752A (en) * | 2007-07-19 | 2008-01-09 | 华中科技大学 | Monosyllabic language lip-reading recognition system based on vision character |
CN104361276A (en) * | 2014-11-18 | 2015-02-18 | 新开普电子股份有限公司 | Multi-mode biometric authentication method and multi-mode biometric authentication system |
CN104598796A (en) * | 2015-01-30 | 2015-05-06 | 科大讯飞股份有限公司 | Method and system for identifying identity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361276B (en) | A kind of multi-modal biological characteristic identity identifying method and system | |
WO2017162017A1 (en) | Method and device for voice data processing and storage medium | |
EP3229171A1 (en) | Method and device for determining identity identifier of human face in human face image, and terminal | |
CN106709402A (en) | Living person identity authentication method based on voice pattern and image features | |
CN105488227B (en) | A kind of electronic equipment and its method that audio file is handled based on vocal print feature | |
US8712774B2 (en) | Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems | |
CN111339806B (en) | Training method of lip language recognition model, living body recognition method and device | |
CN109036436A (en) | A kind of voice print database method for building up, method for recognizing sound-groove, apparatus and system | |
CN104376250A (en) | Real person living body identity verification method based on sound-type image feature | |
WO2021051608A1 (en) | Voiceprint recognition method and device employing deep learning, and apparatus | |
CN105989836B (en) | Voice acquisition method and device and terminal equipment | |
CN111243619B (en) | Training method and device for speech signal segmentation model and computer equipment | |
CN109658921B (en) | Voice signal processing method, equipment and computer readable storage medium | |
US20140148709A1 (en) | System and method for integrating heart rate measurement and identity recognition | |
JP2018159788A5 (en) | Information processing device, emotion recognition method, and program | |
Fox et al. | VALID: A new practical audio-visual database, and comparative results | |
KR20170081350A (en) | Text Interpretation Apparatus and Method for Performing Text Recognition and Translation Per Frame Length Unit of Image | |
CN111950327A (en) | Mouth shape correcting method, mouth shape correcting device, mouth shape correcting medium and computing equipment | |
CN108696768A (en) | A kind of audio recognition method and system | |
CN111179919B (en) | Method and device for determining aphasia type | |
CN109065026B (en) | Recording control method and device | |
CN107910006A (en) | Audio recognition method, device and multiple source speech differentiation identifying system | |
Jain et al. | Visual speech recognition for isolated digits using discrete cosine transform and local binary pattern features | |
CN109817223A (en) | Phoneme marking method and device based on audio fingerprints | |
CN108665901A (en) | A kind of phoneme/syllable extracting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160518 |
|
RJ01 | Rejection of invention patent application after publication |