CN108537207A - Lip reading recognition methods, device, storage medium and mobile terminal - Google Patents
Lip reading recognition methods, device, storage medium and mobile terminal Download PDFInfo
- Publication number
- CN108537207A CN108537207A CN201810372876.7A CN201810372876A CN108537207A CN 108537207 A CN108537207 A CN 108537207A CN 201810372876 A CN201810372876 A CN 201810372876A CN 108537207 A CN108537207 A CN 108537207A
- Authority
- CN
- China
- Prior art keywords
- lip reading
- lip
- information
- reading identification
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the present application discloses lip reading recognition methods, device, storage medium and mobile terminal.This method includes:When detecting that lip reading identification events are triggered, at least frame 3D lip images of active user are obtained by 3D cameras;The 3D lips image is input in lip reading identification model trained in advance;Lip reading information corresponding with the 3D lips image is determined according to the output result of the lip reading identification model;The lip reading information is supplied to the active user.The embodiment of the present application is by using above-mentioned technical proposal, simple, quick lip reading can be carried out by the lip reading identification model built in advance to 3D lip images to identify, and the accuracy of lip reading identification is further improved, the man-machine interaction experience of user is effectively increased, user demand is preferably met.
Description
Technical field
The invention relates to technical field of information processing more particularly to lip reading recognition methods, device, storage medium and
Mobile terminal.
Background technology
With the fast development of electronic technology and the increasingly raising of people's living standard, smart mobile phone, tablet computer etc. are eventually
End has become an essential part in people's life.Meanwhile more human-computer interaction is carried out with terminal device to hommization
(Human-Computer Interaction, HCI) becomes more and more important.
However, at present most of terminal device by the modes of operation such as keyboard input, handwriting input and voice input into
Row human-computer interaction, it is not convenient enough in many cases, interference of the external environment to user cannot be reduced well.For example, comparing
Voice input is carried out under noisy environment, precision of identifying speech is low, effect is poor, and is easy the privacy leakage of user.There is mirror
In this, lip reading identification technology is come into being, and accurate lip reading identification becomes most important in field of human-computer interaction.
Invention content
The embodiment of the present application provides lip reading recognition methods, device, storage medium and mobile terminal, can improve lip reading identification
Accuracy, meet user demand.
In a first aspect, the embodiment of the present application provides a kind of lip reading recognition methods, including:
When detecting that lip reading identification events are triggered, at least frame 3D lips of active user are obtained by 3D cameras
Image;
The 3D lips image is input in lip reading identification model trained in advance;
Lip reading information corresponding with the 3D lips image is determined according to the output result of the lip reading identification model;
The lip reading information is supplied to the active user.
Second aspect, the embodiment of the present application provide a kind of lip reading identification device, including:
Lip image collection module, for when detecting that lip reading identification events are triggered, being worked as by the acquisition of 3D cameras
At least frame 3D lip images of preceding user;
Lip image input module, for the 3D lips image to be input in lip reading identification model trained in advance;
Lip reading information determination module, for being determined and the 3D lips figure according to the output result of the lip reading identification model
As corresponding lip reading information;
Lip reading information providing module, for the lip reading information to be supplied to the active user.
The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey
Sequence realizes the lip reading recognition methods as described in the embodiment of the present application when the program is executed by processor.
Fourth aspect, the embodiment of the present application provide a kind of mobile terminal, including memory, processor and are stored in storage
It can realize on device and when the computer program of processor operation, the processor execute the computer program as the application is real
Apply the lip reading recognition methods described in example.
The lip reading identifying schemes provided in the embodiment of the present application pass through 3D when detecting that lip reading identification events are triggered
Camera obtains at least frame 3D lip images of active user, and the 3D lips image is input to lip reading trained in advance
In identification model, lip reading information corresponding with the 3D lips image is determined according to the output result of lip reading identification model, then
Lip reading information is supplied to active user.It, can be by the lip reading identification model that builds in advance by using above-mentioned technical proposal
Simple, quick lip reading identification is carried out to 3D lip images, and further improves the accuracy of lip reading identification, is effectively increased
The man-machine interaction experience of user, preferably meets user demand.
Description of the drawings
Fig. 1 is a kind of flow diagram of lip reading recognition methods provided by the embodiments of the present application;
Fig. 2 is the flow diagram of another lip reading recognition methods provided by the embodiments of the present application;
Fig. 3 is the flow diagram of another lip reading recognition methods provided by the embodiments of the present application;
Fig. 4 is a kind of structure diagram of lip reading identification device provided by the embodiments of the present application;
Fig. 5 is a kind of structural schematic diagram of mobile terminal provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of another mobile terminal provided by the embodiments of the present application.
Specific implementation mode
Further illustrate the technical solution of the application below with reference to the accompanying drawings and specific embodiments.It is appreciated that
It is that specific embodiment described herein is used only for explaining the application, rather than the restriction to the application.It further needs exist for illustrating
, illustrate only for ease of description, in attached drawing and the relevant part of the application rather than entire infrastructure.
It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail
The processing described as flow chart or method.Although each step is described as the processing of sequence, many of which by flow chart
Step can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of each step can be rearranged.When its operation
The processing can be terminated when completion, it is also possible to the additional step being not included in attached drawing.The processing can be with
Corresponding to method, function, regulation, subroutine, subprogram etc..
Fig. 1 is a kind of flow diagram of lip reading recognition methods provided by the embodiments of the present application, and the present embodiment is applicable to
Lip reading identify the case where, this method can be executed by lip reading identification device, wherein the device can by software and or hardware realization,
It can generally integrate in the terminal.As shown in Figure 1, this method includes:
Step 101, when detecting that lip reading identification events are triggered, pass through 3D cameras obtain active user at least one
Frame 3D lip images.
Illustratively, the mobile terminal in the embodiment of the present application may include the mobile devices such as mobile phone and tablet computer.
When detecting that lip reading identification events are triggered, active user is obtained at least by the 3D cameras of mobile terminal
One frame 3D lip images, to start lip reading identification events.
Illustratively, in order to carry out lip reading identification on suitable opportunity, lip reading identification events can be pre-set and be triggered
Condition.Optionally, for the demand that really determining user identifies lip reading, lip reading can actively be opened detecting active user
When identifying permission, lip reading identification events are triggered.Optionally, in order to make lip reading identification be applied to more valuable Time window, with
Extra power consumption caused by lip reading identification is saved, the Time window and application scenarios that can be identified to lip reading are analyzed or investigated
Deng, rational default scene is set, when detecting mobile terminal in default scene, triggering lip reading identification events.It needs to illustrate
, the embodiment of the present application do not limit the specific manifestation form that lip reading identification events are triggered.
In the embodiment of the present application, when detecting that lip reading identification events are triggered, the 3D cameras of mobile terminal are opened,
That is the 3D cameras of control mobile terminal are in running order.Wherein, 3D cameras are properly termed as 3D sensors again.3D cameras
Difference lies in 3D cameras can not only obtain flat image, can also obtain the depth of reference object with common camera
Information, that is, three-dimensional positions and dimensions information.At least frame 3D lip images of active user are obtained by 3D cameras.
Wherein, by at least frame 3D lip images of 3D cameras acquisition active user, may include:It is imaged by 3D
Head obtains an at least frame image for the lip-region of active user, as 3D lip images;Or, being obtained by 3D cameras current
At least frame 3D facial images of user extract lip-region image from 3D facial images, and by the lip-region figure of extraction
As being used as 3D lip images.Optionally, can the elevation information based on lip extract lip-region from the 3D facial images
Image, as 3D lip images.Because lip is located at the lower section of nose, the elevation information (namely depth information) of nose is more than mouth
The elevation information of lip, and the elevation information of lip is located at other regions of face, therefore can be based on the elevation information of lip
Feature extracts lip-region image from 3D facial images.Optionally, it is also based on outline identification technology, such as edge detection,
Identify lip specific location in 3D facial images, the corresponding image of specific location that lip is extracted from 3D facial images is made
For lip-region image.
The 3D lips image is input in lip reading identification model trained in advance by step 102.
Wherein, lip reading identification model can be understood as after inputting 3D lip images, quickly determine and the 3D lip images
The learning model of corresponding lip reading information.Lip reading identification model can be to the 3D sample lip images of acquisition and its corresponding lip
Language content is trained the learning model of generation.Illustratively, when user says " hello " in a manner of sounding or not sounding
When word content, the situation of change of corresponding lip feature is:Lower lip moves downward first, and the corners of the mouth moves upwards (pronunciation
Ni), later, lip is in O-shaped (pronunciation hao), and the depth information of lip can change with the movement of lip in the whole process
Become.A corresponding at least frame 3D sample lip images learn when to sending out lip reading content different, namely to 3D sample lip figures
The various characteristic informations such as the depth information of picture and the change information of lip are learnt, and lip reading identification model is generated.
Step 103 determines lip reading corresponding with the 3D lips image according to the output result of the lip reading identification model
Information.
In the embodiment of the present application, it by at least frame 3D lip images of the active user obtained in step 101, is input to
In advance after trained lip reading identification model, lip reading identification model can carry out the characteristic information of an at least frame 3D lip images
Analysis, and can determine lip reading information corresponding with the 3D lip images according to analysis result.Illustratively, by 3D lip images
Sequence (multiframe 3D lips image) is input to lip reading identification model, and lip reading identification model believes the feature of the 3D lip image sequences
Breath is analyzed, wherein characteristic information may include lip shape of the mouth as one speaks change information and lip depth information situation of change.Through dividing
Analysis, the lip shape of the mouth as one speaks change information of the 3D lip image sequences are:" lower lip parts a little, and upper lip is arc-shaped upwards " arrive " outside lips,
Tongue is tight against lower tooth ", corresponding lip depth information situation of change is:Lip depth information first increases and then decreases, and in certain
Rule can then determine that lip reading information corresponding with the 3D lip image sequences is " page turning ".
The lip reading information is supplied to the active user by step 104.
Illustratively, lip reading information is supplied to active user in the form of word or voice.For example, user is short in editor
During letter, lip reading identification information can be presented on the information of word in short message editing frame.For another example, user with friend
During carrying out wechat chat, lip reading information can be presented in the form of word in chat conversations frame, or by lip reading
Information is sent to friend in the form of speech, to carry out voice-enabled chat.It should be noted that the embodiment of the present application is to believing lip reading
Breath is supplied to the mode of active user not limit.
The lip reading recognition methods provided in the embodiment of the present application passes through 3D when detecting that lip reading identification events are triggered
Camera obtains at least frame 3D lip images of active user, and the 3D lips image is input to lip reading trained in advance
In identification model, lip reading information corresponding with the 3D lips image is determined according to the output result of lip reading identification model, then
Lip reading information is supplied to active user.It, can be by the lip reading identification model that builds in advance by using above-mentioned technical proposal
Simple, quick lip reading identification is carried out to 3D lip images, and further improves the accuracy of lip reading identification, is effectively increased
The man-machine interaction experience of user, preferably meets user demand.
In some embodiments, before detecting that lip reading identification events are triggered, further include:Acquisition is preset in crowd
An at least frame 3D sample lip images for each individual, and obtain lip reading content corresponding with the 3D samples lip image;Root
The 3D samples lip image is marked according to the lip reading content, obtains training sample set;Utilize the training sample set
Default machine learning model is trained, lip reading identification model is obtained.Wherein, according to lip reading content to 3D sample lip images
It is marked, it will be appreciated that be the sample labeling that lip content is denoted as to corresponding 3D samples lip image.The benefit being arranged in this way
It is, by the 3D lips image of each individual and its corresponding lip reading content, the sample as lip reading identification model in default crowd
This source can greatly improve the accuracy trained to lip reading identification model.
In the embodiment of the present application, an at least frame 3D sample lips for each individual in the crowd of presetting are acquired by 3D cameras
Portion's image, and obtain lip reading content corresponding with 3D sample lip images.Wherein it is possible to before mobile terminal manufacture, make to preset
Each individual reads a large amount of different lip reading contents in a manner of sounding or not sounding in crowd, and is imaged by the 3D of mobile terminal
Head acquisition presets each individual in crowd and reads corresponding 3D lips image when different lip reading contents, as 3D sample lip images.
For example, can obtain respectively 5,000 ages in the user between 18-30 Sui the corresponding 3D samples people when reading different lip reading contents
Face image, and from the 3D sample facial images extract lip-region image, as 3D sample lip images, and as
The training sample of lip reading identification model.Wherein, default crowd may include the different groups such as children, teenager, a middle-aged person and old man
In the combination of any one or more, the embodiment of the present application to preset crowd covering range be not specifically limited.It is exemplary
, the 3D lips video image of each individual in crowd can be preset by the 3D camera synchronous acquisitions with microphone and is said
Sound is talked about, when carrying out 3D lips video image and sound of speaking acquires, to ensure the synchronism and correspondence of the two, avoid shadow
Ring the precision of lip reading identification model training.It is of course also possible to largely preset each individual speaker in crowd from interconnection online collection
Video, lip region image is extracted from video per frame image, as 3D sample lip images, and will be corresponding with the video
Speech content as lip reading content.Lip reading content may include Chinese, English and the corresponding word of any language such as Japanese, short
The contents such as language, short sentence, long sentence or paragraph.3D sample lip images are marked according to lip reading content, that is, mark 3D sample lips
Sonagram is as corresponding lip reading content, using the 3D sample lip images of the good corresponding lip reading content of label as lip reading identification model
Training sample set.Machine learning model is preset using training sample set pair to be trained, and generates lip reading identification model.Wherein, in advance
If machine learning model may include convolutional neural networks model or the long machine learning models such as memory network model in short-term.This Shen
Please embodiment default machine learning model is not limited.
Wherein, before detecting that lip reading identification events are triggered, lip reading identification model is obtained.It should be noted that can
To be the above-mentioned training sample set of acquisition for mobile terminal, presets machine learning model using training sample set pair and be trained, directly
Generate lip reading identification model.It can also be that mobile terminal directly invokes the lip reading identification model that the training of other mobile terminals generates,
For example, using an acquisition for mobile terminal training sample set and generating lip reading identification model before manufacture, then the lip reading is known
Other model storage is directly used to other mobile terminals for other mobile terminals.Alternatively, server obtains a large amount of 3D samples
This lip image, and be marked according to corresponding lip reading content, obtain training sample set.Server is to being based on default machine
Device learning model is trained training sample set, obtains lip reading identification model.When mobile terminal needs to carry out lip reading identification,
Namely when detecting that lip reading identification events are triggered, from server calls trained lip reading identification model.
In some embodiments, the 3D samples lip image is being marked according to the lip reading content, is being instructed
Before practicing sample set, further include:Obtain the first human facial expression information corresponding with the 3D samples lip image;According to the lip
The 3D samples lip image is marked in language content, obtains training sample set, including:According to the lip reading content and described
The 3D samples lip image is marked in first human facial expression information, obtains training sample set;Correspondingly, by the 3D
Before lip image is input in lip reading identification model trained in advance, further include:It obtains corresponding with the 3D lips image
Second human facial expression information;The 3D lips image is input in lip reading identification model trained in advance, including:By the 3D
Lip image and second human facial expression information are input in lip reading identification model trained in advance.The benefit being arranged in this way exists
In, can according to the expression information of user when different lip reading contents and the corresponding 3D samples lip image of different lip reading contents, into
The training study of row lip reading identification model, can further increase the accuracy that lip reading identification model determines lip reading information.
In the embodiment of the present application, in the content difference of expression, corresponding human face expression may be not quite similar user.Example
Such as, the content of user's expression is " I am homesick ", corresponding human face expression relatively distress;User's target content is that " I looks for
Arrive work ", corresponding human face expression is relatively joyful, even mad with joy.The content of user's expression is different, face table
Feelings are different, and the characteristic information of corresponding 3D lips image also differs.Therefore, it can obtain corresponding with 3D sample lip images
Human facial expression information namely the first human facial expression information, and according to lip reading content and corresponding first human facial expression information to 3D
Sample lip image is marked, and the 3D sample lip images of lip reading content and the first human facial expression information will have been marked as lip
The training sample of language identification model.Wherein it is possible to which the corresponding 3D facial images of 3D sample lip images are input to Expression Recognition
In model, the analysis according to Expression Recognition model to 3D facial images determines the expression information namely 3D samples of 3D facial images
Corresponding first human facial expression information of lip image.First human facial expression information may include expression informations such as " happiness, anger, grief and joy ".
Correspondingly, before being input to 3D lip images in lip reading identification model trained in advance, further include:It obtains and the 3D lips
Corresponding second human facial expression information of portion's image, and 3D lips image and the second human facial expression information are input to training in advance
In lip reading identification model.It is understood that lip reading identification model is to 3D lip readings image and face corresponding with 3D lip reading images
Expression information carries out comprehensive analysis, and the 3D lip images under pair human face expression corresponding with human facial expression information carry out lip reading knowledge
Not, the lip reading information of the 3D lip images is determined.
In some embodiments, the second human facial expression information corresponding with the 3D lips image is obtained, including:It will be described
The corresponding 3D facial images of 3D lip images are input to Expression Recognition model, obtain the second people corresponding with the 3D lips image
Facial expression information.Wherein, Expression Recognition model can be understood as after the corresponding 3D facial images of input 3D lip images, quickly
Determine the learning model of the expression information of the 3D facial images, namely determining corresponding with the 3D lip images in the 3D facial images
Expression information learning model.Expression Recognition model can be instructed to the 3D facial images of acquisition and its corresponding expression
Practice the learning model generated.Illustratively, each individual in the crowd of presetting can be shot by the 3D cameras of mobile terminal to exist
3D sample facial images under different expressions, and emotag is carried out to 3D facial images, i.e., according to expression information to corresponding
3D sample facial images are marked.The 3D sample facial images being had to label based on preset machine learning model
It is trained, generates Expression Recognition model.After the corresponding 3D facial images of 3D lip images are input to Expression Recognition model,
Expression Recognition model is analyzed the 3D facial images, such as the characteristic information and cheek of face in analysis 3D facial images
Characteristic information, to obtain the second human facial expression information corresponding with 3D lip images.The technical solution provided through this embodiment,
The second human facial expression information corresponding with 3D lip images can be quickly and accurately determined, to be conducive to follow-up accurately root
Lip reading identification is carried out to 3D lip images according to human facial expression information.
In some embodiments, it is determined according to the output result of the lip reading identification model corresponding with the 3D lips image
Lip reading information, including:Lip reading recognition result is obtained according to the output result of the lip reading identification model;The lip reading is identified
As a result it is input in the semantic understanding model built in advance;Wherein, the semantic understanding model, which is used to work as, gets multiple lip readings
When recognition result, based on context relationship determines lip reading information from multiple lip reading recognition results;By the semantic understanding model
Output result be determined as lip reading information corresponding with the 3D lips image.It is understood that lip reading identification model may
There are errors to be identified to the lip reading of 3D lip images, it is understood that there may be multiple lip reading recognition results can by way of semantic analysis
Accurately to determine lip reading information from multiple lip reading recognition results.The advantages of this arrangement are as follows based on context knowing to lip reading
Multiple lip reading recognition results of other model output carry out semantic understanding, can quickly and accurately determine that user really wants table
The lip reading information reached.
Illustratively, lip reading identification model includes in output result by analyzing an at least frame 3D lip images
Multiple lip reading recognition results, namely determine that the 3D lip images correspond to multiple lip reading recognition results.For example, the 3D of active user
Really corresponding lip reading information is " in high spirits " to lip image, but based on lip reading identification model to the 3D lip image analyses
Lip reading recognition result include it is multiple, respectively:" surname Gao Cailie ", " in high spirits " and " apricot cake is adopted strong ".In order to accurately true
The fixed 3D lip images with the active user obtained, real corresponding lip reading information, can input multiple lip reading recognition results
To the semantic understanding model built in advance, semantic understanding model is set to analyze respectively each lip reading recognition result, final root
It is determined and the real corresponding lip reading information of the 3D lips image according to the output result of semantic understanding model.Wherein, semantic understanding
Model can be understood as after inputting multiple lip reading recognition results, and one of lip is quickly determined from multiple lip reading recognition results
Learning model of the language recognition result as lip reading information.For example, by " surname Gao Cailie ", " in high spirits " and " apricot cake is adopted strong " three
Lip reading recognition result is input in semantic understanding model, based on context semantic understanding model is analyzed it one by one, is determined
It is final lip reading information to go out " in high spirits ".
In the embodiment of the present application, when in multiple lip reading recognition results only include information above, do not include context information when,
Information above in multiple lip reading recognition results can be analyzed based on semantic understanding model, determine final lip reading identification letter
Breath;When only including context information in multiple lip reading recognition results semantic understanding model pair can be based on when not including information above
Context information is analyzed in multiple lip reading recognition results, determines final lip reading identification information;When multiple lip reading recognition results
In not only included information above, but also when including context information, semantic understanding model can be based on in multiple lip reading recognition results
Context information is analyzed, and determines final lip reading identification information.And believe above when both not including in multiple lip reading recognition results
Breath, and when not comprising context information, it can be according to the statistics to active user's the most used word within a preset period of time
As a result, determining final lip reading information from multiple lip reading recognition results.Illustratively, an independent Chinese character had not both included upper
Literary information, and do not include context information, for example, multiple lip reading recognition results are respectively:" I ", " nest ", " fertile " and " holding ", and work as
Preceding user uses the number of " I " most in one week, and " nest ", " fertile " and " holding " is almost seldom used, then can make " I "
For final lip reading information.
In the embodiment of the present application, detect concrete mode that lip reading identification events are triggered can there are many kinds of, do not limit
It is fixed, several ways are given below illustratively.
1, the lip reading identification instruction for whether receiving active user's input monitored;Refer to when receiving the lip reading identification
When enabling, confirmly detects lip reading identification events and be triggered.
The advantages of this arrangement are as follows can receive it is input by user carry out lip reading identification clearly instruction when, then
Lip language identification function is opened, the real demand that user identifies lip reading is met.Wherein, lip reading identification instruction can be that user is advance
The control instruction of the unlatching lip language identification function of setting.Wherein, the preset control instruction packet for opening lip language identification function
Include but be not limited to Mechanical course instruction (such as receive the preset mechanical button of user's operation and send out instruction), default voice refers to
It enables, default touching instruction, and/or preset fingerprint information command etc..If listening to the lip reading identification instruction of active user's input,
Lip reading identification events are confirmly detected to be triggered.
2, current time and/or the current location of mobile terminal are obtained;When the current time be in preset time period and/
Or the current location be predeterminated position when, confirmly detect lip reading identification events and be triggered.
The advantages of this arrangement are as follows current time that can be according to mobile terminal and/or current location, rationally determine lip
The trigger timing of language identification events.Illustratively, the current time of mobile terminal is obtained, and judges current time whether in pre-
If in the period.For example, preset time period may include 9 between date:00-12:00 and 14:00-18:00, if currently
Time is Tuesday 11:00, i.e. current time is in preset time period, it is determined that detects that lip reading identification events are triggered.It can
With understanding, preset time period, which is user, to be inconvenient to carry out voice or is being inconvenient to be manually entered, or is manually entered ratio
Using the period of mobile terminal when relatively wasting time, at this point, when user need using mobile terminal when, if using mobile terminal with
Friend chats, it may be determined that detects that lip reading identification events are triggered.It is again illustrative, the current location of mobile terminal is obtained,
And judge whether current location is in predeterminated position.For example, predeterminated position may include company and other public arenas, if currently
Position is the company of user, i.e., for user currently just in company, i.e. surrounding colleague is just absorbed in work, it has not been convenient to voice is carried out,
Lip reading identification events can be confirmly detected to be triggered, i.e., user can be known by lip reading otherwise with mobile terminal into pedestrian
Machine interacts.Be inconvenient to carry out to use movement when voice or more bothersome manual operation it is understood that predeterminated position is user
The position of terminal, at this point, when user needs using mobile terminal, if being chatted using mobile terminal and friend, it may be determined that inspection
Lip reading identification events are measured to be triggered.Others can both be left alone in this way, can also preferably protect the privacy of user.Example again
Property, current time and the current location of mobile terminal are obtained, and judge current time whether in preset time period, and work as
Whether front position is in predeterminated position.Current time is in preset time period and current location is predeterminated position, at this point it is possible to
It confirmly detects lip reading identification events to be triggered, can rationally control lip reading by the verification of time and position double factor and know
The trigger timing of other event.
3, the environmental noise of mobile terminal current location is obtained;When the environmental noise is more than default noise threshold, really
Regular inspection measures lip reading identification events and is triggered.
The advantages of this arrangement are as follows lip can rationally be determined according to the size of ambient noise of mobile terminal present position
The trigger timing of language identification events effectively avoids carrying out voice communication under more noisy environment, avoids speech recognition error
Greatly, occur the case where poor user experience.Illustratively, it is 50dB to preset noise threshold, the mobile terminal current location of acquisition
Environmental noise 60dB illustrates that user is currently in a more noisy environment, in order to avoid under more noisy environment
Voice communication is carried out, is triggered at this point it is possible to confirmly detect lip reading identification events.
4, the communication information of other mobile terminals transmission is obtained;When in the communication information including preset keyword, really
Regular inspection measures lip reading identification events and is triggered.
The advantages of this arrangement are as follows the communication that other mobile terminals that can be received according to current mobile terminal are sent
Message rationally determines the trigger timing of lip reading identification events.Illustratively, user A is chatted with friend's B progress wechat voices
It, can including when preset keyword when the mobile terminal of user A detects in the communication information that the mobile terminal of user B is sent
Lip reading identification events are confirmly detected to be triggered.Wherein, preset keyword may include " noise is big ", " not hearing " or " sound
The contents such as a little louder ".For example, when the communication information that the mobile terminal of user B is sent is " you that how noise is so big ", lead to
Believe in message to include preset keyword " noise is big ", accordingly, it is determined that detecting that lip reading identification events are triggered.
In some embodiments, after the lip reading information is supplied to the active user, further include:Described in reception
Active user is to the whether accurate feedback information of the recognition result of the lip reading information;The feedback information is sent to the lip
Language identification model is trained.The advantages of this arrangement are as follows whether accurate to the recognition result of lip reading information by user
It is whether accurate and whether accurately anti-to lip reading recognition result according to user at any time to specify lip reading recognition result for feedback information
Feedforward information adjusts the network parameter of lip reading identification model, can further increase the precision of lip reading identification.
Wherein, feedback information can be understood as the lip reading information that user determines lip reading identification model update information or
Judge information.Illustratively, the lip reading determined to lip reading identification model can be set in the human-computer interaction interface of terminal device
The amendment option of information judges option.Wherein, it may include two options of "Yes" and "No" to correct option, is when correcting option
When "Yes", indicate that user is to approve to the lip reading information that lip reading identification model determines.And when it is "No" to correct option, it indicates
User does not approve the lip reading information that lip reading identification model determines, at this point it is possible to according to the modified lip reading information of user to lip
The network parameter of language identification is modified.It may include " correct " and " incorrect " two options to judge option, when judge option
For " correct " when, that is, when receiving user and inputting the judge instruction of " correct ", expression user lip that lip reading identification model is determined
Language information is to approve, at this point, the lip reading information that user can be directly based upon the determination of lip reading identification model carries out human-computer interaction.And
When it is " incorrect " to judge option, that is, when receiving " incorrect " judge instruction input by user, indicate user to lip reading
The lip reading information that identification model determines is not approved, at this point, receiving correct lip reading information input by user, and is based on correct lip
Language information carries out human-computer interaction, and the network parameter that can be identified to lip reading according to the modified lip reading information of user is repaiied
Just.The embodiment of the present application is not construed as limiting the concrete form of feedback information.It is true to lip reading identification model that mobile terminal receives user
The whether accurate feedback information of recognition result of fixed lip reading information, and feedback information is sent to lip reading identification model, with right
The network parameter of lip reading identification model is adaptively adjusted.
Fig. 2 is the flow diagram of lip reading recognition methods provided by the embodiments of the present application.As shown in Fig. 2, this method includes:
Step 201, acquisition preset an at least frame 3D sample lip images for each of crowd individual, and obtain and 3D samples
The corresponding lip reading content of this lip image.
Step 202 obtains the first human facial expression information corresponding with 3D sample lip images.
Step 203 is marked 3D sample lip images according to lip reading content and the first human facial expression information, is instructed
Practice sample set.
Step 204 is trained using the default machine learning model of training sample set pair, obtains lip reading identification model.
Step 205, when detecting that lip reading identification events are triggered, pass through 3D cameras obtain active user at least one
Frame 3D lip images.
Step 206 obtains the second human facial expression information corresponding with 3D lip images.
3D lips image and the second human facial expression information are input in lip reading identification model trained in advance by step 207.
Step 208 determines lip reading information corresponding with 3D lip images according to the output result of lip reading identification model.
Lip reading information is supplied to active user by step 209.
Wherein it is possible to which lip reading information is supplied to active user in the form of word or voice.
The lip reading recognition methods provided in the embodiment of the present application passes through 3D when detecting that lip reading identification events are triggered
Camera obtains at least frame 3D lip images of active user, and by the 3D lips image and its corresponding second face table
In feelings information input to lip reading identification model trained in advance, determined and the 3D lips according to the output result of lip reading identification model
The corresponding lip reading information of portion's image, is then supplied to active user by lip reading information, wherein lip reading identification model is based on to mark
Remember that the 3D sample lip reading images of lip reading content and the first human facial expression information are trained generation.By using above-mentioned technology
Scheme, can according to the expression information of user when different lip reading contents and the corresponding 3D samples lip image of different lip reading contents,
The training study for carrying out lip reading identification model, can further increase the accuracy that lip reading identification model determines lip reading information.
Fig. 3 is the flow diagram of lip reading recognition methods provided by the embodiments of the present application.As shown in figure 3, this method includes:
Step 301, acquisition preset an at least frame 3D sample lip images for each of crowd individual, and obtain and 3D samples
The corresponding lip reading content of this lip image.
Step 302 obtains the first human facial expression information corresponding with 3D sample lip images.
Step 303 is marked 3D sample lip images according to lip reading content and the first human facial expression information, is instructed
Practice sample set.
Step 304 is trained using the default machine learning model of training sample set pair, obtains lip reading identification model.
Step 305, the environmental noise for obtaining mobile terminal current location.
Step 306 judges whether the environmental noise is more than default noise threshold, if so, 307 are thened follow the steps, otherwise,
Return to step 306.
Step 307 confirmly detects lip reading identification events and is triggered.
Step 308, at least frame 3D lip images that active user is obtained by 3D cameras.
Step 309 obtains the second human facial expression information corresponding with 3D lip images.
3D lips image and the second human facial expression information are input in lip reading identification model trained in advance by step 310.
Step 311 obtains lip reading recognition result according to the output result of the lip reading identification model.
The lip reading recognition result is input in the semantic understanding model built in advance by step 312.
Wherein, the semantic understanding model be used for when getting multiple lip reading recognition results, based on context relationship from
Lip reading information is determined in multiple lip reading recognition results.
The output result of the semantic understanding model is determined as lip reading corresponding with the 3D lips image by step 313
Information.
Lip reading information is supplied to active user by step 314.
Wherein it is possible to which lip reading information is supplied to active user in the form of word or voice.
The lip reading recognition methods provided in the embodiment of the present application obtains the environmental noise of mobile terminal current location;Work as ring
When border noise is more than default noise threshold, confirmly detects lip reading identification events and be triggered, it can be according to position residing for mobile terminal
The size of ambient noise set rationally determines the trigger timing of lip reading identification events, effectively avoid under more noisy environment into
The case where row voice communication avoids speech recognition error big, poor user experience occurs.And according to the output knot of lip reading identification model
Fruit obtains lip reading recognition result;Lip reading recognition result is input in the semantic understanding model built in advance;By semantic understanding mould
The output result of type is determined as lip reading information corresponding with 3D lip images, can based on context be exported to lip reading identification model
Multiple lip reading recognition results carry out semantic understanding, quickly and accurately determine the lip reading information that user is really intended by.
Fig. 4 is a kind of structure diagram of lip reading identification device provided by the embodiments of the present application, the device can by software and/or
Hardware realization is typically integrated in mobile terminal, can accurately obtain lip reading information by executing lip reading recognition methods.Such as figure
Shown in 4, which includes:
Lip image collection module 401, for when detecting that lip reading identification events are triggered, being obtained by 3D cameras
At least frame 3D lip images of active user;
Lip image input module 402, for the 3D lips image to be input to lip reading identification model trained in advance
In;
Lip reading information determination module 403, for being determined and the 3D lips according to the output result of the lip reading identification model
The corresponding lip reading information of portion's image;
Lip reading information providing module 404, for the lip reading information to be supplied to the active user.
The lip reading identification device provided in the embodiment of the present application passes through 3D when detecting that lip reading identification events are triggered
Camera obtains at least frame 3D lip images of active user, and the 3D lips image is input to lip reading trained in advance
In identification model, lip reading information corresponding with the 3D lips image is determined according to the output result of lip reading identification model, then
Lip reading information is supplied to active user.It, can be by the lip reading identification model that builds in advance by using above-mentioned technical proposal
Simple, quick lip reading identification is carried out to 3D lip images, and further improves the accuracy of lip reading identification, is effectively increased
The man-machine interaction experience of user, preferably meets user demand.
Optionally, which further includes:
Sample lip image capture module, for before detecting that lip reading identification events are triggered, acquiring the crowd of presetting
Each of individual an at least frame 3D sample lip images, and obtain in corresponding with 3D samples lip image lip reading
Hold;
Sample lip image tagged module, for according to the lip reading content to the 3D samples lip image into rower
Note obtains training sample set;
Lip reading identification model training module is instructed for presetting machine learning model using the training sample set pair
Practice, obtains lip reading identification model.
Optionally, which further includes:
First expression information acquisition module, for according to the lip reading content to the 3D samples lip image into rower
Note before obtaining training sample set, obtains the first human facial expression information corresponding with the 3D samples lip image;
The sample lip image tagged module, is used for:
The 3D samples lip image is marked according to the lip reading content and first human facial expression information, is obtained
Obtain training sample set;
Correspondingly, the device further includes:
Second expression information acquisition module, for the 3D lips image to be input to lip reading identification mould trained in advance
Before in type, the second human facial expression information corresponding with the 3D lips image is obtained;
Lip image input module, is used for:
The 3D lips image and second human facial expression information are input in lip reading identification model trained in advance.
Optionally, the second expression information acquisition module, is used for:
The corresponding 3D facial images of the 3D lips image are input to Expression Recognition model, are obtained and the 3D lips figure
As corresponding second human facial expression information.
Optionally, lip reading information determination module is used for:
Lip reading recognition result is obtained according to the output result of the lip reading identification model;
The lip reading recognition result is input in the semantic understanding model built in advance;Wherein, the semantic understanding mould
Type is used for when getting multiple lip reading recognition results, and based on context relationship determines lip reading letter from multiple lip reading recognition results
Breath;
The output result of the semantic understanding model is determined as lip reading information corresponding with the 3D lips image.
Optionally, detect that lip reading identification events are triggered, including:
The lip reading identification instruction for whether receiving active user's input monitored;It is instructed when receiving the lip reading identification
When, it confirmly detects lip reading identification events and is triggered;Or
Obtain current time and/or the current location of mobile terminal;When the current time be in preset time period and/or
When the current location is predeterminated position, confirmly detects lip reading identification events and be triggered;Or
Obtain the environmental noise of mobile terminal current location;When the environmental noise is more than default noise threshold, determine
Detect that lip reading identification events are triggered;Or
Obtain the communication information of other mobile terminals transmission;When in the communication information including preset keyword, determine
Detect that lip reading identification events are triggered.
Optionally, which further includes:
Feedback information receiving module, for after the lip reading information is supplied to the active user, described in reception
Active user is to the whether accurate feedback information of the recognition result of the lip reading information;
Feedback information sending module is trained for the feedback information to be sent to the lip reading identification model.
The embodiment of the present application also provides a kind of storage medium including computer executable instructions, and the computer is executable
When being executed by computer processor for executing lip reading recognition methods, this method includes for instruction:
When detecting that lip reading identification events are triggered, at least frame 3D lips of active user are obtained by 3D cameras
Image;
The 3D lips image is input in lip reading identification model trained in advance;
Lip reading information corresponding with the 3D lips image is determined according to the output result of the lip reading identification model;
The lip reading information is supplied to the active user.
Storage medium --- any various types of memory devices or storage device.Term " storage medium " is intended to wrap
It includes:Install medium, such as CD-ROM, floppy disk or magnetic tape equipment;Computer system memory or random access memory, such as
DRAM, DDRRAM, SRAM, EDORAM, blue Bath (Rambus) RAM etc.;Nonvolatile memory, such as flash memory, magnetic medium (example
Such as hard disk or optical storage);The memory component etc. of register or other similar types.Storage medium can further include other types
Memory or combinations thereof.In addition, storage medium can be located at program in the first computer system being wherein performed, or
It can be located in different second computer systems, second computer system is connected to the first meter by network (such as internet)
Calculation machine system.Second computer system can provide program instruction to the first computer for executing.Term " storage medium " can
To include two or more that may reside in different location (such as in different computer systems by network connection)
Storage medium.Storage medium can store the program instruction that can be executed by one or more processors and (such as be implemented as counting
Calculation machine program).
Certainly, a kind of storage medium including computer executable instructions that the embodiment of the present application is provided, computer
The lip reading identification operation that executable instruction is not limited to the described above, can also be performed the lip reading that the application any embodiment is provided
Relevant operation in recognition methods.
The embodiment of the present application provides a kind of mobile terminal, and lip provided by the embodiments of the present application can be integrated in the mobile terminal
Language identification device.Fig. 5 is a kind of structural schematic diagram of mobile terminal provided by the embodiments of the present application.Mobile terminal 500 can wrap
It includes:Memory 501, processor 502 and storage on a memory and can processor operation computer program, the processor
502 realize the lip reading recognition methods as described in the embodiment of the present application when executing the computer program.
Mobile terminal provided by the embodiments of the present application, can be by the lip reading identification model that builds in advance to 3D lip images
Simple, quick lip reading identification is carried out, and further improves the accuracy of lip reading identification, effectively increases the man-machine friendship of user
Mutually experience, preferably meets user demand.
Fig. 6 is the structural schematic diagram of another mobile terminal provided by the embodiments of the present application, which may include:
Shell (not shown), memory 601, central processing unit (central processing unit, CPU) 602 (are also known as located
Manage device, hereinafter referred to as CPU), circuit board (not shown) and power circuit (not shown).The circuit board is placed in institute
State the space interior that shell surrounds;The CPU602 and the memory 601 are arranged on the circuit board;The power supply electricity
Road, for being each circuit or the device power supply of the mobile terminal;The memory 601, for storing executable program generation
Code;The CPU602 is run and the executable journey by reading the executable program code stored in the memory 601
The corresponding computer program of sequence code, to realize following steps:
When detecting that lip reading identification events are triggered, at least frame 3D lips of active user are obtained by 3D cameras
Image;
The 3D lips image is input in lip reading identification model trained in advance;
Lip reading information corresponding with the 3D lips image is determined according to the output result of the lip reading identification model;
The lip reading information is supplied to the active user.
The mobile terminal further includes:Peripheral Interface 603, RF (Radio Frequency, radio frequency) circuit 605, audio-frequency electric
Road 606, loud speaker 611, power management chip 608, input/output (I/O) subsystem 609, other input/control devicess 610,
Touch screen 612, other input/control devicess 610 and outside port 604, these components pass through one or more communication bus
Or signal wire 607 communicates.
It should be understood that diagram mobile terminal 600 is only an example of mobile terminal, and mobile terminal 600
Can have than shown in the drawings more or less component, can combine two or more components, or can be with
It is configured with different components.Various parts shown in the drawings can be including one or more signal processings and/or special
It is realized in the combination of hardware, software or hardware and software including integrated circuit.
Below just it is provided in this embodiment for lip reading identification mobile terminal be described in detail, the mobile terminal with
For mobile phone.
Memory 601, the memory 601 can be by access such as CPU602, Peripheral Interfaces 603, and the memory 601 can
Can also include nonvolatile memory to include high-speed random access memory, such as one or more disk memory,
Flush memory device or other volatile solid-state parts.
The peripheral hardware that outputs and inputs of equipment can be connected to CPU602 and deposited by Peripheral Interface 603, the Peripheral Interface 603
Reservoir 601.
I/O subsystems 609, the I/O subsystems 609 can be by the input/output peripherals in equipment, such as touch screen 612
With other input/control devicess 610, it is connected to Peripheral Interface 603.I/O subsystems 609 may include 6091 He of display controller
One or more input controllers 6092 for controlling other input/control devicess 610.Wherein, one or more input controls
Device 6092 processed receives electric signal from other input/control devicess 610 or sends electric signal to other input/control devicess 610,
Other input/control devicess 610 may include physical button (pressing button, rocker buttons etc.), dial, slide switch, behaviour
Vertical pole clicks idler wheel.It is worth noting that input controller 6092 can with it is following any one connect:Keyboard, infrared port,
The indicating equipment of USB interface and such as mouse.
Touch screen 612, the touch screen 612 are the input interface and output interface between customer mobile terminal and user,
Visual output is shown to user, visual output may include figure, text, icon, video etc..
Display controller 6091 in I/O subsystems 609 receives electric signal from touch screen 612 or is sent out to touch screen 612
Electric signals.Touch screen 612 detects the contact on touch screen, and the contact detected is converted to and is shown by display controller 6091
The interaction of user interface object on touch screen 612, that is, realize human-computer interaction, the user interface being shown on touch screen 612
Object can be the icon of running game, be networked to the icon etc. of corresponding network.It is worth noting that equipment can also include light
Mouse, light mouse are the extensions for the touch sensitive surface for not showing the touch sensitive surface visually exported, or formed by touch screen.
RF circuits 605 are mainly used for establishing the communication of mobile phone and wireless network (i.e. network side), realize mobile phone and wireless network
The data receiver of network and transmission.Such as transmitting-receiving short message, Email etc..Specifically, RF circuits 605 receive and send RF letters
Number, RF signals are also referred to as electromagnetic signal, and RF circuits 605 convert electrical signals to electromagnetic signal or electromagnetic signal is converted to telecommunications
Number, and communicated with mobile communications network and other equipment by the electromagnetic signal.RF circuits 605 may include being used for
Execute the known circuit of these functions comprising but it is not limited to antenna system, RF transceivers, one or more amplifiers, tuning
Device, one or more oscillators, digital signal processor, CODEC (COder-DECoder, coder) chipset, Yong Hubiao
Know module (Subscriber Identity Module, SIM) etc..
Voicefrequency circuit 606 is mainly used for receiving audio data from Peripheral Interface 603, which is converted to telecommunications
Number, and the electric signal is sent to loud speaker 611.
Loud speaker 611, the voice signal for receiving mobile phone from wireless network by RF circuits 605, is reduced to sound
And play the sound to user.
Power management chip 608, the hardware for being connected by CPU602, I/O subsystem and Peripheral Interface are powered
And power management.
Lip reading identification device, storage medium and the mobile terminal provided in above-described embodiment, which can perform the application, arbitrarily to be implemented
The lip reading recognition methods that example is provided has and executes the corresponding function module of this method and advantageous effect.Not in above-described embodiment
In detailed description technical detail, reference can be made to the lip reading recognition methods that the application any embodiment is provided.
Note that above are only preferred embodiment and the institute's application technology principle of the application.It will be appreciated by those skilled in the art that
The application is not limited to specific embodiment described here, can carry out for a person skilled in the art it is various it is apparent variation,
The protection domain readjusted and substituted without departing from the application.Therefore, although being carried out to the application by above example
It is described in further detail, but the application is not limited only to above example, in the case where not departing from the application design, also
May include other more equivalent embodiments, and scope of the present application is determined by scope of the appended claims.
Claims (10)
1. a kind of lip reading recognition methods, which is characterized in that including:
When detecting that lip reading identification events are triggered, at least frame 3D lip figures of active user are obtained by 3D cameras
Picture;
The 3D lips image is input in lip reading identification model trained in advance;
Lip reading information corresponding with the 3D lips image is determined according to the output result of the lip reading identification model;
The lip reading information is supplied to the active user.
2. according to the method described in claim 1, it is characterized in that, before detecting that lip reading identification events are triggered, also wrap
It includes:
An at least frame 3D sample lip images for each of crowd individual are preset in acquisition, and are obtained and the 3D samples lip figure
As corresponding lip reading content;
The 3D samples lip image is marked according to the lip reading content, obtains training sample set;
Machine learning model is preset using the training sample set pair to be trained, and obtains lip reading identification model.
3. according to the method described in claim 2, it is characterized in that, according to the lip reading content to the 3D samples lip figure
As being marked, before obtaining training sample set, further include:
Obtain the first human facial expression information corresponding with the 3D samples lip image;
The 3D samples lip image is marked according to the lip reading content, obtains training sample set, including:
The 3D samples lip image is marked according to the lip reading content and first human facial expression information, is instructed
Practice sample set;
Correspondingly, before being input to the 3D lips image in lip reading identification model trained in advance, further include:
Obtain the second human facial expression information corresponding with the 3D lips image;
The 3D lips image is input in lip reading identification model trained in advance, including:
The 3D lips image and second human facial expression information are input in lip reading identification model trained in advance.
4. according to the method described in claim 3, it is characterized in that, obtaining the second face table corresponding with the 3D lips image
Feelings information, including:
The corresponding 3D facial images of the 3D lips image are input to Expression Recognition model, are obtained and the 3D lips image pair
The second human facial expression information answered.
5. according to the method described in claim 1, it is characterized in that, according to the output result of the lip reading identification model determine with
The corresponding lip reading information of the 3D lips image, including:
Lip reading recognition result is obtained according to the output result of the lip reading identification model;
The lip reading recognition result is input in the semantic understanding model built in advance;Wherein, the semantic understanding model is used
In when getting multiple lip reading recognition results, based on context relationship determines lip reading information from multiple lip reading recognition results;
The output result of the semantic understanding model is determined as lip reading information corresponding with the 3D lips image.
6. according to the method described in claim 1, it is characterized in that, detect that lip reading identification events are triggered, including:
The lip reading identification instruction for whether receiving active user's input monitored;When receiving the lip reading identification instruction,
Lip reading identification events are confirmly detected to be triggered;Or
Obtain current time and/or the current location of mobile terminal;When the current time is in preset time period and/or described
When current location is predeterminated position, confirmly detects lip reading identification events and be triggered;Or
Obtain the environmental noise of mobile terminal current location;When the environmental noise is more than default noise threshold, detection is determined
It is triggered to lip reading identification events;Or
Obtain the communication information of other mobile terminals transmission;When in the communication information including preset keyword, detection is determined
It is triggered to lip reading identification events.
7. according to any methods of claim 1-6, which is characterized in that described current the lip reading information to be supplied to
After user, further include:
The active user is received to the whether accurate feedback information of the recognition result of the lip reading information;
The feedback information is sent to the lip reading identification model to be trained.
8. a kind of lip reading identification device, which is characterized in that including:
Lip image collection module, for when detecting that lip reading identification events are triggered, current use to be obtained by 3D cameras
An at least frame 3D lip images at family;
Lip image input module, for the 3D lips image to be input in lip reading identification model trained in advance;
Lip reading information determination module, for being determined and the 3D lips image pair according to the output result of the lip reading identification model
The lip reading information answered;
Lip reading information providing module, for the lip reading information to be supplied to the active user.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor
The lip reading recognition methods as described in any in claim 1-7 is realized when row.
10. a kind of mobile terminal, which is characterized in that including memory, processor and storage are on a memory and can be in processor
The computer program of operation, the processor realize the lip as described in claim 1-7 is any when executing the computer program
Language recognition methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810372876.7A CN108537207B (en) | 2018-04-24 | 2018-04-24 | Lip language identification method, device, storage medium and mobile terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810372876.7A CN108537207B (en) | 2018-04-24 | 2018-04-24 | Lip language identification method, device, storage medium and mobile terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108537207A true CN108537207A (en) | 2018-09-14 |
CN108537207B CN108537207B (en) | 2021-01-22 |
Family
ID=63478460
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810372876.7A Expired - Fee Related CN108537207B (en) | 2018-04-24 | 2018-04-24 | Lip language identification method, device, storage medium and mobile terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108537207B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558788A (en) * | 2018-10-08 | 2019-04-02 | 清华大学 | Silent voice inputs discrimination method, computing device and computer-readable medium |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN110213431A (en) * | 2019-04-30 | 2019-09-06 | 维沃移动通信有限公司 | Message method and mobile terminal |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
CN110427809A (en) * | 2019-06-21 | 2019-11-08 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, electronic equipment and medium based on deep learning |
CN110427992A (en) * | 2019-07-23 | 2019-11-08 | 杭州城市大数据运营有限公司 | Data matching method, device, computer equipment and storage medium |
CN110865705A (en) * | 2019-10-24 | 2020-03-06 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode converged communication method and device, head-mounted equipment and storage medium |
CN111190484A (en) * | 2019-12-25 | 2020-05-22 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode interaction system and method |
CN111611827A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Image processing method and device |
CN111984818A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Singing following recognition method and device, storage medium and electronic equipment |
CN112084927A (en) * | 2020-09-02 | 2020-12-15 | 中国人民解放军军事科学院国防科技创新研究院 | Lip language identification method fusing multiple visual information |
CN112633211A (en) * | 2020-12-30 | 2021-04-09 | 海信视像科技股份有限公司 | Service equipment and man-machine interaction method |
CN112817575A (en) * | 2021-01-19 | 2021-05-18 | 中科方寸知微(南京)科技有限公司 | Lip language identification-based assembly language editor and identification method |
CN113742687A (en) * | 2021-08-31 | 2021-12-03 | 深圳时空数字科技有限公司 | Internet of things control method and system based on artificial intelligence |
CN113762142A (en) * | 2021-09-03 | 2021-12-07 | 海信视像科技股份有限公司 | Lip language identification method and display device |
CN114842846A (en) * | 2022-04-21 | 2022-08-02 | 歌尔股份有限公司 | Method and device for controlling head-mounted equipment and computer readable storage medium |
WO2023006033A1 (en) * | 2021-07-29 | 2023-02-02 | 华为技术有限公司 | Speech interaction method, electronic device, and medium |
CN116431005A (en) * | 2023-06-07 | 2023-07-14 | 安徽大学 | Unmanned aerial vehicle control method and system based on improved mobile terminal lip language recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104484656A (en) * | 2014-12-26 | 2015-04-01 | 安徽寰智信息科技股份有限公司 | Deep learning-based lip language recognition lip shape model library construction method |
CN105022470A (en) * | 2014-04-17 | 2015-11-04 | 中兴通讯股份有限公司 | Method and device of terminal operation based on lip reading |
CN107799125A (en) * | 2017-11-09 | 2018-03-13 | 维沃移动通信有限公司 | A kind of audio recognition method, mobile terminal and computer-readable recording medium |
-
2018
- 2018-04-24 CN CN201810372876.7A patent/CN108537207B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105022470A (en) * | 2014-04-17 | 2015-11-04 | 中兴通讯股份有限公司 | Method and device of terminal operation based on lip reading |
CN104484656A (en) * | 2014-12-26 | 2015-04-01 | 安徽寰智信息科技股份有限公司 | Deep learning-based lip language recognition lip shape model library construction method |
CN107799125A (en) * | 2017-11-09 | 2018-03-13 | 维沃移动通信有限公司 | A kind of audio recognition method, mobile terminal and computer-readable recording medium |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558788B (en) * | 2018-10-08 | 2023-10-27 | 清华大学 | Silence voice input identification method, computing device and computer readable medium |
CN109558788A (en) * | 2018-10-08 | 2019-04-02 | 清华大学 | Silent voice inputs discrimination method, computing device and computer-readable medium |
CN109637521A (en) * | 2018-10-29 | 2019-04-16 | 深圳壹账通智能科技有限公司 | A kind of lip reading recognition methods and device based on deep learning |
CN111611827A (en) * | 2019-02-25 | 2020-09-01 | 北京嘀嘀无限科技发展有限公司 | Image processing method and device |
CN110213431A (en) * | 2019-04-30 | 2019-09-06 | 维沃移动通信有限公司 | Message method and mobile terminal |
CN110276259A (en) * | 2019-05-21 | 2019-09-24 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, computer equipment and storage medium |
CN110276259B (en) * | 2019-05-21 | 2024-04-02 | 平安科技(深圳)有限公司 | Lip language identification method, device, computer equipment and storage medium |
CN111984818A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Singing following recognition method and device, storage medium and electronic equipment |
WO2020252922A1 (en) * | 2019-06-21 | 2020-12-24 | 平安科技(深圳)有限公司 | Deep learning-based lip reading method and apparatus, electronic device, and medium |
CN110427809A (en) * | 2019-06-21 | 2019-11-08 | 平安科技(深圳)有限公司 | Lip reading recognition methods, device, electronic equipment and medium based on deep learning |
CN110427992A (en) * | 2019-07-23 | 2019-11-08 | 杭州城市大数据运营有限公司 | Data matching method, device, computer equipment and storage medium |
CN110865705A (en) * | 2019-10-24 | 2020-03-06 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode converged communication method and device, head-mounted equipment and storage medium |
CN110865705B (en) * | 2019-10-24 | 2023-09-19 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode fusion communication method and device, head-mounted equipment and storage medium |
CN111190484A (en) * | 2019-12-25 | 2020-05-22 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-mode interaction system and method |
CN112084927B (en) * | 2020-09-02 | 2022-12-20 | 中国人民解放军军事科学院国防科技创新研究院 | Lip language identification method fusing multiple visual information |
CN112084927A (en) * | 2020-09-02 | 2020-12-15 | 中国人民解放军军事科学院国防科技创新研究院 | Lip language identification method fusing multiple visual information |
CN112633211A (en) * | 2020-12-30 | 2021-04-09 | 海信视像科技股份有限公司 | Service equipment and man-machine interaction method |
CN112817575A (en) * | 2021-01-19 | 2021-05-18 | 中科方寸知微(南京)科技有限公司 | Lip language identification-based assembly language editor and identification method |
CN112817575B (en) * | 2021-01-19 | 2024-02-20 | 中科方寸知微(南京)科技有限公司 | Assembly language editor based on lip language identification and identification method |
WO2023006033A1 (en) * | 2021-07-29 | 2023-02-02 | 华为技术有限公司 | Speech interaction method, electronic device, and medium |
CN113742687B (en) * | 2021-08-31 | 2022-10-21 | 深圳时空数字科技有限公司 | Internet of things control method and system based on artificial intelligence |
CN113742687A (en) * | 2021-08-31 | 2021-12-03 | 深圳时空数字科技有限公司 | Internet of things control method and system based on artificial intelligence |
CN113762142A (en) * | 2021-09-03 | 2021-12-07 | 海信视像科技股份有限公司 | Lip language identification method and display device |
CN114842846A (en) * | 2022-04-21 | 2022-08-02 | 歌尔股份有限公司 | Method and device for controlling head-mounted equipment and computer readable storage medium |
CN116431005A (en) * | 2023-06-07 | 2023-07-14 | 安徽大学 | Unmanned aerial vehicle control method and system based on improved mobile terminal lip language recognition |
CN116431005B (en) * | 2023-06-07 | 2023-09-12 | 安徽大学 | Unmanned aerial vehicle control method and system based on improved mobile terminal lip language recognition |
Also Published As
Publication number | Publication date |
---|---|
CN108537207B (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108537207A (en) | Lip reading recognition methods, device, storage medium and mobile terminal | |
CN108363706B (en) | Method and device for man-machine dialogue interaction | |
CN109348135A (en) | Photographic method, device, storage medium and terminal device | |
CN107944447B (en) | Image classification method and device | |
CN108566516A (en) | Image processing method, device, storage medium and mobile terminal | |
WO2020253128A1 (en) | Voice recognition-based communication service method, apparatus, computer device, and storage medium | |
CN108345581A (en) | A kind of information identifying method, device and terminal device | |
CN111050023A (en) | Video detection method and device, terminal equipment and storage medium | |
CN111382748B (en) | Image translation method, device and storage medium | |
CN112154431A (en) | Man-machine interaction method and electronic equipment | |
CN107564526B (en) | Processing method, apparatus and machine-readable medium | |
CN106203235A (en) | Live body discrimination method and device | |
CN112017670B (en) | Target account audio identification method, device, equipment and medium | |
CN111414772B (en) | Machine translation method, device and medium | |
CN108021905A (en) | image processing method, device, terminal device and storage medium | |
CN110349577B (en) | Man-machine interaction method and device, storage medium and electronic equipment | |
CN110135349A (en) | Recognition methods, device, equipment and storage medium | |
CN108628819A (en) | Treating method and apparatus, the device for processing | |
CN114822543A (en) | Lip language identification method, sample labeling method, model training method, device, equipment and storage medium | |
CN112735396A (en) | Speech recognition error correction method, device and storage medium | |
CN110491384B (en) | Voice data processing method and device | |
CN112256827A (en) | Sign language translation method and device, computer equipment and storage medium | |
CN112036174B (en) | Punctuation marking method and device | |
CN107958273B (en) | Volume adjusting method and device and storage medium | |
CN110858291A (en) | Character segmentation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210122 |
|
CF01 | Termination of patent right due to non-payment of annual fee |