WO2022062884A1

WO2022062884A1 - Text input method, electronic device, and computer-readable storage medium

Info

Publication number: WO2022062884A1
Application number: PCT/CN2021/116515
Authority: WO
Inventors: 刘浩; 黄韬; 胡粤麟; 秦磊; 张乐乐
Original assignee: 华为技术有限公司
Priority date: 2020-09-27
Filing date: 2021-09-03
Publication date: 2022-03-31
Also published as: CN114356109A

Abstract

The present application relates to the field of artificial intelligence (AI), and provides a text input method, an electronic device, and a computer-readable storage medium. The text input method comprises: upon detecting a text input operation of a user, acquiring lip change information of the user and character information inputted by the user, the lip change information comprising a lip feature sequence when the user speaks a text to be inputted; and determining, according to the lip feature sequence and the character information, the text to be inputted by the user. Because the accuracy of the text to be inputted that is determined according to the character information inputted by the user is relatively high, combining the lip feature sequence with the character information to determine a text to be recognized can improve the accuracy of text input in the case that the user is inconvenient for voice input.

Description

Character input method, electronic device, and computer-readable storage medium

This application claims the priority of the Chinese patent application with the application number 202011036037.1 and the application name "Text input method, electronic device and computer-readable storage medium" submitted to the State Intellectual Property Office on September 27, 2020, the entire content of which is approved by Reference is incorporated in this application.

technical field

The present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to a text input method, an electronic device, and a computer-readable storage medium.

Background technique

The continuous expansion of mobile Internet users and the market scale has laid the foundation for the rapid development of the mobile phone input method. The mobile phone input method has added Wubi, handwriting, voice input and other methods to the initial pinyin input. At present, the pinyin input is still It is the most used input method by users, followed by handwriting input, and finally voice input. Since the speed of voice input is the fastest, more and more users choose the way of voice input. However, in some scenarios, such as in a noisy environment or in a private environment where it is inconvenient for users to make loud voices, the accuracy of speech recognition is low, resulting in a reduction in the accuracy of text input using the voice input method.

SUMMARY OF THE INVENTION

The present application provides a text input method, an electronic device, and a computer-readable storage medium, which can improve the accuracy of text input when it is inconvenient for a user to perform voice input.

To achieve the above object, the application adopts the following technical solutions:

In a first aspect, a text input method is provided, comprising: when a text input operation by a user is detected, acquiring lip change information of the user and character information input by the user, wherein the lip change information includes the The lip feature sequence when the user speaks the text to be input; the text to be input by the user is determined according to the lip feature sequence and the character information.

In the above-mentioned embodiment, when the user's text input operation is detected, the lip change information of the user and the character information input by the user are obtained, and the lip change information includes the lip feature sequence when the user speaks the text to be input. The sequence of lip features and character information determine the text to be entered by the user. Since the text to be input determined according to the character information input by the user has a high accuracy rate, combining the lip feature sequence with the character information to determine the text to be recognized can improve text input when the user is inconvenient for voice input. 's accuracy.

In a possible implementation manner of the first aspect, the acquiring the lip change information of the user includes:

A camera is used to collect an image sequence including the lip region of the user, and lip features are extracted from each image of the image sequence to obtain a lip feature sequence. When the user enters text, the electronic device turns on the camera to capture the image of the lip area. After the user completes the lip language input, the image capture is completed. The lip language input and the input of character information can be performed synchronously without affecting the efficiency of text input. The lip feature sequence obtained from the image sequence can better reflect the change information of the user's lip shape, and the accuracy is high, and then the text to be input by the user is determined according to the lip feature sequence with high accuracy, which improves the text input. 's accuracy.

Transmit a wireless signal, and obtain a reflected signal sequence, wherein the reflected signal in the reflected signal sequence is the signal reflected back by the wireless signal after encountering an obstacle; determine the obstacle according to the reflected signal sequence, if the obstacle If the object is a lip, the lip feature is extracted from each reflected signal of the reflected signal sequence to obtain a lip feature sequence. Since the wireless signal has low requirements on the environment, for example, it is not affected by external light, the application range of the electronic device can be improved by using the wireless signal to obtain the lip feature sequence.

In a possible implementation manner of the first aspect, the character information includes the first letter of the text to be input by the user. Since the input speed is faster when only the first letter is input, determining the text to be input by the user according to the lip feature sequence and the first letter has higher input efficiency.

In a possible implementation manner of the first aspect, the determining the text to be input by the user according to the lip feature sequence and the character information includes:

determining the character sequence corresponding to the lip feature sequence; correcting the character sequence according to the first initial to obtain at least one corrected candidate character sequence; determining the character sequence with the highest probability from the candidate character sequence The candidate text sequence, the candidate text sequence with the highest probability is used as the text to be input by the user. Since multiple character sequences can be determined according to the lip feature sequence, there will be wrong character sequences in the determined multiple character sequences. Therefore, using the first letter to correct the character sequence can improve the accuracy of character input. It can reduce the number of candidate text sequences, thereby reducing the amount of calculation for determining the text to be input subsequently, and improving the calculation speed.

In a possible implementation manner of the first aspect, performing correction processing on the character sequence according to the first initial to obtain at least one corrected candidate character sequence, including:

Extracting the second initial of each character in the character sequence; matching the extracted second initial with the first initial; if there is an unmatched second initial, the unmatched The second initial letter of is replaced with the corresponding first initial letter to obtain at least one replaced character sequence, and the replaced character sequence is used as the candidate character sequence. By adopting the method of initial letter replacement, the character sequence determined according to the lip feature sequence can be corrected, and the character input efficiency can be improved.

In a possible implementation manner of the first aspect, if there is an unmatched second initial, replace the unmatched second initial with a corresponding first initial to obtain at least the replaced first letter. A literal sequence consisting of:

If there is an unmatched second initial, and there is a letter associated with the corresponding first initial, then replace the unmatched second initial with the corresponding first initial, and replace the unmatched second initial with the corresponding first initial The second initial of is replaced with the associated letter, resulting in at least one literal sequence after the replacement. Since the first letter input by the user may also have input errors, determining the replaced text sequence according to the letters associated with the first initial can prevent loss of useful information and improve the accuracy of text input.

In a possible implementation manner of the first aspect, the determining the character sequence corresponding to the lip feature sequence includes:

Inputting the lip feature sequence into a trained lip language recognition model to obtain a text sequence output by the lip language recognition model, the lip language recognition model is used to recognize the text corresponding to the lip features, and the lip language recognition model It is trained based on lip features and the text corresponding to the lip features as training samples. Since the lip language recognition model is trained according to the training samples, it has universality. Therefore, the lip language recognition model is used to recognize the text sequence, which improves the accuracy of the output text sequence.

In a second aspect, a text input device is provided, including:

an acquisition module, configured to acquire the lip change information of the user and the character information input by the user when detecting the text input operation of the user, the lip change information including the user speaking the text to be input lip feature sequence at time;

and a processing module, configured to determine the text to be input by the user according to the lip feature sequence and the character information.

In a possible implementation manner of the second aspect, the obtaining module is specifically used for:

A camera is used to collect an image sequence including the lip region of the user, and lip features are extracted from each image of the image sequence to obtain a lip feature sequence.

transmitting a wireless signal, and acquiring a reflected signal sequence, wherein the reflected signal in the reflected signal sequence is a signal reflected back by the wireless signal after encountering an obstacle;

An obstacle is determined according to the reflected signal sequence, and if the obstacle is a lip, a lip feature is extracted from each reflected signal of the reflected signal sequence to obtain a lip feature sequence.

In a possible implementation manner of the second aspect, the character information includes the first letter of the text to be input by the user.

In a possible implementation manner of the second aspect, the processing module includes:

a determining unit for determining the character sequence corresponding to the lip feature sequence;

An error correction unit, configured to perform correction processing on the character sequence according to the first initial to obtain at least one corrected candidate character sequence;

An output unit, configured to determine a candidate character sequence with the highest probability from the candidate character sequence, and use the candidate character sequence with the highest probability as the character to be input by the user.

In a possible implementation manner of the second aspect, the error correction unit is specifically used for:

extracting the second initial of each character in the sequence of characters;

matching the extracted second initial with the first initial;

If there is an unmatched second initial letter, replace the unmatched second initial letter with the corresponding first initial letter to obtain at least one replaced text sequence, and use the replaced text sequence as the Candidate text sequences.

In a possible implementation manner of the second aspect, the error correction unit is further configured to:

If there is an unmatched second initial, and there is a letter associated with the corresponding first initial, then replace the unmatched second initial with the corresponding first initial, and replace the unmatched second initial with the corresponding first initial The second initial of is replaced with the associated letter, resulting in at least one literal sequence after the replacement.

In a possible implementation manner of the second aspect, the determining unit is specifically configured to:

Inputting the lip feature sequence into a trained lip language recognition model to obtain a text sequence output by the lip language recognition model, the lip language recognition model is used to recognize the text corresponding to the lip features, and the lip language recognition model It is trained based on lip features and the text corresponding to the lip features as training samples.

In a third aspect, an electronic device is provided, including a processor for executing a computer program stored in a memory, so as to implement the text input method according to the above-mentioned first aspect.

In a fourth aspect, a computer-readable storage medium is provided, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the text input method according to the first aspect.

A fifth aspect provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the text input method described in the first aspect.

It can be understood that, for the beneficial effects of the second aspect to the fifth aspect, reference may be made to the relevant description in the first aspect, which is not repeated here.

Description of drawings

1 is a schematic flowchart of a text input method provided by an embodiment of the present application;

FIG. 2 is an application scenario diagram of the text input method provided by the embodiment of the present application;

3 is a schematic diagram of a lip shape provided by an embodiment of the present application;

4 is a specific flowchart of a text input method provided by an embodiment of the present application;

5 is a schematic diagram of a method for outputting a text sequence provided by an embodiment of the present application;

6 is a specific flowchart of a text input method provided by another embodiment of the present application;

7 is a schematic diagram of a radar wave signal provided by an embodiment of the present application;

8 is a schematic diagram of a range Doppler image provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.

As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".

In addition, in the description of the present application, the terms "first", "second" and the like are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

Existing input methods mainly include voice input method, pinyin input method and handwriting input method. Pinyin input method and handwriting input method have higher input accuracy, but lower input efficiency. The voice input method generally performs voice recognition first, and converts the recognized voice into corresponding text, which has high input efficiency. As a result, the accuracy of speech-to-text conversion is also reduced.

To this end, an embodiment of the present application provides a text input method. When a text input operation by a user is detected, the lip change information of the user and the character information input by the user are obtained, where the lip change information includes the user speaking the text to be input. The lip feature sequence when writing is used to determine the text to be input by the user according to the lip feature sequence and character information. Since the accuracy of the characters to be input determined according to the character information input by the user is high, combining the lip feature sequence with the character information to determine the characters to be recognized can improve the accuracy of the input characters in the voice input mode.

The text input method provided by the present application will be exemplarily described below with reference to specific embodiments.

The text input method provided in the embodiments of the present application is applied to electronic devices, and the electronic devices may be mobile phones, tablet computers, handheld computers, personal digital assistants (PDAs), speakers with screens, wearable devices, and the like.

As shown in FIG. 1 , a text input method provided by an embodiment of the present application includes:

S101: When a text input operation of the user is detected, obtain lip change information of the user and character information input by the user, where the lip change information includes a lip feature sequence when the user speaks the text to be input.

The user can input character information by means of soft keyboard input or handwriting input. The input character information can be text, letters corresponding to the text, or the first letter of the text, and the text can be Chinese, English or other foreign language.

The user may or may not make a sound when speaking the text to be input. The lip feature sequence includes lip features at each moment in a continuous time. The lip features are used to represent the lip shape of the user when speaking, and different lip shapes correspond to different pronunciations.

In a possible implementation manner, the electronic device collects an image sequence of the user's lip region through a camera, the image sequence is an image of the lip region at each moment in a continuous time, and then extracts lip features from each image in the image sequence , to obtain a sequence of lip features.

In another possible implementation manner, the electronic device transmits a wireless signal through a wireless sensor, and the wireless signal is reflected back after encountering an obstacle, and the reflected signal is a reflected signal. The electronic device then receives the reflected signal through the wireless sensor, obtains the reflected signal sequence according to the reflected signal at each moment in the continuous time, and determines whether the obstacle is a lip according to the reflected signal sequence. A lip feature is extracted from each reflected signal in the reflected signal sequence to obtain a lip feature sequence. The wireless signal may be a radar wave signal, an infrared signal, or an ultrasonic signal, or the like.

In a possible implementation manner, after the user selects a preset input mode (eg, a multi-mode input mode), the electronic device acquires the user's lip change information after detecting that the user starts to input characters.

For example, in an application scenario, the user opens the text message editing page, and the electronic device opens the interface shown in Figure 2 after detecting the instruction of the multi-mode input mode. The interface inputs the first letter of Chinese pinyin, and simultaneously performs lip language input. When the user starts to input, the camera of the electronic device collects the lip image of the user, and extracts the lip feature from the lip image at each moment, that is, the lip feature sequence is obtained.

S102: Determine the text to be input by the user according to the lip feature sequence and the character information.

When the user speaks, when the lip features change, that is, when the lip shape changes, the pronunciation will also change. Therefore, according to the lip shape when the user speaks, the user's pronunciation can be determined, and the text spoken by the user can be further determined. However, when the user speaks different words, the corresponding lip shape may also be the same. For example, when the lip is in the shape as shown in (A) in Figure 3, the corresponding pronunciation may be the English letter "A" ", "E" or "I", when the lips are in the shape shown in (B) in Figure 3, the corresponding pronunciation may be "Q" or "W" of English letters. Moreover, the same character may need a changed lip shape before it can be issued. For example, the English letters "M" and "L" need a changed lip shape before they can be issued. Therefore, according to the lip features of the user at each moment, it is possible to determine multiple characters or determine wrong characters. Then, according to the lip features at each moment, that is, the lip feature sequence, a variety of possible text sequences can be determined.

Each character information input by the user also corresponds to one or more characters. Combining the lip feature sequence with the character information can correct the characters identified according to the lip feature sequence, or remove the wrong characters identified according to the lip feature sequence. , so as to obtain an accurate text sequence, that is, the text to be input by the user, so that the text input can be completed without acquiring the user's voice, and the accuracy of the text input is improved.

In a possible implementation manner, the electronic device may determine multiple candidate character sequences according to the lip feature sequence and character information, and after determining multiple candidate character sequences, select the one with the highest probability according to the semantics of each character sequence. Candidate text sequence, the candidate text sequence with the highest probability is used as the text to be input by the user.

In a possible implementation manner, the candidate text sequence may be input into the trained semantic recognition model, and the candidate text sequence with the highest probability output by the semantic recognition model is obtained. Among them, the semantic recognition model is trained based on the text sequence and the text sequence with the highest probability as the training sample.

In the above embodiment, since the accuracy of the text to be input determined according to the character information input by the user is high, the combination of the lip feature sequence and the character information to determine the text to be recognized can improve the input accuracy in the voice input mode. accuracy of the text.

In a possible implementation manner, the character information input by the user includes the first letter of the text to be input by the user, wherein "first" is used to distinguish and describe the "initial letter", and "the first letter" refers to the first letter of the text It can be the first letter in Chinese Pinyin, such as "w" in "wen", or the first letter of an English word, such as "G" in "Good". The first letter includes the first letter of each of the words to be entered. After determining the character sequence according to the acquired lip feature sequence, the electronic device performs correction processing on the determined character sequence according to the first initial to obtain at least one corrected candidate character sequence.

If the text to be input by the user is Chinese, after determining the text sequence according to the acquired lip feature sequence, the electronic device corrects the pinyin of the determined text sequence according to the first letter, and obtains the corrected pinyin. The resulting pinyin determines at least one corrected candidate character sequence.

For example, in an application scenario, the electronic device corrects the pinyin of the text sequence according to the first letter to obtain a corrected pinyin, and determines the corrected candidate text sequence according to the corrected pinyin. For example, if the character sequence determined according to the lip feature sequence is "Netherlands" and the first letter is "hn", then the pinyin "helan" of the character sequence is corrected, and the corrected pinyin is "henan". The candidate character sequences determined by the pronunciation of "henan" include "Henan", "Henan" and so on.

In another application scenario, the electronic device corrects the text sequence according to the first letter, obtains a plurality of corrected pinyin, and determines the corrected candidate text sequence according to each corrected pinyin. For example, if the character sequence determined according to the lip feature sequence is "airplane" and the first letter is "hj", then the pinyin "feiji" of the character sequence is corrected, and the corrected pinyin is "huijia", "" huiji", etc. For the pinyin "huijia", the determined candidate text sequences are "home", "exchange price", etc. For the pinyin "huiji", the determined candidate text sequences are "collection", "benefit" and so on.

After obtaining the corrected candidate character sequence, the electronic device determines the candidate character sequence with the highest probability from the candidate character sequence, and uses the candidate character sequence with the highest probability as the character to be input by the user. For example, input the candidate text sequence into the semantic recognition model, and obtain the candidate text sequence with the highest probability output by the semantic recognition model. Since the initial letter input method has a low accuracy rate without historical association information, but the input speed is fast, the combination of the lip feature sequence and the initial letter improves the accuracy rate of text input, and the user uses a shorter The input of the initial letter can be completed in time, which improves the input efficiency. For example, if the user wants to input the three characters "Chinese", he only needs to input the three initials "z", "g", and "r", and at the same time, the words "Chinese" are silently recited in his mouth, and then the text input can be completed.

In a possible implementation manner, the method for the electronic device to correct the character sequence determined according to the lip feature sequence according to the first initial letter is specifically: first extracting the second character sequence of each character in the determined character sequence Initials, where "second" is used to distinguish the description "initials". For English, the second letter can be directly extracted from the text sequence. For Chinese, it is first necessary to convert the Chinese into the corresponding pinyin, and then extract the second letter from each pinyin.

After extracting the second letter, match the second letter with the first letter. If there is an unmatched second letter, replace the unmatched second letter with the corresponding first letter to get the replacement After at least one character sequence, the replaced character sequence is used as a candidate character sequence. The corresponding first letter refers to the first letter corresponding to the position of the second letter. For example, if the second letter is the first letter of the second letter in the text sequence, the corresponding first letter The second initial entered for the user.

The following takes Chinese as an example to introduce the process of obtaining the candidate character sequence.

In an application scenario, if there is an unmatched second letter, the unmatched second letter is directly replaced with the first letter to obtain the replaced pinyin, and then the replaced at least one is determined according to the replaced pinyin. A candidate text sequence. For example, if the character sequence determined according to the lip feature sequence is "support", the extracted second initial letter is "zc"; if the first initial letter is "zs", there is a mismatched second initial letter" c", replace "c" with the first letter "s", get the replaced pinyin as "zhishi", and then determine the candidate word sequence according to the replaced pinyin as "knowledge", "instruction", etc.

In another application scenario, if there is an unmatched second letter, the unmatched second letter is directly replaced with the first letter. The replaced letter cannot form a text, or the replaced letter and the text sequence When the pronunciation difference of the letters is relatively large, the replaced letters are corrected, and at least one character sequence is obtained according to the correction result. For example, if the character sequence determined according to the lip feature sequence is "play" and the corresponding pinyin is "fahui", the second first letter extracted is "fh", if the first letter is "fw", then there is The unmatched second letter "h", if you directly replace "h" with the first letter "w", the pronunciation formed by the replaced letter is "fawui", because "wui" cannot form words, therefore, it cannot be directly Therefore, according to the preset correction rules, the replaced letters are corrected, "wui" is corrected to "wei", and the corrected pinyin "fawei" is obtained, and then the candidate word sequence determined according to the corrected pinyin is: "boring", "hair tail", etc.

For another example, the character sequence determined according to the lip feature sequence is "wow", and the corresponding pronunciation is "wa", then the extracted second initial letter is "w", if the first initial letter is "h", then There is an unmatched second letter, if you directly replace "w" with "h", the pinyin obtained according to the replaced letter is "ha", because the pronunciation of "ha" and "wa" is quite different, therefore, it cannot be Direct replacement. Therefore, according to the preset correction rules, the replaced letters are corrected, and "ha" is corrected to "hua", so that the pronunciation of the corrected pinyin is close to the pronunciation of the text sequence, and then according to the corrected pinyin The determined candidate character sequences are "flower", "hua" and so on.

Since the corresponding pronunciations of some letter sequences are relatively close, for example, in Pinyin, the pronunciations of "r" and "l" are close, "n" and "l" are close, "h" and "f" are close, and "zh" and " z" is pronounced close, "ch" and "c" are close, and "sh" and "s" are close. Therefore, there may be errors in the first letter input by the user. In order to improve the accuracy of text input, in a possible implementation, an associated database is preset, and the associated database stores letters with associated relationships. Letters are letters that are pronounced close to each other and are easily confused. After determining that there is an unmatched second initial, the electronic device determines whether there is a letter associated with the corresponding first initial according to the associated letters stored in the associated database. If there is a letter associated with the corresponding first letter, replace the unmatched second letter with the corresponding first letter, and replace the unmatched second letter with the associated letter, and get the replacement Therefore, the range of the candidate character sequence can be expanded, and when the character to be input by the user is determined according to the candidate character sequence, the accuracy of character input is improved.

For example, if the text sequence is "self" and the corresponding pronunciation is "ziji", then the extracted second letter is "zj", if the first letter is "sj", there is a mismatched second letter" z", the corresponding first letter is "s", and there is a letter "sh" associated with the first letter "s", then "z" is replaced with "s", and the replaced pronunciation is " siji", at the same time, replace "z" with "sh" to obtain the replaced pronunciation as "shiji", and finally the candidate text sequence obtained according to the replaced pronunciation includes "driver", "fourth level", "actual", " timing" etc.

For another example, if the text sequence is "fall" and the corresponding pronunciation is "luo", then the extracted second initial letter is "l", and if the first initial letter is "r", there is an unmatched second initial letter , and there is a letter "n" that sounds close to the first letter "r", then replace "l" with "r" to get the replaced pronunciation "ruo", and at the same time, replace "l" with "n" , to obtain the replaced pronunciation "nuo", and finally the candidate text sequences obtained according to the replaced pronunciation are "If", "Weak", "Nuo", "Nuo", etc.

In a possible implementation, the lip feature sequence is obtained from the image sequence of the lip region of the user, the lip language recognition model is based on the lip feature sequence extracted from the lip region image sequence, and the lip feature sequence The corresponding text sequences are obtained by training as training samples. Correspondingly, the specific flow of the text input method is shown in FIG. 4 , when the text input operation of the user is detected, the first letter of the text to be input input by the user is obtained, and the front camera on the electronic device is used to collect the face. Image, identify each face image collected, and identify the face in the image. As shown in Figure 5, after the face is recognized, the image of the lip area is cut out from the face, and the images of the lip area at each moment in the continuous time form an image sequence, and the lips are extracted from each image in the image sequence. feature to get the lip feature sequence. Then input the lip feature sequence into the lip language recognition model to obtain the text sequence output by the lip language recognition model. The lip language recognition model may be a Spatiotemporal Convolutional Neural Networks (STCNN) model. After the text sequence output by the lip language recognition model is obtained, the first letter is used to correct the text sequence, and the corrected candidate text sequence is obtained. For example, if the input text is Chinese, after obtaining the text sequence output by the lip language recognition model, convert the text sequence into pinyin, extract the second letter in the pinyin, replace the second letter with the first letter, and obtain the replacement After the pinyin is replaced, the candidate character sequence is determined according to the replaced pinyin. After the candidate character sequence is obtained, the semantics of each candidate character sequence is determined, and according to the semantics of each candidate character sequence, the candidate character sequence with the highest probability is determined. For example, the determined word sequence can be compared with a pre-stored word database, and the candidate word sequence with the highest degree of matching with the word database can be selected as the candidate word sequence with the highest probability, and then the candidate word sequence with the highest probability can be selected as the candidate word sequence to be input by the user Text.

In other possible implementation manners, the replaced pinyin may also be input into a preset semantic recognition model to obtain a candidate character sequence with the highest probability output by the semantic recognition model. Among them, the semantic recognition model is trained based on pronunciation and the text sequence with the highest probability as a training sample.

In another possible implementation, the lip feature sequence is obtained from the radar wave signal sequence reflected by the obstacle, and the lip language recognition model is based on the lip feature sequence extracted from the radar wave signal sequence, and the lip The text corresponding to the partial feature sequence is obtained by training as the training sample. Correspondingly, the specific flow of the text input method is shown in FIG. 6 , when the text input operation of the user is detected, the first letter of the text to be input input by the user is obtained, and the radar in front of the electronic device is used to transmit radar waves. and receive the radar wave signal sequence reflected by the obstacle, that is, the reflected signal sequence, and the reflected signal sequence includes the reflected signals at each moment in the continuous time. Among them, the radar can be a 60GHz millimeter-wave radar, and the radar antenna can be a single-transmission and multi-reception mode, or a multi-transmission and multi-reception mode. Since the delay of the reflected signal relative to the transmitted signal and the Doppler effect of the reflected signal can reflect the characteristics of the obstacle, including the size, shape, distance, speed and other information of the obstacle, it is possible to use the lip movement when speaking to the user. The reflected signal sequence is processed to obtain the lip feature sequence of the user when speaking. Then input the lip feature sequence into the lip language recognition model to obtain the text sequence output by the lip language recognition model.

In a possible implementation manner, the radar wave may be modulated by a modulation format of Frequency Modulated Continuous Wave (FMCW), and the FMCW modulation format is modulated by a periodic sawtooth wave function. After the radar wave is modulated, the modulated radar wave as shown in Figure 7 is obtained, in which the reflected signal s2 is delayed relative to the transmitted signal s1, and the reflected signal and the transmitted signal have a frequency difference, and the reason for the frequency difference is an obstacle The Doppler effect of objects in motion. The reflected signal is multiplied by the transmitted signal, and the multiplied signal is low-pass filtered based on the analog signal to obtain the beat signal. After the beat signal is obtained, fast Fourier transform (FFT) is performed on the beat signal, and background removal (such as filtering) is performed to remove the static background environment, and the distance as shown in Figure 8 is obtained. Range Doppler Map (RDM). Each grid in the RDM corresponds to an element in the matrix. In the RDM, the element in each column represents the distance of the obstacle, and the element in each row represents the speed of the obstacle. The speed and distance of the obstacle at the current moment can be determined according to the RDM. For example, the black area in Figure 8 represents the speed and distance of the obstacle at the current moment. After obtaining the RDM, according to the reflected signals received by each receiving antenna, the arrival angle of the obstacle can be obtained, according to the arrival angle and distance of the obstacle, the spatial position information of the obstacle can be obtained, and then according to the spatial position information of the obstacle 3D reconstruction is performed on the obstacle to obtain the 3D depth map of the obstacle. Combining the 3D depth map with the velocity value in the RDM, the 4D (3D space and velocity) vector signal of the obstacle can be obtained.

Similarly, in the embodiment of the present application, after receiving the reflected signal, the electronic device performs the above-mentioned processing on the reflected signal and the corresponding transmitted signal to obtain the four-dimensional vector signal of the lip, and the four-dimensional vector signal of the lip is used as the lip feature, according to The lip feature sequence can be obtained from the four-dimensional vector signal of the lips at each moment. Then input the lip feature sequence into the lip language recognition model to obtain the text sequence output by the lip language recognition model. After the text sequence output by the lip language recognition model is obtained, the first letter is used to correct the text sequence, and the corrected candidate text sequence is obtained. After the candidate text sequences are obtained, the semantics of each candidate text sequence are determined, the candidate text sequence with the highest probability is determined according to the semantics of each candidate text sequence, and the candidate text sequence with the highest probability is used as the text to be input by the user.

Since the radar wave signal has low requirements on the environment, for example, it is not affected by external light, the application range of the electronic device can be improved by using the radar wave signal to obtain the lip feature sequence.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

FIG. 9 shows a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that, the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or less components than those shown, or some components may be combined, or some components may be separated, or different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 . In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.

The PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160 . For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 . MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc. In some embodiments, the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 . The processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .

The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The charging management module 140 is used to receive charging input from the charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .

The power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 . The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example, the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 . The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like. The mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .

The modem processor may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 . In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR). The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc. The GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).

The electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Display screen 194 is used to display images, videos, and the like. Display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light). emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on. In some embodiments, the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.

The electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193 . For example, when taking a photo, the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193 .

Camera 193 is used to capture still images or video. The object is projected through the lens to generate an optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1. In the embodiment of the present application, the camera 193 is used to capture the face image of the user when the user's text input operation is detected.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The NPU is a neural-network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process the input information, and can continuously learn by itself. Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.

Internal memory 121 may be used to store computer executable program code, which includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

The audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

Speaker 170A, also referred to as a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also referred to as "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a call or a voice message, the voice can be answered by placing the receiver 170B close to the human ear.

The microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.

The earphone jack 170D is used to connect wired earphones. The earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.

The pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals. In some embodiments, the pressure sensor 180A may be provided on the display screen 194 . There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, and the like. The capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A. In some embodiments, touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.

The gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 . In some embodiments, the angular velocity of electronic device 100 about three axes (ie, x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the angle at which the electronic device 100 shakes, calculates the distance that the lens module needs to compensate for according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse motion to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenarios.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). The magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.

Distance sensor 180F for measuring distance. The electronic device 100 can measure the distance through radar, infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing. In some embodiments, the electronic device 100 may also measure the distance and speed of obstacles using the distance sensor 180F.

Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes. The light emitting diodes may be infrared light emitting diodes. The electronic device 100 emits infrared light to the outside through the light emitting diode. Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 . The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense ambient light brightness. The electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket, so as to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.

The temperature sensor 180J is used to detect the temperature. In some embodiments, the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 caused by the low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.

Touch sensor 180K, also called "touch device". The touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”. The touch sensor 180K is used to detect a touch operation on or near it. The touch sensor can pass the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through display screen 194 . In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.

The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal. In some embodiments, the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone. The audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function. The application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.

The keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key. The electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .

Motor 191 can generate vibrating cues. The motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. The motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 . The electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication. In some embodiments, the electronic device 100 employs an eSIM, ie: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .

In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware. The computer program can be stored in a computer-readable storage medium, and the computer program When executed by a processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. For example, U disk, mobile hard disk, disk or CD, etc.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

Finally, it should be noted that: the above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this, and any changes or replacements within the technical scope disclosed in the present application should be covered by the present application. within the scope of protection of the application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A text input method, comprising:

When a text input operation by the user is detected, obtain the lip change information of the user and the character information input by the user, where the lip change information includes the lip features of the user when he speaks the text to be input sequence;

The text to be input by the user is determined according to the lip feature sequence and the character information.
The method according to claim 1, wherein the acquiring the lip change information of the user comprises:

A camera is used to collect an image sequence including the lip region of the user, and lip features are extracted from each image of the image sequence to obtain a lip feature sequence.
The method according to claim 1, wherein the acquiring the lip change information of the user comprises:

transmitting a wireless signal, and acquiring a reflected signal sequence, wherein the reflected signal in the reflected signal sequence is a signal reflected back by the wireless signal after encountering an obstacle;

An obstacle is determined according to the reflected signal sequence, and if the obstacle is a lip, a lip feature is extracted from each reflected signal of the reflected signal sequence to obtain a lip feature sequence.
The method according to any one of claims 1 to 3, wherein the character information includes the first letter of the character to be input by the user.
The method according to claim 4, wherein the determining the text to be input by the user according to the lip feature sequence and the character information comprises:

determining the character sequence corresponding to the lip feature sequence;

Correcting the character sequence according to the first initial to obtain at least one corrected candidate character sequence;

A candidate character sequence with the highest probability is determined from the candidate character sequence, and the candidate character sequence with the highest probability is used as the character to be input by the user.
The method according to claim 5, wherein, performing correction processing on the character sequence according to the first initial to obtain at least one corrected candidate character sequence, comprising:

extracting the second initial of each character in the sequence of characters;

matching the extracted second initial with the first initial;

If there is an unmatched second initial letter, replace the unmatched second initial letter with the corresponding first initial letter to obtain at least one replaced text sequence, and use the replaced text sequence as the Candidate text sequences.
The method according to claim 6, wherein if there is an unmatched second initial, replacing the unmatched second initial with a corresponding first initial to obtain at least the replaced first letter. A literal sequence consisting of:

If there is an unmatched second initial, and there is a letter associated with the corresponding first initial, then replace the unmatched second initial with the corresponding first initial, and replace the unmatched second initial with the corresponding first initial The second initial of is replaced with the associated letter, resulting in at least one literal sequence after the replacement.
The method according to any one of claims 5 to 7, wherein the determining the character sequence corresponding to the lip feature sequence comprises:

Inputting the lip feature sequence into a trained lip language recognition model to obtain a text sequence output by the lip language recognition model, the lip language recognition model is used to recognize the text corresponding to the lip features, and the lip language recognition model It is trained based on lip features and the text corresponding to the lip features as training samples.
An electronic device, characterized in that it includes a processor, and the processor is configured to execute a computer program stored in a memory, so as to implement the method according to any one of claims 1 to 8.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1 to 8 is implemented.