WO2022062884A1 - Procédé d'entrée de texte, dispositif électronique et support d'enregistrement lisible par ordinateur - Google Patents

Procédé d'entrée de texte, dispositif électronique et support d'enregistrement lisible par ordinateur Download PDF

Info

Publication number
WO2022062884A1
WO2022062884A1 PCT/CN2021/116515 CN2021116515W WO2022062884A1 WO 2022062884 A1 WO2022062884 A1 WO 2022062884A1 CN 2021116515 W CN2021116515 W CN 2021116515W WO 2022062884 A1 WO2022062884 A1 WO 2022062884A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
lip
text
user
initial
Prior art date
Application number
PCT/CN2021/116515
Other languages
English (en)
Chinese (zh)
Inventor
刘浩
黄韬
胡粤麟
秦磊
张乐乐
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022062884A1 publication Critical patent/WO2022062884A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and in particular, to a text input method, an electronic device, and a computer-readable storage medium.
  • AI Artificial Intelligence
  • the mobile phone input method has added Wubi, handwriting, voice input and other methods to the initial pinyin input.
  • the pinyin input is still It is the most used input method by users, followed by handwriting input, and finally voice input. Since the speed of voice input is the fastest, more and more users choose the way of voice input. However, in some scenarios, such as in a noisy environment or in a private environment where it is inconvenient for users to make loud voices, the accuracy of speech recognition is low, resulting in a reduction in the accuracy of text input using the voice input method.
  • the present application provides a text input method, an electronic device, and a computer-readable storage medium, which can improve the accuracy of text input when it is inconvenient for a user to perform voice input.
  • a text input method comprising: when a text input operation by a user is detected, acquiring lip change information of the user and character information input by the user, wherein the lip change information includes the The lip feature sequence when the user speaks the text to be input; the text to be input by the user is determined according to the lip feature sequence and the character information.
  • the lip change information of the user and the character information input by the user are obtained, and the lip change information includes the lip feature sequence when the user speaks the text to be input.
  • the sequence of lip features and character information determine the text to be entered by the user. Since the text to be input determined according to the character information input by the user has a high accuracy rate, combining the lip feature sequence with the character information to determine the text to be recognized can improve text input when the user is inconvenient for voice input. 's accuracy.
  • the acquiring the lip change information of the user includes:
  • a camera is used to collect an image sequence including the lip region of the user, and lip features are extracted from each image of the image sequence to obtain a lip feature sequence.
  • the electronic device turns on the camera to capture the image of the lip area.
  • the image capture is completed.
  • the lip language input and the input of character information can be performed synchronously without affecting the efficiency of text input.
  • the lip feature sequence obtained from the image sequence can better reflect the change information of the user's lip shape, and the accuracy is high, and then the text to be input by the user is determined according to the lip feature sequence with high accuracy, which improves the text input. 's accuracy.
  • the acquiring the lip change information of the user includes:
  • Transmit a wireless signal and obtain a reflected signal sequence, wherein the reflected signal in the reflected signal sequence is the signal reflected back by the wireless signal after encountering an obstacle; determine the obstacle according to the reflected signal sequence, if the obstacle If the object is a lip, the lip feature is extracted from each reflected signal of the reflected signal sequence to obtain a lip feature sequence. Since the wireless signal has low requirements on the environment, for example, it is not affected by external light, the application range of the electronic device can be improved by using the wireless signal to obtain the lip feature sequence.
  • the character information includes the first letter of the text to be input by the user. Since the input speed is faster when only the first letter is input, determining the text to be input by the user according to the lip feature sequence and the first letter has higher input efficiency.
  • the determining the text to be input by the user according to the lip feature sequence and the character information includes:
  • determining the character sequence corresponding to the lip feature sequence correcting the character sequence according to the first initial to obtain at least one corrected candidate character sequence; determining the character sequence with the highest probability from the candidate character sequence
  • the candidate text sequence, the candidate text sequence with the highest probability is used as the text to be input by the user. Since multiple character sequences can be determined according to the lip feature sequence, there will be wrong character sequences in the determined multiple character sequences. Therefore, using the first letter to correct the character sequence can improve the accuracy of character input. It can reduce the number of candidate text sequences, thereby reducing the amount of calculation for determining the text to be input subsequently, and improving the calculation speed.
  • performing correction processing on the character sequence according to the first initial to obtain at least one corrected candidate character sequence including:
  • the determining the character sequence corresponding to the lip feature sequence includes:
  • the lip language recognition model Inputting the lip feature sequence into a trained lip language recognition model to obtain a text sequence output by the lip language recognition model, the lip language recognition model is used to recognize the text corresponding to the lip features, and the lip language recognition model It is trained based on lip features and the text corresponding to the lip features as training samples. Since the lip language recognition model is trained according to the training samples, it has universality. Therefore, the lip language recognition model is used to recognize the text sequence, which improves the accuracy of the output text sequence.
  • a text input device including:
  • an acquisition module configured to acquire the lip change information of the user and the character information input by the user when detecting the text input operation of the user, the lip change information including the user speaking the text to be input lip feature sequence at time;
  • a processing module configured to determine the text to be input by the user according to the lip feature sequence and the character information.
  • the obtaining module is specifically used for:
  • a camera is used to collect an image sequence including the lip region of the user, and lip features are extracted from each image of the image sequence to obtain a lip feature sequence.
  • the obtaining module is specifically used for:
  • An obstacle is determined according to the reflected signal sequence, and if the obstacle is a lip, a lip feature is extracted from each reflected signal of the reflected signal sequence to obtain a lip feature sequence.
  • the character information includes the first letter of the text to be input by the user.
  • the processing module includes:
  • a determining unit for determining the character sequence corresponding to the lip feature sequence
  • An error correction unit configured to perform correction processing on the character sequence according to the first initial to obtain at least one corrected candidate character sequence
  • An output unit configured to determine a candidate character sequence with the highest probability from the candidate character sequence, and use the candidate character sequence with the highest probability as the character to be input by the user.
  • the error correction unit is specifically used for:
  • the error correction unit is further configured to:
  • the determining unit is specifically configured to:
  • the lip language recognition model Inputting the lip feature sequence into a trained lip language recognition model to obtain a text sequence output by the lip language recognition model, the lip language recognition model is used to recognize the text corresponding to the lip features, and the lip language recognition model It is trained based on lip features and the text corresponding to the lip features as training samples.
  • an electronic device including a processor for executing a computer program stored in a memory, so as to implement the text input method according to the above-mentioned first aspect.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the text input method according to the first aspect.
  • a fifth aspect provides a computer program product that, when the computer program product runs on an electronic device, enables the electronic device to execute the text input method described in the first aspect.
  • FIG. 1 is a schematic flowchart of a text input method provided by an embodiment of the present application.
  • FIG. 2 is an application scenario diagram of the text input method provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of a lip shape provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a method for outputting a text sequence provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a radar wave signal provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • Existing input methods mainly include voice input method, pinyin input method and handwriting input method.
  • Pinyin input method and handwriting input method have higher input accuracy, but lower input efficiency.
  • the voice input method generally performs voice recognition first, and converts the recognized voice into corresponding text, which has high input efficiency. As a result, the accuracy of speech-to-text conversion is also reduced.
  • an embodiment of the present application provides a text input method.
  • a text input operation by a user is detected, the lip change information of the user and the character information input by the user are obtained, where the lip change information includes the user speaking the text to be input.
  • the lip feature sequence when writing is used to determine the text to be input by the user according to the lip feature sequence and character information. Since the accuracy of the characters to be input determined according to the character information input by the user is high, combining the lip feature sequence with the character information to determine the characters to be recognized can improve the accuracy of the input characters in the voice input mode.
  • the text input method provided in the embodiments of the present application is applied to electronic devices, and the electronic devices may be mobile phones, tablet computers, handheld computers, personal digital assistants (PDAs), speakers with screens, wearable devices, and the like.
  • PDAs personal digital assistants
  • a text input method provided by an embodiment of the present application includes:
  • the user can input character information by means of soft keyboard input or handwriting input.
  • the input character information can be text, letters corresponding to the text, or the first letter of the text, and the text can be Chinese, English or other foreign language.
  • the user may or may not make a sound when speaking the text to be input.
  • the lip feature sequence includes lip features at each moment in a continuous time.
  • the lip features are used to represent the lip shape of the user when speaking, and different lip shapes correspond to different pronunciations.
  • the electronic device collects an image sequence of the user's lip region through a camera, the image sequence is an image of the lip region at each moment in a continuous time, and then extracts lip features from each image in the image sequence , to obtain a sequence of lip features.
  • the electronic device transmits a wireless signal through a wireless sensor, and the wireless signal is reflected back after encountering an obstacle, and the reflected signal is a reflected signal.
  • the electronic device receives the reflected signal through the wireless sensor, obtains the reflected signal sequence according to the reflected signal at each moment in the continuous time, and determines whether the obstacle is a lip according to the reflected signal sequence.
  • a lip feature is extracted from each reflected signal in the reflected signal sequence to obtain a lip feature sequence.
  • the wireless signal may be a radar wave signal, an infrared signal, or an ultrasonic signal, or the like.
  • the electronic device acquires the user's lip change information after detecting that the user starts to input characters.
  • a preset input mode e.g, a multi-mode input mode
  • the user opens the text message editing page, and the electronic device opens the interface shown in Figure 2 after detecting the instruction of the multi-mode input mode.
  • the interface inputs the first letter of Chinese pinyin, and simultaneously performs lip language input.
  • the camera of the electronic device collects the lip image of the user, and extracts the lip feature from the lip image at each moment, that is, the lip feature sequence is obtained.
  • S102 Determine the text to be input by the user according to the lip feature sequence and the character information.
  • the lip shape when the lip features change, that is, when the lip shape changes, the pronunciation will also change. Therefore, according to the lip shape when the user speaks, the user's pronunciation can be determined, and the text spoken by the user can be further determined.
  • the corresponding lip shape may also be the same. For example, when the lip is in the shape as shown in (A) in Figure 3, the corresponding pronunciation may be the English letter "A” ", "E” or "I”, when the lips are in the shape shown in (B) in Figure 3, the corresponding pronunciation may be "Q" or "W” of English letters.
  • the same character may need a changed lip shape before it can be issued.
  • the English letters "M” and "L” need a changed lip shape before they can be issued. Therefore, according to the lip features of the user at each moment, it is possible to determine multiple characters or determine wrong characters. Then, according to the lip features at each moment, that is, the lip feature sequence, a variety of possible text sequences can be determined.
  • Each character information input by the user also corresponds to one or more characters.
  • Combining the lip feature sequence with the character information can correct the characters identified according to the lip feature sequence, or remove the wrong characters identified according to the lip feature sequence. , so as to obtain an accurate text sequence, that is, the text to be input by the user, so that the text input can be completed without acquiring the user's voice, and the accuracy of the text input is improved.
  • the electronic device may determine multiple candidate character sequences according to the lip feature sequence and character information, and after determining multiple candidate character sequences, select the one with the highest probability according to the semantics of each character sequence.
  • Candidate text sequence the candidate text sequence with the highest probability is used as the text to be input by the user.
  • the candidate text sequence may be input into the trained semantic recognition model, and the candidate text sequence with the highest probability output by the semantic recognition model is obtained.
  • the semantic recognition model is trained based on the text sequence and the text sequence with the highest probability as the training sample.
  • the combination of the lip feature sequence and the character information to determine the text to be recognized can improve the input accuracy in the voice input mode. accuracy of the text.
  • the character information input by the user includes the first letter of the text to be input by the user, wherein “first” is used to distinguish and describe the “initial letter", and “the first letter” refers to the first letter of the text It can be the first letter in Chinese Pinyin, such as "w” in “wen”, or the first letter of an English word, such as "G” in “Good".
  • the first letter includes the first letter of each of the words to be entered.
  • the electronic device After determining the text sequence according to the acquired lip feature sequence, the electronic device corrects the pinyin of the determined text sequence according to the first letter, and obtains the corrected pinyin. The resulting pinyin determines at least one corrected candidate character sequence.
  • the electronic device corrects the pinyin of the text sequence according to the first letter to obtain a corrected pinyin, and determines the corrected candidate text sequence according to the corrected pinyin. For example, if the character sequence determined according to the lip feature sequence is "Netherlands" and the first letter is "hn”, then the pinyin "helan” of the character sequence is corrected, and the corrected pinyin is "henan".
  • the candidate character sequences determined by the pronunciation of "henan” include “Henan", "Henan” and so on.
  • the electronic device corrects the text sequence according to the first letter, obtains a plurality of corrected pinyin, and determines the corrected candidate text sequence according to each corrected pinyin. For example, if the character sequence determined according to the lip feature sequence is "airplane" and the first letter is "hj”, then the pinyin “feiji” of the character sequence is corrected, and the corrected pinyin is "huijia”, "" huiji”, etc. For the pinyin "huijia”, the determined candidate text sequences are "home”, “exchange price”, etc. For the pinyin "huiji”, the determined candidate text sequences are "collection", "benefit” and so on.
  • the electronic device After obtaining the corrected candidate character sequence, the electronic device determines the candidate character sequence with the highest probability from the candidate character sequence, and uses the candidate character sequence with the highest probability as the character to be input by the user. For example, input the candidate text sequence into the semantic recognition model, and obtain the candidate text sequence with the highest probability output by the semantic recognition model. Since the initial letter input method has a low accuracy rate without historical association information, but the input speed is fast, the combination of the lip feature sequence and the initial letter improves the accuracy rate of text input, and the user uses a shorter The input of the initial letter can be completed in time, which improves the input efficiency.
  • the method for the electronic device to correct the character sequence determined according to the lip feature sequence according to the first initial letter is specifically: first extracting the second character sequence of each character in the determined character sequence Initials, where "second" is used to distinguish the description "initials".
  • the second letter can be directly extracted from the text sequence.
  • the replaced character sequence is used as a candidate character sequence.
  • the corresponding first letter refers to the first letter corresponding to the position of the second letter. For example, if the second letter is the first letter of the second letter in the text sequence, the corresponding first letter The second initial entered for the user.
  • the unmatched second letter is directly replaced with the first letter to obtain the replaced pinyin, and then the replaced at least one is determined according to the replaced pinyin.
  • a candidate text sequence For example, if the character sequence determined according to the lip feature sequence is "support”, the extracted second initial letter is "zc"; if the first initial letter is “zs”, there is a mismatched second initial letter” c", replace “c” with the first letter "s”, get the replaced pinyin as "zhishi”, and then determine the candidate word sequence according to the replaced pinyin as "knowledge”, “instruction”, etc.
  • the unmatched second letter is directly replaced with the first letter.
  • the replaced letter cannot form a text, or the replaced letter and the text sequence
  • the replaced letters are corrected, and at least one character sequence is obtained according to the correction result.
  • the character sequence determined according to the lip feature sequence is "wow", and the corresponding pronunciation is "wa”, then the extracted second initial letter is "w", if the first initial letter is "h”, then There is an unmatched second letter, if you directly replace "w” with “h”, the pinyin obtained according to the replaced letter is "ha”, because the pronunciation of "ha” and "wa” is quite different, therefore, it cannot be Direct replacement. Therefore, according to the preset correction rules, the replaced letters are corrected, and "ha” is corrected to "hua", so that the pronunciation of the corrected pinyin is close to the pronunciation of the text sequence, and then according to the corrected pinyin
  • the determined candidate character sequences are "flower", "hua” and so on.
  • an associated database is preset, and the associated database stores letters with associated relationships. Letters are letters that are pronounced close to each other and are easily confused.
  • the electronic device After determining that there is an unmatched second initial, the electronic device determines whether there is a letter associated with the corresponding first initial according to the associated letters stored in the associated database. If there is a letter associated with the corresponding first letter, replace the unmatched second letter with the corresponding first letter, and replace the unmatched second letter with the associated letter, and get the replacement Therefore, the range of the candidate character sequence can be expanded, and when the character to be input by the user is determined according to the candidate character sequence, the accuracy of character input is improved.
  • the candidate text sequence obtained according to the replaced pronunciation includes "driver”, “fourth level”, “actual”, “ timing” etc.
  • the lip feature sequence is obtained from the image sequence of the lip region of the user
  • the lip language recognition model is based on the lip feature sequence extracted from the lip region image sequence
  • the lip feature sequence The corresponding text sequences are obtained by training as training samples.
  • the specific flow of the text input method is shown in FIG. 4 , when the text input operation of the user is detected, the first letter of the text to be input input by the user is obtained, and the front camera on the electronic device is used to collect the face. Image, identify each face image collected, and identify the face in the image.
  • the image of the lip area is cut out from the face, and the images of the lip area at each moment in the continuous time form an image sequence, and the lips are extracted from each image in the image sequence. feature to get the lip feature sequence.
  • the lip language recognition model may be a Spatiotemporal Convolutional Neural Networks (STCNN) model. After the text sequence output by the lip language recognition model is obtained, the first letter is used to correct the text sequence, and the corrected candidate text sequence is obtained.
  • STCNN Spatiotemporal Convolutional Neural Networks
  • the candidate character sequence is determined according to the replaced pinyin.
  • the semantics of each candidate character sequence is determined, and according to the semantics of each candidate character sequence, the candidate character sequence with the highest probability is determined.
  • the determined word sequence can be compared with a pre-stored word database, and the candidate word sequence with the highest degree of matching with the word database can be selected as the candidate word sequence with the highest probability, and then the candidate word sequence with the highest probability can be selected as the candidate word sequence to be input by the user Text.
  • the replaced pinyin may also be input into a preset semantic recognition model to obtain a candidate character sequence with the highest probability output by the semantic recognition model.
  • the semantic recognition model is trained based on pronunciation and the text sequence with the highest probability as a training sample.
  • the lip feature sequence is obtained from the radar wave signal sequence reflected by the obstacle, and the lip language recognition model is based on the lip feature sequence extracted from the radar wave signal sequence, and the lip
  • the text corresponding to the partial feature sequence is obtained by training as the training sample.
  • the specific flow of the text input method is shown in FIG. 6 , when the text input operation of the user is detected, the first letter of the text to be input input by the user is obtained, and the radar in front of the electronic device is used to transmit radar waves. and receive the radar wave signal sequence reflected by the obstacle, that is, the reflected signal sequence, and the reflected signal sequence includes the reflected signals at each moment in the continuous time.
  • the radar can be a 60GHz millimeter-wave radar, and the radar antenna can be a single-transmission and multi-reception mode, or a multi-transmission and multi-reception mode. Since the delay of the reflected signal relative to the transmitted signal and the Doppler effect of the reflected signal can reflect the characteristics of the obstacle, including the size, shape, distance, speed and other information of the obstacle, it is possible to use the lip movement when speaking to the user.
  • the reflected signal sequence is processed to obtain the lip feature sequence of the user when speaking. Then input the lip feature sequence into the lip language recognition model to obtain the text sequence output by the lip language recognition model.
  • the radar wave may be modulated by a modulation format of Frequency Modulated Continuous Wave (FMCW), and the FMCW modulation format is modulated by a periodic sawtooth wave function.
  • FMCW Frequency Modulated Continuous Wave
  • the modulated radar wave as shown in Figure 7 is obtained, in which the reflected signal s2 is delayed relative to the transmitted signal s1, and the reflected signal and the transmitted signal have a frequency difference, and the reason for the frequency difference is an obstacle
  • the Doppler effect of objects in motion The reflected signal is multiplied by the transmitted signal, and the multiplied signal is low-pass filtered based on the analog signal to obtain the beat signal.
  • FFT fast Fourier transform
  • background removal such as filtering
  • the distance as shown in Figure 8 is obtained.
  • Range Doppler Map RDM
  • Each grid in the RDM corresponds to an element in the matrix.
  • the element in each column represents the distance of the obstacle
  • the element in each row represents the speed of the obstacle.
  • the speed and distance of the obstacle at the current moment can be determined according to the RDM.
  • the black area in Figure 8 represents the speed and distance of the obstacle at the current moment.
  • the arrival angle of the obstacle can be obtained, according to the arrival angle and distance of the obstacle, the spatial position information of the obstacle can be obtained, and then according to the spatial position information of the obstacle 3D reconstruction is performed on the obstacle to obtain the 3D depth map of the obstacle.
  • the 4D (3D space and velocity) vector signal of the obstacle can be obtained.
  • the electronic device after receiving the reflected signal, performs the above-mentioned processing on the reflected signal and the corresponding transmitted signal to obtain the four-dimensional vector signal of the lip, and the four-dimensional vector signal of the lip is used as the lip feature, according to
  • the lip feature sequence can be obtained from the four-dimensional vector signal of the lips at each moment. Then input the lip feature sequence into the lip language recognition model to obtain the text sequence output by the lip language recognition model. After the text sequence output by the lip language recognition model is obtained, the first letter is used to correct the text sequence, and the corrected candidate text sequence is obtained. After the candidate text sequences are obtained, the semantics of each candidate text sequence are determined, the candidate text sequence with the highest probability is determined according to the semantics of each candidate text sequence, and the candidate text sequence with the highest probability is used as the text to be input by the user.
  • the application range of the electronic device can be improved by using the radar wave signal to obtain the lip feature sequence.
  • FIG. 9 shows a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than those shown, or some components may be combined, or some components may be separated, or different component arrangements.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may contain multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • the camera 193 is used to capture the face image of the user when the user's text input operation is detected.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the angle at which the electronic device 100 shakes, calculates the distance that the lens module needs to compensate for according to the angle, and allows the lens to counteract the shake of the electronic device 100 through reverse motion to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the electronic device 100 can measure the distance through radar, infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing. In some embodiments, the electronic device 100 may also measure the distance and speed of obstacles using the distance sensor 180F.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 caused by the low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch device”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware.
  • the computer program can be stored in a computer-readable storage medium, and the computer program When executed by a processor, the steps of each of the above method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente demande se rapporte au domaine de l'intelligence artificielle (IA), et concerne un procédé d'entrée de texte, un dispositif électronique et un support d'enregistrement lisible par ordinateur. Le procédé d'entrée de texte comprend : lors de la détection d'une opération d'entrée de texte d'un utilisateur, l'acquisition d'informations de changement de lèvre de l'utilisateur et d'informations de caractère entrées par l'utilisateur, les informations de changement de lèvre comprenant une séquence de caractéristiques de lèvre lorsque l'utilisateur prononce un texte à entrer ; et la détermination, en fonction de la séquence de caractéristiques de lèvre et des informations de caractère, du texte devant être entré par l'utilisateur. Étant donné que la précision du texte à saisir qui est déterminée en fonction des informations de caractère entrées par l'utilisateur est relativement élevée, la combinaison de la séquence de caractéristiques de lèvre avec les informations de caractère pour déterminer un texte à reconnaître peut améliorer la précision d'entrée de texte dans le cas où l'utilisateur est peu pratique pour une entrée vocale.
PCT/CN2021/116515 2020-09-27 2021-09-03 Procédé d'entrée de texte, dispositif électronique et support d'enregistrement lisible par ordinateur WO2022062884A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011036037.1A CN114356109A (zh) 2020-09-27 2020-09-27 文字输入方法、电子设备及计算机可读存储介质
CN202011036037.1 2020-09-27

Publications (1)

Publication Number Publication Date
WO2022062884A1 true WO2022062884A1 (fr) 2022-03-31

Family

ID=80844894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116515 WO2022062884A1 (fr) 2020-09-27 2021-09-03 Procédé d'entrée de texte, dispositif électronique et support d'enregistrement lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN114356109A (fr)
WO (1) WO2022062884A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601575A (zh) * 2022-10-25 2023-01-13 扬州市职业大学(扬州开放大学)(Cn) 一种辅助失语失写者常用语表达的方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021076349A1 (fr) * 2019-10-18 2021-04-22 Google Llc Reconnaissance vocale automatique audiovisuelle de multiples locuteurs de bout en bout

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1704877A (zh) * 2004-05-26 2005-12-07 华为技术有限公司 一种手持设备的文字输入方法和装置
CN102117115A (zh) * 2009-12-31 2011-07-06 上海量科电子科技有限公司 一种利用唇语进行文字输入选择的系统及实现方法
JP2011186994A (ja) * 2010-03-11 2011-09-22 Fujitsu Ltd 文字入力装置および文字入力方法
CN104217218A (zh) * 2014-09-11 2014-12-17 广州市香港科大霍英东研究院 一种唇语识别方法及系统
JP2015172848A (ja) * 2014-03-12 2015-10-01 株式会社ゼンリンデータコム 読唇入力装置、読唇入力方法及び読唇入力プログラム

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637521A (zh) * 2018-10-29 2019-04-16 深圳壹账通智能科技有限公司 一种基于深度学习的唇语识别方法及装置
CN110427809B (zh) * 2019-06-21 2023-07-25 平安科技(深圳)有限公司 基于深度学习的唇语识别方法、装置、电子设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1704877A (zh) * 2004-05-26 2005-12-07 华为技术有限公司 一种手持设备的文字输入方法和装置
CN102117115A (zh) * 2009-12-31 2011-07-06 上海量科电子科技有限公司 一种利用唇语进行文字输入选择的系统及实现方法
JP2011186994A (ja) * 2010-03-11 2011-09-22 Fujitsu Ltd 文字入力装置および文字入力方法
JP2015172848A (ja) * 2014-03-12 2015-10-01 株式会社ゼンリンデータコム 読唇入力装置、読唇入力方法及び読唇入力プログラム
CN104217218A (zh) * 2014-09-11 2014-12-17 广州市香港科大霍英东研究院 一种唇语识别方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115601575A (zh) * 2022-10-25 2023-01-13 扬州市职业大学(扬州开放大学)(Cn) 一种辅助失语失写者常用语表达的方法及系统
CN115601575B (zh) * 2022-10-25 2023-10-31 扬州市职业大学(扬州开放大学) 一种辅助失语失写者常用语表达的方法及系统

Also Published As

Publication number Publication date
CN114356109A (zh) 2022-04-15

Similar Documents

Publication Publication Date Title
CN114365476A (zh) 一种拍摄方法及设备
WO2022193989A1 (fr) Procédé et appareil d'exploitation de dispositif électronique, et dispositif électronique
CN111742539B (zh) 一种语音控制命令生成方法及终端
WO2022100685A1 (fr) Procédé de traitement de commande de dessin et dispositif associé
CN114650363A (zh) 一种图像显示的方法及电子设备
CN113542580B (zh) 去除眼镜光斑的方法、装置及电子设备
WO2022062884A1 (fr) Procédé d'entrée de texte, dispositif électronique et support d'enregistrement lisible par ordinateur
CN114242037A (zh) 一种虚拟人物生成方法及其装置
WO2022042768A1 (fr) Procédé d'affichage d'index, dispositif électronique et support de stockage lisible par ordinateur
CN113672756A (zh) 一种视觉定位方法及电子设备
CN114880251A (zh) 存储单元的访问方法、访问装置和终端设备
CN115589051A (zh) 充电方法和终端设备
WO2022022319A1 (fr) Procédé et système de traitement d'image, dispositif électronique et système de puce
CN111104295A (zh) 一种页面加载过程的测试方法及设备
CN112584037B (zh) 保存图像的方法及电子设备
CN113467735A (zh) 图像调整方法、电子设备及存储介质
CN114822525A (zh) 语音控制方法和电子设备
CN115641867B (zh) 语音处理方法和终端设备
CN109285563B (zh) 在线翻译过程中的语音数据处理方法及装置
WO2022214004A1 (fr) Procédé de détermination d'utilisateur cible, dispositif électronique et support de stockage lisible par ordinateur
WO2022095752A1 (fr) Procédé de démultiplexage de trame, dispositif électronique et support de stockage
WO2022007757A1 (fr) Procédé d'enregistrement d'empreinte vocale inter-appareils, dispositif électronique et support de stockage
CN114120987B (zh) 一种语音唤醒方法、电子设备及芯片系统
CN115393676A (zh) 手势控制优化方法、装置、终端和存储介质
CN113391735A (zh) 显示形态的调整方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871256

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21871256

Country of ref document: EP

Kind code of ref document: A1