CN112669817B

CN112669817B - Language identification method and device and electronic equipment

Info

Publication number: CN112669817B
Application number: CN202011563773.2A
Authority: CN
Inventors: 李沛德; 官龙腾
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2023-08-08
Anticipated expiration: 2040-12-25
Also published as: CN112669817A

Abstract

The application discloses a language identification method, a language identification device and electronic equipment, belongs to the technical field of communication, and can solve the problems of low efficiency and low success rate of information identification of a face movement part of the electronic equipment. The method comprises the following steps: obtaining target light corresponding to a face part of a person to be shot through N spectral filters in M spectral filters, wherein M and N are positive integers, and N is smaller than or equal to M; according to the target light, a target image sequence of a target part is obtained through a real sensing pixel sensor, wherein the target part is a part in a motion state in a face part; and outputting language information corresponding to the target part according to the target image sequence. The method and the device are applied to the process of information identification of the face movement part of the electronic equipment.

Description

Language identification method and device and electronic equipment

Technical Field

The application belongs to the technical field of communication, and particularly relates to a language identification method, a language identification device and electronic equipment.

Background

In general, a user may perform image acquisition with a lens of an electronic device, so as to perform functions such as face recognition and lip recognition. Specifically, the electronic device may firstly empty a large number of photodiodes in the lens-based sensor, then perform conversion processing on the optical signals of the face of the user in a set exposure time to form a digital signal matrix (i.e., an image), and then perform rolling shutter exposure (rolling shutter) through the sensor, i.e., the sensor performs line-by-line exposure by scanning line by line until all pixel points are exposed, so as to capture the image of the user, thereby obtaining the mouth shape information of the user who is speaking, and realizing the lip-language recognition function.

However, the above manner may cause a slow imaging speed of the lens, and when a moving object (e.g., a high-speed moving object) is photographed, phenomena such as "tilting", "rolling uncertainty" or "partial exposure" may occur, which are defined as jelly effects, and thus the instantaneity and accuracy of capturing an image signal are poor, and thus the efficiency and success rate of information identification of a moving part of a face of an electronic device are low.

Disclosure of Invention

The embodiment of the application aims to provide a language identification method, a language identification device and electronic equipment, which can solve the problems of low efficiency and low success rate of information identification of a face movement part of the electronic equipment.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, an embodiment of the present application provides a language identification method, applied to an electronic device, where a camera module of the electronic device includes M spectral filters and a real pixel sensor, where the language identification method includes: obtaining target light corresponding to a face part of a person to be shot through N spectral filters in M spectral filters, wherein M and N are positive integers, and N is smaller than or equal to M; according to the target light, a target image sequence of a target part is obtained through a real sensing pixel sensor, wherein the target part is a part in a motion state in a face part; and outputting language information corresponding to the target part according to the target image sequence.

In a second aspect, an embodiment of the present application provides a language identification device, where a camera module of the language identification device includes M spectral filters and a sensing pixel sensor, and the language identification device includes: the device comprises an acquisition module and an output module. The device comprises an acquisition module, a light source module and a light source module, wherein the acquisition module is used for acquiring target light corresponding to a human face part of an object to be shot through N spectral filters in M spectral filters, M and N are positive integers, and N is smaller than or equal to M; and acquiring a target image sequence of a target part through a real pixel sensor according to the target light, wherein the target part is a part in a motion state in the face part. And the output module is used for outputting language information corresponding to the target part according to the target image sequence acquired by the acquisition module.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction implementing the steps of the method according to the first aspect when executed by the processor.

In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.

In the embodiment of the application, the electronic device may acquire, through at least one spectral filter on the camera module, a target light corresponding to a human face of the object to be photographed, and acquire, according to the target light, a target image sequence of a target part in a motion state through a real sensing pixel sensor, so as to output language information corresponding to the target part according to the target image sequence. Because the camera module of the electronic equipment comprises at least one spectral filter and a real-sense pixel sensor, the electronic equipment can screen the light of the face through the at least one spectral filter (namely only acquire the light of the face part) so as to reduce the redundant data amount, so that the real-sense pixel sensor can quickly acquire a corresponding image based on the light of the face part; in addition, the real-sense pixel sensor is used for collecting images of the object in a motion state and has the characteristic of capturing high-precision dynamic objects in real time, so that an image sequence of a target part in the motion state can be collected through the real-sense pixel sensor, namely, the motion change condition of the target part can be accurately obtained through the real-sense pixel sensor, and accordingly, the electronic equipment can rapidly and accurately output corresponding language information according to the image sequence, and the efficiency and the success rate of information identification (such as lip language identification) of the face motion part of the electronic equipment are improved.

Drawings

FIG. 1 is a schematic diagram of a language identification method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a lens module according to an embodiment of the present disclosure;

FIG. 3 is a second schematic diagram of a lens module according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a light processing module according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an arrangement distribution of real sensing pixels according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a language identification apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;

fig. 8 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The following explains some concepts and/or terms related to the language identification method, the language identification device and the electronic equipment provided in the embodiments of the present application.

Camera sensor (sensor): is the core of the camera and is also the most critical technology in the camera. Sensors are generally divided into two types: one is a widely used CCD (charge coupled) element, and the other is a CMOS (complementary metal oxide semiconductor) device.

The CMOS device used at present is a semiconductor capable of recording a change in light in a digital camera, like a CCD. In comparison with conventional cameras, the conventional camera uses "film" as a carrier on which information is recorded, whereas the "film" of the digital camera is an imaging photosensitive element thereof, which is an unreplaced "film" of the digital camera and is integral with the camera. The general workflow of CMOS is that an optical signal is sensed by a large number of photodiodes (pixels), converted into an electrical signal, formed into a digital signal matrix (i.e., image) by an amplifying circuit and an AD converting circuit, and compressed and stored by ISP.

The earliest applications of multispectral, hyperspectral and even hyperspectral cameras or imagers were from aerial photography, that is to say generally for satellite remote sensing. As the name suggests, multispectral imaging divides an incident full-band or broad-band optical signal into several narrow-band light beams according to spectral resolution (minimum wavelength interval capable of resolving), and then images the light beams on a Sensor. If the spectrum resolution is high enough, the incident spectrum curve can be completely sampled by the multispectral technology, and the multispectral technology has wide application in civil and military fields (unmanned aerial vehicle investigation, agricultural pest and disease damage monitoring, soil fertility, water area pollution monitoring and the like).

The lip language recognition system uses a machine vision technology to continuously recognize a human face from an image, judges a person speaking in the image, extracts continuous mouth-shaped variation characteristics of the person, then inputs the continuous variation characteristics into a lip language recognition model to recognize pronunciation corresponding to the population of the speaking person, and then calculates natural language sentences with the highest possibility according to the recognized pronunciation.

The embodiment of the application can be applied to information recognition scenes of face movement parts such as lip recognition scenes and eye movement information recognition. Based on the novel sensor design a camera module structure, electronic equipment can utilize multispectral to screen face spectrum based on multispectral and sense pixel technology, reduce redundant data volume, then utilize the contour map of the high accuracy dynamic object that sense pixel snatched in real time, realize accurately, acquire user's mouth shape information (or eye motion information) in real time, and translate into corresponding language information in contrast mouth shape information base (or eye motion information base), promoted the success rate and the efficiency of information discernment (e.g. lip language discernment) of face motion position.

An embodiment of the application provides a language identification method, and fig. 1 shows a flowchart of the language identification method provided in the embodiment of the application, and the method can be applied to electronic equipment. As shown in fig. 1, the language identification method provided in the embodiment of the present application may include the following steps 201 to 203.

Step 201, the electronic device obtains target light corresponding to a face of a person to be photographed through N spectral filters in the M spectral filters.

In the embodiment of the application, the camera module of the electronic equipment comprises M spectral filters and a real-sense pixel sensor; m and N are positive integers, and N is less than or equal to M.

In this embodiment of the present application, a user may start an application program having a face movement part recognition function (e.g., a lip recognition function and an eye recognition function), so as to collect, by using a camera of an electronic device, an image of a part in a movement state in an object to be photographed (e.g., a person to be photographed) based on multispectral and real-sense pixel technologies, so as to determine language information corresponding to the part according to the collected image of the part.

In this embodiment of the application, the electronic device includes a lens module, and the lens module includes at least the following components: lenses, filters (e.g., M spectral filters) and sensors (e.g., a real pixel sensor, a conventional sensor). The filter may be disposed outside the sensor (may be referred to as an out-sensor filter) or inside the sensor (may be referred to as an in-sensor filter).

In this embodiment of the present application, the electronic device may acquire light rays of a specific spectrum (for example, target light rays corresponding to a face portion of a person) through an optical filter and a multispectral technology, so as to implement image acquisition only on the face portion of the person.

It should be noted that the spectrum of a substance is unique, and any substance has a unique spectral reflectance curve, i.e., a raman scattering line. The raman scattering can directly reflect the vibration and rotation energy level of molecules or crystal lattices, and the composition components or structures (molecules and atoms) of the raman scattering are different for different substances, so that any two different substances cannot have identical raman spectral lines, namely the spectral reflection curve of the substances is necessarily unique, and therefore unnecessary light of a spectrum can be filtered through a filter, and light of a specific spectrum can be obtained.

The sensor of the camera module is similar to a human eye in sensitization, the CFA covered on the pixels simulates three cone cells of the human eye, the three cone cells sample a spectral reflection curve, a digital signal is formed and then is processed by an ISP (Internet service provider) to finally become an image, so that the sensor images three primary colors of an incident spectral curve to form three discrete data (which can be understood as three-spectrum sampling), and finally the three primary colors are mixed into colors and brightness, so that the human eye and the camera can only obtain the colors and the brightness, and the details (metamerism) of the spectral curve cannot be obtained. The camera of the electronic device does not have identification capability on many substance characteristics and attributes (such as skin color health degree and other characteristics), and the multispectral technology can assist the sensor of the camera in identifying the substance characteristics and attributes, and can be divided into time domain multispectral and space domain multispectral.

It should be noted that, the above-mentioned external filter of the sensor uses a spectroscope or a filter to implement spectrum filtering before the sensor senses light, so as to obtain spectrum information, that is, before the light reaches the sensor, the light is filtered by the filter to obtain the light with a specific spectrum (for example, the light of the face part), so that after the light with the specific spectrum enters the sensor, the sensor can perform image acquisition based on the light with the specific spectrum.

Alternatively, in the embodiment of the present application, the external filter of the sensor may be disposed outside the lens, or may be disposed inside the lens (i.e., between the lens and the sensor). The arrangement mode of the optical filter outside the sensor can be a drawable mode, a wheel disc mode, a spectroscope mode and the like. The drawable mode can be controlled by electronic equipment or a mechanical system, and the embodiment of the application is not limited.

Illustratively, as shown in fig. 2, the lens module of the electronic device includes a lens, M spectral filters (e.g., 1# spectral filter to M # spectral filter), a sensor (e.g., a real pixel sensor), and an infrared filter; after the light of the object to be shot is filtered through some of the M spectral filters, the light enters the sensor so as to acquire images through the sensor.

It should be noted that, the above-mentioned filter in the sensor is to replace the CFA, i.e. the RGB filter, on each pixel in the conventional sensor with a spectral filter, and each spectral filter corresponds to a spectral band allowed to pass through.

As shown in fig. 3, the lens module of the electronic device includes a lens and a sensor (e.g., a solid-state pixel sensor), and M spectral filters (not shown in fig. 3) are disposed in the sensor; after the light of the object to be shot enters the sensor, a spectrum filter in the sensor filters the light, and the image acquisition is realized based on the light of the specific spectrum obtained after the light filtering treatment.

It should be noted that the above-mentioned real-sensing pixel sensor may be understood as a new sensor, and a few special pixels (may be called real-sensing pixels) are integrated in a conventional sensor, and these pixels may independently output information of a moving object, so as to capture a contour map of a high-precision dynamic object in real time by using the real-sensing pixels. The method for acquiring the image in the motion state by the real pixel sensor will be described in the following embodiments, and will not be described here.

Optionally, in an embodiment of the present application, the lens module may further include at least one of the following: lens protective cover, VCM, lens holder, infrared filter, flexible circuit board (FPC), lens module connector, etc.

Optionally, in the embodiment of the present application, before the step 201, the language identification method provided in the embodiment of the present application further includes a step 301 described below, and the step 201 may be specifically implemented by a step 201a described below.

Step 301, the electronic device receives a first input of a user.

In this embodiment of the present application, the first input is an input of a target spectrum range by a user, and a spectrum of the target light is within the target spectrum range.

In this embodiment of the present invention, a user may input a target spectrum range into an electronic device, so that when the electronic device detects light, light (for example, light of a face portion) with a spectrum located in the target spectrum range is allowed to enter into a sensor through an optical filter, and other light is filtered out, so that the sensor performs image acquisition based on the received light.

In step 201a, the electronic device determines N spectral filters corresponding to the target spectral range from the M spectral filters in response to the first input, and performs filtering processing on light rays outside the target spectral range through the N spectral filters to obtain target light rays.

In this embodiment of the present application, each spectral filter corresponds to one spectral band (i.e., a band of a spectrum that allows transmission), and the electronic device may control, according to a target spectral range input by a user, the spectral filters (for example, N spectral filters) of the spectral band within the target spectral range to be in a working state, so that filtering processing of other light rays (i.e., light rays whose spectrum is outside the target spectral range) is performed through the spectral filters.

It should be noted that, controlling the N spectral filters to be in the working state can be understood as: controlling the N spectral filters to rotate to a preset position so that the N spectral filters can receive light; or controlling the N spectral filters to be in an electrified state so that the N spectral filters receive light.

In the embodiment of the application, the user can select the specific spectrum to be acquired through the multispectral technology, so that the electronic equipment can acquire the image of the face part only, the purpose of eliminating redundant information is achieved, and further, the dynamic object is identified with high precision only through the sense pixel technology, so that language information corresponding to the motion area can be acquired rapidly and accurately.

Step 202, the electronic device obtains a target image sequence of the target part through the sensing pixel sensor according to the target light.

In this embodiment of the present application, the above-mentioned real-sense pixel sensor is used to collect an image of an object in a motion state; the target part is a part in a motion state in the face part.

In this embodiment of the present application, the target image sequence includes multiple frame images of the target portion, that is, the target image sequence is an image of the target portion acquired in real time during the movement process of the target portion.

Alternatively, in the embodiment of the present application, the target portion may be a lip portion or an eye portion in a face portion.

In this embodiment of the present application, after an application program with a lip recognition function is started by a user, a real-sense pixel sensor in a camera module of an electronic device is in a working state, and when a lip portion of a person to be shot is detected to be in a motion state (for example, in a speaking state), a multi-frame image of the lip portion in the motion state can be obtained in real time, where each frame image in the multi-frame image corresponds to one piece of mouth shape information.

In the embodiment of the present application, the user may define, in advance, a correspondence between blink or eye movement information and language information in the electronic device, so as to store the correspondence in the eye movement information base. Therefore, after the application program with the eye recognition function is started by a user, the real-sense pixel sensor in the camera module of the electronic equipment is in a working state, and under the condition that the eye part of a shot person is detected to be in a motion state (for example, in a blinking state), multi-frame images of the eye part in the motion state can be acquired in real time, each frame of image in the multi-frame images corresponds to one eye motion information respectively, so that the electronic equipment can determine the corresponding language information according to the corresponding relation between the eye motion information and the language information.

It should be noted that, compared with a conventional sensor, the conventional sensor needs to integrate the light information within a period of time (related to the frame rate) and then read out the light information one by one according to the sequence, while the actual pixel sensor may be independent, and as the pixel clock frequency (i.e. each time unit), the actual pixel sensor may sense the external environment brightness change (i.e. the light brightness change) in real time, convert the environment brightness change into the current change, and further convert the current change into the digital signal change, and if the variation of the digital signal of a certain actual pixel exceeds a preset threshold (for example, VH and VL described in the embodiments below), the system will be reported to require outputting the corresponding image (i.e. determining that the corresponding part of the actual pixel is in a motion state), and output a data packet with coordinate information, brightness information and time information. Therefore, compared with the conventional sensor, the real-time performance of the real-sense pixel sensor is better, the signal redundancy is better, the precision is higher, the motion information of a dynamic object can be captured, and the real-time performance and the precision are higher.

It will be appreciated that the above-described real pixel sensor is used to capture images of an object in motion, whereas for an object in a stationary state, the real pixel sensor may not be used to capture images.

For example, as shown in fig. 4, the electronic device may perform current amplification processing on an input optical signal (an optical signal corresponding to a light ray) through the current amplification module, then perform signal conversion processing on the optical signal after the current amplification processing through the analog-to-digital conversion module to obtain a corresponding digital signal, then determine whether a variation of the digital signal (i.e., an intensity variation of a digital signal of a previous clock frequency and a digital signal of a current clock frequency) exceeds preset thresholds VH and VL (VH is a digital signal value+threshold of the real sensing pixel when the previous clock frequency in the figure, VL is a digital signal value-threshold of the real sensing pixel when the previous clock frequency in the figure), and output a corresponding analog signal (i.e., an image) through the signal control module, the analog signal output module, the multiplexing switch module and the like when the variation of the digital signal exceeds the preset threshold.

Optionally, in the embodiment of the present application, after the electronic device obtains the target light through the external filter of the sensor, the electronic device may obtain the target image sequence of the target portion through the density sensing pixel sensor. Note that the density-sensing pixel sensor is a sensor in which sensing pixels are inserted into a conventional sensor (as shown in fig. 5, the sensor of the electronic device includes conventional pixels (i.e., red, green, blue (RGB) pixels and sensing pixels)) according to density, that is, the sensor has an ability to accurately capture the outline of a moving object at a high speed while not affecting the normal output of a color image.

In the embodiment of the present application, in the above-described manner of acquiring the target image sequence by using the external optical filter of the sensor and the density-sensing pixel sensor, capturing an image of a high-speed moving object under a specific spectrum may be achieved without losing a capturing function of a conventional color image; when the user subsequently needs to use the shooting function of the conventional color image, the external optical filter of the sensor can be fully retracted so as to realize shooting of the conventional color image.

Alternatively, in embodiments of the present application, the electronic device may continue to pass through the sensor including the filter after the target light is captured through the in-sensor filter (i.e., the sensor described above with CFA on each pixel in the conventional sensor replaced with a spectral filter). It should be noted that there are no more regular pixels in the sensor, i.e. the pixels in the sensor are all real pixels.

Illustratively, as shown in tables 1 and 2, the filter distribution in two different real pixel sensors (e.g., real pixel sensor 1 and real pixel sensor 2) is shown. The pixels in the real pixel sensor 1 and the real pixel sensor 2 are all real pixels.

TABLE 1

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s1

s2

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

s3

s4

In table 1, s1 to s4 represent filters of different wavelength bands in the sensor 1.

TABLE 2

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s1

s2

s3

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s4

s5

s6

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

s7

s8

s9

In table 2, s1 to s9 represent filters of different wavelength bands in the sensor 2.

In the embodiment of the application, in the mode of acquiring the target image sequence through the optical filter in the sensor and the actual pixel sensor, the lens module is simple in structure, does not need an additional optical filtering structure, is low in design difficulty, and is high in production yield and reliability.

Alternatively, in the embodiment of the present application, the above step 202 may be specifically implemented by the following steps 202a and 202 b.

In step 202a, the electronic device determines, by using the real pixel sensor, a portion corresponding to a light ray satisfying a preset condition in the target light ray as a target portion.

In this embodiment of the present application, the preset condition is that a signal variation of the light is greater than or equal to a preset threshold.

It can be understood that, in the case that the signal variation of the light is greater than or equal to the preset threshold, it can be determined that the portion corresponding to the light is in a motion state. It should be noted that, for the relevant description of the signal variation of the light, reference may be made to the description in the above step 202, which is not repeated here.

Step 202b, the electronic device obtains pixel point information of the target part in each time unit, and generates a target image sequence according to the pixel point information.

In the embodiment of the application, the electronic device may acquire the pixel point information of the target portion at intervals of time units, generate a frame of image according to the acquired pixel point information of the current time unit, and so on until the target portion is in a static state, so as to acquire a multi-frame image, that is, a target image sequence.

Optionally, in an embodiment of the present application, the pixel point information may include at least one of the following: a pixel brightness value corresponding to the target part, pixel distribution information corresponding to the target part, a pixel color value corresponding to the target part, and the like.

Step 203, the electronic device outputs language information corresponding to the target part according to the target image sequence.

It can be understood that the target image sequence is a multi-frame image obtained when the target part is in a motion state, and each frame of image corresponds to one piece of mouth shape information respectively; the electronic equipment can determine the corresponding text content according to the mouth shape information corresponding to each frame of image, so as to determine and output the language information corresponding to the target part according to all text contents corresponding to the multi-frame image.

Alternatively, in the embodiment of the present application, the above step 203 may be specifically implemented by the following steps 203a and 203 b.

Step 203a, the electronic device determines motion information of the target part according to the target image sequence.

In this embodiment of the present application, the motion information is used to indicate a motion change condition of the target portion.

In this embodiment of the present application, the motion information indicates a difference condition (i.e., a mouth-shaped variation condition) between each frame of image in the target image sequence, and different motion information corresponds to different mouth-shaped information, and the electronic device may search, according to the motion information of the target portion, a corresponding mouth-shaped information set from a preset mouth-shaped information base, so as to obtain language information corresponding to the target portion.

Step 203b, the electronic device determines a mouth shape information set corresponding to the motion information from a preset mouth shape information base, and performs language identification processing on the mouth shape information set to output language information.

In the embodiment of the application, the electronic device may compare the target image sequence with a preset mouth shape information base, that is, for each frame of image in the target image sequence, search mouth shape information matched with one frame of image from the preset mouth shape information base to obtain a mouth shape information set corresponding to the target image sequence, and then the electronic device may identify the mouth shape information set and translate the mouth shape information set into corresponding language information.

Optionally, in the embodiment of the present application, the electronic device outputs the language information in a preset manner, where the preset manner includes at least one of the following: a text display mode, a voice playing mode and the like.

In the embodiment of the application, the electronic equipment can acquire the image sequence corresponding to the target part in real time through the real-time sensing pixels so as to determine the corresponding motion information, thereby accurately acquiring the mouth shape information of the user in real time, and translating the mouth shape information into the corresponding language information by comparing the mouth shape information base, and improving the success rate and the efficiency of lip language identification.

According to the language identification method, the electronic equipment can acquire target light corresponding to the face of the person of the object to be shot through at least one spectral filter on the camera module, and acquire a target image sequence of the target part in a motion state through the real-sense pixel sensor according to the target light, so that language information corresponding to the target part is output according to the target image sequence. Because the camera module of the electronic equipment comprises at least one spectral filter and a real-sense pixel sensor, the electronic equipment can screen the light of the face through the at least one spectral filter (namely only acquire the light of the face part) so as to reduce the redundant data amount, so that the real-sense pixel sensor can quickly acquire a corresponding image based on the light of the face part; in addition, the real-sense pixel sensor is used for collecting images of the object in a motion state and has the characteristic of capturing high-precision dynamic objects in real time, so that an image sequence of a target part in the motion state can be collected through the real-sense pixel sensor, namely, the motion change condition of the target part can be accurately obtained through the real-sense pixel sensor, and accordingly, the electronic equipment can rapidly and accurately output corresponding language information according to the image sequence, and the efficiency and the success rate of the electronic equipment for carrying out information of the human face motion part are improved.

It should be noted that, in the language identification method provided in the embodiment of the present application, the execution subject may be a language identification device, or a control module in the language identification device for executing the language identification method. In the embodiment of the present application, a method for executing a language recognition method by using a language recognition device is taken as an example, and the language recognition device provided in the embodiment of the present application is described.

Fig. 6 shows a schematic diagram of a possible structure of the language identification apparatus according to the embodiment of the present application. As shown in fig. 6, the language recognition apparatus 60 may include: an acquisition module 61 and an output module 62.

The acquiring module 61 is configured to acquire, through N spectral filters of the M spectral filters, a target light corresponding to a face portion of a person to be photographed, where M and N are both positive integers, and N is less than or equal to M; and acquiring a target image sequence of a target part through a real pixel sensor according to the target light, wherein the target part is a part in a motion state in the face part. And an output module 62, configured to output language information corresponding to the target location according to the target image sequence acquired by the acquisition module 61.

In one possible implementation manner, the language identifying apparatus 60 provided in the embodiment of the present application includes: and a receiving module. The receiving module is configured to receive a first input of a user, where the first input is an input of the user to a target spectral range, and a spectrum of the target light is in the target spectral range, before the obtaining module 61 obtains, through N spectral filters in the M spectral filters, a target light corresponding to a face portion of a person to be photographed. The obtaining module 61 is specifically configured to determine, from the M spectral filters, N spectral filters corresponding to the target spectral range in response to the first input received by the receiving module, and perform filtering processing on light rays outside the target spectral range through the N spectral filters, so as to obtain the target light rays.

In a possible implementation manner, the obtaining module 61 is specifically configured to determine, by using a real-sensing pixel sensor, a location corresponding to a light ray satisfying a preset condition in the target light ray as a target location, where the preset condition is that a signal variation of the light ray is greater than or equal to a preset threshold; and acquiring pixel point information of the target part in each time unit, and generating a target image sequence according to the pixel point information.

In one possible implementation manner, the output module 62 is specifically configured to determine, according to the target image sequence, motion information of the target portion, where the motion information is used to indicate a motion change situation of the target portion; and determining a mouth shape information set corresponding to the motion information from a preset mouth shape information base, and carrying out language identification processing on the mouth shape information set to output language information.

The embodiment of the application provides a language identification device, because a camera module of electronic equipment comprises at least one spectral filter and a real-sense pixel sensor, the electronic equipment can screen human face light through the at least one spectral filter (namely, only acquire the light of a human face part) so as to reduce redundant data quantity, and therefore the real-sense pixel sensor can quickly acquire a corresponding image based on the light of the human face part; in addition, the real-sense pixel sensor is used for collecting images of the object in a motion state and has the characteristic of capturing high-precision dynamic objects in real time, so that an image sequence of a target part in the motion state can be collected through the real-sense pixel sensor, namely, the motion change condition of the target part can be accurately obtained through the real-sense pixel sensor, and accordingly, the electronic equipment can rapidly and accurately output corresponding language information according to the image sequence, and the efficiency and the success rate of the electronic equipment for carrying out information of the human face motion part are improved.

The language identification device in the embodiment of the application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device may be a mobile electronic device or a non-mobile electronic device. By way of example, the mobile electronic device may be a cell phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, wearable device, ultra-mobile personal computer (ultra-mobile personal computer, UMPC), netbook or personal digital assistant (personal digital assistant, PDA), etc., and the non-mobile electronic device may be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The language identification apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.

The language identification device provided in the embodiment of the present application can implement each process implemented by the above method embodiment, and can achieve the same technical effect, so that repetition is avoided, and details are not repeated here.

Optionally, as shown in fig. 7, the embodiment of the present application further provides an electronic device 90, including a processor 91, a memory 92, and a program or an instruction stored in the memory 92 and capable of running on the processor 91, where the program or the instruction implements each process of the embodiment of the method when executed by the processor 91, and the process can achieve the same technical effect, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 8 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 100 includes, but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, and processor 110.

Those skilled in the art will appreciate that the electronic device 100 may further include a power source (e.g., a battery) for powering the various components, and that the power source may be logically coupled to the processor 110 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 8 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

In this embodiment of the application, the camera module of the electronic device includes M spectral filters and a real-sense pixel sensor.

The processor 110 is configured to obtain, through N spectral filters of the M spectral filters, a target light corresponding to a face portion of a person to be photographed, where M and N are positive integers, and N is less than or equal to M; according to the target light, a target image sequence of a target part is obtained through a real sensing pixel sensor, wherein the target part is a part in a motion state in a face part; and outputting language information corresponding to the target part according to the target image sequence.

The embodiment of the application provides electronic equipment, because a camera module of the electronic equipment comprises at least one spectral filter and a real-sense pixel sensor, the electronic equipment can screen human face light through the at least one spectral filter (namely, only acquire the light of a human face part) so as to reduce redundant data quantity, so that the real-sense pixel sensor can quickly acquire a corresponding image based on the light of the human face part; in addition, the real-sense pixel sensor is used for collecting images of the object in a motion state and has the characteristic of capturing high-precision dynamic objects in real time, so that an image sequence of a target part in the motion state can be collected through the real-sense pixel sensor, namely, the motion change condition of the target part can be accurately obtained through the real-sense pixel sensor, and accordingly, the electronic equipment can rapidly and accurately output corresponding language information according to the image sequence, and the efficiency and the success rate of the electronic equipment for carrying out information of the human face motion part are improved.

Optionally, in the embodiment of the present application, the user input unit 107 is configured to receive, before the processor 110 obtains, through N spectral filters of the M spectral filters, a first input of a user, where the first input is an input of the user to a target spectral range, and a spectrum of the target light is within the target spectral range, where the target light corresponds to a face portion of a person of the object to be photographed. The processor 110 is specifically configured to determine, in response to the first input, N spectral filters corresponding to the target spectral range from the M spectral filters, and perform filtering processing on light rays outside the target spectral range through the N spectral filters to obtain target light rays.

Optionally, in the embodiment of the present application, the processor 110 is specifically configured to determine, by using the real pixel sensor, a portion corresponding to a light ray satisfying a preset condition in the target light ray as the target portion, where the preset condition is that a signal variation of the light ray is greater than or equal to a preset threshold; and acquiring pixel point information of the target part in each time unit, and generating a target image sequence according to the pixel point information.

Optionally, in the embodiment of the present application, the processor 110 is specifically configured to determine, according to the target image sequence, motion information of the target portion, where the motion information is used to indicate a motion change condition of the target portion; and determining a mouth shape information set corresponding to the motion information from a preset mouth shape information base, and carrying out language identification processing on the mouth shape information set to output language information.

The electronic device provided in the embodiment of the present application can implement each process implemented by the above method embodiment, and can achieve the same technical effects, so that repetition is avoided, and details are not repeated here.

It should be appreciated that in embodiments of the present application, the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 107 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also referred to as a touch screen. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein. Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 110 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored, and when the program or the instruction is executed by a processor, the program or the instruction implement each process of the embodiment of the method, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.

Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium such as a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

The embodiment of the application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled with the processor, and the processor is used for running a program or an instruction, implementing each process of the above method embodiment, and achieving the same technical effect, so as to avoid repetition, and not repeated here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. The language recognition method is characterized by being applied to electronic equipment, wherein a camera module of the electronic equipment comprises M spectral filters and a real-sense pixel sensor, and the method comprises the following steps:

obtaining target light corresponding to a human face part of an object to be shot through N spectral filters in the M spectral filters, wherein M and N are positive integers, and N is smaller than or equal to M;

determining a part corresponding to the light rays meeting the preset conditions in the target light rays as a target part by the real-sense pixel sensor, and acquiring a target image sequence of the target part; the preset condition is that the signal variation of the light is larger than or equal to a preset threshold value, and the target part is a part in a motion state in the face part;

and outputting language information corresponding to the target part according to the target image sequence.

2. The method according to claim 1, wherein before the target light corresponding to the face of the person to be photographed is obtained through N spectral filters of the M spectral filters, the method further comprises:

receiving a first input of a user, wherein the first input is input of the user to a target spectrum range, and the spectrum of the target light is in the target spectrum range;

The obtaining, by the N spectral filters of the M spectral filters, a target light corresponding to a face of a person to be photographed includes:

and responding to the first input, determining N spectral filters corresponding to the target spectral range from the M spectral filters, and filtering light rays outside the target spectral range through the N spectral filters to acquire the target light rays.

3. The method of claim 1, wherein the acquiring the sequence of target images of the target site comprises:

and acquiring pixel point information of the target part in each time unit, and generating the target image sequence according to the pixel point information.

4. A method according to any one of claims 1 to 3, wherein outputting language information corresponding to the target site according to the target image sequence includes:

determining motion information of the target part according to the target image sequence, wherein the motion information is used for indicating motion change conditions of the target part;

determining a mouth shape information set corresponding to the motion information from a preset mouth shape information base, and carrying out language identification processing on the mouth shape information set so as to output the language information.

5. The utility model provides a language recognition device which characterized in that, language recognition device's camera module includes M spectral filter and sense pixel sensor, language recognition device includes: an acquisition module and an output module;

the acquisition module is used for acquiring target light corresponding to the face part of the person to be shot through N spectral filters in the M spectral filters, M and N are positive integers, and N is smaller than or equal to M; determining a part corresponding to the light rays meeting the preset conditions in the target light rays as a target part through the real sensing pixel sensor, and acquiring a target image sequence of the target part; the preset condition is that the signal variation of the light is larger than or equal to a preset threshold value, and the target part is a part in a motion state in the face part;

the output module is used for outputting language information corresponding to the target part according to the target image sequence acquired by the acquisition module.

6. The apparatus of claim 5, wherein the language identification means comprises: a receiving module;

the receiving module is configured to receive a first input of a user before the obtaining module obtains, through N spectral filters in the M spectral filters, a target light corresponding to a face portion of a person to be photographed, where the first input is an input of the user to a target spectral range, and a spectrum of the target light is in the target spectral range;

The acquisition module is specifically configured to determine, from the M spectral filters, the N spectral filters corresponding to the target spectral range in response to the first input received by the receiving module, and perform filtering processing on light rays outside the target spectral range through the N spectral filters, so as to acquire the target light rays.

7. The apparatus according to claim 5, wherein the obtaining module is specifically configured to obtain pixel point information of the target portion at each time unit, and generate the target image sequence according to the pixel point information.

8. The apparatus according to any one of claims 5 to 7, wherein the output module is configured to determine motion information of the target site, in particular according to the target image sequence, the motion information being configured to indicate a motion change situation of the target site; and determining a mouth shape information set corresponding to the motion information from a preset mouth shape information base, and carrying out language identification processing on the mouth shape information set so as to output the language information.

9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the language identification method of any one of claims 1 to 4.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores thereon a program or instructions which, when executed by a processor, implement the steps of the language identification method according to any one of claims 1 to 4.