CN109145878B

CN109145878B - Image extraction method and device

Info

Publication number: CN109145878B
Application number: CN201811159896.2A
Authority: CN
Inventors: 吴珂
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2018-09-30
Filing date: 2018-09-30
Publication date: 2022-02-15
Anticipated expiration: 2038-09-30
Also published as: CN109145878A

Abstract

The disclosure relates to an image extraction method and device, and the method comprises the following steps: in the video call process, carrying out image recognition on video frames in a video generated by the video call to obtain a recognition result; judging whether the video frame meets a preset extraction condition or not according to the identification result; and when the video frame meets the preset extraction condition, extracting the video frame into a picture. According to the method and the device, the video frames meeting the preset conditions in the video call process can be automatically extracted into the pictures according to the preset extraction conditions, the video pictures do not need to be manually captured by a user, the video call of the user cannot be interfered, the natural relaxation state of the user can be captured, and therefore the image extraction efficiency is improved.

Description

Image extraction method and device

Technical Field

The present disclosure relates to the field of communications technologies, and in particular, to an image extraction method and apparatus.

Background

Generally, a video call may be represented as a communication mode in which two or more terminal devices transmit voice and video to each other in real time based on the internet or the mobile internet. When a user carries out video call with friends, the user often has a lot of interesting interaction. In the related art, a user can only capture an image considered to be wonderful by manually calling an interface of a video call intercepted by screen capturing software, however, the user often misses the wonderful moment by manual interception, and the manual interception increases the complexity of operation, thereby affecting the video call of the user, resulting in low image extraction efficiency and poor quality.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides an image extraction method and apparatus.

According to a first aspect of the embodiments of the present disclosure, there is provided an image extraction method, including:

in the video call process, carrying out image recognition on video frames in a video generated by the video call to obtain a recognition result;

judging whether the video frame meets a preset extraction condition or not according to the identification result;

and when the video frame meets the preset extraction condition, extracting the video frame into a picture.

In one possible implementation, the video generated by the video call includes video captured at either or more ends of the video call.

In one possible implementation, the image recognition includes face recognition,

the preset extraction conditions include any one or more of the following:

a human face appears in the video frame;

the face in the video frame is at the designated position in the video frame;

the proportion of the area of the face to the area of the video frame is larger than a first threshold value;

the shooting angle of the lens of the face meets the angle condition;

the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value;

a specified expression appears in the video frame.

In one possible implementation, the image recognition includes target object recognition,

the preset extraction conditions include one or more of the following:

the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes;

a target object appears in the video frame.

In one possible implementation, the method further includes:

providing options for extraction conditions;

the selected extraction condition is determined as a preset extraction condition.

In one possible implementation, the method further includes:

sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;

and when receiving the authorization information returned by the video call opposite terminal in response to the authorization request, carrying out image identification on the video frame in the video generated by the video call opposite terminal.

According to a second aspect of the embodiments of the present disclosure, there is provided an image extraction apparatus including:

the identification module is used for carrying out image identification on video frames in a video generated by video call in the video call process to obtain an identification result;

the judging module is used for judging whether the video frame meets a preset extraction condition or not according to the identification result;

and the extraction module is used for extracting the video frame into a picture when the video frame meets the preset extraction condition.

the preset extraction conditions include any one or more of the following:

a human face appears in the video frame;

the face in the video frame is at the designated position in the video frame;

the shooting angle of the lens of the face meets the angle condition;

a specified expression appears in the video frame.

the preset extraction conditions include one or more of the following:

a target object appears in the video frame.

In one possible implementation, the apparatus further includes:

a display module for providing options of extraction conditions;

and the determining module is used for determining the selected extraction condition as a preset extraction condition.

In one possible implementation, the apparatus further includes:

the device comprises a sending module, a receiving module and a sending module, wherein the sending module is used for sending an authorization request to a video call opposite terminal, and the authorization request is used for requesting to identify a video generated by the video call opposite terminal;

and the receiving module is used for carrying out image identification on the video frame in the video generated by the video call opposite terminal when receiving the authorization information returned by the video call opposite terminal in response to the authorization request.

According to a third aspect of the embodiments of the present disclosure, there is provided an image extraction apparatus including: a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the above method is performed.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions which, when executed by a processor, enable the processor to perform the above-described method.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, a user does not need to manually capture video pictures, interference on the video call of the user is rarely generated, the natural and relaxed state of the user is favorably captured, and the image extraction efficiency and quality are effectively improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.

FIG. 3 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

FIG. 1 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. The method can be applied to terminal equipment such as desktop computers, notebook computers, tablet computers, mobile phones and the like, and is not limited herein. As shown in fig. 1, the method may include:

step 100, in the video call process, carrying out image recognition on a video frame in a video generated by the video call to obtain a recognition result;

step 101, judging whether the video frame meets a preset extraction condition or not according to the identification result;

and 102, when the video frame meets the preset extraction condition, extracting the video frame into a picture.

In this example, generally speaking, image recognition may be represented as a technique of processing, analyzing, and understanding an image with a computer to recognize various different patterns of objects and objects.

In one possible implementation, the image recognition process may include: the method comprises the steps of training different sample sets according to different extraction conditions to obtain a classifier (for example, the classifier can be generated based on a neural network), inputting a video frame into the classifier, obtaining a corresponding output result, namely an identification result, and extracting the video frame into a picture when the identification result meets the extraction conditions. It should be noted that, a person skilled in the art may also select another suitable identification method (for example, a clustering algorithm) to perform image identification on the video frame according to needs, and the present disclosure does not limit a specific image identification manner.

Video (Video) technology can be generally expressed as a technology for capturing, recording, processing, storing, transmitting, and reproducing moving images as electrical signals. A video may comprise a series of video frames.

As an example of this embodiment, the terminal device may obtain all or a part of video frames in a video generated during a video call when the video call is detected (for example, the terminal device may obtain odd-numbered or even-numbered video frames, which is not limited herein). For each obtained video frame, the terminal device can perform image recognition on the video frame to obtain a recognition result. When the terminal device judges that the video frame meets the preset extraction condition according to the identification result, the video frame can be extracted as a picture, and a plurality of extracted pictures can also be formed into a picture set. The terminal device can store the picture or the picture set, or send the picture or the picture set to other terminals, or share the picture or the picture set through an internet platform.

As an example of this embodiment, the video generated by the video call may include video captured at either or more ends of the video call. For example, if a terminal a, a terminal B, and a terminal C participate in a video call, the video generated during the video call may include video shot by any one or more of the terminals a, B, and C.

The method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, a user does not need to manually capture video pictures, interference on the video call of the user is rarely generated, the natural and relaxed state of the user is favorably captured, and the image extraction efficiency and quality are effectively improved.

As an example of this embodiment, the image recognition may include face recognition, and the preset extraction condition includes any one or more of the following: a human face appears in the video frame; the face in the video frame is at the designated position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value; a specified expression appears in the video frame.

The face recognition may be represented as a process of extracting face feature information from a face in an image, and performing recognition according to the face feature information to obtain a recognition result.

For example, the terminal device may detect that the extraction condition is: a human face appears in the video frame; the face in the video frame is at the designated position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value; and determining to perform face recognition on the video frame under the condition that any one or more of the specified expressions appear in the video frame.

For example, if the extraction condition is that a face appears in the video frame, the contour feature of the video frame may be extracted, and the contour feature of the video frame is compared with the pre-stored face contour feature with a similarity, and if the similarity obtained by the comparison is greater than a first similarity threshold, it may be determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the face appearing in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that a face in a video frame is located at a specified position in the video frame (for example, the middle of the video frame), it may be determined whether the face appears in the video frame, and then, a coordinate range where the face appears in the video frame may be determined when the face appears in the video frame, and if the coordinate range belongs to a preset coordinate range, it may be determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the face at the proper position in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that the ratio of the area of the face to the area of the video frame is greater than the first threshold, it may be determined whether the face appears in the video frame, and then, the ratio of the area of the face region to the area of the video frame may be determined under the condition that the face appears in the video frame, and when the ratio of the area of the face to the area of the video frame is greater than the preset first threshold, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the appropriate face size in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that the shot angle of the face satisfies the angle condition, it may be determined whether the face appears in the video frame, and then, when the face appears in the video frame, the positions of the five sense organs of the face may be determined, and the shot angle of the face in the video frame may be determined according to the positions of the five sense organs (for example, 35 degrees). If the shot angle of the face in the video frame satisfies an angle condition (for example, may be 30 degrees to 45 degrees), it may be determined that the video frame meets the extraction condition, and the video frame may be extracted as a picture. The method and the device can automatically extract the video frame of which the face shooting angle accords with the preset angle in the video call according to the setting of the user.

For example, if the extraction condition is that the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than the second threshold, it may be determined whether a face appears in the video frame, and then, when a face appears in the video frame, it may be determined that the difference between the brightness of the face region and the brightness of the background region excluding the face region (or the ratio between the brightness of the face region and the brightness of the background region excluding the face region) is smaller than the second threshold, and when the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than the second threshold, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the appropriate brightness of the face and the background in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that a smiling expression appears in the video, the feature information of the face in the video frame may be extracted under the condition that the face appears in the video frame, and the extracted feature information of the face is compared with the preset smiling face feature information in terms of similarity, and when the similarity between the extracted feature information of the face and the preset smiling face feature information is greater than a second similarity threshold, the smiling expression appearing in the video frame may be determined, and the video frame may be extracted as a picture. The video frame containing the designated expression in the video call can be automatically extracted according to the setting of the user.

As an example of this embodiment, the image recognition may include target object recognition, and the preset extraction condition includes one or more of the following: the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes; a target object appears in the video frame.

The target object identification can be expressed as a process of determining whether an object in a video frame belongs to a specified target (for example, whether a specified variety of animals and plants exists in the video frame, a specified natural landscape, a specified human appearance, a specified color collocation, and the like).

For example, the terminal device may detect that the extraction condition is: and under the condition that the clothing color and the background color in the video frame accord with the preset matching condition or the target object appears in the video frame, determining to perform target object identification on the video frame.

For example, if the extraction condition is that the clothing color and the background color in the video frame meet the preset matching condition, and the preset matching condition includes the range of the preset color number, it may be determined whether the clothing appears in the video frame in the process of identifying the target object, and the color number of the clothing color and the background color is determined when the clothing appears in the video frame. And when the color numbers of the clothing color and the background color belong to the range of the preset color number, determining that the clothing color and the background color in the video frame accord with the preset matching condition, and extracting the video frame as a picture. Therefore, the video frame with the clothes matched with the background color in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that a target object appears in the video frame and the target object is a designated person (e.g., a star or a relative of the user), facial feature information of the designated person may be preset, and when it is determined that a human face appears in the video frame, facial feature information of the human face in the video frame is extracted, and if facial feature information whose similarity with the facial feature information of the designated person is greater than a third similarity threshold exists in the video frame, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. Therefore, the video frame of the appointed person in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that a target object appears in the video frame and the target object is a person with a specified appearance characteristic (e.g., a woman, a man, a clown, or the like), the specified appearance characteristic information may be preset, and when it is determined that a face appears in the video frame, the appearance characteristic information of one or more faces in the video frame is extracted, and if the appearance characteristic information of a face with a similarity greater than a fourth similarity threshold exists in the video frame, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. Therefore, the video frame with the designated appearance in the video call can be automatically extracted according to the setting of the user. The terminal equipment can also perform sample training according to a group of pictures of face features specified by a user, which are predetermined by the user, so that the obtained specified face feature information is more in line with the requirements of the user.

For example, if the extraction condition is that a target object appears in a video frame and the target object is a specified landscape (e.g., a seaside, a mountain, etc.), image feature information of the specified landscape may be preset and the image feature information of the video frame may be determined, and if the similarity between the image feature information of the video frame and the image feature information of the specified landscape is greater than a fifth similarity threshold, it is determined that the video frame meets the extraction condition and the video frame is extracted as a picture. Therefore, the video frames with the specified scenery in the video call can be automatically extracted according to the setting of the user.

For example, if the extraction condition is that a target object appears in the video frame and the target object is a specified animal (e.g., a panda, an elephant, etc.), image feature information of the specified animal may be preset and image feature information of the video frame may be determined, and if the similarity between the image feature information of the video frame and the image feature information of the specified animal is greater than a sixth similarity threshold, it is determined that the video frame meets the extraction condition and the video frame is extracted as a picture. Therefore, the video frame of the specified animal in the video call can be automatically extracted according to the setting of the user.

FIG. 2 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. As shown in fig. 2, the difference between fig. 2 and fig. 1 is that the method may further include:

step 200, providing options for extraction conditions.

Step 201, the selected extraction condition is determined as a preset extraction condition.

For example, the terminal device may display a selection interface before the captured video frame is recognized (or before the video call is made, which is not limited herein), the selection interface may include a plurality of options for selecting the extraction condition (e.g., smiling face recognition, beach recognition, child recognition, etc.), and the terminal may set, as the preset extraction condition, the extraction condition corresponding to one or more effective selection operations (e.g., click operation, slide operation, etc.) on the selection interface when detecting the one or more effective selection operations. Therefore, the method and the device can extract the pictures meeting the requirements of the user from the video according to the extraction conditions selected by the user, and flexibly meet different requirements of the user. In a possible implementation manner, a default extraction condition may also be preset, and is not limited herein.

FIG. 3 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. As shown in fig. 3, the difference between fig. 3 and fig. 1 is that the method may further include:

step 300, sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;

step 301, when receiving the authorization information returned by the video call opposite end in response to the authorization request, performing image recognition on the video frame in the video generated by the video call opposite end.

For example, before identifying a video frame of a video call peer, the terminal device may send an authorization request to the video call peer, where the authorization request may be used to request to identify a video frame in a video generated by the video call peer. The video call opposite end can provide an option for allowing or not allowing the terminal equipment to identify the video frame when receiving the authorization request, and can return authorization information to the terminal equipment when detecting that the option for allowing the terminal equipment to identify the video frame of the video in the video call is triggered. When receiving the authorization information returned by the video call opposite terminal in response to the authorization request, the terminal device can acquire the video generated by the video call opposite terminal, perform image recognition on the video frames in the video, and extract the video frames judged to meet the preset conditions as images according to the recognition result of the image recognition. Therefore, before identifying the video generated by the video call opposite terminal, the terminal equipment must obtain the authorization of the video call opposite terminal, so that the terminal equipment cannot identify the video of the video call opposite terminal under the unauthorized condition, the video of the video call opposite terminal is effectively prevented from being randomly intercepted, and the privacy of a user at the video call opposite terminal is favorably ensured.

The method and the device can send the authorization request to the video call opposite terminal before identifying the video frame of the video call local terminal or the video call opposite terminal, and can also send the authorization request to the video call opposite terminal after identifying the video frame of the video call local terminal.

In an application example, the following description is made by taking a terminal device as a mobile phone as an example.

Before the video call, a user can set video frame extraction on videos shot by the user or shot by the other side, and can select extraction conditions needing automatic snapshot in the video call process according to factors such as video call environment. For example, a human face in a video frame is at a specified position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than a second threshold value. The clothing color and the background color in the video frame with the appointed expression appear in the video frame accord with the preset matching condition; a target object (e.g., a user-specified person of a clown degree, a user-specified landscape such as a beach, a mountain, and a user-specified animal or thing, etc.) appears in the video frame.

In the process of video call, the mobile phone can automatically capture (extract video frames as an example of pictures) video frames meeting the extraction conditions according to the extraction conditions selected by the user to form pictures and store the pictures in the photo album.

For example, a user carries out a video call with a friend indoors through a mobile phone, the user can set the extraction conditions that a face is in the middle of a picture in a video, the face is in a smiling expression, and the shooting angle of the face is 30 degrees, the mobile phone can extract a video frame, in the video call process, of which the face is in the middle of the picture, the face is in the smiling expression, and the shooting angle is 30 degrees into a picture, and store the picture as an album.

For another example, when the user travels by the seaside and makes a video call with a friend through the mobile phone, the user may set the extraction condition as the occurrence of the beach scenery in the video, and the mobile phone may extract the video frames with the occurrence of the beach scenery as pictures and store the pictures as albums during the video call.

Therefore, the automatic, efficient and natural snapshot can be performed on the beautiful background or interesting scenes appearing in the video call according to the setting of the user, the manual snapshot of the user is not needed, the influence on the video call of the user is rarely caused, the snapshot requirement of the user is flexibly met, and the user is provided with a good memory.

Fig. 4 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. As shown in fig. 4, the apparatus may include:

the identification module 41 is configured to perform image identification on a video frame in a video generated by a video call during the video call, so as to obtain an identification result.

And the judging module 42 is configured to judge whether the video frame meets a preset extraction condition according to the identification result.

An extracting module 43, configured to extract the video frame as a picture when the video frame meets the preset extracting condition.

the preset extraction conditions include any one or more of the following:

a face appears in the video frame.

The face in the video frame is at a designated position in the video frame.

The proportion of the area of the face to the area of the video frame is larger than a first threshold value.

The shooting angle of the lens of the face meets the angle condition.

The difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than a second threshold value.

A specified expression appears in the video frame.

the preset extraction conditions include one or more of the following:

the clothing color and the background color in the video frame accord with preset matching conditions, wherein the target object comprises clothing.

A target object appears in the video frame.

The method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, the video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, the user does not need to manually capture video pictures, the video call of the user cannot be interfered, the natural and relaxed state of the user can be captured, and the image extraction efficiency is improved.

Fig. 5 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. For convenience of explanation, only the portions related to the present embodiment are shown in fig. 5. Components in fig. 5 that are numbered the same as those in fig. 4 have the same functions, and detailed descriptions of these components are omitted for the sake of brevity. As shown in fig. 5

In one possible implementation, the apparatus further includes:

a display module 44 for providing options of extraction conditions.

A determining module 45, configured to determine the selected extraction condition as a preset extraction condition.

In one possible implementation, the apparatus may further include:

a sending module 46, configured to send an authorization request to a video call peer, where the authorization request is used to request to identify a video generated by the video call peer;

and the receiving module 47 is configured to perform image recognition on a video frame in a video generated by the video call opposite end when receiving the authorization information returned by the video call opposite end in response to the authorization request.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 6, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image extraction method, characterized by comprising:

in the video call process, sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;

when receiving authorization information returned by a video call opposite terminal in response to the authorization request, carrying out image identification on video frames in videos generated by the video call to obtain an identification result, wherein the videos generated by the video call comprise videos generated by the opposite terminal;

when the video frame meets the preset extraction condition, extracting the video frame into a picture;

the video generated by the video call includes video captured at either or more ends of the video call.

2. The method of claim 1, wherein the image recognition comprises face recognition,

the preset extraction conditions include any one or more of the following:

a human face appears in the video frame;

the face in the video frame is at the designated position in the video frame;

the shooting angle of the lens of the face meets the angle condition;

a specified expression appears in the video frame.

3. The method of claim 1, wherein the image recognition comprises target object recognition,

the preset extraction conditions include one or more of the following:

a target object appears in the video frame.

4. The method of claim 1, further comprising:

providing options for extraction conditions;

5. An image extraction device characterized by comprising:

the device comprises a sending module, a receiving module and a sending module, wherein the sending module is used for sending an authorization request to a video call opposite terminal in the video call process, and the authorization request is used for requesting to identify a video generated by the video call opposite terminal;

the identification module is used for carrying out image identification on video frames in videos generated by video calls to obtain identification results when authorization information returned by the video call opposite terminal in response to the authorization request is received, and the videos generated by the video calls comprise the videos generated by the opposite terminal;

the extraction module is used for extracting the video frame into a picture when the video frame meets the preset extraction condition;

6. The apparatus of claim 5, wherein the image recognition comprises face recognition,

the preset extraction conditions include any one or more of the following:

a human face appears in the video frame;

the face in the video frame is at the designated position in the video frame;

the shooting angle of the lens of the face meets the angle condition;

a specified expression appears in the video frame.

7. The apparatus of claim 5, wherein the image recognition comprises target object recognition,

the preset extraction conditions include one or more of the following:

a target object appears in the video frame.

8. The apparatus of claim 5, further comprising:

a display module for providing options of extraction conditions;

9. An image extraction device characterized by comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

-performing the method according to any of claims 1 to 4.

10. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor, enable the processor to perform the method of any one of claims 1 to 4.