CN109145878B - Image extraction method and device - Google Patents

Image extraction method and device Download PDF

Info

Publication number
CN109145878B
CN109145878B CN201811159896.2A CN201811159896A CN109145878B CN 109145878 B CN109145878 B CN 109145878B CN 201811159896 A CN201811159896 A CN 201811159896A CN 109145878 B CN109145878 B CN 109145878B
Authority
CN
China
Prior art keywords
video
video frame
face
video call
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811159896.2A
Other languages
Chinese (zh)
Other versions
CN109145878A (en
Inventor
吴珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201811159896.2A priority Critical patent/CN109145878B/en
Publication of CN109145878A publication Critical patent/CN109145878A/en
Application granted granted Critical
Publication of CN109145878B publication Critical patent/CN109145878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to an image extraction method and device, and the method comprises the following steps: in the video call process, carrying out image recognition on video frames in a video generated by the video call to obtain a recognition result; judging whether the video frame meets a preset extraction condition or not according to the identification result; and when the video frame meets the preset extraction condition, extracting the video frame into a picture. According to the method and the device, the video frames meeting the preset conditions in the video call process can be automatically extracted into the pictures according to the preset extraction conditions, the video pictures do not need to be manually captured by a user, the video call of the user cannot be interfered, the natural relaxation state of the user can be captured, and therefore the image extraction efficiency is improved.

Description

Image extraction method and device
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to an image extraction method and apparatus.
Background
Generally, a video call may be represented as a communication mode in which two or more terminal devices transmit voice and video to each other in real time based on the internet or the mobile internet. When a user carries out video call with friends, the user often has a lot of interesting interaction. In the related art, a user can only capture an image considered to be wonderful by manually calling an interface of a video call intercepted by screen capturing software, however, the user often misses the wonderful moment by manual interception, and the manual interception increases the complexity of operation, thereby affecting the video call of the user, resulting in low image extraction efficiency and poor quality.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides an image extraction method and apparatus.
According to a first aspect of the embodiments of the present disclosure, there is provided an image extraction method, including:
in the video call process, carrying out image recognition on video frames in a video generated by the video call to obtain a recognition result;
judging whether the video frame meets a preset extraction condition or not according to the identification result;
and when the video frame meets the preset extraction condition, extracting the video frame into a picture.
In one possible implementation, the video generated by the video call includes video captured at either or more ends of the video call.
In one possible implementation, the image recognition includes face recognition,
the preset extraction conditions include any one or more of the following:
a human face appears in the video frame;
the face in the video frame is at the designated position in the video frame;
the proportion of the area of the face to the area of the video frame is larger than a first threshold value;
the shooting angle of the lens of the face meets the angle condition;
the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value;
a specified expression appears in the video frame.
In one possible implementation, the image recognition includes target object recognition,
the preset extraction conditions include one or more of the following:
the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes;
a target object appears in the video frame.
In one possible implementation, the method further includes:
providing options for extraction conditions;
the selected extraction condition is determined as a preset extraction condition.
In one possible implementation, the method further includes:
sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;
and when receiving the authorization information returned by the video call opposite terminal in response to the authorization request, carrying out image identification on the video frame in the video generated by the video call opposite terminal.
According to a second aspect of the embodiments of the present disclosure, there is provided an image extraction apparatus including:
the identification module is used for carrying out image identification on video frames in a video generated by video call in the video call process to obtain an identification result;
the judging module is used for judging whether the video frame meets a preset extraction condition or not according to the identification result;
and the extraction module is used for extracting the video frame into a picture when the video frame meets the preset extraction condition.
In one possible implementation, the video generated by the video call includes video captured at either or more ends of the video call.
In one possible implementation, the image recognition includes face recognition,
the preset extraction conditions include any one or more of the following:
a human face appears in the video frame;
the face in the video frame is at the designated position in the video frame;
the proportion of the area of the face to the area of the video frame is larger than a first threshold value;
the shooting angle of the lens of the face meets the angle condition;
the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value;
a specified expression appears in the video frame.
In one possible implementation, the image recognition includes target object recognition,
the preset extraction conditions include one or more of the following:
the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes;
a target object appears in the video frame.
In one possible implementation, the apparatus further includes:
a display module for providing options of extraction conditions;
and the determining module is used for determining the selected extraction condition as a preset extraction condition.
In one possible implementation, the apparatus further includes:
the device comprises a sending module, a receiving module and a sending module, wherein the sending module is used for sending an authorization request to a video call opposite terminal, and the authorization request is used for requesting to identify a video generated by the video call opposite terminal;
and the receiving module is used for carrying out image identification on the video frame in the video generated by the video call opposite terminal when receiving the authorization information returned by the video call opposite terminal in response to the authorization request.
According to a third aspect of the embodiments of the present disclosure, there is provided an image extraction apparatus including: a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: the above method is performed.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions which, when executed by a processor, enable the processor to perform the above-described method.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, a user does not need to manually capture video pictures, interference on the video call of the user is rarely generated, the natural and relaxed state of the user is favorably captured, and the image extraction efficiency and quality are effectively improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.
FIG. 3 is a flow diagram illustrating an image extraction method according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. The method can be applied to terminal equipment such as desktop computers, notebook computers, tablet computers, mobile phones and the like, and is not limited herein. As shown in fig. 1, the method may include:
step 100, in the video call process, carrying out image recognition on a video frame in a video generated by the video call to obtain a recognition result;
step 101, judging whether the video frame meets a preset extraction condition or not according to the identification result;
and 102, when the video frame meets the preset extraction condition, extracting the video frame into a picture.
In this example, generally speaking, image recognition may be represented as a technique of processing, analyzing, and understanding an image with a computer to recognize various different patterns of objects and objects.
In one possible implementation, the image recognition process may include: the method comprises the steps of training different sample sets according to different extraction conditions to obtain a classifier (for example, the classifier can be generated based on a neural network), inputting a video frame into the classifier, obtaining a corresponding output result, namely an identification result, and extracting the video frame into a picture when the identification result meets the extraction conditions. It should be noted that, a person skilled in the art may also select another suitable identification method (for example, a clustering algorithm) to perform image identification on the video frame according to needs, and the present disclosure does not limit a specific image identification manner.
Video (Video) technology can be generally expressed as a technology for capturing, recording, processing, storing, transmitting, and reproducing moving images as electrical signals. A video may comprise a series of video frames.
As an example of this embodiment, the terminal device may obtain all or a part of video frames in a video generated during a video call when the video call is detected (for example, the terminal device may obtain odd-numbered or even-numbered video frames, which is not limited herein). For each obtained video frame, the terminal device can perform image recognition on the video frame to obtain a recognition result. When the terminal device judges that the video frame meets the preset extraction condition according to the identification result, the video frame can be extracted as a picture, and a plurality of extracted pictures can also be formed into a picture set. The terminal device can store the picture or the picture set, or send the picture or the picture set to other terminals, or share the picture or the picture set through an internet platform.
As an example of this embodiment, the video generated by the video call may include video captured at either or more ends of the video call. For example, if a terminal a, a terminal B, and a terminal C participate in a video call, the video generated during the video call may include video shot by any one or more of the terminals a, B, and C.
The method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, a user does not need to manually capture video pictures, interference on the video call of the user is rarely generated, the natural and relaxed state of the user is favorably captured, and the image extraction efficiency and quality are effectively improved.
As an example of this embodiment, the image recognition may include face recognition, and the preset extraction condition includes any one or more of the following: a human face appears in the video frame; the face in the video frame is at the designated position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value; a specified expression appears in the video frame.
The face recognition may be represented as a process of extracting face feature information from a face in an image, and performing recognition according to the face feature information to obtain a recognition result.
For example, the terminal device may detect that the extraction condition is: a human face appears in the video frame; the face in the video frame is at the designated position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value; and determining to perform face recognition on the video frame under the condition that any one or more of the specified expressions appear in the video frame.
For example, if the extraction condition is that a face appears in the video frame, the contour feature of the video frame may be extracted, and the contour feature of the video frame is compared with the pre-stored face contour feature with a similarity, and if the similarity obtained by the comparison is greater than a first similarity threshold, it may be determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the face appearing in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that a face in a video frame is located at a specified position in the video frame (for example, the middle of the video frame), it may be determined whether the face appears in the video frame, and then, a coordinate range where the face appears in the video frame may be determined when the face appears in the video frame, and if the coordinate range belongs to a preset coordinate range, it may be determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the face at the proper position in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that the ratio of the area of the face to the area of the video frame is greater than the first threshold, it may be determined whether the face appears in the video frame, and then, the ratio of the area of the face region to the area of the video frame may be determined under the condition that the face appears in the video frame, and when the ratio of the area of the face to the area of the video frame is greater than the preset first threshold, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the appropriate face size in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that the shot angle of the face satisfies the angle condition, it may be determined whether the face appears in the video frame, and then, when the face appears in the video frame, the positions of the five sense organs of the face may be determined, and the shot angle of the face in the video frame may be determined according to the positions of the five sense organs (for example, 35 degrees). If the shot angle of the face in the video frame satisfies an angle condition (for example, may be 30 degrees to 45 degrees), it may be determined that the video frame meets the extraction condition, and the video frame may be extracted as a picture. The method and the device can automatically extract the video frame of which the face shooting angle accords with the preset angle in the video call according to the setting of the user.
For example, if the extraction condition is that the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than the second threshold, it may be determined whether a face appears in the video frame, and then, when a face appears in the video frame, it may be determined that the difference between the brightness of the face region and the brightness of the background region excluding the face region (or the ratio between the brightness of the face region and the brightness of the background region excluding the face region) is smaller than the second threshold, and when the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than the second threshold, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. The video frame with the appropriate brightness of the face and the background in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that a smiling expression appears in the video, the feature information of the face in the video frame may be extracted under the condition that the face appears in the video frame, and the extracted feature information of the face is compared with the preset smiling face feature information in terms of similarity, and when the similarity between the extracted feature information of the face and the preset smiling face feature information is greater than a second similarity threshold, the smiling expression appearing in the video frame may be determined, and the video frame may be extracted as a picture. The video frame containing the designated expression in the video call can be automatically extracted according to the setting of the user.
As an example of this embodiment, the image recognition may include target object recognition, and the preset extraction condition includes one or more of the following: the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes; a target object appears in the video frame.
The target object identification can be expressed as a process of determining whether an object in a video frame belongs to a specified target (for example, whether a specified variety of animals and plants exists in the video frame, a specified natural landscape, a specified human appearance, a specified color collocation, and the like).
For example, the terminal device may detect that the extraction condition is: and under the condition that the clothing color and the background color in the video frame accord with the preset matching condition or the target object appears in the video frame, determining to perform target object identification on the video frame.
For example, if the extraction condition is that the clothing color and the background color in the video frame meet the preset matching condition, and the preset matching condition includes the range of the preset color number, it may be determined whether the clothing appears in the video frame in the process of identifying the target object, and the color number of the clothing color and the background color is determined when the clothing appears in the video frame. And when the color numbers of the clothing color and the background color belong to the range of the preset color number, determining that the clothing color and the background color in the video frame accord with the preset matching condition, and extracting the video frame as a picture. Therefore, the video frame with the clothes matched with the background color in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that a target object appears in the video frame and the target object is a designated person (e.g., a star or a relative of the user), facial feature information of the designated person may be preset, and when it is determined that a human face appears in the video frame, facial feature information of the human face in the video frame is extracted, and if facial feature information whose similarity with the facial feature information of the designated person is greater than a third similarity threshold exists in the video frame, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. Therefore, the video frame of the appointed person in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that a target object appears in the video frame and the target object is a person with a specified appearance characteristic (e.g., a woman, a man, a clown, or the like), the specified appearance characteristic information may be preset, and when it is determined that a face appears in the video frame, the appearance characteristic information of one or more faces in the video frame is extracted, and if the appearance characteristic information of a face with a similarity greater than a fourth similarity threshold exists in the video frame, it is determined that the video frame meets the extraction condition, and the video frame is extracted as a picture. Therefore, the video frame with the designated appearance in the video call can be automatically extracted according to the setting of the user. The terminal equipment can also perform sample training according to a group of pictures of face features specified by a user, which are predetermined by the user, so that the obtained specified face feature information is more in line with the requirements of the user.
For example, if the extraction condition is that a target object appears in a video frame and the target object is a specified landscape (e.g., a seaside, a mountain, etc.), image feature information of the specified landscape may be preset and the image feature information of the video frame may be determined, and if the similarity between the image feature information of the video frame and the image feature information of the specified landscape is greater than a fifth similarity threshold, it is determined that the video frame meets the extraction condition and the video frame is extracted as a picture. Therefore, the video frames with the specified scenery in the video call can be automatically extracted according to the setting of the user.
For example, if the extraction condition is that a target object appears in the video frame and the target object is a specified animal (e.g., a panda, an elephant, etc.), image feature information of the specified animal may be preset and image feature information of the video frame may be determined, and if the similarity between the image feature information of the video frame and the image feature information of the specified animal is greater than a sixth similarity threshold, it is determined that the video frame meets the extraction condition and the video frame is extracted as a picture. Therefore, the video frame of the specified animal in the video call can be automatically extracted according to the setting of the user.
FIG. 2 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. As shown in fig. 2, the difference between fig. 2 and fig. 1 is that the method may further include:
step 200, providing options for extraction conditions.
Step 201, the selected extraction condition is determined as a preset extraction condition.
For example, the terminal device may display a selection interface before the captured video frame is recognized (or before the video call is made, which is not limited herein), the selection interface may include a plurality of options for selecting the extraction condition (e.g., smiling face recognition, beach recognition, child recognition, etc.), and the terminal may set, as the preset extraction condition, the extraction condition corresponding to one or more effective selection operations (e.g., click operation, slide operation, etc.) on the selection interface when detecting the one or more effective selection operations. Therefore, the method and the device can extract the pictures meeting the requirements of the user from the video according to the extraction conditions selected by the user, and flexibly meet different requirements of the user. In a possible implementation manner, a default extraction condition may also be preset, and is not limited herein.
FIG. 3 is a flow diagram illustrating an image extraction method according to an exemplary embodiment. As shown in fig. 3, the difference between fig. 3 and fig. 1 is that the method may further include:
step 300, sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;
step 301, when receiving the authorization information returned by the video call opposite end in response to the authorization request, performing image recognition on the video frame in the video generated by the video call opposite end.
For example, before identifying a video frame of a video call peer, the terminal device may send an authorization request to the video call peer, where the authorization request may be used to request to identify a video frame in a video generated by the video call peer. The video call opposite end can provide an option for allowing or not allowing the terminal equipment to identify the video frame when receiving the authorization request, and can return authorization information to the terminal equipment when detecting that the option for allowing the terminal equipment to identify the video frame of the video in the video call is triggered. When receiving the authorization information returned by the video call opposite terminal in response to the authorization request, the terminal device can acquire the video generated by the video call opposite terminal, perform image recognition on the video frames in the video, and extract the video frames judged to meet the preset conditions as images according to the recognition result of the image recognition. Therefore, before identifying the video generated by the video call opposite terminal, the terminal equipment must obtain the authorization of the video call opposite terminal, so that the terminal equipment cannot identify the video of the video call opposite terminal under the unauthorized condition, the video of the video call opposite terminal is effectively prevented from being randomly intercepted, and the privacy of a user at the video call opposite terminal is favorably ensured.
The method and the device can send the authorization request to the video call opposite terminal before identifying the video frame of the video call local terminal or the video call opposite terminal, and can also send the authorization request to the video call opposite terminal after identifying the video frame of the video call local terminal.
In an application example, the following description is made by taking a terminal device as a mobile phone as an example.
Before the video call, a user can set video frame extraction on videos shot by the user or shot by the other side, and can select extraction conditions needing automatic snapshot in the video call process according to factors such as video call environment. For example, a human face in a video frame is at a specified position in the video frame; the proportion of the area of the face to the area of the video frame is larger than a first threshold value; the shooting angle of the lens of the face meets the angle condition; the difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than a second threshold value. The clothing color and the background color in the video frame with the appointed expression appear in the video frame accord with the preset matching condition; a target object (e.g., a user-specified person of a clown degree, a user-specified landscape such as a beach, a mountain, and a user-specified animal or thing, etc.) appears in the video frame.
In the process of video call, the mobile phone can automatically capture (extract video frames as an example of pictures) video frames meeting the extraction conditions according to the extraction conditions selected by the user to form pictures and store the pictures in the photo album.
For example, a user carries out a video call with a friend indoors through a mobile phone, the user can set the extraction conditions that a face is in the middle of a picture in a video, the face is in a smiling expression, and the shooting angle of the face is 30 degrees, the mobile phone can extract a video frame, in the video call process, of which the face is in the middle of the picture, the face is in the smiling expression, and the shooting angle is 30 degrees into a picture, and store the picture as an album.
For another example, when the user travels by the seaside and makes a video call with a friend through the mobile phone, the user may set the extraction condition as the occurrence of the beach scenery in the video, and the mobile phone may extract the video frames with the occurrence of the beach scenery as pictures and store the pictures as albums during the video call.
Therefore, the automatic, efficient and natural snapshot can be performed on the beautiful background or interesting scenes appearing in the video call according to the setting of the user, the manual snapshot of the user is not needed, the influence on the video call of the user is rarely caused, the snapshot requirement of the user is flexibly met, and the user is provided with a good memory.
Fig. 4 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. As shown in fig. 4, the apparatus may include:
the identification module 41 is configured to perform image identification on a video frame in a video generated by a video call during the video call, so as to obtain an identification result.
And the judging module 42 is configured to judge whether the video frame meets a preset extraction condition according to the identification result.
An extracting module 43, configured to extract the video frame as a picture when the video frame meets the preset extracting condition.
In one possible implementation, the video generated by the video call includes video captured at either or more ends of the video call.
In one possible implementation, the image recognition includes face recognition,
the preset extraction conditions include any one or more of the following:
a face appears in the video frame.
The face in the video frame is at a designated position in the video frame.
The proportion of the area of the face to the area of the video frame is larger than a first threshold value.
The shooting angle of the lens of the face meets the angle condition.
The difference between the brightness of the face region and the brightness of the background region excluding the face region is smaller than a second threshold value.
A specified expression appears in the video frame.
In one possible implementation, the image recognition includes target object recognition,
the preset extraction conditions include one or more of the following:
the clothing color and the background color in the video frame accord with preset matching conditions, wherein the target object comprises clothing.
A target object appears in the video frame.
The method comprises the steps of carrying out image recognition on a video frame in a video generated by video call in the video call process to obtain a recognition result, judging whether the video frame meets a preset extraction condition or not according to the recognition result, and extracting the video frame into a picture when the video frame meets the preset extraction condition. Therefore, the video frames meeting the preset conditions in the video call process can be automatically extracted into pictures according to the preset extraction conditions, the user does not need to manually capture video pictures, the video call of the user cannot be interfered, the natural and relaxed state of the user can be captured, and the image extraction efficiency is improved.
Fig. 5 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. For convenience of explanation, only the portions related to the present embodiment are shown in fig. 5. Components in fig. 5 that are numbered the same as those in fig. 4 have the same functions, and detailed descriptions of these components are omitted for the sake of brevity. As shown in fig. 5
In one possible implementation, the apparatus further includes:
a display module 44 for providing options of extraction conditions.
A determining module 45, configured to determine the selected extraction condition as a preset extraction condition.
In one possible implementation, the apparatus may further include:
a sending module 46, configured to send an authorization request to a video call peer, where the authorization request is used to request to identify a video generated by the video call peer;
and the receiving module 47 is configured to perform image recognition on a video frame in a video generated by the video call opposite end when receiving the authorization information returned by the video call opposite end in response to the authorization request.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 6 is a block diagram illustrating an image extraction apparatus according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 6, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An image extraction method, characterized by comprising:
in the video call process, sending an authorization request to a video call opposite terminal, wherein the authorization request is used for requesting to identify a video generated by the video call opposite terminal;
when receiving authorization information returned by a video call opposite terminal in response to the authorization request, carrying out image identification on video frames in videos generated by the video call to obtain an identification result, wherein the videos generated by the video call comprise videos generated by the opposite terminal;
judging whether the video frame meets a preset extraction condition or not according to the identification result;
when the video frame meets the preset extraction condition, extracting the video frame into a picture;
the video generated by the video call includes video captured at either or more ends of the video call.
2. The method of claim 1, wherein the image recognition comprises face recognition,
the preset extraction conditions include any one or more of the following:
a human face appears in the video frame;
the face in the video frame is at the designated position in the video frame;
the proportion of the area of the face to the area of the video frame is larger than a first threshold value;
the shooting angle of the lens of the face meets the angle condition;
the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value;
a specified expression appears in the video frame.
3. The method of claim 1, wherein the image recognition comprises target object recognition,
the preset extraction conditions include one or more of the following:
the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes;
a target object appears in the video frame.
4. The method of claim 1, further comprising:
providing options for extraction conditions;
the selected extraction condition is determined as a preset extraction condition.
5. An image extraction device characterized by comprising:
the device comprises a sending module, a receiving module and a sending module, wherein the sending module is used for sending an authorization request to a video call opposite terminal in the video call process, and the authorization request is used for requesting to identify a video generated by the video call opposite terminal;
the identification module is used for carrying out image identification on video frames in videos generated by video calls to obtain identification results when authorization information returned by the video call opposite terminal in response to the authorization request is received, and the videos generated by the video calls comprise the videos generated by the opposite terminal;
the judging module is used for judging whether the video frame meets a preset extraction condition or not according to the identification result;
the extraction module is used for extracting the video frame into a picture when the video frame meets the preset extraction condition;
the video generated by the video call includes video captured at either or more ends of the video call.
6. The apparatus of claim 5, wherein the image recognition comprises face recognition,
the preset extraction conditions include any one or more of the following:
a human face appears in the video frame;
the face in the video frame is at the designated position in the video frame;
the proportion of the area of the face to the area of the video frame is larger than a first threshold value;
the shooting angle of the lens of the face meets the angle condition;
the difference between the brightness of the face area and the brightness of the background area except the face area is smaller than a second threshold value;
a specified expression appears in the video frame.
7. The apparatus of claim 5, wherein the image recognition comprises target object recognition,
the preset extraction conditions include one or more of the following:
the color of the clothes in the video frame and the color of the background accord with preset matching conditions, wherein the target object comprises clothes;
a target object appears in the video frame.
8. The apparatus of claim 5, further comprising:
a display module for providing options of extraction conditions;
and the determining module is used for determining the selected extraction condition as a preset extraction condition.
9. An image extraction device characterized by comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
-performing the method according to any of claims 1 to 4.
10. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor, enable the processor to perform the method of any one of claims 1 to 4.
CN201811159896.2A 2018-09-30 2018-09-30 Image extraction method and device Active CN109145878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811159896.2A CN109145878B (en) 2018-09-30 2018-09-30 Image extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811159896.2A CN109145878B (en) 2018-09-30 2018-09-30 Image extraction method and device

Publications (2)

Publication Number Publication Date
CN109145878A CN109145878A (en) 2019-01-04
CN109145878B true CN109145878B (en) 2022-02-15

Family

ID=64814238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811159896.2A Active CN109145878B (en) 2018-09-30 2018-09-30 Image extraction method and device

Country Status (1)

Country Link
CN (1) CN109145878B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246243A (en) * 2019-05-09 2019-09-17 厦门中控智慧信息技术有限公司 Access control method, device and terminal device
CN110287949B (en) * 2019-07-30 2021-04-06 腾讯音乐娱乐科技(深圳)有限公司 Video clip extraction method, device, equipment and storage medium
CN111339842A (en) * 2020-02-11 2020-06-26 深圳壹账通智能科技有限公司 Video jamming identification method and device and terminal equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098379A (en) * 2010-12-17 2011-06-15 惠州Tcl移动通信有限公司 Terminal as well as method and device for acquiring real-time video images of terminal
CN102752727A (en) * 2012-05-30 2012-10-24 北京三星通信技术研究有限公司 Terminal remote guide method and terminal remote guide device
CN103716227A (en) * 2013-12-12 2014-04-09 北京京东尚科信息技术有限公司 Method and device for performing information interaction in instant messenger
CN105516883A (en) * 2014-09-22 2016-04-20 中兴通讯股份有限公司 Remote assistance method and device
CN105976444A (en) * 2016-04-28 2016-09-28 信阳师范学院 Video image processing method and apparatus
CN107506755A (en) * 2017-09-26 2017-12-22 云丁网络技术(北京)有限公司 Monitoring video recognition methods and device
CN107635110A (en) * 2017-09-30 2018-01-26 维沃移动通信有限公司 A kind of video interception method and terminal
CN107948506A (en) * 2017-11-22 2018-04-20 珠海格力电器股份有限公司 Image processing method and device and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101240261B1 (en) * 2006-02-07 2013-03-07 엘지전자 주식회사 The apparatus and method for image communication of mobile communication terminal
EP1968320B1 (en) * 2007-02-27 2018-07-18 Accenture Global Services Limited Video call device control
CN104869347B (en) * 2015-05-18 2018-10-30 小米科技有限责任公司 Video call method and device
CN105635567A (en) * 2015-12-24 2016-06-01 小米科技有限责任公司 Shooting method and device
US9886640B1 (en) * 2016-08-08 2018-02-06 International Business Machines Corporation Method and apparatus to identify a live face image using a thermal radiation sensor and a visual radiation sensor
CN108471632B (en) * 2018-03-01 2021-02-23 Oppo广东移动通信有限公司 Information processing method and device, mobile terminal and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098379A (en) * 2010-12-17 2011-06-15 惠州Tcl移动通信有限公司 Terminal as well as method and device for acquiring real-time video images of terminal
CN102752727A (en) * 2012-05-30 2012-10-24 北京三星通信技术研究有限公司 Terminal remote guide method and terminal remote guide device
CN103716227A (en) * 2013-12-12 2014-04-09 北京京东尚科信息技术有限公司 Method and device for performing information interaction in instant messenger
CN105516883A (en) * 2014-09-22 2016-04-20 中兴通讯股份有限公司 Remote assistance method and device
CN105976444A (en) * 2016-04-28 2016-09-28 信阳师范学院 Video image processing method and apparatus
CN107506755A (en) * 2017-09-26 2017-12-22 云丁网络技术(北京)有限公司 Monitoring video recognition methods and device
CN107635110A (en) * 2017-09-30 2018-01-26 维沃移动通信有限公司 A kind of video interception method and terminal
CN107948506A (en) * 2017-11-22 2018-04-20 珠海格力电器股份有限公司 Image processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN109145878A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN105095873B (en) Photo be shared method, apparatus
US9674395B2 (en) Methods and apparatuses for generating photograph
CN107025419B (en) Fingerprint template inputting method and device
CN105654039B (en) The method and apparatus of image procossing
CN112153400B (en) Live broadcast interaction method and device, electronic equipment and storage medium
EP3327590A1 (en) Method and device for adjusting video playback position
JP2016531362A (en) Skin color adjustment method, skin color adjustment device, program, and recording medium
CN105631803B (en) The method and apparatus of filter processing
CN108921178B (en) Method and device for obtaining image blur degree classification and electronic equipment
CN112219224B (en) Image processing method and device, electronic equipment and storage medium
CN109145878B (en) Image extraction method and device
CN112188091B (en) Face information identification method and device, electronic equipment and storage medium
CN107025441B (en) Skin color detection method and device
CN106547850B (en) Expression annotation method and device
CN109766473B (en) Information interaction method and device, electronic equipment and storage medium
CN108898591A (en) Methods of marking and device, electronic equipment, the readable storage medium storing program for executing of picture quality
CN112200040A (en) Occlusion image detection method, device and medium
CN106791563B (en) Information transmission method, local terminal equipment, opposite terminal equipment and system
CN107122697B (en) Automatic photo obtaining method and device and electronic equipment
CN105653623B (en) Picture collection method and device
CN107105311B (en) Live broadcasting method and device
CN111340690B (en) Image processing method, device, electronic equipment and storage medium
CN108027821B (en) Method and device for processing picture
CN109740557B (en) Object detection method and device, electronic equipment and storage medium
CN109598183B (en) Face authentication method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant