CN115811594A

CN115811594A - Video call image processing method, device, equipment, system and storage medium

Info

Publication number: CN115811594A
Application number: CN202111074339.2A
Authority: CN
Inventors: 刘钦鸿; 李新良; 李国盛; 冉飞
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2023-03-17

Abstract

The present disclosure relates to a method, an apparatus, a device, a system and a storage medium for processing video call images, wherein the method comprises: in response to receiving an image acquisition request sent by an application program, acquiring a current image video stream; processing the image in the current image video stream based on an image processing strategy corresponding to the current video call scene; and returning the processed image to the application program for displaying. The method and the device can accurately determine the image processing strategy based on the current video call scene, improve the pertinence of the image processing strategy, thereby improving the quality of processing the image in the current image video stream based on the image processing strategy, meeting the requirement of a user on the quality of the video call image, and further improving the video call experience of the user.

Description

Video call image processing method, device, equipment, system and storage medium

Technical Field

The present disclosure relates to the field of terminal device technologies, and in particular, to a method, an apparatus, a device, a system, and a storage medium for processing video call images.

Background

With the global popularization of 4G/5G technology, video calls have developed into an important communication bridge between people. However, in real life, people may carry out video call in various environments such as outdoors or indoors, sometimes the video call image is dark and sometimes too bright, and the situation that the face is blackened may occur in the reverse light, so that the requirement of the user on the video call image quality cannot be met, and the video call experience of the user is influenced.

Disclosure of Invention

In order to overcome the problems in the related art, embodiments of the present disclosure provide a method, an apparatus, a device, a system, and a storage medium for processing video call images, so as to solve the defects in the related art.

According to a first aspect of the embodiments of the present disclosure, a method for processing an image in a video call is provided, the method including:

in response to receiving an image acquisition request sent by an application program, acquiring a current image video stream;

processing the image in the current image video stream based on an image processing strategy corresponding to the current video call scene;

and returning the processed image to the application program for displaying.

In an embodiment, the acquiring the current image video stream includes:

and in response to determining that the application program meets the set condition, acquiring the current image video stream based on the system video frame layer.

In one embodiment, the application satisfying the set condition includes:

the application program is in a pre-constructed application program white list, and the white list is used for recording the application programs supporting the image processing strategy.

In an embodiment, the method further includes determining an image processing policy corresponding to the current video call scenario based on:

acquiring current image frame information;

determining a current video call scene based on the current image frame information;

and determining an image processing strategy corresponding to the current video call scene based on a corresponding relation between a pre-constructed video call scene and the image processing strategy.

In an embodiment, the current image frame information includes at least one of an image Sensor Gain, an image signal processor Gain ISP Gain, an exposure EV, and a light sensitivity ISO value.

In an embodiment, the determining a current video call scene based on the current image frame information comprises at least one of:

determining the current video call scene as a bright scene in response to the fact that the numerical value of the current image frame information meets a preset bright scene condition;

and in response to determining that the value of the current image frame information meets a preset dim light scene condition, determining the current video call scene as a dim light scene.

In an embodiment, when the current video call scene is a bright scene, the processing the image in the current image video stream based on the image processing policy corresponding to the current video call scene includes:

reducing the output frame rate of the image sensor from a default value to a first set value;

controlling the image sensor to output a frame of long exposure image frame and a frame of short exposure image frame respectively within each frame of transmission time based on the output frame rate of the first set value, wherein the exposure duration of the long exposure image frame is longer than that of the short exposure image frame;

and carrying out fusion processing on the long exposure image frame and the short exposure image frame.

In an embodiment, when the current video call scene is a dim light scene, the processing the image in the current image video stream based on the image processing policy corresponding to the current video call scene includes:

increasing the output frame rate of the image sensor from a default value to a second set value;

controlling the image sensor to output a frame of target exposure image frame with a target exposure duration within each frame of transmission time based on the output frame rate of the second set value, wherein the target exposure duration is the exposure duration meeting the current environment brightness condition;

and processing the target exposure image frame in a preset mode, wherein the preset mode comprises at least one of brightness improvement and noise reduction.

According to a second aspect of the embodiments of the present disclosure, there is provided a video call image processing apparatus, the apparatus including:

the video stream acquisition module is used for responding to an image acquisition request sent by a received application program and acquiring a current image video stream;

the image processing module is used for processing the image in the current image video stream based on the image processing strategy corresponding to the current video call scene;

and the image returning module is used for returning the processed image to the application program for displaying.

In an embodiment, the video stream acquiring module is further configured to acquire a current image video stream based on a system video frame layer in response to determining that the application satisfies a set condition.

In one embodiment, the application satisfying the setting condition includes:

In an embodiment, the apparatus further comprises a processing policy determination module;

the processing strategy determination module comprises:

the frame information acquisition unit is used for acquiring current image frame information;

a scene determining unit, configured to determine a current video call scene based on the current image frame information;

and the strategy determining unit is used for determining the image processing strategy corresponding to the current video call scene based on the corresponding relation between the pre-constructed video call scene and the image processing strategy.

In an embodiment, the scene determination unit is further configured to at least one of:

determining the current video call scene as a bright scene in response to determining that the value of the current image frame information meets a preset bright scene condition;

In an embodiment, in a case that the current video call scene is a bright scene, the image processing module includes:

a frame rate reduction unit for reducing an output frame rate of the image sensor from a default value to a first set value;

a long and short frame output unit, configured to control the image sensor to output a frame rate based on the first setting value, and output a long exposure image frame and a short exposure image frame within each frame transmission time, where an exposure duration of the long exposure image frame is longer than an exposure duration of the short exposure image frame;

and the fusion processing unit is used for carrying out fusion processing on the long exposure image frame and the short exposure image frame.

In an embodiment, in a case that the current video call scene is a dim light scene, the image processing module includes:

a frame rate increasing unit for increasing the output frame rate of the image sensor from a default value to a second set value;

a target frame output unit, configured to control the image sensor to output a target exposure image frame with a target exposure duration within each frame transmission time based on the output frame rate of the second setting value, where the target exposure duration is an exposure duration satisfying a current ambient brightness condition;

and the preset processing unit is used for processing the target exposure image frame in a preset mode, wherein the preset mode comprises at least one of brightness improvement and noise reduction.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic apparatus, the apparatus comprising:

a processor and a memory for storing a computer program;

wherein the processor is configured, upon execution of the computer program, to implement:

responding to an image acquisition request sent by an application program, and acquiring a current image video stream;

and returning the processed image to the application program for displaying.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements:

and returning the processed image to the application program for displaying.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the method and the device, the current image video stream is obtained in response to the received image obtaining request sent by the application program, the image in the current image video stream is processed based on the image processing strategy corresponding to the current video call scene, and the processed image is returned to the application program to be displayed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart illustrating a video call image processing method according to an exemplary embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a video call image processing method according to yet another exemplary embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating how an image processing policy corresponding to the current video call scenario is determined according to an exemplary embodiment of the present disclosure;

FIG. 4 is a flow chart showing how images in the current image video stream are processed according to an exemplary embodiment of the present disclosure;

FIG. 5 is a flow chart showing how images in the current image video stream are processed according to yet another exemplary embodiment of the present disclosure;

fig. 6 is a block diagram illustrating a video call image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 7 is a block diagram illustrating yet another video call image processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 8 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

FIG. 1 is a flow diagram illustrating a video call image processing method in accordance with an exemplary embodiment; the method of the embodiment can be applied to terminal devices (such as smart phones, tablet computers, desktop computers, wearable devices and the like) with video call functions.

As shown in fig. 1, the method comprises the following steps S101-S103:

in step S101, in response to receiving an image acquisition request sent by an application program, a current image video stream is acquired.

In this embodiment, the terminal device may obtain a current image video stream, that is, an image video stream transmitted in a current video communication process, in response to receiving an image obtaining request sent by an application program. The application program may be a third-party application program that calls a shooting function in the terminal device.

For example, when a user of a terminal device performs a video call with another user at a communication peer through an application program, a video call function is triggered in the application program, so that the application program sends an image acquisition request to the terminal device, and the terminal device can acquire a current image video stream in response to the request.

In step S102, processing an image in the current image video stream based on an image processing policy corresponding to the current video call scene.

In this embodiment, after the terminal device obtains the current image video stream in response to receiving the image obtaining request sent by the application program, the terminal device may process the image in the current image video stream based on the image processing policy corresponding to the current video call scene.

For example, the terminal device may determine a current video call scene based on a preset manner, and further may match a corresponding image processing policy based on the current video call scene, so as to process an image in the current image video stream based on the image processing policy corresponding to the current video call scene.

The preset mode for determining the current video call scene may be selected from related technologies based on actual needs, which is not limited in this embodiment. The embodiment shown in fig. 3 described below shows an exemplary manner of determining a current video call scenario, and the embodiment will not be described in detail first.

In an embodiment, the terminal device may determine an image processing policy corresponding to a current video call scene based on a configuration file acquired and stored in advance, and further process an image in the current image video stream based on the image processing policy. It can be understood that the configuration file may record image processing policies corresponding to various preset video call scenes, and then the terminal device may determine a corresponding image processing policy based on the content recorded in the configuration file after determining the current video call scene, so that the image processing policy may be implemented in a targeted manner based on the current video call scene, and the quality of the video call image may be improved.

In another embodiment, the above-mentioned manner of processing the images in the current image video stream can also be referred to the following embodiments shown in fig. 4 or fig. 5, and will not be described in detail herein.

In step S103, the processed image is returned to the application program for display.

In this embodiment, after processing the image in the current image video stream based on the image processing policy corresponding to the current video call scene, the processed image may be returned to the application program for display.

For example, after the terminal device processes the image in the current image video stream based on the image processing policy, the processed image may be returned to the application program for real-time display by the application program. The processed image may be recharged to an application currently performing a video call along with a preset functional data stream channel (for example, camera pipeline) of the management Camera, so that the application performs textView image display. That is, at this time, the image displayed on the video call interface of the application is the preview image processed by the image processing policy.

As can be seen from the above description, in the method of this embodiment, a current image video stream is obtained in response to a received image obtaining request sent by an application program, and an image processing policy corresponding to a current video call scene is used to process an image in the current image video stream, so as to return the processed image to the application program for display.

Fig. 2 is a flowchart illustrating a video call image processing method according to yet another exemplary embodiment of the present disclosure; the method of the embodiment can be applied to terminal devices (such as smart phones, tablet computers, desktop computers, wearable devices and the like) with video call functions.

As shown in fig. 2, the method comprises the following steps S201-S204:

in step S201, in response to receiving an image acquisition request transmitted by an application, it is determined whether the application satisfies a setting condition.

In this embodiment, after the terminal device receives the image acquisition request sent by the application program, it may be determined whether the application program satisfies the setting condition.

The setting condition may be set based on actual service needs, for example, the use frequency of the application program is set to be higher than or equal to a setting threshold, or whether the application program (for example, the identification information of the application program) is in a pre-constructed application program white list, or the like, so that the application program to which the video call image processing scheme of the embodiment may be applied may be identified based on the setting condition, and the purpose of improving the scheme implementability is achieved. The white list is used to record the application programs supporting the image processing policy of the embodiment. It is to be understood that the identification information of the application includes information that is used for distinguishing the application from other applications and is uniquely corresponding to the application, such as an ID of the application, and the like, which is not limited in this embodiment.

In step S202, in response to determining that the application satisfies the setting condition, a current image video stream is acquired.

In this embodiment, after the terminal device determines that the application program satisfies the setting condition, the current image video stream may be acquired.

In step S203, processing an image in the current image video stream based on an image processing policy corresponding to the current video call scene.

In step S204, the processed image is returned to the application program for display.

For the explanation and explanation of steps S202 to S204, reference may be made to steps S101 to S103 in the embodiment shown in fig. 1, which is not repeated herein.

As can be seen from the above description, in this embodiment, it is determined whether the application program meets the setting condition, and when it is determined that the application program meets the setting condition, the current image video stream is obtained, so that smooth implementation of the current video call image processing scheme can be ensured, further, it can be achieved that the image processing policy is accurately determined subsequently, and the pertinence of the image processing policy is improved, thereby improving the quality of processing the image in the current image video stream based on the image processing policy subsequently, meeting the requirement of the user on the image quality of the video call, and further improving the video call experience of the user.

FIG. 3 is a flowchart illustrating how an image processing policy corresponding to the current video call scenario is determined according to an exemplary embodiment of the present disclosure; the present embodiment takes how to determine the image processing policy corresponding to the current video call scene as an example based on the above embodiments. As shown in fig. 3, on the basis of the foregoing embodiment, the present embodiment further includes determining an image processing policy corresponding to the current video call scene based on the following steps S301 to S303:

in step S301, current image frame information is acquired.

In this embodiment, when the terminal device needs to determine the image processing policy corresponding to the current video call scene, the current image frame information may be acquired.

For example, when a user of a terminal device performs a video call with another user of a correspondent node through an application program, a video call function is triggered in the application program, so that the application program sends an image acquisition request to the terminal device, and the terminal device can acquire a current image video stream and current image frame information in response to the request. The current image frame information may be information of a current image frame displayed in a video call interface of an application program.

In an embodiment, the current image frame information may include at least one of an image Sensor Gain (Sensor Gain), an image signal processor Gain (ISP Gain), an exposure rate (EV), and a sensitivity (ISO) value.

In step S302, a current video call scene is determined based on the current image frame information.

In this embodiment, after the terminal device obtains the current image frame information, the current video call scene may be determined based on the current image frame information.

In one embodiment, when the terminal device acquires current image frame information such as a Sensor Gain (Sensor Gain), an image signal processor Gain (ISP Gain), an exposure rate (EV), and a sensitivity (ISO) value in response to the image acquisition request, the terminal device may analyze the current image frame information based on the information, and determine a current video call scene based on the analysis result.

For example, when it is determined that the value of the current image frame information satisfies a preset bright scene condition, the current video call scene may be determined as a bright scene; and when the numerical value of the current image frame information is determined to meet the preset dim light scene condition, the current video call scene can be determined to be a dim light scene. Taking the current image frame information as an ISO value as an example, when the ISO value is determined to be less than or equal to a set threshold (e.g., 800, etc.), it may be determined that the current video call scene is determined to be a bright scene; in contrast, when it is determined that the ISO value is greater than the set threshold value, it may be determined that the current video call scene is determined to be a dim light scene.

For another example, a deep learning model for determining a video call scene may be pre-constructed and trained, and then after the current image frame information is obtained, the information may be input into the trained deep learning model, and then the current video call scene may be determined based on an output result of the model. For example, the deep learning model may be obtained by training in advance based on a large amount of sample image frame information and a manual calibration manner, and a specific training process may refer to explanation and description in the related art, which is not limited in this embodiment.

It should be noted that the types of the video call scenes may be set based on actual needs, for example, the types are set to be a bright light scene, a dark light scene, a character scene, a scene, and the like, which is not limited in this embodiment.

In step S303, an image processing policy corresponding to the current video call scene is determined based on a correspondence relationship between a pre-constructed video call scene and the image processing policy.

In this embodiment, after the terminal device determines the current video call scene based on the current image frame information, the image processing policy corresponding to the current video call scene may be determined based on a correspondence between a pre-constructed video call scene and an image processing policy.

For example, the terminal device may store a correspondence between a pre-established video call scene and an image processing policy, where the correspondence includes image processing policies matched with various pre-established video call scenes determined by developers based on actual business experience and/or a large number of experiments, and then, after the current video call scene is determined, the image processing policy corresponding to the current video call scene may be queried based on the correspondence.

As can be seen from the above description, in the embodiment, by obtaining the current image frame information, determining the current video call scene based on the current image frame information, and further determining the image processing policy corresponding to the current video call scene based on the pre-established correspondence between the video call scene and the image processing policy, the current video call scene can be accurately determined, and further a foundation can be laid for accurately determining the image processing policy subsequently and improving the pertinence of the image processing policy, and the quality of processing the image in the current image video stream based on the image processing policy can be effectively improved, so that the requirement of the user on the video call image quality is met, and the video call experience of the user is improved.

FIG. 4 is a flow chart illustrating how images in the current image video stream are processed according to an exemplary embodiment of the present disclosure; the present embodiment is based on the above embodiments and illustrates how to process the image in the current image video stream. As shown in fig. 4, when the current video call scene is a bright scene, the processing of the image in the current image video stream based on the image processing policy corresponding to the current video call scene in step S102 may include the following steps S401 to S403:

in step S401, the output frame rate of the image sensor is reduced from the current value to a first setting value.

In this embodiment, when the terminal device determines that the current video call scene meets the determination condition of the bright scene, the output frame rate (Framelength) of the image Sensor may be reduced from the current value to the first setting value. The first setting value may be set to a smaller value, such as 2000, based on actual needs, which is not limited in this embodiment.

In step S402, the image sensor is controlled to output a frame of long exposure image frame and a frame of short exposure image frame within each frame transmission time, respectively, based on the output frame rate of the first setting value. Wherein the exposure duration of the long-exposure image frame is longer than the exposure duration of the short-exposure image frame.

In this embodiment, after the terminal device reduces the output frame rate of the image Sensor from the current value to the first set value, the image Sensor may be controlled to output a frame of long exposure image frame and a frame of short exposure image frame within each frame of transmission time, respectively, based on the output frame rate of the first set value.

In step S303, the long-exposure image frame and the short-exposure image frame are subjected to fusion processing.

In this embodiment, after controlling the image sensor to output a long-exposure image frame and a short-exposure image frame with a set number of frames per second based on the output frame rate of the first setting value, the long-exposure image frame and the short-exposure image frame may be subjected to fusion processing.

It can be understood that in bright light scenes such as outdoors, backlight is easy to occur, and the quality of video call images is affected. Therefore, in this embodiment, a frame of long exposure image frame and a frame of short exposure image frame are output within each frame of transmission time by controlling the image sensor to output the frame rate based on the first setting value, and then the long exposure image frame and the short exposure image frame within each frame of transmission time are fused subsequently, so as to improve the quality of the video call image based on the fused image frames.

For example, assume that the setting parameter (Sensor setting) of the current image Sensor is 30fps, that is, the image Sensor is controlled to output a long-exposure image Frame and a short-exposure image Frame within the transmission time (i.e. 33 ms) of each Frame, that is, each image acquisition request can be used to acquire two long image frames and two short image frames, and output the two image frames to the platform end within the same exposure time (33 ms) for subsequent fusion processing. It can be understood that, by performing the fusion processing on the long-exposure image frame and the short-exposure image frame, the high dynamic range of the image video stream can be adjusted, so that the image video stream capable of improving the backlight scene can be output.

As can be seen from the above description, in this embodiment, by reducing the output frame rate of the image sensor from the current value to the first setting value, and controlling the output frame rate of the image sensor based on the first setting value, outputting a frame of long-exposure image frame and a frame of short-exposure image frame within each frame of transmission time, and performing fusion processing on the long-exposure image frame and the short-exposure image frame, it is possible to process an image in the current image video stream based on an image processing policy corresponding to the current scene when the current video call scene is a bright light scene.

FIG. 5 is a flow chart showing how images in the current image video stream are processed according to yet another exemplary embodiment of the present disclosure; the present embodiment is based on the above embodiments and illustrates how to process the image in the current image video stream. As shown in fig. 5, when the current video call scene is a dim light scene, the processing of the image in the current image video stream based on the image processing policy corresponding to the current video call scene in step S102 may include the following steps S501 to S503:

in step S501, the output frame rate of the image sensor is increased from the current value to a second set value.

In this embodiment, when the terminal device determines that the current video call scene satisfies the determination condition of the dim light scene, the output frame rate (Framelength) of the image Sensor may be increased from the current value to the second setting value. The second setting value may be set to a larger value, such as 4000, based on actual needs, which is not limited in this embodiment.

In step S502, the image sensor is controlled to output a target exposure image frame of a target exposure duration for each frame transmission time based on the output frame rate of the second setting value. And the target exposure duration is the exposure duration meeting the current environment brightness condition.

In this embodiment, after the terminal device increases the output frame rate of the image Sensor from the current value to the second set value, the image Sensor may be controlled to output a target exposure image frame of a frame of target exposure duration within each frame transmission time based on the output frame rate of the second set value, where the target exposure duration is an exposure duration that meets the current ambient brightness condition, that is, an exposure duration that makes a picture appear neither too bright nor too dark under the current ambient brightness condition, without outputting a long exposure image frame and a short exposure image frame as in the embodiment shown in fig. 4.

In step S503, the target exposure image frame is processed in a preset manner.

The preset mode comprises at least one of brightness improvement and noise reduction.

In this embodiment, after controlling the image sensor to output a target exposure image frame of a target exposure duration within each frame transmission time based on the output frame rate of the second set value, at least one of brightness enhancement and noise reduction may be performed on the normal exposure image frame.

For example, still assuming that the current Sensor setting is 30fps, that is, within a transmission time per frame (i.e., 33 ms), a target exposure image frame of a target exposure duration is output, that is, each image acquisition request may acquire a target exposure image frame of a target exposure duration, and then the image frame may be output to the platform end for at least one of subsequent brightness enhancement and noise reduction within one exposure duration (33 ms), for example, a Video NR noise reduction module is enabled, and the image frame is output to an AI noise reduction module for at least one of brightness enhancement and noise reduction, so as to output an image Video stream capable of improving an indoor dim environment and/or noise.

As can be seen from the above description, in this embodiment, by increasing the output frame rate of the image sensor from the default value to the second set value, and controlling the image sensor to output the target exposure image frame of one frame of target exposure duration within each frame of transmission time based on the output frame rate of the second set value, and further performing preset mode processing on the target exposure image frame, the image in the current image video stream may be processed based on the image processing policy corresponding to the current scene when the current video call scene is a dim light scene, and because the image in the current image video stream is processed by using the image processing policy specific to the dim light scene, the processing quality of the video call image in the dim light scene may be further improved, the requirement of the user on the video call image quality is met, and the video call experience of the user is improved.

The following describes a video call image processing method according to the disclosed embodiment, taking a WeChat video call as an example.

Step 1: when a user opens a WeChat and enables a video call function, an operating system of a terminal device firstly judges whether the WeChat application is in a preset application program white list, if the WeChat application is determined to be in the white list, a camera module (open camera) can be called, configuration information and setting of some attributes (namely configStream) are further carried out on a camera data stream (camera data stream), and an HDR mode (HDR mode) is set to be true; in contrast, if it is determined that the application program is not in the white list, the application temporarily does not support the image processing policy of the present embodiment.

Step 2: the system bottom layer of the terminal device can judge whether the value of the hdrMode is true through the getKey method, and if the value of the hdrMode is true, the video call is determined to adopt the video call image processing method of the embodiment of the disclosure.

And step 3: when an image acquisition request (e.g., captureRequest) of the WeChat application is received, the system image framework layer of the terminal device may acquire current picture frame information, such as at least one of an image Sensor Gain (Sensor Gain), an image signal processor Gain (ISP Gain), an exposure rate (EV) and a light sensitivity (ISO) value, through a scene recognition module (e.g., AIDetectModule), and perform detailed decomposition analysis on the information, so as to determine whether the current video call scene is a bright scene, a dark scene, or the like. Illustratively, different image processing policies may be invoked according to different determination results of the current video call scenario, such as: when the ISO or Sensor Gain or ISP Gain reaches the bright scene condition, judging the scene as a bright scene, and then adopting an image processing strategy corresponding to the bright scene, as shown in the following step 4; on the contrary, when ISO or Sensor Gain or ISP Gain reaches the dim light scene condition, it can be determined as a dim light scene, and then an image processing strategy corresponding to the dim light scene is adopted, as shown in the following step 5.

And 4, step 4: in the case of determining a bright scene such as outdoors, the output frame rate (framelength) of the image Sensor may be set to be minimum, for example, to 2000, and at this time, the image Sensor may output a frame long exposure frame and a frame short exposure frame within the transmission time of each frame, respectively. For example, in the case of determining a bright scene such as outdoors, a preset parameter value (e.g., (preset automatic exposure parameter, e.g., AE _ TARGET _ MODE)) may be set to true, so that the video frame layer (P1 Node) may perform a fusion process on the long exposure frame and the short exposure frame output by the Sensor according to the preset parameter value (e.g., preset automatic exposure parameter, e.g., AE _ TARGET _ MODE) to true, so as to adjust the high dynamic range of the video preview stream and achieve the purpose of outputting an image video stream for improving the current scene.

And 5: in the case of determining a dark scene such as an indoor scene, the parameter value AE _ TARGET _ MODE may be set to false, and the output frame rate (frame length) of the image Sensor may be increased to a larger value, such as 4000, at which time the image Sensor may output a TARGET exposure image frame of a TARGET exposure duration within each frame transmission time (the TARGET exposure duration is an exposure duration satisfying the current ambient brightness condition). For example, when it is determined that the scene is dark, such as indoor, the Video frame layer (P1 Node) enables Video NR noise reduction according to the fact that AE _ TARGET _ MODE is false, so as to send the frame to the AI noise reduction module for brightness enhancement and/or noise reduction, thereby achieving the purpose of outputting the image Video stream for improving the current scene.

Step 6: the image processed by the image video stream can be rewound to the WeChat application along with the camera pipeline to be displayed as a textView image. It can be understood that the image displayed at this time is a preview image processed by the image processing strategy, and can be provided for the user to preview in real time.

It should be noted that the processing method of the volte video call image is the same as that of the video call image of the WeChat application, and is not repeated herein.

The video call image processing method can realize the interception and processing of the image video stream during the bottom video call based on the system bottom code of the terminal equipment, is not limited by the data format type, is relatively front-end in the processing of the image video stream, is not specific to a certain application program, can be suitable for the video call function of various third-party applications, ensures the reliability and wide applicability of the video call image processing function, can improve the image quality of the image video stream for various third-party video calls and volte video calls, and provides high-quality video chat experience for users.

FIG. 6 is a block diagram illustrating a video call image processing apparatus in accordance with an exemplary embodiment; the apparatus of the embodiment can be applied to a terminal device (e.g., a smart phone, a tablet computer, a desktop computer, a wearable device, etc.) having a video call function. As shown in fig. 6, the apparatus may include: a video stream acquisition module 110, an image processing module 120, and an image return module 130, wherein:

a video stream acquiring module 110, configured to acquire a current image video stream in response to receiving an image acquisition request sent by an application program;

an image processing module 120, configured to process an image in the current image video stream based on an image processing policy corresponding to a current video call scene;

and an image returning module 130, configured to return the processed image to the application program for display.

As can be seen from the above description, the apparatus of this embodiment obtains a current image video stream in response to receiving an image obtaining request sent by an application program, processes an image in the current image video stream based on an image processing policy corresponding to a current video call scene, and then returns the processed image to the application program for display.

Fig. 7 is a block diagram illustrating a video call image processing apparatus according to yet another exemplary embodiment; the apparatus of the embodiment can be applied to terminal devices (e.g., smart phones, tablet computers, desktop computers, wearable devices, etc.) with video call functions. The video stream acquiring module 210, the image processing module 220, and the image returning module 230 have the same functions as the video stream acquiring module 110, the image processing module 120, and the image returning module 130 in the embodiment shown in fig. 6, and are not described herein again. As shown in fig. 7, the video stream acquiring module 210 may be further configured to acquire a current image video stream based on the system video frame layer in response to determining that the application satisfies the setting condition.

In one embodiment, the application satisfying the set condition may include:

In an embodiment, the apparatus may further include a processing policy determining module 240;

the processing policy determination module 240 may include:

a frame information obtaining unit 241 for obtaining current image frame information;

a scene determining unit 242, configured to determine a current video call scene based on the current image frame information;

the policy determining unit 243 is configured to determine an image processing policy corresponding to the current video call scene based on a correspondence between a pre-constructed video call scene and the image processing policy.

In an embodiment, the current image frame information may include at least one of an image Sensor Gain, an image signal processor Gain ISP Gain, an exposure EV, and a sensitivity ISO value.

In an embodiment, the scene determining unit 242 may be further configured to at least one of:

In an embodiment, in the case that the current video call scene is a bright scene, the image processing module 220 may include:

a frame rate reduction unit 221 configured to reduce an output frame rate of the image sensor from a default value to a first set value;

a long and short frame output unit 222, configured to control the image sensor to output a frame length exposure image frame and a frame length exposure image frame within each frame transmission time based on the output frame rate of the first setting value, where an exposure duration of the long exposure image frame is longer than an exposure duration of the short exposure image frame;

and a fusion processing unit 223, configured to perform fusion processing on the long-exposure image frame and the short-exposure image frame.

In an embodiment, in a case that the current video call scene is a dim light scene, the image processing module 220 may include:

a frame rate increasing unit 224 for increasing the output frame rate of the image sensor from a default value to a second set value;

a target frame output unit 225, configured to control the image sensor to output a target exposure image frame with a target exposure duration in each frame transmission time based on the output frame rate of the second set value, where the target exposure duration is an exposure duration meeting the current ambient brightness condition;

the preset processing unit 226 is configured to perform processing on the target exposure image frame in a preset manner, where the preset manner includes at least one of luminance enhancement and noise reduction.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 8 is a block diagram of an electronic device shown in accordance with an example embodiment. For example, the device 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, and the like.

Referring to fig. 8, device 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls the overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.

The multimedia components 908 include a screen that provides an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 further includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the device 900. For example, the sensor component 914 may detect an open/closed state of the device 900, the relative positioning of components, such as a display and keypad of the device 900, the sensor component 914 may also detect a change in the position of the device 900 or a component of the device 900, the presence or absence of user contact with the device 900, orientation or acceleration/deceleration of the device 900, and a change in the temperature of the device 900. The sensor assembly 914 may also include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the device 900 and other devices in a wired or wireless manner. The device 900 may access a wireless network based on a communication standard, such as WiFi,2G or 3g,4g or 5G or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video call image processing method, the method comprising:

and returning the processed image to the application program for displaying.

2. The method of claim 1, wherein the obtaining a current image video stream comprises:

3. The method of claim 2, wherein the application satisfying the set condition comprises:

4. The method of claim 1, further comprising determining an image processing policy corresponding to the current video call scenario based on:

acquiring current image frame information;

and determining an image processing strategy corresponding to the current video call scene based on the corresponding relation between the pre-constructed video call scene and the image processing strategy.

5. The method according to claim 4, wherein the current image frame information includes at least one of an image Sensor Gain, an image signal processor Gain, ISP Gain, an exposure EV, and a sensitivity ISO value.

6. The method of claim 4, wherein determining a current video call scene based on the current image frame information comprises at least one of:

7. The method according to claim 1, wherein in a case that the current video call scene is a bright scene, the processing the image in the current image video stream based on the image processing policy corresponding to the current video call scene comprises:

8. The method according to claim 1, wherein in a case that the current video call scene is a dim light scene, the processing the image in the current image video stream based on the image processing policy corresponding to the current video call scene comprises:

controlling the image sensor to output a target exposure image frame of a frame of target exposure duration within each frame of transmission time based on the output frame rate of the second set value, wherein the target exposure duration is the exposure duration meeting the current environment brightness condition;

9. An image processing apparatus for video call, the apparatus comprising:

the image processing module is used for processing the images in the current image video stream based on an image processing strategy corresponding to the current video call scene;

10. The apparatus of claim 9, wherein the video stream acquiring module is further configured to acquire the current image video stream based on a system video frame layer in response to determining that the application satisfies a set condition.

11. The apparatus of claim 10, wherein the application satisfying a set condition comprises:

12. The apparatus of claim 9, further comprising a processing policy determination module;

the processing strategy determination module comprises:

13. The apparatus of claim 12, wherein the current image frame information comprises at least one of an image Sensor Gain, an image signal processor Gain ISP Gain, an exposure EV, and a sensitivity ISO value.

14. The apparatus of claim 12, wherein the scene determination unit is further configured to at least one of:

15. The apparatus of claim 9, wherein in the case that the current video call scene is a bright scene, the image processing module comprises:

16. The apparatus of claim 9, wherein in the case that the current video call scene is a dim light scene, the image processing module comprises:

the target frame output unit is used for controlling the image sensor to output a target exposure image frame with a frame of target exposure duration in each frame transmission time based on the output frame rate of the second set value, wherein the target exposure duration is the exposure duration meeting the current environment brightness condition;

17. An electronic device, characterized in that the device comprises:

a processor and a memory for storing a computer program;

wherein the processor is configured to, when executing the computer program, implement:

and returning the processed image to the application program for displaying.

18. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing:

and returning the processed image to the application program for displaying.