CN113031839A

CN113031839A - Image processing method, device, equipment and medium in video call

Info

Publication number: CN113031839A
Application number: CN202110206754.2A
Authority: CN
Inventors: 莫铭锟
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2021-06-25
Anticipated expiration: 2041-02-22
Also published as: CN113031839B

Abstract

The embodiment of the disclosure relates to an image processing method, an image processing device, image processing equipment and an image processing medium in video call, wherein the method comprises the following steps: acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call; acquiring a reference model corresponding to a target object; comparing the characteristics of the target object and the reference model, and determining the adjustment angle of the target object on the target image based on the characteristic comparison result; and adjusting the target image based on the adjustment angle, and sending the adjusted target image to a receiving end. The embodiment of the disclosure provides a new method for adjusting and processing a video image in a video call process based on an image recognition technology, so that the dependence on a sensor with an angle measurement function in a terminal in the image adjustment process is reduced, and the picture effect seen by a video receiver on a video interface is improved.

Description

Image processing method, device, equipment and medium in video call

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, device, and medium in a video call.

Background

The development of the video call technology reduces the cost of communication and communication of people, so that people can see the opposite party at any time and any place, and the influence of distance on face-to-face communication of people is reduced.

However, during the video call, there may be a difference in the usage state of the participant terminals, so that there may be a deflection in the video picture displayed in the terminal. For example, when a user watches an electronic book or plays a game during a video call, the terminal is in a horizontal screen display state, while the receiving terminal is still in a normal use state, for example, a vertical screen display state, and after the sending terminal sends an image collected by the camera to the receiving terminal, the image displayed by the receiving terminal is in the horizontal display state, so that the picture effect is poor.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide an image processing method, apparatus, device and medium in a video call.

In a first aspect, an embodiment of the present disclosure provides an image processing method in a video call, including:

acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call;

acquiring a reference model corresponding to the target object;

comparing the characteristics of the target object and the reference model, and determining the adjustment angle of the target object on the target image based on the characteristic comparison result;

and adjusting the target image based on the adjustment angle, and sending the adjusted target image to a receiving end.

In a second aspect, an embodiment of the present disclosure further provides an image processing apparatus in a video call, including:

the image acquisition and identification module is used for acquiring a target image acquired by the image acquisition device and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call;

a reference model obtaining module, configured to obtain a reference model corresponding to the target object;

the adjustment angle determining module is used for comparing the characteristics of the target object and the reference model and determining the adjustment angle of the target object on the target image based on the characteristic comparison result;

and the image adjusting and sending module is used for adjusting the target image based on the adjusting angle and sending the adjusted target image to a receiving end.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including a memory and a processor, where: the memory stores a computer program, and when the computer program is executed by the processor, the electronic device is enabled to implement any one of the image processing methods in video call provided by the embodiments of the present disclosure.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a computing device, the computer program causes the computing device to implement any one of the image processing methods in video call provided in the embodiments of the present disclosure.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has at least the following advantages: in the embodiment of the disclosure, a reference model is preset for determining an adjustment angle of a target object on a target image acquired by an image acquisition device in a video call process, if it is determined that a display angle of the target object needs to be adjusted based on a feature comparison result of the target object on the target image and the reference model, that is, the adjustment angle is greater than zero, the target image is adjusted based on the adjustment angle, and the adjusted target image is sent to a receiving end, so that the display angle of the target object on the target image displayed by the receiving end does not deflect, thereby realizing that a new method for adjusting a video image in the video call process is provided based on an image recognition technology, reducing dependence on a sensor with an angle measurement function in a terminal in the image adjustment process, and improving a picture effect seen by a video receiving end on a video interface, the video receiving party can see the video image belonging to the normal picture direction on the video interface under most conditions.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of an image processing method in a video call according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a terminal interface in a video call according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of another terminal interface in a video call provided in the embodiment of the present disclosure;

fig. 4 is a schematic diagram of a terminal interface in a video call according to another embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an image processing apparatus in a video call according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Fig. 1 is a flowchart of an image processing method in a video call according to an embodiment of the present disclosure, where the method may be executed by an image processing apparatus in a video call, and the apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability, for example, a mobile terminal, a tablet computer, a personal computer, and other user terminals.

As shown in fig. 1, an image processing method in a video call provided by an embodiment of the present disclosure may include:

s101, acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call.

In the embodiment of the disclosure, the establishment of the video call between different terminals can be realized based on any instant messaging tool platform. After video calls are established between different terminals, a sending terminal and a receiving terminal of video images are arranged. The image acquisition device comprises a front camera or a rear camera integrated in the terminal, an independent camera accessed to the terminal, and other equipment with an image acquisition function.

After the terminal acquires the target image acquired by the image acquisition device, a target object on the target image may be identified by using any available target identification method, such as a model with a target identification function, and the like, where the target object may include a target human face or a target environmental object, and the target environmental object may include, but is not limited to, a vehicle, a building, a road, and the like.

Optionally, identifying the target object on the target image comprises: identifying a target object on a target image in response to the terminal screen being in a landscape display state; the horizontal screen display state means that the horizontal size of the image displayed on the terminal screen is larger than the vertical size. Accordingly, the vertical screen display state of the terminal screen means that the image displayed on the terminal screen has a smaller horizontal size than a vertical size. The horizontal screen display state or the vertical screen display state can be determined by judging the Activity (Activity) of the current interface through the terminal system interface. In view of the screen characteristics of the terminal, the vertical screen display state of the current terminal screen is a default use state in the video call state, and if one terminal is in the horizontal screen display state, and correspondingly, the target image is acquired based on the horizontal screen mode, the target image is displayed after the target image is received in the other terminal, so that the video receiver can see the image belonging to the abnormal picture direction on the video interface, for example, the face is in the horizontal display state, and the display effect of the face image of the user B in the terminal interface of the user a shown in fig. 3 can be referred to. Therefore, the embodiment of the disclosure is applicable to a situation that a certain terminal is in a landscape display state and establishes a video call with other terminals, for example, a certain user watches an electronic book, a game or a video program after rotating a screen to the landscape display state in the video call process, and subsequently determines an adjustment angle of a target object through feature comparison between the target object on the target image and a reference model, and then adjusts the target image as a whole to adjust the display angle of the target object, thereby ensuring the picture effect of the video image displayed in a receiving end and ensuring natural display of the picture.

Further, the embodiments of the present disclosure may further include: in the state of establishing a video call, acquiring application program running information recorded by a terminal system, wherein the application program running information can include but is not limited to information such as an application program name, a foreground running state, a background running state and the like, and determining a target application program running in the terminal according to the application program running information, wherein the target application program refers to an application program except for the video call application program and can be a pre-specified application program or an application program in the foreground running state; if the target application program needs to be operated in the landscape display state of the terminal screen (specifically determined by the development process of the application program), an operation of identifying the target object on the target image is performed. In other words, the terminal can determine whether to automatically execute the adjustment processing operation of the target image according to the currently running target application program under the condition that the terminal screen is in the horizontal screen display state, so that the influence on the running of the target application program and the influence on user experience in the video call process are avoided. If the target application program does not need to run in the horizontal screen display state of the terminal screen, screen rotation prompt information can be sent to the user, so that the user can manually rotate the terminal screen to the vertical screen display state.

And S102, acquiring a reference model corresponding to the target object.

The reference model is used for determining the adjustment angle of the target object on the target image acquired by the image acquisition device in the video call process, in other words, the reference model has directivity and represents a standard display state of the target object at the receiving end in the video call state. The type of the reference model is related to the type of the target object. Illustratively, if the target object is a target face, the reference model may be a preset reference face in a standard position state in a video call state, or other reference objects that can be used to characterize the position state of the reference face; if the target object is a target environment object, the reference model may be a preset reference environment object in a standard position state in a video call state and associated with the target environment object, for example, the target environment object includes a vehicle, a building, a road, etc., and the reference environment object may also include a vehicle, a building, a road, etc. The standard position state of the reference model in the video call state refers to the position state of the reference model when the video image which is acquired by the image acquisition device and comprises the reference model presents a normal picture direction in the specified display state of the terminal screen. Taking the vertical screen display state of the terminal in the normal use state (that is, the terminal screen does not have the condition of being turned upside down) as an example, the face in the forward direction acquired by the image acquisition device can be used as a reference model, and the face display effect of the user a in the terminal interface of the user a in fig. 2 (or fig. 3 and fig. 4) can be referred to.

S103, comparing the characteristics of the target object and the reference model, and determining the adjustment angle of the target object on the target image based on the characteristic comparison result.

The preset standard position state of the reference model in the video call state does not change along with the change of the horizontal screen display state or the vertical screen display state of the terminal screen. Through the characteristic comparison, the difference of the display angles of the target object and the reference model on the screen of the receiving end can be determined, and therefore the adjustment angle of the target object is determined.

For example, in the feature comparison process between the target object and the reference model, it may be determined whether the target image needs to be rotated according to a display state (for example, a horizontal screen or a vertical screen) of a terminal screen when the image acquisition device acquires the target image, and then the feature comparison between the target object and the reference model is performed. Specifically, if the display state of the terminal screen is the same as the display state of the terminal screen corresponding to the reference model when the target image is acquired, the display proportions of the target image and the image corresponding to the reference model are considered to be the same, for example, the display proportions are 3:4, the target image does not need to be rotated, any available image identification technology can be directly utilized to identify the characteristic key points on the target object and the reference model, and the characteristic comparison is performed based on the characteristic key points; if the display state of the terminal screen when the target image is acquired is different from the display state of the terminal screen corresponding to the reference model, for example, the terminal screen is in a horizontal screen display state when the target image is acquired, and the vertical screen display state of the terminal screen corresponding to the reference model is acquired, the display proportions of the target image and the image corresponding to the reference model are different, for example, one display proportion is 4:3 and the other display proportion is 3:4, the target image can be rotated by 90 degrees, so that the display proportion of the rotated target image is the same as the display proportion of the image corresponding to the reference model, and then the characteristic comparison of the target object and the reference model is performed based on the rotated target image. For example, it may also be pre-specified that a straight line where a short side of the image (i.e., the side having a smaller size) is located is taken as a horizontal axis (or a vertical axis) of the image coordinates, and a straight line where a long side of the image (i.e., the side having a larger size) is taken as a vertical axis (or a horizontal axis) of the image coordinates, that is, an image coordinate system of the target image and an image coordinate system corresponding to the reference model are established according to the same image coordinate system definition rule, and then feature key point identification is performed on the target object and the reference model based on the respective image coordinate systems, and feature comparison is performed based on coordinates corresponding to.

Exemplarily, as shown in fig. 2, in the video call process, the front camera in the terminal of the user B acquires the face image of the user B when the terminal is in the vertical screen display state and deflects clockwise by a preset degree, and if no image adjustment processing is performed, the face image of the user B is transmitted to the terminal of the user a, and the display effect is as the face image effect of the user B displayed in the terminal interface of the user a before the image adjustment processing in fig. 2. If the face image of the user B is acquired, the face of the user B in the image is compared with the reference face in a characteristic manner, the fact that the face of the user B deflects by a preset angle clockwise relative to the reference face is determined, the face image of the user B is integrally rotated by the preset angle anticlockwise and then is transmitted to the terminal of the user A, and the final display effect of the face of the user B is the face image effect of the user B displayed in the terminal interface of the user A after the image adjustment processing in the image adjusting process in the figure 2, namely, the face display angle is corrected.

Exemplarily, as shown in fig. 3, in the video call process, the front-facing camera of the terminal of the user B acquires the face image of the user B when the terminal is in the landscape display state, and if no image adjustment processing is performed, the face image of the user B is transmitted to the terminal of the user a, and the display effect is as the face image effect of the user B displayed in the terminal interface of the user a before the image adjustment processing in fig. 3, that is, the face of the user B is in the landscape display state in the terminal of the user a, and the picture display is not natural enough. If the face of the user B in the image is compared with the reference face in the characteristic comparison mode after the face image of the user B is acquired, the fact that the face of the user B rotates 90 degrees anticlockwise relative to the reference face is determined, the face image of the user B is integrally rotated 90 degrees clockwise and then is transmitted to the terminal of the user A, and the final display effect of the face of the user B is the face image effect of the user B displayed in the terminal interface of the user A after the image adjustment processing in the image adjusting mode in the figure 3, namely the face of the user B is in the display state under the normal picture.

And S104, adjusting the target image based on the adjustment angle, and sending the adjusted target image to a receiving end.

In the process of adjusting the image, the processing operation is not limited to rotation, and may also include cropping, screen filling, scaling, and the like, which may be determined according to actual processing requirements. After the target image is adjusted based on the adjustment angle, the target object can present the same display state as the reference model at the receiving end, and the condition that the target object has display angle deflection at the receiving end is reduced.

In the embodiment of the disclosure, a reference model is preset for determining an adjustment angle of a target object on a target image acquired by an image acquisition device in a video call process, if it is determined that a display angle of the target object needs to be adjusted based on a feature comparison result of the target object on the target image and the reference model, that is, the adjustment angle is greater than zero, the target image is adjusted based on the adjustment angle, and the adjusted target image is sent to a receiving end, so that the display angle of the target object on the target image displayed by the receiving end does not deflect, thereby realizing a new method for adjusting a video image in the video call process based on an image recognition technology, reducing dependence on a sensor (such as a gravity sensor and the like) with an angle measurement function in a terminal in the image adjustment process, and improving a picture effect seen by a video receiving end on a video interface, the video receiving party can see the video image belonging to the normal picture direction on the video interface under most conditions.

On the basis of the above technical solution, optionally, performing feature comparison on the target object and the reference model, and determining an adjustment angle of the target object on the target image based on a result of the feature comparison, includes:

determining at least two first preset key points on the target object and a first geometric figure determined by the at least two first preset key points; according to the difference of the target objects, the first preset key point can be flexibly determined, and the embodiment of the disclosure is not particularly limited on the basis of ensuring that the first geometric figure can be used for representing the display state of the target object on the target image;

determining at least two second preset key points on the reference model and a second geometric figure determined by the at least two second preset key points; the second geometric figure is of the same type as the first geometric figure; similar to the first preset key point, on the basis of ensuring that the second geometric figure can be used for representing the standard position state of the reference model in the video call state, the second preset key point is not specifically limited in the embodiment of the disclosure and can be flexibly determined;

calculating the position angle difference of the first geometric figure and the second geometric figure under a preset coordinate system to serve as an adjusting angle of a target object on the target image; the image display ratio of the target image in the preset coordinate system is the same as the image display ratio of the reference model in the preset coordinate system, for example, in the preset coordinate system, the image display ratio of the reference model is 3:4, and the display ratio of the target image is also 3:4 (if the original display ratio of the target image is not 3:4, the display ratio referred to herein is the display ratio determined after the target image is rotated).

For example, a definition rule of the preset coordinate system may be preset, for example, a straight line where a short side of the image is located is taken as a horizontal axis of the preset coordinate system, a straight line where a long side of the image is taken as a horizontal axis of the preset coordinate system, and the like, and then image coordinate systems corresponding to the target image and the reference model are respectively established according to the definition rule of the preset coordinate system, so as to facilitate subsequent calculation of the position angle difference between the first geometric figure and the second geometric figure in the preset coordinate system. In addition, an image coordinate system corresponding to the reference model can also be determined as the preset coordinate system, if the image display proportion of the acquired target image is the same as that of the reference model, the target image can directly adopt the image coordinate system corresponding to the reference model as the image coordinate system of the target image; if the image display proportion of the acquired target image is different from the image display proportion corresponding to the reference model, the target image needs to be rotated, so that the image display proportion of the target image after rotation is the same as the image display proportion corresponding to the reference model, the image coordinate system corresponding to the reference model can be directly adopted as the image coordinate system of the target image, and then the position angle difference of the first geometric figure and the second geometric figure can be calculated under the preset coordinate system.

The first geometric figure and the second geometric figure may comprise straight lines or polygons, and in order to ensure the accuracy of calculation, the first geometric figure and the second geometric figure are preferably of the same figure type, for example, both straight lines or both polygons are used. With regard to the specific calculation principle of the position angle difference between the geometric figures, reference can be made to the prior art implementation.

In an alternative embodiment, the target object comprises a target face, and the reference model comprises a reference face in a standard position state in a video call state;

the at least two first preset key points on the target object comprise at least two first preset key points positioned at different facial features on the target face;

the at least two second preset key points on the reference model comprise at least two second preset key points positioned at different five sense organ parts on the reference human face.

Further exemplary, the different five sense organ sites include the left eye and the right eye; alternatively, the different five sense organ parts include the nose and mouth; alternatively, the different five sense organ parts include the left eye, right eye and nose.

For the reference face, the connecting line of key points positioned at the same position on the left eye and the right eye (for example, the connecting line of the pupil center of the left eye and the pupil center of the right eye) is parallel to the horizontal line; the connecting line of key points positioned at the center of the nose tip and the mouth is vertical to the horizontal line; moreover, the connecting line of key points at the same positions on the left eye and the right eye and the connecting line of the key points at the same positions on the left eye and the right eye and the nose tip respectively form a triangle, and the bottom side of the triangle is parallel to the horizontal line. Therefore, the adjustment angle of the target face on the target image can be accurately determined by calculating the position angle difference of the first geometric figure determined by the first preset key points of different facial features on the target face and the second geometric figure determined by the second preset key points of different facial features on the reference face under the preset coordinate system.

In addition, if the image display scale of the target image is different from the image display scale corresponding to the reference model, it may be preferable to determine that more preset key points participate in determining the adjustment angle of the target object on the target image. Illustratively, as shown in fig. 2, the terminal of the user B is in a portrait screen display state, the image display scale of the target image (i.e., the face image of the user B) acquired by the image acquisition device is the same as the image display scale corresponding to the reference face, the key points at the same positions on the left eye and the right eye of the target face on the target image acquired by the image acquisition device may be determined as first preset key points, or the key points at the tip of the nose and the center of the mouth of the target face on the target image may be determined as first preset key points, and a straight line determined by the first preset key points is determined as the first geometric figure. Accordingly, the key points at the same positions on the left and right eyes of the reference face may be determined as second preset key points, or the key points at the center positions of the nose tip and the mouth on the reference face may be determined as second preset key points, and a straight line determined by the second preset key points may be determined as the second geometric image. And then calculating the position angle difference (such as the included angle between two straight lines) between the first geometric figure and the second geometric figure corresponding to the same five sense organs under the same coordinate system to be used as the adjustment angle of the target face on the target image. For example, an included angle between a straight line determined by key points at the same positions on the left eye and the right eye of the target face and a straight line determined by key points at the same positions on the left eye and the right eye of the reference face is calculated to be a preset deflection angle in the clockwise direction, and then the target image can be rotated by the preset deflection angle in the counterclockwise direction and then sent to the receiving end for display.

Exemplarily, as shown in fig. 3, a terminal of a user B is in a landscape display state, an image display ratio of a target image (i.e., a face image of the user B) acquired by an image acquisition device is different from an image display ratio corresponding to a reference face, the target image may be rotated by 90 degrees, so that the image display ratio of the target image after the rotation processing is the same as the image display ratio corresponding to the reference model, and at this time, a key point at the same position on a left eye and a right eye of the target face on the target image and a key point of a nose tip are determined as a first preset key point in an image coordinate system corresponding to the reference face, that is, a preset coordinate system, and a triangle determined by the first preset key point is determined as a first geometric figure. Accordingly, key points at the same positions on the left and right eyes of the reference face and the tip of the nose key point are determined as second preset key points, and a triangle determined by the second preset key points is determined as a second geometric figure. And then calculating the position angle difference between the first geometric figure and the second geometric figure under a preset coordinate system to be used as the adjustment angle of the target face on the target image. After the target image is adjusted based on the adjustment angle, the target face may present the same display state as the reference face at the receiving end, for example, after the image adjustment processing in fig. 3, the face effect of the user B displayed in the terminal interface of the user a is the same as the display effect of the reference face.

In an alternative embodiment, the target object comprises a target environment object, and the reference model comprises a reference environment object in a standard position state and associated with the target environment object in a video call state;

the first geometric figures determined by the at least two first preset key points comprise straight lines parallel to a road surface on the target image or polygons used for representing the position states of the target environment object on the target image;

the second geometric figures determined by the at least two second preset key points comprise reference lines parallel to the horizontal line or reference polygons for representing the standard position state of the reference environmental object.

That is, in the embodiment of the present disclosure, the target object on the target image is not limited to a human face, and may also be a target environment object in a street view environment. Taking the target environment object as a vehicle, the reference model may be a vehicle or a road, etc. The second preset key point can be flexibly determined according to different reference models. Taking the reference model as an example of a vehicle, the connecting lines of the key points (i.e., the second preset key points) at the same positions on the tires at the same side can be taken as reference lines parallel to a horizontal line, rectangles or other polygons formed by the key points at different positions on the vehicle body can be taken as reference polygons, or taking the reference model as an example of a road, the connecting lines of the key points on the same lane line on the road can be taken as reference lines parallel to a horizontal line, and the like.

Reference may be made to the foregoing description as to how to calculate the difference between the position and the angle of the first geometric figure and the second geometric figure in the preset coordinate system.

On the basis of the foregoing technical solution, optionally, if the edges of the first geometric figure and the second geometric figure under the preset coordinate system are parallel, the method provided in the embodiment of the present disclosure further includes:

acquiring a screen rotation angle measured by an angle sensor installed in a terminal; the angle sensor may include, but is not limited to, a sensor having an angle measuring function such as a gravity sensor;

and adjusting the target image based on the screen rotation angle.

For example, the terminal screen is rotated by 180 degrees relative to the screen direction in the normal use state of the terminal, the first geometric figure and the second geometric figure are in a vertically-inverted state, at this time, the position angle difference between the first geometric figure and the second geometric figure calculated under the preset coordinate system is 0, that is, the preliminarily determined adjustment angle of the target object is 0, if the rotation angle of the terminal screen is not continuously obtained, effective adjustment of the target image cannot be realized, and further, the target object cannot be displayed in the same display state as the reference model at the receiving end. Therefore, the target image can be continuously adjusted based on the screen rotation angle by acquiring the screen rotation angle of the terminal, and the adjusted target image is sent to the receiving end, so that the target object is ensured to be in the same display state as the reference model at the receiving end.

As shown in fig. 4, in a vertical screen display state of the terminal, the terminal screen of the user B is turned upside down, taking a straight line determined by key points at the same positions on the left eye and the right eye of the target face on the target image (i.e., the face image of the user B) as a first geometric figure, taking a straight line determined by key points at the same positions on the left eye and the right eye of the reference face as a second geometric figure, since the two straight lines are parallel to each other, a position angle difference between the first geometric figure and the second geometric figure calculated under a preset coordinate system is 0, that is, no image adjustment processing is required, at this time, the terminal of the user B sends the face image of the user B to the terminal of the user a, and the face of the user B is in an inverted display state in the terminal of the user a; if the terminal of the user B continues to acquire the screen rotation angle of the terminal, for example, the terminal rotates 180 degrees counterclockwise, the face image of the user B may be rotated 180 degrees clockwise and then sent to the terminal of the user a, and at this time, the face of the user B is in a normal display state in the terminal of the user a.

Optionally, the method provided by the embodiment of the present disclosure further includes:

receiving a video image, and determining the display state of a target object on the video image in the current display state of a terminal screen; if the display state of the target object on the video image is not matched with the current display state of the terminal screen, determining the rotation angle of the target object on the video image based on the display state of the target object on the video image and the current display state of the terminal screen; and performing rotation processing on the video image based on the rotation angle, and displaying the image after the rotation processing.

After receiving the video image, a target object (e.g., a target human face or a target environmental object) on the video image may be identified by any available target identification method, and a display state of the target object in the current display state of the terminal screen may be determined based on a positional relationship of different portions on the target object. Exemplarily, a target face on a video image, the positions of five sense organs of the face and the position relation among the five sense organs are determined by a face recognition method; and determining the display state of the target face in the current display state of the terminal screen based on the position relationship between the five sense organs, for example, in the vertical screen display state of the terminal, the face on the video image is in an inverted display state, or for example, in the horizontal screen display state of the terminal, the face on the video image is in a horizontal display state, and the mouth part is positioned at the right side of the straight line where the two eyes are positioned. And then, performing rotation processing on the video image based on the determined rotation angle, so that the target object on the video image displayed on the terminal interface can be in a normal display state.

Taking a target object as a target face as an example, as shown in fig. 2, after a terminal of a user B receives a video image (i.e., a face image of the user a) sent by the terminal of the user a, it is determined that the face of the user a is in a normal display state in a vertical screen display state of the terminal, i.e., the display state of the target face is matched with the current display state of the terminal, and the video image does not need to be rotated; as shown in fig. 3, the terminal of the user B is in a horizontal screen display state, and the terminal of the user a is in a vertical screen display state, so that the face of the user a displayed in the terminal of the user B is in a horizontal display state, that is, the display state of the target face is not matched with the current display state of the terminal screen, at this time, it may be determined that the rotation angle of the face of the user a is clockwise rotated by 90 degrees based on the horizontal display state of the face of the user a and the current display state of the terminal screen, and the video image is displayed after being rotated, where the display effect of the face of the user a in the terminal of the user B is shown in a second row of sub-graph in fig. 3; as shown in fig. 4, the terminal of the user B and the terminal of the user a are in a state of being opposite from top to bottom on the screen, so that the face of the user a is in an inverted display state in the terminal of the user B, that is, the display state of the target face is not matched with the current display state of the terminal screen, at this time, it may be determined that the rotation angle of the face of the user a is 180 degrees clockwise based on the inverted display state of the face of the user a and the current display state of the terminal screen, the video image is displayed after being rotated, and the display effect of the face of the user a in the terminal of the user B is shown in the sub-image in the second row in fig. 4, that is, the face of the user a.

It should be noted that fig. 2, fig. 3, and fig. 4 are schematic diagrams of a terminal interface in a video call provided by an embodiment of the present disclosure, which are used to exemplarily illustrate the embodiment of the present disclosure, but should not be construed as a specific limitation to the embodiment of the present disclosure, for example, a display position of a face image on the video interface in a video call process may be determined according to an actual situation, and the video interface may also be hidden in an interface at a sending end, which may be determined according to an application program running in the terminal, for example, a user may play a game while playing the game while carrying out the video call, a game screen may be displayed on the terminal interface, and the video interface may be hidden.

The embodiment of the disclosure provides a new method for adjusting and processing a video image in a video call process based on an image recognition technology, so that the dependence on a sensor with an angle measurement function in a terminal in the image adjustment process is reduced, the picture effect seen by a video receiver and a video sender on a video interface is improved, and the video receiver and the video sender can see the video image in a normal picture direction on the video interface under most conditions.

Fig. 5 is a schematic structural diagram of an image processing apparatus in a video call according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability, for example, a user terminal such as a mobile terminal, a tablet computer, and a personal computer.

As shown in fig. 5, an image processing apparatus 500 in a video call provided by an embodiment of the present disclosure may include an image obtaining and identifying module 501, a reference model obtaining module 502, an adjustment angle determining module 503, and an image adjusting and sending module 504, where:

the image acquiring and identifying module 501 is configured to acquire a target image acquired by an image acquisition device and identify a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call;

a reference model obtaining module 502, configured to obtain a reference model corresponding to a target object;

an adjustment angle determining module 503, configured to perform feature comparison on the target object and the reference model, and determine an adjustment angle of the target object on the target image based on a result of the feature comparison;

and an image adjusting and sending module 504, configured to perform adjustment processing on the target image based on the adjustment angle, and send the adjusted target image to the receiving end.

Optionally, the adjustment angle determining module 503 includes:

the first key point and graph determining unit is used for determining at least two first preset key points on the target object and a first geometric graph determined by the at least two first preset key points;

the second key point and graph determining unit is used for determining at least two second preset key points on the reference model and a second geometric graph determined by the at least two second preset key points; the second geometric figure is of the same type as the first geometric figure;

the adjusting angle determining unit is used for calculating the position angle difference of the first geometric figure and the second geometric figure under a preset coordinate system and taking the position angle difference as the adjusting angle of the target object on the target image; the image display proportion of the target image in the preset coordinate system is the same as the image display proportion of the reference model in the preset coordinate system.

Optionally, the target object includes a target face, and the reference model includes a reference face in a standard position state in a video call state;

Optionally, the different five sense organ sites include the left eye and the right eye; or

The different five sense organs parts include nose and mouth; or

The different five sense organs include the left eye, right eye and nose.

Optionally, the target object includes a target environment object, and the reference model includes a reference environment object in a standard position state and associated with the target environment object in the video call state;

Optionally, if the edges of the first geometric figure and the second geometric figure under the preset coordinate system are parallel, the apparatus 500 provided by the embodiment of the present disclosure further includes:

the screen rotation angle acquisition module is used for acquiring a screen rotation angle measured by an angle sensor installed in the terminal;

the image adjusting and sending module 504 is further configured to perform adjustment processing on the target image based on the screen rotation angle, and send the adjusted target image to the receiving end.

Optionally, the image obtaining and recognizing module 501 is specifically configured to:

acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image in response to the fact that a terminal screen is in a horizontal screen display state; the horizontal screen display state means that the horizontal size of the image displayed on the terminal screen is larger than the vertical size.

Optionally, the image acquisition and recognition module 501 comprises:

the target image acquisition unit is used for acquiring a target image acquired by the image acquisition device;

the terminal system comprises a target application program determining unit, a display unit and a display unit, wherein the target application program determining unit is used for responding to the situation that a terminal screen is in a horizontal screen display state, acquiring application program running information recorded by a terminal system, and determining a target application program running in the terminal according to the application program running information;

and the target object identification unit is used for identifying the target object on the target image if the target application program needs to run in the horizontal screen display state of the terminal screen.

Optionally, the apparatus 500 provided in the embodiment of the present disclosure further includes:

the video image receiving module is used for receiving the video image and determining the display state of a target object on the video image in the current display state of the terminal screen;

the rotation angle determining module is used for determining the rotation angle of the target object on the video image based on the display state of the target object on the video image and the current display state of the terminal screen if the display state of the target object on the video image is not matched with the current display state of the terminal screen;

and the image rotation module is used for performing rotation processing on the video image based on the rotation angle and displaying the image after the rotation processing.

The image processing device in the video call provided by the embodiment of the disclosure can execute the image processing method in any video call provided by the embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is used to exemplarily illustrate an electronic device that implements an image processing method in a video call according to an embodiment of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, wearable electronic devices, servers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and occupation ranges of the embodiments of the present disclosure.

As shown in fig. 6, the electronic device 600 includes one or more processors 601 and memory 602.

The processor 601 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 600 to perform desired functions.

The memory 602 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer readable storage medium, and the processor 601 may execute the program instructions to implement the image processing method in video call provided by the embodiments of the present disclosure, and may also implement other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

The image processing method in the video call provided by the embodiment of the present disclosure may include: acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call; acquiring a reference model corresponding to a target object; comparing the characteristics of the target object and the reference model, and determining the adjustment angle of the target object on the target image based on the characteristic comparison result; and adjusting the target image based on the adjustment angle, and sending the adjusted target image to a receiving end. It should be understood that electronic device 600 may also perform other alternative embodiments provided by the disclosed method embodiments.

In one example, the electronic device 600 may further include: an input device 603 and an output device 604, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 603 may also include, for example, a keyboard, a mouse, and the like.

The output device 604 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 604 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 600 relevant to the present disclosure are shown in fig. 6, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 600 may include any other suitable components depending on the particular application.

In addition to the above methods and apparatus, the disclosed embodiments also provide a computer program product comprising a computer program or computer program instructions that, when executed by a computing device, cause the computing device to implement an image processing method in any video call provided by the disclosed embodiments.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device.

Furthermore, the disclosed embodiments may also provide a computer-readable storage medium having stored thereon computer program instructions that, when executed by a computing device, cause the computing device to implement an image processing method in any video call provided by the disclosed embodiments.

The image processing method in the video call provided by the embodiment of the present disclosure may include: acquiring a target image acquired by an image acquisition device, and identifying a target object on the target image; the target image is an image acquired by the image acquisition device in a state of establishing a video call; acquiring a reference model corresponding to a target object; comparing the characteristics of the target object and the reference model, and determining the adjustment angle of the target object on the target image based on the characteristic comparison result; and adjusting the target image based on the adjustment angle, and sending the adjusted target image to a receiving end. It should be understood that the computer program instructions, when executed by a computing device, may also cause the computing device to implement other alternative embodiments provided by the disclosed method embodiments.

A computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image processing method in a video call, comprising:

acquiring a reference model corresponding to the target object;

2. The method of claim 1, wherein performing feature matching between the target object and the reference model, and determining an adjustment angle of the target object on the target image based on the feature matching result comprises:

determining at least two first preset key points on the target object and a first geometric figure determined by the at least two first preset key points;

determining at least two second preset key points on the reference model and a second geometric figure determined by the at least two second preset key points; the second geometric figure is of the same type as the first geometric figure;

calculating the position angle difference of the first geometric figure and the second geometric figure under a preset coordinate system to serve as the adjusting angle of a target object on the target image; and the image display proportion of the target image in the preset coordinate system is the same as the image display proportion of the reference model in the preset coordinate system.

3. The method of claim 2, wherein the target object comprises a target face, and wherein the reference model comprises a reference face in a standard position state in the video call state;

the at least two first preset key points on the target object comprise at least two first preset key points positioned on different five sense organ parts on the target human face;

the at least two second preset key points on the reference model comprise at least two second preset key points located at the different five sense organ parts on the reference face.

4. The method of claim 3, wherein:

the different five sense organ sites include the left eye and the right eye; or

The different five sense organ parts comprise a nose and a mouth; or

The different five sense organ parts include the left eye, the right eye and the nose.

5. The method of claim 2, wherein the target object comprises a target environmental object, and wherein the reference model comprises a reference environmental object in a standard position state and associated with the target environmental object in the video-call state;

the second geometric figures determined by the at least two second preset key points comprise reference lines parallel to the horizontal line or reference polygons used for representing the standard position state of the reference environment object.

6. The method according to claim 2, wherein if the edges of the first geometric figure and the second geometric figure under the preset coordinate system are parallel, the method further comprises:

acquiring a screen rotation angle measured by an angle sensor installed in a terminal;

and adjusting the target image based on the screen rotation angle.

7. The method of claim 1, wherein identifying a target object on the target image comprises:

responding to the situation that a terminal screen is in a horizontal screen display state, and identifying a target object on the target image; the horizontal screen display state means that the horizontal size of the image displayed on the terminal screen is larger than the vertical size.

8. The method of claim 7, wherein identifying the target object on the target image comprises:

acquiring application program running information recorded by a terminal system, and determining a target application program running in the terminal according to the application program running information;

and if the target application program needs to run in the horizontal screen display state of the terminal screen, identifying a target object on the target image.

9. The method of claim 1, further comprising:

receiving a video image and determining the display state of a target object on the video image in the current display state of a terminal screen;

if the display state of the target object on the video image is not matched with the current display state of the terminal screen, determining the rotation angle of the target object on the video image based on the display state of the target object on the video image and the current display state of the terminal screen;

and performing rotation processing on the video image based on the rotation angle, and displaying the image after the rotation processing.

10. An image processing apparatus in a video call, comprising:

11. An electronic device, comprising a memory and a processor, wherein the memory has stored therein a computer program, which when executed by the processor, causes the electronic device to implement the method of image processing in a video call of any one of claims 1-9.

12. A computer-readable storage medium, in which a computer program is stored, which, when executed by a computing device, causes the computing device to implement the method of image processing in a video call according to any one of claims 1 to 9.