WO2023040616A1

WO2023040616A1 - Terminal device and video call method

Info

Publication number: WO2023040616A1
Application number: PCT/CN2022/114748
Authority: WO
Inventors: 梁震
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-09-15
Filing date: 2022-08-25
Publication date: 2023-03-23
Also published as: CN115834813A

Abstract

Provided in the present disclosure are a terminal device and a video call method. The terminal device comprises: a transmitting-receiving unit, which is configured to receive and transmit an opposite-end image; a display panel, which is configured to display the opposite-end image; an analysis unit, which is configured to determine a viewpoint location in the opposite-end image displayed on the display panel; a driving unit, which is configured to drive an image collection unit to move to the viewpoint location; and the image collection unit, which is configured to collect the local-end image in a light emergence direction of the display panel.

Description

Terminal device, method of video call

Cross References to Related Applications

This application claims priority to Patent Application No. 202111077903.6 filed with the China Patent Office on September 15, 2021, the entire contents of which are hereby incorporated by reference.

technical field

The present disclosure relates to, but is not limited to, the field of display technology.

Background technique

Video calls can be made on mobile terminals (such as mobile phones, laptops, etc.), that is, during the call, the mobile terminals of both users collect images of their own users and send them to the mobile terminal of the other party. In addition to hearing the voice of the other user, the users on both sides can also see the image (or video) of the other user, which has an effect similar to "face-to-face" communication; among them, the image can also be displayed on a larger screen such as a TV. display on the display device of the screen, so as to achieve the effect of video conferencing.

However, in actual face-to-face communication, a lot of information (such as about 70% of the information) may be transmitted through "non-verbal" means such as subtle eye contact and body language. Although video call technology can transmit images, it is still not effective. pass on this information.

Contents of the invention

The present disclosure provides a terminal device and a method for video calling.

In a first aspect, the present disclosure provides a terminal device, which includes: a transceiver unit configured to receive a peer image and send a local image; a display panel configured to display the peer image; an analysis unit configured to determine the The viewpoint position in the peer image displayed on the display panel; the driving unit configured to drive the image acquisition unit to move to the viewpoint position; the image acquisition unit configured to acquire the local image along the light emitting direction of the display panel.

In a second aspect, the present disclosure provides a video call method, which is used in any one of the terminal devices described in the present disclosure, the method comprising: the transceiver unit receives an image of the opposite end; the display panel displays the image of the opposite end The analysis unit determines the viewpoint position in the peer image displayed on the display panel; the drive unit drives the image acquisition unit to move to the viewpoint position; the image acquisition unit acquires the local image at the viewpoint position along the light emitting direction of the display panel; The transceiver unit sends the local image.

Description of drawings

FIG. 1 is a schematic block diagram of a terminal device provided by the present disclosure.

Fig. 2 is a schematic workflow diagram of an analysis unit in a terminal device provided by the present disclosure.

FIG. 3 is a schematic diagram of a peer image displayed on a display panel in a terminal device provided in the present disclosure.

FIG. 4 is a schematic structural diagram of a drive unit and an image acquisition unit located inside a display panel in a terminal device provided by the present disclosure.

Fig. 5 is a schematic diagram of a peer image displayed on a display panel in another terminal device provided by the present disclosure.

Fig. 6 is a schematic flowchart of a video calling method provided by the present disclosure.

Detailed ways

In order for those skilled in the art to better understand the technical solution of the present disclosure, the terminal device and the video call method provided by the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

The present disclosure will be described more fully hereinafter with reference to the accompanying drawings, but the illustrated embodiments may be embodied in different forms, and the present disclosure should not be construed as limited to the embodiments set forth below. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The drawings of the embodiments of the present disclosure are used to provide a further understanding of the embodiments of the present disclosure, and constitute a part of the description, and are used together with the detailed embodiments to explain the present disclosure, and do not constitute limitations to the present disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing detailed embodiments with reference to the accompanying drawings.

The present disclosure may be described with reference to plan views and/or cross-sectional views by way of idealized schematic views of the present disclosure. Accordingly, the example illustrations may be modified according to manufacturing techniques and/or tolerances.

In the case of no conflict, each embodiment and each feature in the embodiment of the present disclosure can be combined with each other.

The terms used in the present disclosure are for describing specific embodiments only, and are not intended to limit the present disclosure. As used in this disclosure, the term "and/or" includes any and all combinations of one or more of the associated listed items. As used in this disclosure, the singular forms "a" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. As used in the present disclosure, the terms "comprising", "made up of" designate the presence of said features, integers, steps, operations, elements and/or components, but do not exclude the presence or addition of one or more other features, Integrals, steps, operations, elements, components and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant art and the present disclosure, and will not be interpreted as having idealized or excessive formal meanings, Unless the disclosure expressly so limited.

The present disclosure is not limited to the embodiments shown in the drawings, but includes modifications of configurations formed based on manufacturing processes. Accordingly, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate the specific shapes of the regions of the elements, but are not intended to be limiting.

In a first aspect, referring to FIG. 1 to FIG. 5 , the present disclosure provides a terminal device.

The terminal device provided by the present disclosure has a display function, an image collection function, an information transmission function, etc., and also has a voice collection function and a voice playback function, so that video calls can be realized.

Of course, the functions of the terminal device are not limited to video calls, and it can also implement other functions such as voice calls and local program running.

In some embodiments, the terminal device is a mobile terminal.

As a mode of implementation of the present disclosure, the terminal device may be a mobile terminal, such as a mobile phone, a tablet computer, etc., because a mobile terminal is a commonly used device for performing video calls.

Certainly, the type of terminal equipment is not limited to this, it also can be other types such as notebook computer, desktop computer, dedicated video conferencing equipment.

Referring to FIG. 1 , in one embodiment, a terminal device provided by the present disclosure includes: a display panel 1 , a drive unit 2 , an image acquisition unit 3 , a transceiver unit 4 , and an analysis unit 5 .

The transceiver unit 4 is configured to receive the image of the opposite end and send the image of the local end.

Display panel 1 is configured to display the image of the peer end.

The analysis unit 5 is configured to determine the viewpoint position 92 in the peer image displayed by the display panel 1 .

The driving unit 2 is configured to drive the image acquisition unit 3 to move to the viewpoint position 92 .

The image acquisition unit 3 is configured to acquire a local image along the light emitting direction of the display panel 1 .

The terminal device in the embodiment of the present disclosure includes a transceiver unit 4 (such as a wireless communication unit, a wireless communication circuit, etc.) capable of realizing a remote information interaction function. The transceiver unit 4 can receive an image from a peer terminal device in a video call. The peer image is the image of the peer user collected by the image acquisition unit 3 of the peer terminal device (so it is also the local image of the peer terminal device).

The display panel 1 (such as a liquid crystal display panel 1, an organic light-emitting diode display panel 1, etc.) displays the above peer image, so that the local user can see the counterpart user's image.

Thus, the analysis unit 5 can determine the viewpoint position 92 of the counterpart user in the counterpart image according to the counterpart image displayed on the display panel 1 (that is, the position where the counterpart user's line of sight is emitted in the counterpart image) and display it on the local display panel. The specific physical location on the display surface of 1. The analysis unit can be an analysis circuit or the like.

Furthermore, the drive unit 2 drives the image acquisition unit 3 (such as a camera) to move to the above viewpoint position 92, so that the image acquisition unit 3 moves along the light emitting direction of the display panel 1 at the viewpoint position 92 (that is, the image captured by the image acquisition unit 3 is The image of the person facing the display surface of the display panel 1, that is, the image of the local user) collects the local image, and the transceiver unit 4 sends the local image to the opposite terminal device (so the local image is also the opposite terminal device). The opposite terminal image of the terminal terminal device), for the display panel 1 of the opposite terminal device to display, so that the other party user can see the image of the local user. Wherein, if the peer terminal device of the video call is also the terminal device of the embodiment of the present disclosure, the image acquisition unit 3 therein can also move according to the viewpoint position 92 in the image of the local end, and collect the image of the opposite user and send it to the local end Terminal Equipment.

Of course, during the video call, the transceiver unit 4 should continue to receive the image of the opposite end, so that each frame of the image of the opposite end can be processed in the above way, that is, the image acquisition unit 3 can "track" the opposite end in real time The viewpoint position 92 of the counterpart user in the image.

Certainly, when the viewpoint position 92 of the opposite user in the received multi-frame opposite end images remains unchanged, the position of the image acquisition unit 3 should also remain unchanged.

Of course, there may be no other user in the received image of the other end (such as the other user leaving temporarily), or there may be no viewpoint position 92 of the other user (such as the other user turning his head), at this time, the display panel 1 should still display the opposite user. end image, the image acquisition unit 3 should still acquire the local end image, but the position of the image acquisition unit 3 can remain unchanged or be moved to a default position.

Of course, whether there is the opposite user in the received peer image, the peer image may also include other scenes, and these scenes can be displayed by the display panel 1, but the analysis unit 5 may not analyze them.

Of course, the terminal device provided by the embodiments of the present disclosure may also include other units such as a voice playback unit (such as a speaker), a voice receiving unit (such as a microphone), and its transceiver unit 4 may also send and receive other information (such as audio information), It will not be described in detail here.

In the embodiment of the present disclosure, the image acquisition unit 3 can continuously collect the local image (the image of the user) at the viewpoint position 92 of the image of the opposite end (that is, the image of the other user) and send it to the other user, so that the collected image of the local end It is similar to the image directly seen by the other user's eyes, that is, the image seen by the other user is similar to the effect of "seeing directly" by oneself. As a result, the above local image can better convey information such as subtle eyes and body language (for example, the other user can easily see whether the local user is looking directly at him or looking away from the local image) ), thereby increasing the amount of information transmitted, making the effect of video calls more similar to "face-to-face" communication.

Of course, when the local user uses the terminal device provided by the embodiment of the present disclosure, the other user can feel more detailed information, and the video call needs to be carried out between the two users, so if both users want to make a video call, they can To feel more detailed information, both users need to use the terminal equipment provided by the embodiments of the present disclosure. However, if only one user uses the terminal equipment provided by the embodiment of the present disclosure, and the other user uses other conventional terminal equipment to make a video call, it is of course also feasible.

In some implementations, referring to FIG. 2 , determining the viewpoint position 92 in the peer image displayed on the display panel 1 includes step S101 and step S102 .

In step S101, image analysis is performed on the peer image to determine the position of the pupil 91 on the human face.

In step S102 , the viewpoint position 92 is determined based on the position of the pupil 91 .

As a way of implementing the present disclosure, it is possible to analyze the image of the opposite end through image analysis (such as target recognition) technology to determine whether there is a human face in the image of the opposite end, and whether there is a pupil 91 (or eyes) on the human face , and the position of the pupil 91; furthermore, the above viewpoint position 92 can be determined according to the above position of the pupil 91.

In some embodiments, determining the viewpoint position 92 according to the position of the pupil 91 ( S102 ) includes: S1021 , determining the position of the pupil 91 as the viewpoint position 92 .

As a manner of implementing the present disclosure, specifically, the position of the pupil 91 determined above may be used as the viewpoint position 92 .

For example, when there are two pupils 91 (that is, the eyes of the opposite user) in the opposite end image, if there is only one image acquisition unit 3 (such as a monocular camera) in the local terminal device, the position of one of the pupils 91 can be selected is the viewpoint position 92; with reference to Fig. 5, if there are two image acquisition units 3 (such as binocular cameras) in the local terminal device, then the positions of the two pupils 91 can be respectively the corresponding viewpoints of the two image acquisition units 3 The position 92 is the position where the two image acquisition units 3 move to the two pupils 91 respectively.

Alternatively, determining the viewpoint position 92 according to the position of the pupil 91 ( S102 ) may also include: S1022 , determining the position of the midpoint of a line connecting two pupils 91 of a human face as the viewpoint position 92 .

Referring to FIG. 3 , as another embodiment of the present disclosure, the middle position of the two pupils 91 (the midpoint of the line between the two pupils 91 ) can also be used as the above viewpoint position 92 .

In some embodiments, the image acquisition unit 3 and the driving unit are arranged inside the display panel 1 .

Obviously, the image acquisition unit 3 needs to move to the viewpoint position 92 of the opposite end image, and in order to prevent the image acquisition unit 3 from blocking the viewpoint position 92 of the opposite end image (such as the eyes of the opposite user) and affecting the viewing effect of the own user, so as In one way of the embodiment of the present disclosure, the image acquisition unit 3 and the drive unit 2 above can be arranged "inside" the display panel 1, that is, the image acquisition unit 3 can be located "inside" the display surface of the display panel 1, so that " Under-screen camera" and so on. Therefore, for the local user watching the display panel 1, the image acquisition unit 3 and the driving unit are "invisible".

In some embodiments, the driving unit 2 includes a first track 21 and a second track 22, the second track 22 is movably arranged on the first track 21, and the image acquisition unit 3 is movably arranged on the second track 22; The extending direction of the first rail 21 intersects the extending direction of the second rail 22 .

In some embodiments, the extending direction of the first track 21 is perpendicular to the extending direction of the second track 22 ; the extending direction of the first track 21 and the extending direction of the second track 22 are both parallel to the display surface of the display panel 1 .

For example, referring to FIG. 4 , as a more specific way of implementing the present disclosure, the driving unit 2 may include two intersecting tracks (the first track 21 and the second track 22), wherein the second track 22 can move on the first track 21 (see the arrow in Figure 4), and the image acquisition unit 3 can move on the second track 22 (see the arrow in Figure 4 Arrow), thus, the movement of the image acquisition unit 3 in two different directions can be realized.

Further, the above two tracks may be perpendicular to each other, and both are parallel to the display surface of the display panel 1, so that the image acquisition unit 3 moves rapidly, and the distance relative to the display surface of the display panel 1 remains unchanged during the movement, so The distance to the local user remains unchanged, so there is no need to refocus due to movement, and the operation is simple.

Of course, if the above drive unit 2 and image acquisition unit 3 are in other forms or located in other locations, it is also feasible, as long as the image acquisition unit 3 can realize the "tracking" of the viewpoint position 92.

In some embodiments, the driving unit 2 is configured to drive the image acquisition unit 3 to move within a preset range 99 , and the preset range 99 corresponds to a partial area of the display surface of the display panel 1 .

Referring to FIG. 3 to FIG. 5 , the image acquisition unit 3 can only move within a partial area of the display surface of the display panel 1 , but cannot move to all positions of the display surface of the display panel 1 . This is because, generally speaking, during most voice calls, the face of the opposite user in the image of the opposite end (of course corresponding to the viewpoint position 92) is located in a partial area of the display surface of the display panel 1 of the end (such as upper-middle region) without deviating too much. Therefore, the driving unit 2 only needs to be able to drive the image acquisition unit 3 to move within the preset range 99 corresponding to the above partial area, thereby simplifying the structure of the product and reducing the impact of the driving unit on the display effect.

In some embodiments, the number of image acquisition units 3 is multiple, and each image acquisition unit 3 has a corresponding drive unit, and each drive unit has a corresponding preset range 99, and different preset ranges 99 have at least some non-overlapping Driving the image acquisition unit 3 to move to the viewpoint position 92 includes: determining at least one preset range 99 where the viewpoint position 92 is located, and making the drive unit corresponding to the preset range 99 move the corresponding image acquisition unit 3 to the viewpoint position 92.

Referring to Fig. 5, when there are multiple image acquisition units 3, each image acquisition unit 3 can have a corresponding drive unit and a preset range 99, and different preset ranges 99 are not completely the same, that is, different image acquisition units 3 The possible motion ranges are different, so that when the viewpoint position 92 is determined, the corresponding drive unit and image acquisition unit 3 can be selected to move according to the preset range 99 where the viewpoint position 92 is located (Figure 5 shows two preset The ranges 99 respectively correspond to the possible areas of the two pupils 91 and the viewpoint positions 92 , of course, multiple preset ranges 99 can also correspond to the possible regions of one viewpoint position 92 ).

In a second aspect, referring to FIG. 6 , the present disclosure provides a video call method, which is used for a terminal device in any one of the embodiments of the present disclosure.

The video calling method of the present disclosure implements video calling through the above terminal equipment.

It should be understood that a video call needs to be carried out between two users, and the method of the present disclosure describes the process of one of the users, that is, at least one user who has a video call uses the terminal device of the embodiment of the present disclosure to conduct a video call. However, if only one user uses the terminal device provided by the embodiments of the present disclosure, and the other user uses other conventional terminal devices to make a video call, it is of course also feasible.

Referring to FIG. 6 , in one embodiment, the video call method of the present disclosure includes steps S201 to S206.

In step S201, the transceiver unit receives the image of the opposite end.

In step S202, the display panel displays the peer image.

In step S203, the analysis unit determines the position of the viewpoint in the peer image displayed on the display panel.

In step S204, the driving unit drives the image acquisition unit to move to the viewpoint position.

In step S205, the image acquisition unit acquires the local image at the viewpoint along the light emitting direction of the display panel.

In step S206, the transceiver unit sends the local image.

Of course, during the video call, in addition to the above process, the transceiver unit can also transmit the audio information of both users, etc., which will not be described in detail here.

In the video call method of the present disclosure, the position of the image acquisition unit is continuously adjusted according to the received image of the opposite end (the image of the other party user), so that the image acquisition unit can collect the local image similar to that directly seen by the eyes of the other end user and concurrently For the terminal device of the other party user, that is, the image seen by the other party user is similar to the effect of "seeing directly" by oneself. Therefore, the above local image can better convey subtle information such as eye contact and body language (for example, the other party user can easily It can be seen whether the user on the other side is looking directly at him or looking away, that is, to achieve eye contact), increase the amount of information transmitted, and make the effect of video calling more similar to face-to-face communication.

Among them, it should be understood that during the video call process, the transceiver unit will continuously receive the image of the opposite end, and also need to continuously send the image of the local end to the opposite end, so the method of the present disclosure is actually a continuous loop after the video call is started. ongoing until the end of the video call.

In some embodiments, the transceiver unit receiving the image of the opposite end (S201) includes: the transceiver unit receives the video stream of the opposite end, and obtains the image of the opposite end from the video stream of the opposite end; The local image forms the local video stream, and sends the local video stream.

Because it is a "video call" process, the transceiver unit usually sends and receives video streams, so it needs to obtain (such as video decoding) the opposite-end image from the received opposite-end video stream, and obtain (such as video encoding) and send the local video stream.

This disclosure has disclosed example embodiments and, although specific terms have been employed, they are used and should be construed in a generic descriptive sense only and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be described in combination with other embodiments, unless explicitly stated otherwise. Combinations of features and/or elements. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims

A terminal device comprising:

A transceiver unit configured to receive the image of the opposite end and send the image of the local end;

a display panel configured to display the peer image;

An analysis unit configured to determine a viewpoint position in the peer image displayed on the display panel;

a driving unit configured to drive the image acquisition unit to move to the position of the viewpoint;

The image acquisition unit is configured to acquire the local image along the light emitting direction of the display panel.
The terminal device according to claim 1, wherein said determining the viewpoint position in the peer image displayed on the display panel comprises:

Performing image analysis on the peer image to determine the position of the pupil on the human face;

A viewpoint position is determined according to the position of the pupil.
The terminal device according to claim 2, wherein said determining the position of the viewpoint according to the position of the pupil comprises:

The position of the pupil is determined as the position of the viewpoint.
The terminal device according to claim 2, wherein said determining the position of the viewpoint according to the position of the pupil comprises:

The position of the midpoint of the line between two pupils of a human face is determined as the position of the viewpoint.
The terminal device according to claim 1, wherein,

The image acquisition unit and the driving unit are arranged inside the display panel.
The terminal device according to claim 5, wherein,

The drive unit includes a first track and a second track, the second track is movably arranged on the first track, and the image acquisition unit is movably arranged on the second track; the extension of the first track The direction intersects the extending direction of the second rail.
The terminal device according to claim 6, wherein,

The extending direction of the first track is perpendicular to the extending direction of the second track;

Both the extending direction of the first track and the extending direction of the second track are parallel to the display surface of the display panel.
The terminal device according to claim 1, wherein,

The driving unit is configured to drive the image acquisition unit to move within a preset range, and the preset range corresponds to a partial area of the display surface of the display panel.
The terminal device according to claim 8, wherein,

The number of the image acquisition units is multiple, each of the image acquisition units has a corresponding drive unit, each of the drive units has a corresponding preset range, and different preset ranges are at least partially non-overlapping;

The driving the image acquisition unit to move to the viewpoint position includes: determining at least one preset range where the viewpoint position is located, and making the drive unit corresponding to the preset range move the corresponding image acquisition unit to the viewpoint position.
A method for video calling, which is used for the terminal device described in any one of claims 1 to 9, the method comprising:

The transceiver unit receives the image of the opposite end;

The display panel displays the image of the opposite end;

The analysis unit determines the viewpoint position in the peer image displayed on the display panel;

The drive unit drives the image acquisition unit to move to the viewpoint position;

The image acquisition unit acquires the local image at the viewpoint position along the light emitting direction of the display panel;

The transceiver unit sends the local image.
The method of claim 10, wherein,

The receiving the image of the opposite end by the transceiver unit includes: receiving the video stream of the opposite end by the transceiver unit, and obtaining the image of the opposite end from the video stream of the opposite end;

Sending the local image by the transceiver unit includes: forming a local video stream by the transceiver unit using the local image, and sending the local video stream.