WO2019184499A1 - Video call method and device, and computer storage medium - Google Patents

Video call method and device, and computer storage medium Download PDF

Info

Publication number
WO2019184499A1
WO2019184499A1 PCT/CN2018/124933 CN2018124933W WO2019184499A1 WO 2019184499 A1 WO2019184499 A1 WO 2019184499A1 CN 2018124933 W CN2018124933 W CN 2018124933W WO 2019184499 A1 WO2019184499 A1 WO 2019184499A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
portrait
video frame
terminal
video
Prior art date
Application number
PCT/CN2018/124933
Other languages
French (fr)
Chinese (zh)
Inventor
肖树山
马小捷
石范潘
李斯楠
夏吟
Original Assignee
上海掌门科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海掌门科技有限公司 filed Critical 上海掌门科技有限公司
Publication of WO2019184499A1 publication Critical patent/WO2019184499A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present application relates to Internet application technologies, and in particular, to a video call method, device, and computer storage medium.
  • a video image recorded by the opposite camera is generally displayed in the video call interface; in some cases, a video image recorded by the local camera is also displayed.
  • a video image recorded by the local camera is also displayed.
  • the video image recorded by the camera of user B is displayed in the video call interface of user A, and the video of the user A is displayed in the video call interface of user B.
  • Video image Therefore, both parties can intuitively see the video transmitted from the opposite end of the call for communication.
  • This type of call has become a habit in the field of video calling, so that those skilled in the art have not found that such a video call fails to create a more realistic call scene.
  • the present application provides a method, device, and computer storage medium for video calling.
  • Some embodiments of the present application provide a method for video calling, the method comprising: collecting, by a first camera, a video frame of a first user local scene, and acquiring video with the first user based on data from a server end. a portrait of the second user of the call; synthesizing the portrait of the second user with a video frame of the first user local scene, and displaying the synthesized video frame on the video call interface.
  • Some embodiments of the present application provide a method for a video call, where the method includes: receiving, by a server, a video frame of a second user sent by a terminal of a second user; according to the video frame of the second user, The portrait of the second user is sent to the terminal of the first user who performs a video call with the second user, so that the terminal of the first user synthesizes the portrait of the second user with the video frame of the local scene of the first user, and The video call interface displays the video frames obtained after the synthesis.
  • An apparatus comprising: one or more processors; storage means for storing one or more programs, when the one or more programs are executed by the one or more processors
  • the one or more processors are implemented to implement the method of any of the claims.
  • a storage medium comprising computer executable instructions for performing the method of any of the claims when executed by a computer processor.
  • the above embodiment of the present application combines the portrait of one of the video call parties to the video frame of the local scene of the other user. Compared with the manner in which the video image recorded by the camera device of any one of the video calls is completely displayed in the video call interface of the other party in the prior art, the above embodiment of the present application can create a more realistic manner for both parties of the video call. Really call environment, improve video call performance.
  • FIG. 1 is a structural diagram of a video call provided by some embodiments of the present application.
  • FIG. 2 is a flowchart of a method for performing a video call by a terminal according to some embodiments of the present disclosure
  • FIG. 3 is a flowchart of a method for performing a video call by a server according to some embodiments of the present disclosure
  • FIG. 4 is an interaction diagram of a video call provided by a system including a terminal and a server according to some embodiments of the present application;
  • FIG. 5 is a block diagram of a computer system/server provided by some embodiments of the present application.
  • the word “if” as used herein may be interpreted as “when” or “when” or “in response to determining” or “in response to detecting.”
  • the phrase “if determined” or “if detected (conditions or events stated)” may be interpreted as “when determined” or “in response to determination” or “when detected (stated condition or event) “Time” or “in response to a test (condition or event stated)”.
  • the core idea of some embodiments of the present application includes: when a user performs a video call, displaying in the video call interface of the user is a video frame that synthesizes the user portrait of the other party to the local scene of the user; The synthesized video frame is used for video calls. Therefore, some embodiments of the present application can provide a more realistic call environment and improve the video call effect of the user by adopting the above manner.
  • Some embodiments of the present application may implement a video call based on the following architecture, as shown in FIG. 1, the architecture includes a server, a first terminal, a second terminal, an nth terminal.
  • the terminal may include a user equipment, such as a mobile user equipment such as a mobile phone or a tablet computer, and a fixed user equipment such as a desktop computer; in some embodiments, the terminal may include software running on the user equipment. Or a client, such as a third-party application running on the user device, a program or application that comes with the user device system, a plug-in in the application, or a functional unit such as a Software Development Kit (SDK).
  • SDK Software Development Kit
  • the server may include a centralized server, and may also include a distributed server; in some embodiments of the present application, the server includes a service server that serves video call services.
  • the number of the terminals is not limited in this application, that is, some embodiments of the present application can implement a two-person video call or a multi-person video call.
  • a video call of two people is taken as an example for description.
  • FIG. 2 is a flowchart of a method for a video call performed by a terminal according to some embodiments of the present disclosure. As shown in FIG. 2, the method includes:
  • the terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user who performs a video call with the first user based on data from the server end.
  • user A and user B make a video call, for user A, it is the first user and user B is the second user; likewise, for user B, it is the first User, user A is the second user.
  • the terminal collects video frames of the local scene of the first user through the first camera of the terminal device. That is, for the user A, the terminal of the user A collects the video frame of the local scene of the user A through the first camera of the terminal device of the user A; and for the user B, the terminal of the user B passes the terminal device of the user B. A camera captures video frames of user B's local scene.
  • the terminal acquires a portrait of the second user who makes a video call with the first user based on the data from the server side.
  • the data from the server side can be directly the portrait of the second user. That is to say, for the user A, the terminal of the user A receives the portrait of the user B from the server side; and for the user B, the terminal of the user B receives the portrait of the user A from the server side.
  • the terminal collects the portrait of the first user through the second camera and sends it to the server, and then the server sends the portrait of the second user who makes a video call with the first user to the terminal.
  • the data from the server side can also be the video frame of the second user.
  • the terminal of the user A receives the video frame of the user B from the server, and the terminal of the user A acquires the portrait of the user B from the received video frame; and for the user B, The terminal of user B receives the video frame of user A from the server end, and the terminal of user B acquires the portrait of user A from the received video frame.
  • the first camera may be a rear camera of the terminal device, and the second camera may be a front camera of the terminal device. That is, in some embodiments of the present application, the video frame of the first user is captured by the front camera of the terminal device, and the video frame of the local scene of the first user is collected by the rear camera of the terminal device.
  • the terminal when the terminal sends the portrait of the first user to the server, the terminal may obtain the portrait of the first user from the video frame of the first user collected by the second camera, and then obtain the image of the first user.
  • the portrait of the first user is sent to the server side.
  • the video frame of the first user collected by the second camera may be directly sent by the terminal to the server, and the server may retrieve the portrait of the first user from the video frame.
  • the portrait of the second user is combined with the video frame of the first user local scene, and the synthesized video frame is displayed on the video call interface.
  • the portrait of the second user is compared with the first user according to the video frame of the first user local scene collected by the terminal and the portrait of the second user who performs a video call with the first user acquired based on the data of the server end.
  • the video frames of the local scene are synthesized such that the user makes a video call based on the synthesized video frames.
  • the following manner when the portrait of the second user is combined with the video frame of the first user local scene, the following manner may be adopted: determining the portrait display size of the second user according to the portrait eye distance of the second user. And synthesizing the portrait of the second user with the video frame of the first user local scene according to the determined display size.
  • the portrait distance of the second user may be sent by the server to the terminal. It can be understood that the size of the second user's portrait may not be adjusted based on the second user's portrait eye distance, and the second user's portrait may be directly adjusted according to the preset ratio.
  • the terminal acquires the screen size of the terminal device, and determines the second user according to the relationship between the eye distance of the portrait and the screen size.
  • the portrait shows the size.
  • the terminal may determine the screen size of the terminal device according to the attribute information of the terminal device, for example, the model information of the terminal device. For example, if the terminal device is Apple 7, it can be determined that the screen size of the Apple 7 is 4.7 inches.
  • the unit of the eye distance obtained can be consistent with the default unit of the screen size, for example, the unit of the eye distance is in inches; the unit of the eye distance and the screen size can also be converted into the same, for example, if the portrait The unit of eye distance is in centimeters, and the screen size is in inches. You can convert centimeters to inches or convert inches to centimeters.
  • the display size of the portrait of the second user and the portion displayed by the portrait may be determined based on the relationship of the portrait eye distance to the screen size. For example, if the portrait eye distance is greater than the E% (for example, 20%) screen size, only the body part of the eye length of F times (for example, 3 times) below the head in the second user portrait is displayed, and the excess portion is not displayed; If the portrait eye distance is equal to a screen size smaller than E% (for example, 20%), only the body part of the eye length of G times (for example, 4 times) below the head in the second user portrait is displayed, and the excess portion is not displayed.
  • E% for example, 20%
  • G times for example, 4 times
  • E, F, and G are preset values, and it is preferable that the value of F is smaller than the value of G.
  • the portrait of the second user can also be scaled so that the portrait of the second user can be normally displayed in the terminal screen of the first user. For example, if the portrait eye distance is too large and exceeds the screen size, for example, the portrait eye distance is twice the screen size, and the screen cannot display the user portrait, the second user's portrait is scaled, for example, to reduce it to the original half. Then display it again. If the eye distance of the portrait is too small, the portrait of the second user can be scaled, for example, enlarged to the original 2 times and then displayed.
  • the following manner when synthesizing the portrait of the second user with the video frame of the first user local scene, the following manner may also be adopted: first, selecting N pixel points from the portrait of the second user, and Selecting M pixels from the video frame of the first user local scene, where N and M are positive integers greater than 0, and the pixels may be randomly selected, or may be selected according to a preset selected position; then the selected N pixels and an intermediate color between the M pixels, and stroke the second user's portrait based on the calculated intermediate color, so that the second user's portrait is more integrated into the first user's local scene.
  • the portrait of the second user obtained by the stroke is combined with the video frame of the local scene of the first user.
  • the portrait of the second user may not be re-stroked, and the portrait of the second user is directly superimposed in the video frame of the first user local scene. That is, the intermediate color corresponding to the portrait of the second user and the video frame of the local scene of the first user is acquired, and the portrait of the second user is stroked by using the acquired intermediate color, and the second user after the stroke is obtained.
  • the portrait is synthesized with the first user local scene video frame.
  • the portrait of the second user and the video frame of the first user local scene may be combined according to the preset position. That is to say, the present embodiment is to place the portrait of the second user at a suitable position in the video frame of the first user's local scene, instead of randomly placing the portrait of the second user.
  • the preset position is the middle position of the bottom edge of the first user local scene video frame
  • the portrait of the second user is centered on the bottom edge of the video frame of the first user local scene
  • the preset position may also be The left end, the right end, and the like in the video frame of the first user local scene, and the superimposed position of the portrait of the second user in the present application is not limited.
  • FIG. 3 is a flowchart of a method for performing a video call on a server according to an embodiment of the present disclosure. As shown in FIG. 3, the method includes:
  • the server receives the video frame of the second user sent by the terminal of the second user.
  • the server receives the video frame of the second user sent by the terminal of the second user.
  • user A and user B make a video call.
  • the video frame of the second user received by the server is the video frame of user B.
  • the server receives the video.
  • the video frame of the second user is the video frame of User A.
  • the video frame of the second user received by the server may be the image of the second user extracted by the terminal of the second user; or may be the video frame of the second user sent by the terminal of the second user. .
  • the portrait of the second user is sent to the terminal of the first user who performs a video call with the second user according to the video frame of the second user, so that the terminal of the first user will
  • the portrait of the two users is combined with the video frame of the local scene of the first user, and the video frame obtained after the synthesis is displayed on the video call interface.
  • the server receives the video frame of the second user, it also needs to capture the portrait of the second user from the video frame of the second user; and then send the captured image of the second user to the second user.
  • the terminal of the first user making a video call with the second user.
  • the server may further detect the portrait eye distance of the second user after acquiring the portrait of the second user, and then provide the detected eye distance information to the terminal of the first user, so that the terminal of the first user is based on the portrait eye.
  • the size of the portrait display of the second user is adjusted.
  • the server side if the server side cannot detect the portrait eye distance of the second user, it indicates that the terminal of the second user fails to accurately acquire the image of the second user, for example, the second user does not face the camera or the camera is blocked. Then, the server returns a prompt message to the terminal of the second user, prompting the user to re-collect the portrait.
  • the determined eye distance and the portrait of the second user corresponding to the eye distance may be determined after determining the eye distance that meets the preset requirement.
  • the terminal sent to the first user For example, a portrait of a second user having the largest eye distance among the plurality of eye distances and the largest eye distance are selected and transmitted to the terminal of the first user.
  • the terminal of the first user synthesizes the received portrait of the second user with the video frame of the first user local scene.
  • User A and user B make a video call.
  • user A it is the first user and user B is the second user.
  • user B it is itself The first user, user A is the second user.
  • the terminal corresponding to the user A is the terminal UA
  • the terminal corresponding to the user B is the UB
  • the terminal UA sends the user image IA of the user A to the server
  • the terminal UB sends the user image IB of the user B to the server.
  • the server side After acquiring the user image IA and the user image IB, the server side captures the user portrait Ia in the user image IA and the user portrait Ib in the user image IB, and then sends the obtained user portrait Ia to the terminal UB, and obtains the user image Ia. It is sent to the terminal UA to the user portrait Ib. Further, the terminal UA synthesizes based on the obtained user portrait Ib, and the terminal UB synthesizes based on the obtained user portrait Ia.
  • FIG. 4 is a flow chart of interaction of a video call according to an embodiment of the present application.
  • user A performs a video call with user B
  • the terminal corresponding to user A is the terminal UA
  • the terminal corresponding to user B is the terminal UB.
  • the terminal UA can collect the video frame of the user A by using the front camera of the terminal device of the user A, and collect the local scene video frame of the user A by using the rear camera of the terminal device;
  • the terminal UB can utilize the front end of the terminal device of the user B.
  • the camera captures the video frame of the user B, and collects the local scene video frame of the user B by using the rear camera of the terminal device; then the terminal UA and the terminal UB respectively send the video frame of the user A and the video frame of the user B that are recorded.
  • the server side processes the received video frame of the user A and the video frame of the user B, and obtains the portrait Ia of the user A and the portrait Ib of the user B; the portrait of the user A that the server side will obtain Ia is sent to the terminal UB, and the obtained portrait B of the user B is sent to the terminal UA; the terminal UA synthesizes the portrait Ib of the user B transmitted by the server and the local scene video frame of the user A, and the terminal UB is sent by the server.
  • the portrait Aa of the user A is synthesized with the local scene video frame of the user B; therefore, the user A and the user B respectively perform a video call based on the synthesized image. Make the video call more realistic.
  • FIG. 5 illustrates a block diagram of an exemplary computer system/server 012 suitable for implementing some embodiments of the present application.
  • the computer system/server 012 shown in FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present application.
  • computer system/server 012 is represented in the form of a general purpose computing device.
  • Components of computer system/server 012 may include, but are not limited to, one or more processors or processing units 016, system memory 028, and bus 018 that connects different system components, including system memory 028 and processing unit 016.
  • Bus 018 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an Enhanced ISA Bus, a Video Electronics Standards Association (VESA) local bus, and peripheral component interconnects ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI peripheral component interconnects
  • Computer system/server 012 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system/server 012, including volatile and non-volatile media, removable and non-removable media.
  • System memory 028 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 030 and/or cache memory 032.
  • Computer system/server 012 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 034 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 5, commonly referred to as a "hard disk drive").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile disk such as a CD-ROM, DVD-ROM
  • other optical media read and write optical drive.
  • each drive can be coupled to bus 018 via one or more data medium interfaces.
  • Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the various embodiments of the present application.
  • Program/utility 040 having a set (at least one) of program modules 042, which may be stored, for example, in memory 028, such program module 042 includes, but is not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment.
  • Program module 042 typically performs the functions and/or methods of the embodiments described herein.
  • Computer system/server 012 may also be in communication with one or more external devices 014 (eg, a keyboard, pointing device, display 024, etc.), in some embodiments of the present application, computer system/server 012 is in communication with an external radar device, Any device (eg, a network card that can communicate with one or more devices that enable a user to interact with the computer system/server 012, and/or with the computer system/server 012 to communicate with one or more other computing devices, Modem, etc.) communication. This communication can take place via an input/output (I/O) interface 022.
  • I/O input/output
  • computer system/server 012 can also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 020. As shown, network adapter 020 communicates with other modules of computer system/server 012 via bus 018. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in connection with computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, Tape drives and data backup storage systems.
  • the processing unit 016 by executing a program stored in the system memory 028, performs various function applications and data processing, for example, a method for implementing a video call, which may include:
  • the terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user that performs a video call with the first user based on data from the server end;
  • a method of video calling can also be implemented, including:
  • the server receives the video frame of the second user sent by the terminal of the second user;
  • the computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes one or more computers to perform the operations described in the above-described embodiments of the present application.
  • Method flow and/or device operation may include:
  • the terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user that performs a video call with the first user based on data from the server end;
  • the server receives the video frame of the second user sent by the terminal of the second user;
  • the transmission route of computer programs is no longer limited by tangible media, and can also be downloaded directly from the network. Any combination of one or more computer readable media can be utilized.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
  • LAN local area network
  • WAN wide area network
  • an external computer eg, using an Internet service provider to access the Internet
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods described in various embodiments of the present application. Part of the steps.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Studio Circuits (AREA)

Abstract

Provided is a video call method executed at a terminal. The method comprises: a terminal collecting a video frame of a local scene of a first user via a first camera, and acquiring, based on data from a server end, an image of a second user video calling the first user; and superimposing the image of the second user on the video frame of the local scene of the first user for synthesis, and displaying a synthesized video frame on a video call interface. Provided is a video call method executed at a server end. The method comprises: a server end receiving a video frame, sent by a terminal of a second user, of the second user; and sending, according to the video frame of the second user, an image of the second user to a terminal of a first user video calling the second user, so that the terminal of the first user synthesizes the image of the second user with a video frame of a local scene of the first user and displays a synthesized video frame on a video call interface. The present application can improve the call effect of a video call.

Description

一种视频通话的方法、设备和计算机存储介质Method, device and computer storage medium for video call 【技术领域】[Technical Field]
本申请涉及互联网应用技术,尤其涉及一种视频通话的方法、设备和计算机存储介质。The present application relates to Internet application technologies, and in particular, to a video call method, device, and computer storage medium.
【背景技术】【Background technique】
现有技术在进行视频通话时,一般在视频通话界面中会显示对端摄像头摄录的视频图像;在一些情况下,还会显示本地摄像头摄录的视频图像。举例来说,若用户A与用户B进行视频通话,在用户A的视频通话界面中显示用户B的摄像头摄录的视频图像,在用户B的视频通话界面中则显示用户A的摄像头摄录的视频图像。由此,通话双方可以直观地看到通话对端传输过来的视频,便于交流。这种通话方式已经成为目前视频通话领域的习惯,以致于本领域技术人员并未发现这种视频通话未能营造更为拟真的通话场景。In the prior art, when a video call is made, a video image recorded by the opposite camera is generally displayed in the video call interface; in some cases, a video image recorded by the local camera is also displayed. For example, if user A and user B make a video call, the video image recorded by the camera of user B is displayed in the video call interface of user A, and the video of the user A is displayed in the video call interface of user B. Video image. Therefore, both parties can intuitively see the video transmitted from the opposite end of the call for communication. This type of call has become a habit in the field of video calling, so that those skilled in the art have not found that such a video call fails to create a more realistic call scene.
【发明内容】[Summary of the Invention]
有鉴于此,本申请提供了一种视频通话的方法、设备和计算机存储介质。In view of this, the present application provides a method, device, and computer storage medium for video calling.
本申请的一些实施例提供一种视频通话的方法,所述方法包括:终端通过第一摄像头采集第一用户本地场景的视频帧,以及,基于来自服务器端的数据获取与所述第一用户进行视频通话的第二用户的人像;将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Some embodiments of the present application provide a method for video calling, the method comprising: collecting, by a first camera, a video frame of a first user local scene, and acquiring video with the first user based on data from a server end. a portrait of the second user of the call; synthesizing the portrait of the second user with a video frame of the first user local scene, and displaying the synthesized video frame on the video call interface.
本申请的一些实施例提供一种视频通话的方法,所述方法包括:服务器端接收第二用户的终端发送的第二用户的视频帧;依据所述第二用户的视频帧,将所述第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终 端,以便第一用户的终端将所述第二用户的人像与第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Some embodiments of the present application provide a method for a video call, where the method includes: receiving, by a server, a video frame of a second user sent by a terminal of a second user; according to the video frame of the second user, The portrait of the second user is sent to the terminal of the first user who performs a video call with the second user, so that the terminal of the first user synthesizes the portrait of the second user with the video frame of the local scene of the first user, and The video call interface displays the video frames obtained after the synthesis.
一种设备,其特征在于,所述设备包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现权利要求中任一项所述的方法。An apparatus, comprising: one or more processors; storage means for storing one or more programs, when the one or more programs are executed by the one or more processors The one or more processors are implemented to implement the method of any of the claims.
一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行权利要求中任一项所述的方法。A storage medium comprising computer executable instructions for performing the method of any of the claims when executed by a computer processor.
由以上技术方案可以看出,本申请的上述实施例将进行视频通话双方中任一方用户的人像合成至另一方用户的本地场景的视频帧。相比于现有技术中将视频通话任一方的摄像设备摄录的视频图像完整地显示在另一方的视频通话界面中的方式,本申请的上述实施例能够为进行视频通话的双方营造更加拟真的通话环境,提高视频通话效果。As can be seen from the above technical solution, the above embodiment of the present application combines the portrait of one of the video call parties to the video frame of the local scene of the other user. Compared with the manner in which the video image recorded by the camera device of any one of the video calls is completely displayed in the video call interface of the other party in the prior art, the above embodiment of the present application can create a more realistic manner for both parties of the video call. Really call environment, improve video call performance.
【附图说明】[Description of the Drawings]
图1为本申请一些实施例提供的视频通话的架构图;FIG. 1 is a structural diagram of a video call provided by some embodiments of the present application;
图2为本申请一些实施例提供的由终端执行视频通话的方法流程图;2 is a flowchart of a method for performing a video call by a terminal according to some embodiments of the present disclosure;
图3为本申请一些实施例提供的由服务器端执行视频通话的方法流程图;FIG. 3 is a flowchart of a method for performing a video call by a server according to some embodiments of the present disclosure;
图4为本申请一些实施例的包含终端和服务器的系统提供的视频通话的交互图;4 is an interaction diagram of a video call provided by a system including a terminal and a server according to some embodiments of the present application;
图5为本申请一些实施例提供的计算机系统/服务器的框图。FIG. 5 is a block diagram of a computer system/server provided by some embodiments of the present application.
【具体实施方式】【detailed description】
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本申请进行详细描述。In order to make the objects, technical solutions, and advantages of the present application more clear, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments.
在本申请下述实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的 “一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the following embodiments of the present application are for the purpose of describing particular embodiments only, and are not intended to limit the application. The singular forms "a", "the" and "the"
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" as used herein is merely an association describing the associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately, while A and B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining" or "in response to detecting." Similarly, depending on the context, the phrase "if determined" or "if detected (conditions or events stated)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event) "Time" or "in response to a test (condition or event stated)".
本申请的一些实施例的核心思想包括:用户在进行视频通话时,在用户的视频通话界面中显示的是将对方的用户人像合成至该用户本地场景的视频帧;进行视频通话的双方基于该合成的视频帧进行视频通话。因此,本申请的一些实施例通过采用上述方式,能够提供更加拟真的通话环境,提高用户的视频通话效果。本申请的一些实施例可以基于以下架构来实现视频通话,如图1中所示,该架构包括服务器端,第一终端、第二终端...第n终端。在本申请的一些实施例中,终端可包括用户设备,如手机、平板电脑等移动用户设备,又如台式电脑等固定用户设备;在一些实施例中,终端可包括运行在用户设备上的软件或客户端,如运行在用户设备上的第三方应用、用户设备系统自带的程序或应用、应用中的插件或软件开发工具包(Software Development Kit,SDK)等功能单元等。服务器可包括集中式服务器、也可包括分布式服务器;在本申请的一些实施例中,服务器包括服务于视频通话业务的业务服务器。可以理解的是,本申请对终端的数量并不进行限定,即本申请的一些实施例既可以实现双人视频通话,也可以实现多人视频通话。在本申请的下述一些实施例中,以进行双人视频通话为例进行说明。The core idea of some embodiments of the present application includes: when a user performs a video call, displaying in the video call interface of the user is a video frame that synthesizes the user portrait of the other party to the local scene of the user; The synthesized video frame is used for video calls. Therefore, some embodiments of the present application can provide a more realistic call environment and improve the video call effect of the user by adopting the above manner. Some embodiments of the present application may implement a video call based on the following architecture, as shown in FIG. 1, the architecture includes a server, a first terminal, a second terminal, an nth terminal. In some embodiments of the present application, the terminal may include a user equipment, such as a mobile user equipment such as a mobile phone or a tablet computer, and a fixed user equipment such as a desktop computer; in some embodiments, the terminal may include software running on the user equipment. Or a client, such as a third-party application running on the user device, a program or application that comes with the user device system, a plug-in in the application, or a functional unit such as a Software Development Kit (SDK). The server may include a centralized server, and may also include a distributed server; in some embodiments of the present application, the server includes a service server that serves video call services. It can be understood that the number of the terminals is not limited in this application, that is, some embodiments of the present application can implement a two-person video call or a multi-person video call. In some of the following embodiments of the present application, a video call of two people is taken as an example for description.
图2为本申请一些实施例提供的由终端执行的视频通话的方法流程图, 如图2中所示,所述方法包括:FIG. 2 is a flowchart of a method for a video call performed by a terminal according to some embodiments of the present disclosure. As shown in FIG. 2, the method includes:
在201中,终端通过第一摄像头采集第一用户本地场景的视频帧,以及,基于来自服务器端的数据获取与所述第一用户进行视频通话的第二用户的人像。In 201, the terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user who performs a video call with the first user based on data from the server end.
可以理解的是,若用户A与用户B进行视频通话,对于用户A来说,其自身是第一用户,而用户B是第二用户;同样地,对于用户B来说,其自身是第一用户,而用户A是第二用户。It can be understood that if user A and user B make a video call, for user A, it is the first user and user B is the second user; likewise, for user B, it is the first User, user A is the second user.
在本步骤中,终端通过终端设备的第一摄像头采集第一用户本地场景的视频帧。即对于用户A来说,用户A的终端通过用户A的终端设备的第一摄像头采集用户A的本地场景的视频帧;而对于用户B来说,用户B的终端通过用户B的终端设备的第一摄像头采集用户B的本地场景的视频帧。In this step, the terminal collects video frames of the local scene of the first user through the first camera of the terminal device. That is, for the user A, the terminal of the user A collects the video frame of the local scene of the user A through the first camera of the terminal device of the user A; and for the user B, the terminal of the user B passes the terminal device of the user B. A camera captures video frames of user B's local scene.
同时,在本步骤中,终端基于来自服务器端的数据获取与第一用户进行视频通话的第二用户的人像。其中,来自服务器端的数据可以直接为第二用户的人像。也就是说,对于用户A来说,用户A的终端从服务器端接收的是用户B的人像;而对于用户B来说,用户B的终端从服务器端接收的是用户A的人像。可以理解的是,终端通过第二摄像头采集第一用户的人像,并发送给服务器端,然后再由服务器端向终端发送与第一用户进行视频通话的第二用户的人像。来自服务器端的数据也可以为第二用户的视频帧。也就是说,对于用户A来说,用户A的终端从服务器端接收的是用户B的视频帧,由用户A的终端从所接收的视频帧中获取用户B的人像;而对于用户B来说,用户B的终端从服务器端接收的是用户A的视频帧,由用户B的终端从所接收的视频帧中获取用户A的人像。Meanwhile, in this step, the terminal acquires a portrait of the second user who makes a video call with the first user based on the data from the server side. Among them, the data from the server side can be directly the portrait of the second user. That is to say, for the user A, the terminal of the user A receives the portrait of the user B from the server side; and for the user B, the terminal of the user B receives the portrait of the user A from the server side. It can be understood that the terminal collects the portrait of the first user through the second camera and sends it to the server, and then the server sends the portrait of the second user who makes a video call with the first user to the terminal. The data from the server side can also be the video frame of the second user. That is, for the user A, the terminal of the user A receives the video frame of the user B from the server, and the terminal of the user A acquires the portrait of the user B from the received video frame; and for the user B, The terminal of user B receives the video frame of user A from the server end, and the terminal of user B acquires the portrait of user A from the received video frame.
其中,第一摄像头可以为终端设备的后置摄像头,第二摄像头可以为终端设备的前置摄像头。即在本申请的一些实施例中,由终端设备的前置摄像头采集第一用户的视频帧,由终端设备的后置摄像头采集第一用户本地场景的视频帧。The first camera may be a rear camera of the terminal device, and the second camera may be a front camera of the terminal device. That is, in some embodiments of the present application, the video frame of the first user is captured by the front camera of the terminal device, and the video frame of the local scene of the first user is collected by the rear camera of the terminal device.
可选地,在终端将第一用户的人像发送给服务器端时,可以由终端从第 二摄像头所采集得到的第一用户的视频帧中,抠取第一用户的人像后,将抠取得到的第一用户的人像发送至服务器端。也可以由终端直接将第二摄像头所采集得到的第一用户的视频帧发送至服务器端,供服务器端从视频帧中抠取第一用户的人像。Optionally, when the terminal sends the portrait of the first user to the server, the terminal may obtain the portrait of the first user from the video frame of the first user collected by the second camera, and then obtain the image of the first user. The portrait of the first user is sent to the server side. The video frame of the first user collected by the second camera may be directly sent by the terminal to the server, and the server may retrieve the portrait of the first user from the video frame.
在202中,将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。In 202, the portrait of the second user is combined with the video frame of the first user local scene, and the synthesized video frame is displayed on the video call interface.
在本步骤中,根据终端所采集的第一用户本地场景的视频帧以及基于服务器端的数据所获取的与第一用户进行视频通话的第二用户的人像,将第二用户的人像与第一用户本地场景的视频帧进行合成,从而使得用户基于合成后的视频帧进行视频通话。In this step, the portrait of the second user is compared with the first user according to the video frame of the first user local scene collected by the terminal and the portrait of the second user who performs a video call with the first user acquired based on the data of the server end. The video frames of the local scene are synthesized such that the user makes a video call based on the synthesized video frames.
在本申请的一些实施例中,将第二用户的人像与第一用户本地场景的视频帧进行合成时,可以采用以下方式:根据第二用户的人像眼距,确定第二用户的人像显示尺寸;依据所确定出的显示尺寸,将第二用户的人像与第一用户本地场景的视频帧进行合成。其中,第二用户的人像眼距可以由服务器端向终端发送得到的。可以理解的是,也可以不基于第二用户的人像眼距对显示第二用户的人像时的尺寸进行调整,而直接根据预设的比例对第二用户的人像进行尺寸调整。In some embodiments of the present application, when the portrait of the second user is combined with the video frame of the first user local scene, the following manner may be adopted: determining the portrait display size of the second user according to the portrait eye distance of the second user. And synthesizing the portrait of the second user with the video frame of the first user local scene according to the determined display size. The portrait distance of the second user may be sent by the server to the terminal. It can be understood that the size of the second user's portrait may not be adjusted based on the second user's portrait eye distance, and the second user's portrait may be directly adjusted according to the preset ratio.
可选地,在根据第二用户的人像眼距确定第二用户的人像显示尺寸时,可以采用以下方式:终端获取终端设备的屏幕尺寸,根据人像眼距与屏幕尺寸的关系,确定第二用户的人像显示尺寸。其中,终端可以根据终端设备的属性信息,例如终端设备的型号信息,来确定终端设备的屏幕尺寸。例如,若终端设备为苹果7,则可以确定苹果7的屏幕尺寸为4.7英寸。另外可以理解的是,所获取的人像眼距的单位可以与屏幕尺寸的默认单位一致,例如人像眼距的单位为英寸;也可以将人像眼距和屏幕尺寸的单位换算为一致,例如若人像眼距的单位为厘米,屏幕尺寸的单位为英寸,可以将厘米换算为英寸,也可以将英寸换算为厘米。Optionally, when determining the portrait display size of the second user according to the eye distance of the second user, the following manner may be adopted: the terminal acquires the screen size of the terminal device, and determines the second user according to the relationship between the eye distance of the portrait and the screen size. The portrait shows the size. The terminal may determine the screen size of the terminal device according to the attribute information of the terminal device, for example, the model information of the terminal device. For example, if the terminal device is Apple 7, it can be determined that the screen size of the Apple 7 is 4.7 inches. In addition, it can be understood that the unit of the eye distance obtained can be consistent with the default unit of the screen size, for example, the unit of the eye distance is in inches; the unit of the eye distance and the screen size can also be converted into the same, for example, if the portrait The unit of eye distance is in centimeters, and the screen size is in inches. You can convert centimeters to inches or convert inches to centimeters.
在一些实施例中,可根据人像眼距与屏幕尺寸的关系,确定第二用户的 人像的显示尺寸和人像所显示的部分。举例来说,若人像眼距大于E%(例如20%)的屏幕尺寸,则只显示第二用户人像中头顶以下F倍(例如3倍)眼距长度的身体部分,超出部分不显示;若人像眼距等于小于E%(例如20%)的屏幕尺寸,则只显示第二用户人像中头顶以下G倍(例如4倍)眼距长度的身体部分,超出部分不显示。在本申请实施例中,E、F、G均为预设值,其中优选的是F的值要小于G的值。可以理解的是,当人像眼距过大或过小时,还可以对第二用户的人像进行缩放处理,使得第二用户的人像能够在第一用户的终端屏幕中正常显示。例如,若人像眼距过大而超过屏幕尺寸时,例如人像眼距为屏幕尺寸的两倍,屏幕无法显示用户人像,则对第二用户的人像进行缩放处理,例如将其缩小为原先的一半后再进行显示。若人像眼距过小时,则可以对第二用户的人像进行缩放处理,例如将其放大为原先的2倍后再进行显示。In some embodiments, the display size of the portrait of the second user and the portion displayed by the portrait may be determined based on the relationship of the portrait eye distance to the screen size. For example, if the portrait eye distance is greater than the E% (for example, 20%) screen size, only the body part of the eye length of F times (for example, 3 times) below the head in the second user portrait is displayed, and the excess portion is not displayed; If the portrait eye distance is equal to a screen size smaller than E% (for example, 20%), only the body part of the eye length of G times (for example, 4 times) below the head in the second user portrait is displayed, and the excess portion is not displayed. In the embodiment of the present application, E, F, and G are preset values, and it is preferable that the value of F is smaller than the value of G. It can be understood that when the portrait eye distance is too large or too small, the portrait of the second user can also be scaled so that the portrait of the second user can be normally displayed in the terminal screen of the first user. For example, if the portrait eye distance is too large and exceeds the screen size, for example, the portrait eye distance is twice the screen size, and the screen cannot display the user portrait, the second user's portrait is scaled, for example, to reduce it to the original half. Then display it again. If the eye distance of the portrait is too small, the portrait of the second user can be scaled, for example, enlarged to the original 2 times and then displayed.
在本申请的一些实施例中,在将第二用户的人像与第一用户本地场景的视频帧进行合成时,还可以采用以下方式:首先从第二用户的人像中选取N个像素点,以及,从第一用户本地场景的视频帧中选取M个像素点,其中N、M为大于0的正整数,可以随机选取像素点,也可以按照预设的选取位置进行选取;然后计算所选取的N个像素点以及M个像素点之间的中间色,并基于计算得到的中间色对第二用户的人像进行描边,从而使得第二用户的人像叠加时更能够融入第一用户本地场景的视频帧中;将描边所得到的第二用户的人像与第一用户本地场景的视频帧进行合成。可以理解是,也可以不对第二用户的人像进行重新描边,直接将第二用户的人像叠加在第一用户本地场景的视频帧中。也就是说,获取第二用户的人像以及第一用户本地场景的视频帧所对应的中间色,利用所获取的中间色对第二用户的人像进行描边,将描边之后的第二用户的人像与第一用户本地场景视频帧进行合成。In some embodiments of the present application, when synthesizing the portrait of the second user with the video frame of the first user local scene, the following manner may also be adopted: first, selecting N pixel points from the portrait of the second user, and Selecting M pixels from the video frame of the first user local scene, where N and M are positive integers greater than 0, and the pixels may be randomly selected, or may be selected according to a preset selected position; then the selected N pixels and an intermediate color between the M pixels, and stroke the second user's portrait based on the calculated intermediate color, so that the second user's portrait is more integrated into the first user's local scene. In the video frame, the portrait of the second user obtained by the stroke is combined with the video frame of the local scene of the first user. It can be understood that the portrait of the second user may not be re-stroked, and the portrait of the second user is directly superimposed in the video frame of the first user local scene. That is, the intermediate color corresponding to the portrait of the second user and the video frame of the local scene of the first user is acquired, and the portrait of the second user is stroked by using the acquired intermediate color, and the second user after the stroke is obtained. The portrait is synthesized with the first user local scene video frame.
另外,在进行第二用户人像的叠加时,可以按照预设位置将第二用户的人像与第一用户本地场景的视频帧进行合成。也就是说,本实施例是将第二用户的人像放置在第一用户本地场景的视频帧中的合适位置,而不是将第二 用户的人像进行随意放置。例如,若预设位置为第一用户本地场景视频帧的底边的中间位置,则将第二用户的人像居中叠加在第一用户本地场景的视频帧的底边上,预设位置也可以为第一用户本地场景的视频帧中的左端、右端等,本申请对第二用户的人像的叠加位置不进行限定。In addition, when superimposing the second user portrait, the portrait of the second user and the video frame of the first user local scene may be combined according to the preset position. That is to say, the present embodiment is to place the portrait of the second user at a suitable position in the video frame of the first user's local scene, instead of randomly placing the portrait of the second user. For example, if the preset position is the middle position of the bottom edge of the first user local scene video frame, the portrait of the second user is centered on the bottom edge of the video frame of the first user local scene, and the preset position may also be The left end, the right end, and the like in the video frame of the first user local scene, and the superimposed position of the portrait of the second user in the present application is not limited.
在将第二用户的人像与第一用户本地场景的视频帧进行合成后,在视频通话界面显示合成后所得到的视频帧,进行视频通话的双方基于各自所合成的视频帧进行视频通话,从而提高视频通话的通话效果。After combining the portrait of the second user with the video frame of the first user local scene, displaying the synthesized video frame on the video call interface, and performing video call on both sides of the video call by the two sides of the video call, thereby Improve the call performance of your video call.
图3为本申请一实施例提供的在服务器端执行的视频通话的方法流程图,如图3中所示,所述方法包括:FIG. 3 is a flowchart of a method for performing a video call on a server according to an embodiment of the present disclosure. As shown in FIG. 3, the method includes:
在301中,服务器端接收第二用户的终端发送的第二用户的视频帧。In 301, the server receives the video frame of the second user sent by the terminal of the second user.
在本步骤中,服务器端接收第二用户的终端所发送的第二用户的视频帧。举例来说,用户A与用户B进行视频通话,对于用户A来说,服务器端接收的第二用户的视频帧即为用户B的视频帧;同样地,对于用户B来说,服务器端接收的第二用户的视频帧即为用户A的视频帧。In this step, the server receives the video frame of the second user sent by the terminal of the second user. For example, user A and user B make a video call. For user A, the video frame of the second user received by the server is the video frame of user B. Similarly, for user B, the server receives the video. The video frame of the second user is the video frame of User A.
可以理解的是,服务器端所接收的第二用户的视频帧,可以为第二用户的终端抠取出的第二用户的人像;也可以为由第二用户的终端发送的第二用户的视频帧。It can be understood that the video frame of the second user received by the server may be the image of the second user extracted by the terminal of the second user; or may be the video frame of the second user sent by the terminal of the second user. .
在302中,依据所述第二用户的视频帧,将所述第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端,以便第一用户的终端将所述第二用户的人像与第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。In 302, the portrait of the second user is sent to the terminal of the first user who performs a video call with the second user according to the video frame of the second user, so that the terminal of the first user will The portrait of the two users is combined with the video frame of the local scene of the first user, and the video frame obtained after the synthesis is displayed on the video call interface.
在本步骤中,若服务器端所接收的是第二用户的视频帧,还需从第二用户的视频帧中抠取第二用户的人像;然后将抠取得到的第二用户的人像发送给与第二用户进行视频通话的第一用户的终端。另外,服务器端可以在获取第二用户的人像后,进一步检测第二用户的人像眼距,然后将检测得到的眼距信息提供给第一用户的终端,以供第一用户的终端根据人像眼距调整第二用户的人像显示尺寸。In this step, if the server receives the video frame of the second user, it also needs to capture the portrait of the second user from the video frame of the second user; and then send the captured image of the second user to the second user. The terminal of the first user making a video call with the second user. In addition, the server may further detect the portrait eye distance of the second user after acquiring the portrait of the second user, and then provide the detected eye distance information to the terminal of the first user, so that the terminal of the first user is based on the portrait eye. The size of the portrait display of the second user is adjusted.
在本步骤中,若服务器端无法检测得到第二用户的人像眼距,则表明第二用户的终端未能准确获取第二用户的图像,例如第二用户未正视摄像头或者摄像头被遮挡等情况,则服务器端向第二用户的终端返回提示信息,提示用户重新采集人像。In this step, if the server side cannot detect the portrait eye distance of the second user, it indicates that the terminal of the second user fails to accurately acquire the image of the second user, for example, the second user does not face the camera or the camera is blocked. Then, the server returns a prompt message to the terminal of the second user, prompting the user to re-collect the portrait.
在本步骤中,若检测得到第二用户的人像中有多个眼距,则可以在确定满足预设要求的眼距后,将所确定的眼距以及该眼距对应的第二用户的人像发送给第一用户的终端。例如,选取多个眼距中眼距最大的第二用户的人像以及该最大的眼距发送至第一用户的终端。In this step, if it is detected that there are multiple eye distances in the portrait of the second user, the determined eye distance and the portrait of the second user corresponding to the eye distance may be determined after determining the eye distance that meets the preset requirement. The terminal sent to the first user. For example, a portrait of a second user having the largest eye distance among the plurality of eye distances and the largest eye distance are selected and transmitted to the terminal of the first user.
在服务器端将第二用户的人像发送至第一用户的终端后,第一用户的终端将所接收的第二用户的人像与第一用户本地场景的视频帧进行合成。After the server sends the portrait of the second user to the terminal of the first user, the terminal of the first user synthesizes the received portrait of the second user with the video frame of the first user local scene.
以下对上述过程进行举例说明,用户A与用户B进行视频通话,则对于用户A来说,其本身是第一用户,用户B为第二用户;同样地,对于用户B来说,其本身是第一用户,用户A为第二用户。若用户A对应的终端为终端UA,用户B对应的终端为UB,终端UA在获取用户A的用户图像IA后发送至服务器端,终端UB在获取用户B的用户图像IB后发送至服务器端,服务器端在获取用户图像IA和用户图像IB后,抠取用户图像IA中的用户人像Ia以及用户图像IB中的用户人像Ib,然后将抠取得到的用户人像Ia发送至终端UB,将抠取得到用户人像Ib发送至终端UA。进而终端UA根据所得到的用户人像Ib进行合成,终端UB根据所得到的用户人像Ia进行合成。The following describes the above process. User A and user B make a video call. For user A, it is the first user and user B is the second user. Similarly, for user B, it is itself The first user, user A is the second user. If the terminal corresponding to the user A is the terminal UA, and the terminal corresponding to the user B is the UB, the terminal UA sends the user image IA of the user A to the server, and the terminal UB sends the user image IB of the user B to the server. After acquiring the user image IA and the user image IB, the server side captures the user portrait Ia in the user image IA and the user portrait Ib in the user image IB, and then sends the obtained user portrait Ia to the terminal UB, and obtains the user image Ia. It is sent to the terminal UA to the user portrait Ib. Further, the terminal UA synthesizes based on the obtained user portrait Ib, and the terminal UB synthesizes based on the obtained user portrait Ia.
图4为本申请一实施例提供的视频通话的交互流程图。FIG. 4 is a flow chart of interaction of a video call according to an embodiment of the present application.
如图4中所示,用户A与用户B进行视频通话,用户A对应的终端为终端UA,用户B对应的终端为终端UB。首先,终端UA可以利用用户A的终端设备的前置摄像头采集用户A的视频帧,并利用终端设备的后置摄像头采集用户A的本地场景视频帧;终端UB可以利用用户B的终端设备的前置摄像头采集用户B的视频帧,并利用终端设备的后置摄像头采集用户B的本地场景视频帧;然后终端UA和终端UB分别将才记得到的用户A的视频帧和用户B的视频帧发送至服务器端;服务器端对所接收的用户A的视频帧 和用户B的视频帧进行处理,抠取得到用户A的人像Ia和用户B的人像Ib;服务器端将抠取得到的用户A的人像Ia发送至终端UB,将抠取得到的用户B的人像Ib发送至终端UA;终端UA将由服务器端发送的用户B的人像Ib与用户A的本地场景视频帧进行合成,终端UB将由服务器端发送的用户A的人像Ia与用户B的本地场景视频帧进行合成;因此,用户A与用户B分别基于所合成的图像进行视频通话,使得视频通话效果更为逼真。As shown in FIG. 4, user A performs a video call with user B, and the terminal corresponding to user A is the terminal UA, and the terminal corresponding to user B is the terminal UB. First, the terminal UA can collect the video frame of the user A by using the front camera of the terminal device of the user A, and collect the local scene video frame of the user A by using the rear camera of the terminal device; the terminal UB can utilize the front end of the terminal device of the user B. The camera captures the video frame of the user B, and collects the local scene video frame of the user B by using the rear camera of the terminal device; then the terminal UA and the terminal UB respectively send the video frame of the user A and the video frame of the user B that are recorded. To the server side; the server side processes the received video frame of the user A and the video frame of the user B, and obtains the portrait Ia of the user A and the portrait Ib of the user B; the portrait of the user A that the server side will obtain Ia is sent to the terminal UB, and the obtained portrait B of the user B is sent to the terminal UA; the terminal UA synthesizes the portrait Ib of the user B transmitted by the server and the local scene video frame of the user A, and the terminal UB is sent by the server. The portrait Aa of the user A is synthesized with the local scene video frame of the user B; therefore, the user A and the user B respectively perform a video call based on the synthesized image. Make the video call more realistic.
图5示出了适于用来实现本申请的一些实施方式的示例性计算机系统/服务器012的框图。图5显示的计算机系统/服务器012仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。FIG. 5 illustrates a block diagram of an exemplary computer system/server 012 suitable for implementing some embodiments of the present application. The computer system/server 012 shown in FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present application.
如图5所示,计算机系统/服务器012以通用计算设备的形式表现。计算机系统/服务器012的组件可以包括但不限于:一个或者多个处理器或者处理单元016,系统存储器028,连接不同系统组件(包括系统存储器028和处理单元016)的总线018。As shown in Figure 5, computer system/server 012 is represented in the form of a general purpose computing device. Components of computer system/server 012 may include, but are not limited to, one or more processors or processing units 016, system memory 028, and bus 018 that connects different system components, including system memory 028 and processing unit 016.
总线018表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。 Bus 018 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an Enhanced ISA Bus, a Video Electronics Standards Association (VESA) local bus, and peripheral component interconnects ( PCI) bus.
计算机系统/服务器012典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机系统/服务器012访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。Computer system/server 012 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system/server 012, including volatile and non-volatile media, removable and non-removable media.
系统存储器028可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)030和/或高速缓存存储器032。计算机系统/服务器012可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统034可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器, 以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线018相连。存储器028可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本申请各实施例的功能。 System memory 028 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 030 and/or cache memory 032. Computer system/server 012 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 034 can be used to read and write non-removable, non-volatile magnetic media (not shown in Figure 5, commonly referred to as a "hard disk drive"). Although not shown in FIG. 5, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk"), and a removable non-volatile disk (such as a CD-ROM, DVD-ROM) may be provided. Or other optical media) read and write optical drive. In these cases, each drive can be coupled to bus 018 via one or more data medium interfaces. Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the various embodiments of the present application.
具有一组(至少一个)程序模块042的程序/实用工具040,可以存储在例如存储器028中,这样的程序模块042包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块042通常执行本申请所描述的实施例中的功能和/或方法。Program/utility 040 having a set (at least one) of program modules 042, which may be stored, for example, in memory 028, such program module 042 includes, but is not limited to, an operating system, one or more applications, other programs Modules and program data, each of these examples or some combination may include an implementation of a network environment. Program module 042 typically performs the functions and/or methods of the embodiments described herein.
计算机系统/服务器012也可以与一个或多个外部设备014(例如键盘、指向设备、显示器024等)通信,在本申请的一些实施例中,计算机系统/服务器012与外部雷达设备进行通信,还可与一个或者多个使得用户能与该计算机系统/服务器012交互的设备通信,和/或与使得该计算机系统/服务器012能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口022进行。并且,计算机系统/服务器012还可以通过网络适配器020与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器020通过总线018与计算机系统/服务器012的其它模块通信。应当明白,尽管图中未示出,可以结合计算机系统/服务器012使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Computer system/server 012 may also be in communication with one or more external devices 014 (eg, a keyboard, pointing device, display 024, etc.), in some embodiments of the present application, computer system/server 012 is in communication with an external radar device, Any device (eg, a network card that can communicate with one or more devices that enable a user to interact with the computer system/server 012, and/or with the computer system/server 012 to communicate with one or more other computing devices, Modem, etc.) communication. This communication can take place via an input/output (I/O) interface 022. Also, computer system/server 012 can also communicate with one or more networks (e.g., a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) via network adapter 020. As shown, network adapter 020 communicates with other modules of computer system/server 012 via bus 018. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in connection with computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, Tape drives and data backup storage systems.
处理单元016通过运行存储在系统存储器028中的程序,从而执行各种功能应用以及数据处理,例如实现一种视频通话的方法,可以包括:The processing unit 016, by executing a program stored in the system memory 028, performs various function applications and data processing, for example, a method for implementing a video call, which may include:
终端通过第一摄像头采集第一用户本地场景的视频帧,以及,基于来自服务器端的数据获取与所述第一用户进行视频通话的第二用户的人像;The terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user that performs a video call with the first user based on data from the server end;
将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成,并 在视频通话界面显示合成后得到的视频帧。Combining the portrait of the second user with the video frame of the first user local scene, and displaying the synthesized video frame on the video call interface.
还可以实现一种视频通话的方法,包括:A method of video calling can also be implemented, including:
服务器端接收第二用户的终端发送的第二用户的视频帧;The server receives the video frame of the second user sent by the terminal of the second user;
依据所述第二用户的视频帧,将所述第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端,以便第一用户的终端将所述第二用户的人像与第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Sending, according to the video frame of the second user, the portrait of the second user to the terminal of the first user who performs a video call with the second user, so that the terminal of the first user displays the portrait of the second user The video frame of the first user local scene is synthesized, and the synthesized video frame is displayed on the video call interface.
上述的计算机程序可以设置于计算机存储介质中,即该计算机存储介质被编码有计算机程序,该程序在被一个或多个计算机执行时,使得一个或多个计算机执行本申请上述实施例中所示的方法流程和/或装置操作。例如,被上述一个或多个处理器执行的方法流程,可以包括:The computer program described above may be provided in a computer storage medium encoded with a computer program that, when executed by one or more computers, causes one or more computers to perform the operations described in the above-described embodiments of the present application. Method flow and/or device operation. For example, the method flow executed by one or more of the above processors may include:
终端通过第一摄像头采集第一用户本地场景的视频帧,以及,基于来自服务器端的数据获取与所述第一用户进行视频通话的第二用户的人像;The terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user that performs a video call with the first user based on data from the server end;
将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Combining the portrait of the second user with the video frame of the first user local scene, and displaying the synthesized video frame on the video call interface.
还可以包括:It can also include:
服务器端接收第二用户的终端发送的第二用户的视频帧;The server receives the video frame of the second user sent by the terminal of the second user;
依据所述第二用户的视频帧,将所述第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端,以便第一用户的终端将所述第二用户的人像与第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Sending, according to the video frame of the second user, the portrait of the second user to the terminal of the first user who performs a video call with the second user, so that the terminal of the first user displays the portrait of the second user The video frame of the first user local scene is synthesized, and the synthesized video frame is displayed on the video call interface.
随着时间、技术的发展,介质含义越来越广泛,计算机程序的传播途径不再受限于有形介质,还可以直接从网络下载等。可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个 或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。With the development of time and technology, the meaning of media is more and more extensive. The transmission route of computer programs is no longer limited by tangible media, and can also be downloaded directly from the network. Any combination of one or more computer readable media can be utilized. The computer readable medium can be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of computer readable storage media include: electrical connections having one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、RF等等,或者上述的任意合适的组合。可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing. Computer program code for performing the operations of the present application may be written in one or more programming languages, or a combination thereof, including an object oriented programming language such as Java, Smalltalk, C++, and conventional Procedural programming language—such as the "C" language or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg, using an Internet service provider to access the Internet) connection).
利用本申请所提供的技术方案,通过将进行视频通话双方中一方用户的人像叠加在另一方用户的本地场景的视频帧的方式,能够为进行视频通话的双方营造更加拟真的通话环境,从而实现提高视频通话效果的目的。By using the technical solution provided by the present application, by superimposing the portrait of one of the video call parties on the video frame of the local scene of the other user, it is possible to create a more realistic call environment for both parties of the video call, thereby Achieve the purpose of improving the video call.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和 方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided herein, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division, and the actual implementation may have another division manner.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform the methods described in various embodiments of the present application. Part of the steps. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above is only the preferred embodiment of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are made within the spirit and principles of the present application, should be included in the present application. Within the scope of protection.

Claims (15)

  1. 一种视频通话的方法,其特征在于,所述方法包括:A method for video calling, the method comprising:
    终端通过第一摄像头采集第一用户本地场景的视频帧,以及,基于来自服务器端的数据获取与所述第一用户进行视频通话的第二用户的人像;The terminal collects a video frame of the first user local scene through the first camera, and acquires a portrait of the second user that performs a video call with the first user based on data from the server end;
    将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Combining the portrait of the second user with the video frame of the first user local scene, and displaying the synthesized video frame on the video call interface.
  2. 根据权利要求1所述的方法,其特征在于,该方法还包括:The method of claim 1 further comprising:
    所述终端通过第二摄像头采集第一用户的人像,并发送给服务器端。The terminal collects the portrait of the first user through the second camera and sends the image to the server.
  3. 根据权利要求2所述的方法,其特征在于,所述第一摄像头为后置摄像头,所述第二摄像头为前置摄像头。The method of claim 2 wherein said first camera is a rear camera and said second camera is a front camera.
  4. 根据权利要求2所述的方法,其特征在于,所述终端通过第二摄像头采集第一用户的人像,并发送给服务器端包括:The method according to claim 2, wherein the collecting, by the terminal, the portrait of the first user by the second camera and transmitting the image to the server comprises:
    所述终端通过第二摄像头采集第一用户的视频帧,从所述视频帧中抠取人像,将抠取所得到的人像发送至服务器端;或者,The terminal collects a video frame of the first user by using the second camera, and extracts a portrait from the video frame, and sends the captured portrait to the server; or
    所述终端通过第二摄像头采集第一用户的视频帧,将所述视频帧发送至服务器端,以供所述服务器端从该视频帧中抠取人像。The terminal collects a video frame of the first user by using the second camera, and sends the video frame to the server, so that the server end extracts a portrait from the video frame.
  5. 根据权利要求1述的方法,其特征在于,将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成包括:The method according to claim 1, wherein synthesizing the portrait of the second user with the video frame of the first user local scene comprises:
    根据所述第二用户的人像眼距,确定所述第二用户的人像显示尺寸;Determining, according to the portrait eye distance of the second user, a portrait display size of the second user;
    依据确定出的显示尺寸,将第二用户的人像与所述第一用户本地场景的视频帧进行合成。And synthesizing the portrait of the second user with the video frame of the first user local scene according to the determined display size.
  6. 根据权利要求1所述的方法,其特征在于,将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成包括:The method according to claim 1, wherein synthesizing the portrait of the second user with the video frame of the first user local scene comprises:
    从所述第二用户的人像中选取N个像素点,以及,从所述第一用户本地场景的视频帧中选取M个像素点,其中N、M为大于0的正整数;Selecting N pixel points from the portrait of the second user, and selecting M pixel points from the video frames of the first user local scene, where N and M are positive integers greater than 0;
    计算所选取的N个像素点以及M个像素点之间的中间色,并基于计算得到 的中间色对所述第二用户的人像进行描边;Calculating the selected N pixels and the intermediate color between the M pixels, and performing strokes on the second user's portrait based on the calculated intermediate color;
    将描边得到的所述第二用户的人像与所述第一用户本地场景的视频帧进行合成。The portrait of the second user obtained by the stroke is combined with the video frame of the first user local scene.
  7. 根据权利要求1所述的方法,其特征在于,将所述第二用户的人像与所述第一用户本地场景的视频帧进行合成包括:The method according to claim 1, wherein synthesizing the portrait of the second user with the video frame of the first user local scene comprises:
    按照预设的位置,将所述第二用户的人像叠加于所述第一用户本地场景的视频帧中。And superimposing the portrait of the second user in a video frame of the first user local scene according to a preset location.
  8. 一种视频通话的方法,其特征在于,所述方法包括:A method for video calling, the method comprising:
    服务器端接收第二用户的终端发送的第二用户的视频帧;The server receives the video frame of the second user sent by the terminal of the second user;
    依据所述第二用户的视频帧,将所述第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端,以便第一用户的终端将所述第二用户的人像与第一用户本地场景的视频帧进行合成,并在视频通话界面显示合成后得到的视频帧。Sending, according to the video frame of the second user, the portrait of the second user to the terminal of the first user who performs a video call with the second user, so that the terminal of the first user displays the portrait of the second user The video frame of the first user local scene is synthesized, and the synthesized video frame is displayed on the video call interface.
  9. 根据权利要求8所述的方法,其特征在于,所述第二用户的视频帧为所述第二用户的终端抠取出的所述第二用户的人像。The method according to claim 8, wherein the video frame of the second user is a portrait of the second user extracted by the terminal of the second user.
  10. 根据权利要求8所述的方法,其特征在于,依据所述第二用户的视频帧,将第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端包括:The method according to claim 8, wherein the transmitting the portrait of the second user to the terminal of the first user performing a video call with the second user according to the video frame of the second user comprises:
    从所述第二用户的视频帧抠取第二用户的人像;Extracting a portrait of the second user from the video frame of the second user;
    将抠取得到的第二用户的人像发送给与所述第二用户进行视频通话的第一用户的终端。Sending the acquired portrait of the second user to the terminal of the first user who is in a video call with the second user.
  11. 根据权利要求10所述的方法,其特征在于,该方法还包括:The method of claim 10, further comprising:
    检测第二用户的人像眼距,将检测得到的眼距信息提供给所述第一用户的终端。The portrait eye distance of the second user is detected, and the detected eye distance information is provided to the terminal of the first user.
  12. 根据权利要求11所述的方法,其特征在于,该方法还包括:The method of claim 11 further comprising:
    若服务器端无法检测得到第二用户的人像眼距,则向第二用户的终端返回无法获取第二用户的人像眼距的提示。If the server side cannot detect the portrait eye distance of the second user, the terminal of the second user returns a prompt that the eye distance of the second user cannot be obtained.
  13. 根据权利要求11所述的方法,其特征在于,该方法还包括:The method of claim 11 further comprising:
    若检测得到第二用户有多个眼距时,则在确定满足预设要求的眼距后,将所确定的眼距以及该眼距对应的第二用户的人像发送给第一用户的终端。If it is detected that the second user has multiple eye distances, after determining the eye distance that meets the preset requirement, the determined eye distance and the portrait of the second user corresponding to the eye distance are sent to the terminal of the first user.
  14. 一种设备,其特征在于,所述设备包括:A device, characterized in that the device comprises:
    一个或多个处理器;One or more processors;
    存储装置,用于存储一个或多个程序,a storage device for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-13中任一所述的方法。The one or more programs are executed by the one or more processors such that the one or more processors implement the method of any of claims 1-13.
  15. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-13中任一所述的方法。A storage medium containing computer executable instructions for performing the method of any of claims 1-13 when executed by a computer processor.
PCT/CN2018/124933 2018-03-29 2019-01-31 Video call method and device, and computer storage medium WO2019184499A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810272244.3 2018-03-29
CN201810272244.3A CN108259810A (en) 2018-03-29 2018-03-29 A kind of method of video calling, equipment and computer storage media

Publications (1)

Publication Number Publication Date
WO2019184499A1 true WO2019184499A1 (en) 2019-10-03

Family

ID=62746463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124933 WO2019184499A1 (en) 2018-03-29 2019-01-31 Video call method and device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN108259810A (en)
WO (1) WO2019184499A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111669534A (en) * 2020-04-28 2020-09-15 视联动力信息技术股份有限公司 Communication method, first client and instant communication system
CN113473239A (en) * 2020-07-15 2021-10-01 青岛海信电子产业控股股份有限公司 Intelligent terminal, server and image processing method

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259810A (en) * 2018-03-29 2018-07-06 上海掌门科技有限公司 A kind of method of video calling, equipment and computer storage media
CN110536075B (en) * 2019-09-20 2023-02-21 上海掌门科技有限公司 Video generation method and device
WO2022001635A1 (en) * 2020-07-03 2022-01-06 海信视像科技股份有限公司 Display device and display method
US20240007590A1 (en) * 2020-09-30 2024-01-04 Beijing Zitiao Network Technology Co., Ltd. Image processing method and apparatus, and electronic device, and computer readable medium
CN112363658B (en) * 2020-10-27 2022-08-12 维沃移动通信有限公司 Interaction method and device for video call
CN114915722B (en) * 2021-02-09 2023-08-22 华为技术有限公司 Method and device for processing video
US20240185530A1 (en) * 2021-03-30 2024-06-06 Beijing Boe Technology Development Co., Ltd. Information interaction method, computer-readable storage medium and communication terminal
CN116941235A (en) * 2021-04-26 2023-10-24 深圳市大疆创新科技有限公司 Shooting method, control device and shooting equipment storage medium
CN115250340A (en) * 2021-04-26 2022-10-28 海信集团控股股份有限公司 MV recording method and display device
CN115484466A (en) * 2021-05-31 2022-12-16 海信集团控股股份有限公司 Display method and server for on-line singing video
CN113973178A (en) * 2021-10-24 2022-01-25 云景文旅科技有限公司 Interactive photographing processing method and device in travel process

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101039200A (en) * 2006-03-13 2007-09-19 阿尔卡特朗讯公司 Context enriched communication system and method
CN101562682A (en) * 2008-04-14 2009-10-21 鸿富锦精密工业(深圳)有限公司 Video image processing system, server, user side and video image processing method thereof
CN103716537A (en) * 2013-12-18 2014-04-09 宇龙计算机通信科技(深圳)有限公司 Photograph synthesizing method and terminal
US20150091891A1 (en) * 2013-09-30 2015-04-02 Dumedia, Inc. System and method for non-holographic teleportation
CN106331569A (en) * 2016-08-23 2017-01-11 广州华多网络科技有限公司 Method and system for transforming figure face in instant video picture
CN106534716A (en) * 2016-11-17 2017-03-22 三星电子(中国)研发中心 Methods for transmitting and displaying panoramic videos
CN108259810A (en) * 2018-03-29 2018-07-06 上海掌门科技有限公司 A kind of method of video calling, equipment and computer storage media

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350845B (en) * 2007-07-20 2012-05-09 中兴通讯股份有限公司 Method for simulating talking scene of mobile phone visible telephone
CN101610421B (en) * 2008-06-17 2011-12-21 华为终端有限公司 Video communication method, video communication device and video communication system
CN101677386A (en) * 2008-08-01 2010-03-24 中兴通讯股份有限公司 System capable of selecting real-time virtual call background and video call method
CN101547333A (en) * 2009-04-20 2009-09-30 中兴通讯股份有限公司 Method and terminal for switching front and back scene during viewable call
CN101931621A (en) * 2010-06-07 2010-12-29 上海那里网络科技有限公司 Device and method for carrying out emotional communication in virtue of fictional character
CN102307292A (en) * 2011-09-01 2012-01-04 宇龙计算机通信科技(深圳)有限公司 Visual communication method visual terminal
CN103686050A (en) * 2012-09-18 2014-03-26 联想(北京)有限公司 Method and electronic equipment for simulating call scenes,

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101039200A (en) * 2006-03-13 2007-09-19 阿尔卡特朗讯公司 Context enriched communication system and method
CN101562682A (en) * 2008-04-14 2009-10-21 鸿富锦精密工业(深圳)有限公司 Video image processing system, server, user side and video image processing method thereof
US20150091891A1 (en) * 2013-09-30 2015-04-02 Dumedia, Inc. System and method for non-holographic teleportation
CN103716537A (en) * 2013-12-18 2014-04-09 宇龙计算机通信科技(深圳)有限公司 Photograph synthesizing method and terminal
CN106331569A (en) * 2016-08-23 2017-01-11 广州华多网络科技有限公司 Method and system for transforming figure face in instant video picture
CN106534716A (en) * 2016-11-17 2017-03-22 三星电子(中国)研发中心 Methods for transmitting and displaying panoramic videos
CN108259810A (en) * 2018-03-29 2018-07-06 上海掌门科技有限公司 A kind of method of video calling, equipment and computer storage media

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111669534A (en) * 2020-04-28 2020-09-15 视联动力信息技术股份有限公司 Communication method, first client and instant communication system
CN113473239A (en) * 2020-07-15 2021-10-01 青岛海信电子产业控股股份有限公司 Intelligent terminal, server and image processing method
CN113473239B (en) * 2020-07-15 2023-10-13 青岛海信电子产业控股股份有限公司 Intelligent terminal, server and image processing method

Also Published As

Publication number Publication date
CN108259810A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
WO2019184499A1 (en) Video call method and device, and computer storage medium
CN109635621B (en) System and method for recognizing gestures based on deep learning in first-person perspective
CN109313812B (en) Shared experience with contextual enhancements
CN108933915B (en) Video conference device and video conference management method
CN111541845B (en) Image processing method and device and electronic equipment
CN106575361B (en) Method for providing visual sound image and electronic equipment for implementing the method
US20190222806A1 (en) Communication system and method
WO2021057267A1 (en) Image processing method and terminal device
US11527242B2 (en) Lip-language identification method and apparatus, and augmented reality (AR) device and storage medium which identifies an object based on an azimuth angle associated with the AR field of view
JP6986187B2 (en) Person identification methods, devices, electronic devices, storage media, and programs
KR101768532B1 (en) System and method for video call using augmented reality
CN108307106B (en) Image processing method and device and mobile terminal
CN111629253A (en) Video processing method and device, computer readable storage medium and electronic equipment
US11556605B2 (en) Search method, device and storage medium
US20200162792A1 (en) Generating an interactive digital video content item
WO2019184498A1 (en) Video interactive method, computer device and storage medium
JP2016181018A (en) Information processing system and information processing method
CN109766006B (en) Virtual reality scene display method, device and equipment
CN110673811A (en) Panoramic picture display method and device based on sound information positioning and storage medium
CN112887654B (en) Conference equipment, conference system and data processing method
CN111985252A (en) Dialogue translation method and device, storage medium and electronic equipment
US11756302B1 (en) Managing presentation of subject-based segmented video feed on a receiving device
WO2022151687A1 (en) Group photo image generation method and apparatus, device, storage medium, computer program, and product
JP2020112895A (en) Control program of information processing apparatus, control method of information processing apparatus, and information processing apparatus
CN108932142A (en) A kind of picture catching method and terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18911653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18911653

Country of ref document: EP

Kind code of ref document: A1