WO2022147698A1 - 三维视频通话方法及电子设备 - Google Patents

三维视频通话方法及电子设备 Download PDF

Info

Publication number
WO2022147698A1
WO2022147698A1 PCT/CN2021/070536 CN2021070536W WO2022147698A1 WO 2022147698 A1 WO2022147698 A1 WO 2022147698A1 CN 2021070536 W CN2021070536 W CN 2021070536W WO 2022147698 A1 WO2022147698 A1 WO 2022147698A1
Authority
WO
WIPO (PCT)
Prior art keywords
subunit
face
image
dimensional
sub
Prior art date
Application number
PCT/CN2021/070536
Other languages
English (en)
French (fr)
Inventor
雷涛
石洁珂
李宗岩
檀珠峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/070536 priority Critical patent/WO2022147698A1/zh
Priority to CN202180087392.8A priority patent/CN116711303A/zh
Publication of WO2022147698A1 publication Critical patent/WO2022147698A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals

Definitions

  • the present application relates to the field of communications, and in particular, to a three-dimensional video call method and electronic device.
  • the video sending device first obtains the two-dimensional image and the depth image, and then compresses the two-dimensional image and the depth image and sends it to the server.
  • the server decodes the received 2D image and depth image, generates a 3D image according to the decoded 2D image and depth image, and compresses the 3D image and sends it to the video receiving device to implement a 3D video call between users.
  • the server decodes the received two-dimensional image and depth image, and encodes the generated three-dimensional image, which increases the time delay for the video receiving device to obtain the three-dimensional video.
  • Embodiments of the present application provide a method and an electronic device for a 3D video call, which can reduce the delay in acquiring a 3D video during a video call.
  • the present application adopts the following technical solutions.
  • an electronic device in a first aspect, includes: a face image acquisition module, a video coding module and a network transmission module.
  • a face image acquisition module is used to obtain a face depth image and a face two-dimensional image; the face depth image is divided into a plurality of subunits including a first subunit and a second subunit; the face two-dimensional image is divided into multiple subunits; Divide into a plurality of subunits including the third subunit and the fourth subunit; send the first subunit and the third subunit to the video encoding module; after sending the first subunit and the third subunit, send to the video encoding module The second subunit and the fourth subunit.
  • the first subunit corresponds to the third subunit
  • the second subunit corresponds to the fourth subunit.
  • the video encoding module is configured to obtain the first encoding unit according to the first subunit and the third subunit, and send the first encoding unit to the network transmission module; after obtaining and sending the first encoding unit, obtain and send the first encoding unit according to the second subunit and the fourth subunit.
  • the subunit obtains the second coding unit, and sends the second coding unit to the network transmission module.
  • the network transmission module is used for sending the first coding unit to the second electronic device; after sending the first coding unit, sending the second coding unit to the second electronic device.
  • the face image acquisition module of the electronic device divides the face depth image and the face two-dimensional image into a plurality of subunits, and sends a pair of subunits to the video encoding module before sending the next A pair of subunits, the pair of subunits includes a subunit of a face depth image and a subunit of a face two-dimensional image corresponding to the one subunit of the face depth image. In this way, the time for the video encoding module to wait for receiving images can be shortened.
  • the video encoding module After the video encoding module obtains one encoding unit according to a pair of subunits and sends it to the network transmission module, it performs the same processing on the next pair of subunits, which can shorten the time that the network transmission module waits for receiving the encoding unit. After the network transmission module receives one encoding unit and sends it to the second electronic device, it then receives the next encoding unit and sends it to the second electronic device, which can shorten the time that the second electronic device waits to receive the encoding unit, thereby reducing the time required for the second electronic device to wait. Get the time delay of 3D video.
  • the face image acquisition module can specifically be used to receive face depth information; receive face two-dimensional information; obtain a face depth image according to the face depth information, and obtain a face according to the face two-dimensional information 2D image.
  • the face image acquisition module processes the face depth information to obtain a face depth image, so that a real three-dimensional video can be displayed, and a three-dimensional video call can be realized.
  • the video coding module may be specifically configured to: encode the third subunit to obtain the third coding unit; obtain the first coding unit according to the first subunit and the third coding unit; and, The fourth subunit is encoded to obtain the fourth encoding unit; and the second encoding unit is obtained according to the second subunit and the fourth encoding unit. That is to say, the video coding module can firstly encode a subunit of the two-dimensional face image to obtain a coding unit, and then perform mixed coding of a subunit of the depth image of the face and the coding unit. In this way, the subunits of the face depth image and the subunits of the face two-dimensional image are encoded into the same encoding unit, which can reduce the complexity of transmission.
  • the video encoding module of the electronic device may first encode the first subunit. Encoding is performed, and then the encoded first subunit and the third subunit are mixedly encoded to obtain a first encoding unit.
  • the electronic device described in the first aspect may further include: a face three-dimensional generation module and a display module.
  • the face 3D generation module is used to obtain the first face 3D sub-image according to the first subunit and the third subunit, and send the first face 3D sub-image to the display module; after obtaining and sending the first face 3D sub-image After the sub-image is obtained, the second three-dimensional sub-image of the human face is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional sub-image of the human face is sent to the display module.
  • the display module is used to superimpose the first face three-dimensional sub-image and the scene two-dimensional image; after the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two are superimposed. dimensional images are superimposed.
  • the face three-dimensional generation module obtains a face three-dimensional sub-image according to a pair of sub-units, and sends it to the display module, and then performs the same processing on the next pair of sub-units, which can shorten the waiting time of the display module, thereby reducing the
  • the delay in obtaining the 3D image of the face by the electronic device can further reduce the delay in obtaining the 3D video by the electronic device.
  • an electronic device in a second aspect, includes: a network transmission module, a video decoding module, a three-dimensional face generation module and a display module.
  • the network transmission module is configured to receive the first encoding unit from the first electronic device and send the first encoding unit to the video decoding module; after receiving and sending the first encoding unit, receive the second encoding unit from the first electronic device unit, sending the second coding unit to the video decoding module.
  • a video decoding module configured to obtain the first subunit and the third subunit according to the first coding unit; after obtaining the first subunit and the third subunit, obtain the second subunit and the fourth subunit according to the second coding unit .
  • the first subunit and the second subunit are respectively subunits in the face depth image
  • the third subunit and the fourth subunit are respectively subunits in the face two-dimensional image
  • the first subunit corresponds to the third subunit unit
  • the second subunit corresponds to the fourth subunit.
  • the three-dimensional face generation module is used to obtain the first three-dimensional sub-image of the face according to the first subunit and the third subunit, and send the first three-dimensional sub-image to the display module; after obtaining and sending the first three-dimensional sub-image of the face Then, the second three-dimensional sub-image of the face is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional sub-image of the face is sent to the display module.
  • the display module is used to superimpose the first face three-dimensional sub-image and the scene two-dimensional image; after the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two are superimposed. dimensional images are superimposed.
  • the network transmission module of the electronic device receives an encoding unit and sends it to the video decoding module, it receives the next encoding unit and sends it to the video decoding module, which can shorten the waiting time of the video decoding module.
  • the video decoding module decodes a coding unit, obtains a sub-unit of a 3D face image and a sub-unit of a 2D face image, and sends it to the 3D face generation module, and performs the same processing on the next coding unit, which can shorten the The waiting time of the 3D face generation module.
  • the 3D face generation module obtains a 3D sub-image of a face according to a pair of sub-units, and sends it to the display module, and then performs the same processing on the next pair of sub-units, which can shorten the waiting time of the display module, thereby reducing the acquisition time of electronic equipment.
  • the time delay of the 3D image of the face reduces the time delay for the electronic device to obtain the 3D video.
  • the video decoding module is further configured to parse the first coding unit to obtain the first subunit and the third coding unit; decode the third coding unit to obtain the third subunit; After the third coding unit is decoded, the second coding unit is parsed to obtain the second subunit and the fourth coding unit; the fourth coding unit is decoded to obtain the fourth subunit.
  • the video decoding module can decode a pair of sub-units from one coding unit, which can reduce the complexity of acquiring the sub-unit of the depth image of the face and the sub-unit of the two-dimensional image of the face.
  • the electronic device provided in the second aspect may further include: a touch module.
  • the touch module is used to detect the adjustment action.
  • the display module is used for adjusting and displaying the angle of the face in the three-dimensional image of the face according to the adjustment action. In this way, the electronic device can display different angles of the face in the three-dimensional video.
  • a three-dimensional video calling method includes: acquiring a face depth image and a face two-dimensional image; dividing the face depth image into a plurality of subunits including a first subunit and a second subunit; dividing the face two-dimensional image into a plurality of subunits including a first subunit and a second subunit; A plurality of subunits of the third subunit and the fourth subunit.
  • the first coding unit is obtained according to the first subunit and the third subunit; after the first coding unit is obtained, the second coding unit is obtained according to the second subunit and the fourth subunit.
  • the first subunit corresponds to the third subunit
  • the second subunit corresponds to the fourth subunit.
  • obtaining the face depth image and the face two-dimensional image above may include: receiving face depth information; receiving face two-dimensional information; obtaining a face depth image according to the face depth information, The face two-dimensional information obtains a two-dimensional image of the face.
  • the three-dimensional video calling method described in the third aspect may further include: encoding the third subunit to obtain the third encoding unit; and obtaining the third encoding unit according to the first subunit and the third encoding unit a first coding unit; and, coding a fourth subunit to obtain a fourth coding unit; and obtaining a second coding unit according to the second subunit and the fourth coding unit.
  • the three-dimensional video calling method described in the third aspect may further include: obtaining a first face three-dimensional sub-image according to the first subunit and the third subunit; after obtaining the first face three-dimensional sub-image; After the image is obtained, a second three-dimensional sub-image of the face is obtained according to the second sub-unit and the fourth sub-unit.
  • the first face three-dimensional sub-image and the scene two-dimensional image are superimposed; after the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed.
  • a method for a three-dimensional video call includes: receiving a first encoding unit from a first electronic device; and after receiving the first encoding unit, receiving a second encoding unit from the first electronic device.
  • the first subunit and the third subunit are obtained according to the first coding unit; after the first subunit and the third subunit are obtained, the second subunit and the fourth subunit are obtained according to the second coding unit; wherein, the first subunit The unit and the second subunit are respectively the subunits in the face depth image, the third subunit and the fourth subunit are respectively the subunits in the face two-dimensional image, the first subunit corresponds to the third subunit, the second subunit The unit corresponds to the fourth subunit.
  • the first three-dimensional face sub-image is obtained according to the first subunit and the third subunit; after the first three-dimensional face sub-image is obtained, the second three-dimensional face sub-image is obtained according to the second subunit and the fourth subunit.
  • the first face three-dimensional sub-image and the scene two-dimensional image are superimposed; after the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed.
  • the 3D video calling method described in the fourth aspect may further include: parsing the first coding unit to obtain the first subunit and the third coding unit; decoding the third coding unit , to obtain the third subunit; after decoding the third coding unit, parse the second coding unit to obtain the second subunit and the fourth coding unit; decode the fourth coding unit to obtain the fourth subunit.
  • the three-dimensional video calling method described in the fourth aspect may further include: detecting an adjustment action, and in response to the adjustment action, adjusting the angle of the displayed face in the three-dimensional image of the face.
  • an electronic device comprising: a processor coupled to a memory.
  • Memory for storing computer programs.
  • the processor is configured to execute the computer program stored in the memory, so that the electronic device executes the three-dimensional video calling method according to any one of the possible implementation manners of the third aspect to the fourth aspect.
  • the electronic device described in the fifth aspect may further include a transceiver.
  • the transceiver may be a transceiver circuit or an input/output port.
  • the transceiver may be used for the electronic device to communicate with other devices.
  • the electronic device described in the fifth aspect may be an electronic device, or a chip or a chip system provided inside the electronic device.
  • a three-dimensional video call system in a sixth aspect, includes the electronic device described in any possible implementation manner of the first aspect, and the electronic device described in any possible implementation manner of the second aspect.
  • a seventh aspect provides a computer-readable storage medium on which a computer program or instruction is stored, when the computer program or instruction is executed on a computer, the computer is made to execute any of the third to fourth aspects.
  • a computer program product comprising: a computer program or instructions, when the computer program or instructions are run on a computer, the computer is made to execute any one of the possible implementations of the third aspect to the fourth aspect The three-dimensional video calling method described in the method.
  • FIG. 1 is a schematic structural diagram of a 3D video call system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram 1 of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is a second schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a face image acquisition module provided by an embodiment of the present application.
  • FIG. 6 is a third schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a three-dimensional video call method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a face depth image and a face two-dimensional image provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram 1 of the application of the first electronic device provided by the embodiment of the present application.
  • FIG. 10 is a second application schematic diagram of the first electronic device provided by the embodiment of the present application.
  • FIG. 11 is a second schematic flowchart of a method for a 3D video call provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a code stream provided by an embodiment of the present application.
  • FIG. 13 is an application schematic diagram of the second electronic device provided by the embodiment of the present application.
  • FIG. 14 is a schematic diagram of a three-dimensional image of a face provided by an embodiment of the present application.
  • references in the description of the present application to the terms “comprising” and “having” and any variations thereof are intended to cover non-exclusive inclusion.
  • a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes other unlisted steps or units, or optionally also Include other steps or units inherent to these processes, methods, products or devices.
  • FIG. 1 is a schematic structural diagram of a three-dimensional video call system to which the three-dimensional video call method provided by the embodiment of the present application is applicable.
  • the three-dimensional video call system applicable to the embodiments of the present application will be described in detail by taking the three-dimensional video call system shown in FIG. 1 as an example.
  • the solutions in the embodiments of the present application can also be applied to other three-dimensional video call systems, such as a first electronic device to a plurality of second electronic devices, or a plurality of first electronic devices to a plurality of second electronic devices
  • the corresponding name can also be replaced by the name of the corresponding function in other 3D video calling systems.
  • the three-dimensional video call system includes at least two electronic devices, such as a first electronic device and a second electronic device.
  • the embodiments of the present application are described by taking the first electronic device as the sending end of the 3D video and the second electronic device as the receiving end of the 3D video as an example.
  • the electronic device may specifically be a mobile phone, a tablet computer, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC) , netbooks, personal digital assistants (personal digital assistants, PDAs), artificial intelligence (artificial intelligence) devices, wearable devices and other terminal devices with video calling functions, wearable devices can be smart watches, smart bracelets, smart glasses, smart helmet etc.
  • the embodiments of the present application do not limit any specific types of electronic devices.
  • FIG. 2 is a schematic structural diagram 1 of an electronic device provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110 , an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) interface 130 , a charge management module 140 , a power management module 141 , and a battery 142 , Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Sensor Module 190, Key 190, Motor 191, Indicator 192, Camera 193, Display Screen 194, and Subscriber Identification Module , SIM) card interface 195 and so on.
  • a processor 110 an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) interface 130 , a charge management module 140 , a power management module 141 , and a battery 142 , Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Sensor Module 190, Key 190, Motor 191, Indicator
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the electronic device 100 may use the processor 110 to obtain a face depth image and a two-dimensional face image, divide the face depth image into multiple subunits, and divide the face two-dimensional image into multiple subunits unit.
  • the electronic device 100 may use the processor 110 to obtain a three-dimensional sub-image of the face according to the depth image of the face and the two-dimensional image of the face.
  • the electronic device 100 can use the processor 110 to obtain a three-dimensional sub-image of a face according to a sub-unit of the depth image of the face and a sub-unit of the two-dimensional image of the face, and a sub-unit of the depth image of the face is related to the person corresponds to a subunit of the 2D image of the face.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the electronic device 100 may use the mobile communication module 150 to send the encoded face depth image and the two-dimensional face image to other electronic devices, and/or receive encoded face images from other electronic devices Depth images of faces and 2D images of faces.
  • the electronic device 100 can use the mobile communication module 150 to send the encoded subunit of the depth image of the face and the subunit of the two-dimensional image of the face to other electronic devices, and/or receive the encoded subunits from other electronic devices.
  • the subunit of the face depth image and the subunit of the face 2D image may be used to send the encoded face depth image and the two-dimensional face image to other electronic devices, and/or receive encoded face images from other electronic devices Depth images of faces and 2D images of faces.
  • the electronic device 100 can use the mobile communication module 150 to send the encoded subunit of the depth image of the face and the subunit of the two-dimensional image of the face to other electronic devices, and/or receive the encoded subunits from other electronic devices.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the electronic device 100 may use the GPU to superimpose the three-dimensional sub-image of the face and the two-dimensional image of the scene.
  • Display screen 194 is used to display images, videos, and the like.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 may include 1 or N cameras 193 .
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the camera 193 is used to capture still images or video.
  • the camera 193 may include a time of flight (TOF) sensor, a three-dimensional structured light sensor, a color (red green blue, RGB) sensor, and the like.
  • TOF time of flight
  • RGB color
  • the electronic device 100 may use the camera 193 to collect a depth image of a human face and a two-dimensional image of the human face.
  • the electronic device 100 may use a video codec to encode the human face depth image and the human face two-dimensional image, and/or obtain the human face depth image and the human face two-dimensional image through decoding.
  • the electronic device 100 may use a video codec to encode the subunits of the depth image of the face and the subunits of the two-dimensional image of the face, and/or obtain the subunits of the depth image of the face and the face by decoding.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • an external memory card such as a Micro SD card
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121, and/or instructions stored in a memory provided in the processor.
  • the internal memory 121 may be used to store an artificial intelligence algorithm model, and/or a three-dimensional face generation algorithm model, and the like.
  • the audio module 170 includes speakers, receivers, microphones, headphone jacks, and the like.
  • the audio module 170 is used for converting digital audio data into analog audio electrical signal output, and also for converting analog audio electrical signal input into digital audio data, and the audio module 170 may include an analog/digital converter and a digital/analog converter.
  • the electronic device 100 may implement audio functions through the audio module 170, an application processor, and the like. Such as music playback, recording, etc.
  • the sensor module 190 may include a pressure sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the electronic device 100 may detect an adjustment action by using a touch sensor, so as to adjust the angle of the face in the three-dimensional image of the face displayed on the display screen 194 .
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present invention takes an Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 as an example.
  • FIG. 3 is a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, calendar, map, WLAN, music, short message, gallery, call, and navigation.
  • the call application can be used to implement a three-dimensional video call.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • the 3D video call can also be implemented as a module in the application framework layer of the electronic device, such as a 3D video call module.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • FIG. 4 is a second schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 400 shown in FIG. 4 may be the first electronic device, that is, the electronic device 400 may be the sending end of the 3D video.
  • the first electronic device 400 provided in this embodiment of the present application may include a face image acquisition module 410 , a video encoding module 420 , and a network transmission module 430 .
  • the first electronic device 400 may further include a face three-dimensional generation module 440 and a display module 450.
  • the modules shown in FIG. 4 can be implemented by electronic hardware, computer software, or a combination of computer software and electronic hardware.
  • the module shown in FIG. 4 can be implemented as a call application in the application layer shown in FIG. 3 , or the module shown in FIG. 4 can also be implemented as an application program shown in FIG. 3 . 3D video call module in the framework layer.
  • the face image acquisition module 410, the video encoding module 420, and the three-dimensional face generation module 440 can be implemented as the processor 110 shown in FIG. 2, and the network transmission module 430 can be implemented as the mobile communication module shown in FIG. 2 150.
  • the display module 450 may be implemented as the display screen 194 shown in FIG. 2 .
  • the above-described manner of using software implementation and the manner of using hardware implementation may be combined, which will not be repeated in this embodiment of the present application.
  • the face image acquisition module 410 can be used to obtain a face depth image and a two-dimensional face image, divide the face depth image into a plurality of subunits including a first subunit and a second subunit, and divide the face two The dimensional image is divided into a plurality of subunits including a third subunit and a fourth subunit.
  • the face image acquisition module 410 can be configured to send the first subunit and the third subunit to the following video encoding module 420; after sending the first subunit and the third subunit, send the first subunit and the third subunit to the following video encoding module 420: The second subunit and the fourth subunit are sent.
  • the first subunit corresponds to the third subunit
  • the second subunit corresponds to the fourth subunit.
  • the face image acquisition module 410 can be configured to send the first subunit and the third subunit to the following three-dimensional face generation module 440; after sending the first subunit and the third subunit, send the first subunit and the third subunit to the following person:
  • the face 3D generation module 440 sends the second subunit and the fourth subunit.
  • the face image acquisition module 410 may be configured to acquire a two-dimensional image of the scene, and send the two-dimensional image of the scene to the following three-dimensional face generation module 440 and/or video encoding module 420.
  • the two-dimensional image of the scene includes the scene image in the current video scene.
  • the face image acquisition module 410 may be specifically configured to receive face depth information, receive two-dimensional face information, obtain a face depth image according to the face depth information, and obtain a second face according to the face two-dimensional information dimensional image.
  • the face depth information may be collected by a high-precision depth camera, such as a TOF sensor, a three-dimensional structured light sensor, and the like.
  • the two-dimensional face information may include face information and scene information in the current video scene.
  • the two-dimensional face information includes only face information and does not include scene information.
  • the face image acquisition module 410 may be configured to receive two-dimensional scene information, and obtain a scene image according to the two-dimensional scene information.
  • both the face two-dimensional information and the scene two-dimensional information may be collected by a two-dimensional camera, such as an RGB sensor.
  • the face image collection module 410 may be specifically configured to collect face depth information, collect face two-dimensional information, obtain a face depth image according to the face depth information, and obtain a face according to the face two-dimensional information 2D image. That is, the face image acquisition module 410 may include a module for acquiring depth information of a human face and a module for acquiring two-dimensional information of a human face. Optionally, the face image acquisition module 410 may be specifically configured to acquire two-dimensional scene information.
  • FIG. 5 is a schematic structural diagram of a face image acquisition module provided by an embodiment of the present application.
  • the face image acquisition module 410 may include: a face depth image acquisition sub-module 411 , a two-dimensional image acquisition sub-module 412 and an image signal processing (image signal processing, ISP) sub-module 413 .
  • ISP image signal processing
  • the face depth image collection sub-module 411 can be used to collect the face depth information in the current video scene, and send it to the following image signal processing sub-module 413 .
  • the face depth image acquisition sub-module 411 may be a high-precision depth camera, which may include, but is not limited to, TOF sensors and three-dimensional structured light sensors.
  • the TOF sensor can continuously send light pulses to the target object, and then receive the light returned from the target object, obtain the distance from itself to the target object by detecting the flight (round-trip) time of sending and receiving light pulses, and generate depth information.
  • the three-dimensional structured light sensor obtains the depth information of the surface of the target object by projecting structured light on the surface of the target object and receiving the light reflected from the surface of the target object.
  • the two-dimensional image acquisition sub-module 412 can be used to acquire the two-dimensional information of the face in the current video scene, and send it to the following image signal processing sub-module 413 .
  • the two-dimensional image acquisition sub-module 412 may be an RGB sensor, etc., which is not limited in this application.
  • the two-dimensional image acquisition sub-module 412 can be configured to acquire two-dimensional information of the scene in the current video scene, and send the information to the following image signal processing sub-module 413 .
  • the ISP sub-module 413 can be used to receive the face depth information from the face depth image collection sub-module 411 and the face two-dimensional information from the two-dimensional image collection sub-module 412, and obtain the face depth image according to the face depth information, A two-dimensional image of the face is obtained according to the two-dimensional information of the face.
  • the ISP sub-module 413 may be configured to obtain a two-dimensional image of the scene according to the two-dimensional information of the scene.
  • the ISP sub-module 413 can be used to divide the depth image of the face into multiple subunits including the first subunit and the second subunit; divide the two-dimensional image of the face into the third subunit and the fourth subunit of multiple subunits.
  • the ISP submodule 413 can be used to send the first subunit and the third subunit to the following video encoding module 420; after sending the first subunit and the third subunit, send the second subunit to the following video encoding module 420 and the fourth subunit.
  • the ISP sub-module 413 may be configured to send a two-dimensional image of the scene to the video encoding module 420 described below.
  • the ISP submodule 413 can be used to send the first subunit and the third subunit to the following three-dimensional face generation module 440; after sending the first subunit and the third subunit, send the first subunit and the third subunit to the three-dimensional face generation module 440 The second subunit and the fourth subunit are sent.
  • the ISP sub-module 413 can be used to send the two-dimensional image of the scene to the following three-dimensional face generation module 440 .
  • the video coding module 420 can be configured to obtain the first coding unit according to the first subunit and the third subunit, and send the first coding unit to the network transmission module 430; after obtaining and sending the first coding unit, obtain and send the first coding unit according to the second subunit and The fourth subunit obtains the second coding unit, and sends the second coding unit to the network transmission module 430 .
  • the video encoding module 420 uses a pair of subunits as granularity to pipeline and encode multiple subunits included in the face depth image and multiple subunits included in the face two-dimensional image, and the pair of subunits includes the subunits of the face depth image.
  • the time delay of face 3D images improves the efficiency of acquiring 3D videos.
  • the video encoding module 420 is specifically configured to: encode the third subunit to obtain the third encoding unit; obtain the first encoding unit according to the first subunit and the third encoding unit; The subunit is coded to obtain the fourth coding unit; and the second coding unit is obtained according to the second subunit and the fourth coding unit. That is, the video encoding module 420 can encode the subunits of the two-dimensional face image to obtain the third encoding unit, and then encode the subunits of the depth image of the face and the third encoding unit to obtain the first encoding unit.
  • this embodiment of the present application does not limit the order in which the electronic device encodes the first subunit and the third subunit.
  • the first subunit may be encoded first, and then the encoded first subunit may be encoded. Perform mixed coding with the third subunit to obtain the first coding unit.
  • the video encoding module 420 may encode the two-dimensional scene image, and send the encoded two-dimensional scene image to the network transmission module 430 .
  • the network transmission module 430 can be configured to send the encoded depth image of the face and the two-dimensional image of the face to the second electronic device.
  • the network transmission module 430 may be configured to send the first encoding unit to the second electronic device; after sending the first encoding unit, send the second encoding unit to the second electronic device.
  • the network transmission module 430 can pipelinely send the encoding units corresponding to multiple pairs of subunits to the second electronic device, so that the second electronic device can obtain the three-dimensional face sub-image in a pipelined manner, which can reduce the need for the second electronic device to obtain the three-dimensional face of the human face.
  • the delay of images reduces the delay of obtaining 3D video.
  • the network transmission module 430 may be configured to send the encoded two-dimensional image of the scene to the second electronic device.
  • the three-dimensional face generation module 440 may be configured to obtain a three-dimensional face image according to the face depth image and the two-dimensional face image.
  • the face 3D generation module 440 can be configured to obtain the first face 3D sub-image according to the first subunit and the third subunit, and send the first face 3D sub-image to the following display module 450; After the first three-dimensional face sub-image is obtained, the second three-dimensional face sub-image is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional face sub-image is sent to the display module 450 described below.
  • the three-dimensional face generation module 440 may be configured to receive a two-dimensional image of the scene from the face image acquisition module 410 .
  • the three-dimensional face generation module 440 can be used to obtain a two-dimensional image of the scene according to the two-dimensional image of the face, and send it to the display module 450 described below.
  • the two-dimensional image of the face includes a scene image and a face image.
  • the three-dimensional face generation module 440 may be configured to send the two-dimensional image of the scene to the display module 450 .
  • the display module 450 is configured to superimpose the first face three-dimensional sub-image and the scene two-dimensional image. After the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed. In this way, the display module 450 can superimpose a plurality of three-dimensional sub-images of the human face and the two-dimensional image of the scene in a pipelined manner to display the three-dimensional image of the human face.
  • FIG. 6 is a third schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 600 shown in FIG. 6 may be the second electronic device, that is, the electronic device 600 may be the receiving end of the 3D video.
  • the second electronic device 600 provided in this embodiment of the present application may include a network transmission module 610 , a video decoding module 620 , a three-dimensional face generation module 630 , and a display module 640 .
  • the second electronic device 600 may further include a touch module 650 .
  • the modules shown in FIG. 6 can be implemented by electronic hardware, computer software, or a combination of computer software and electronic hardware.
  • the module shown in FIG. 6 can be implemented as a call application in the application layer shown in FIG. 3 , or the module shown in FIG. 6 can also be implemented as the application shown in FIG. 3 .
  • the video decoding module 620 and the three-dimensional face generation module 630 may be implemented as the processor 110 shown in FIG. 2
  • the network transmission module 610 may be implemented as the mobile communication module 150 shown in FIG. 2
  • the display module 640 may be implemented as the processor 110 shown in FIG. 2 .
  • the touch module 650 may be implemented as the sensor module 190 shown in FIG. 2 .
  • the touch module 650 When implemented by a combination of computer software and electronic hardware, the above-described manner of using software implementation and the manner of using hardware implementation may be combined, which will not be repeated in this embodiment of the present application.
  • the network transmission module 610 can be used to receive the encoded depth image of the face and the two-dimensional image of the face.
  • the network transmission module 610 can be configured to receive the first encoding unit from the first electronic device, and send the first encoding unit to the video decoding module 620; after receiving and sending the first encoding unit, receive the first encoding unit from the first electronic device.
  • the second coding unit sending the second coding unit to the video decoding module 620 .
  • the network transmission module 610 may be configured to receive the encoded two-dimensional image of the scene and send it to the video decoding module 620 .
  • the video decoding module 620 can be used for decoding the encoded face depth image and face two-dimensional image. Specifically, the video decoding module 620 can be configured to obtain the first subunit and the third subunit according to the first coding unit, and send the first subunit and the third subunit to the following three-dimensional face generation module 630; After the first subunit and the third subunit, the second subunit and the fourth subunit are obtained according to the second encoding unit, and the second subunit and the fourth subunit are sent to the following three-dimensional face generation module 630.
  • the first subunit and the second subunit are respectively subunits in the face depth image
  • the third subunit and the fourth subunit are respectively subunits in the face two-dimensional image
  • the first subunit corresponds to the third subunit unit
  • the second subunit corresponds to the fourth subunit.
  • the video decoding module 620 may be specifically configured to parse the first coding unit to obtain the first subunit and the third coding unit; and decode the third coding unit to obtain the third subunit. After the third coding unit is decoded, the second coding unit is parsed to obtain the second subunit and the fourth coding unit; the fourth coding unit is decoded to obtain the fourth subunit. It should be noted that the decoding method of the video decoding module 620 corresponds to the encoding method of the video encoding module 420, and the present application does not limit the specific decoding method of the video decoding module 620.
  • the video decoding module 620 may be configured to decode the encoded two-dimensional scene image to obtain the two-dimensional scene image, and send the two-dimensional scene image to the face three-dimensional generation module 630 .
  • the face 3D generation module 630 can be configured to obtain a face 3D image according to the face depth image and the face 2D image.
  • the three-dimensional face generation module 630 may be configured to obtain the first three-dimensional face sub-image according to the first subunit and the third subunit, and send the first three-dimensional face sub-image to the display module 640 .
  • the second three-dimensional face sub-image is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional face sub-image is sent to the display module 640 .
  • the three-dimensional face generation module 630 can be used to obtain a two-dimensional image of the scene according to the two-dimensional image of the face.
  • the two-dimensional image of the face includes a scene image and a face image.
  • the three-dimensional face generation module 630 may be configured to send a two-dimensional image of the scene to the display module 640 described below.
  • the display module 640 is configured to superimpose the first face three-dimensional sub-image and the scene two-dimensional image. After the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed. In this way, the display module 640 can superimpose a plurality of three-dimensional sub-images of the human face and the two-dimensional image of the scene in a pipelined manner to display the three-dimensional image of the human face.
  • the touch module 650 is used to detect the adjustment action.
  • the display module 640 may be configured to adjust and display the angle of the face in the overall three-dimensional image according to the adjustment action. That is, the electronic device 600 can display different angles of the face in the three-dimensional video by adjusting.
  • FIG. 7 is a schematic flow chart 1 of a method for a three-dimensional video call provided by an embodiment of the present application.
  • the three-dimensional video call method includes the following steps:
  • the first electronic device acquires a face depth image and a two-dimensional face image, divides the face depth image into a plurality of subunits including a first subunit and a second subunit, and divides the face two-dimensional image into a plurality of subunits including a first subunit and a second subunit. Multiple subunits of the three subunits and the fourth subunit.
  • the first subunit corresponds to the third subunit
  • the second subunit corresponds to the fourth subunit. That is to say, the subunits of the depth image of the face correspond to the subunits of the two-dimensional image of the face one-to-one.
  • FIG. 8 is a schematic diagram of a depth image of a face and a two-dimensional image of a face provided by an embodiment of the present application. Take the face depth image and face two-dimensional image divided into 4 subunits as an example. As shown in Figure 8, the depth image of the face is divided into subunit 1, subunit 2, subunit 3 and subunit 4, and the two-dimensional face image is divided into subunit 5, subunit 6, subunit 7 and subunit 8 . Wherein, subunit 1 corresponds to subunit 5, subunit 2 corresponds to subunit 6, subunit 3 corresponds to subunit 7, and subunit 4 corresponds to subunit 8.
  • FIG. 8 is only a method of dividing the depth image of the face and the two-dimensional image of the face into a plurality of subunits proposed by the embodiment of the present application.
  • the depth image of the face and the two-dimensional face image can also be divided
  • the dimensional image is divided into a plurality of subunits along the vertical direction, which is not limited in this application.
  • the first electronic device acquires the depth image of the face and the two-dimensional image of the face, which may include the following steps 1 to 3.
  • Step 1 the first electronic device receives face depth information.
  • the face depth information may be collected by a high-precision depth camera, such as a TOF sensor, a three-dimensional structured light sensor, and the like.
  • FIG. 9 is a schematic diagram 1 of the application of the first electronic device provided by the embodiment of the present application.
  • the face image acquisition module 410 may receive face depth information.
  • Step 2 the first electronic device receives the two-dimensional information of the face.
  • the face two-dimensional information may include face information.
  • the two-dimensional face information may also include scene information in the current video scene.
  • the first electronic device may receive scene two-dimensional information.
  • the two-dimensional scene information may include scene information.
  • both the face two-dimensional information and the scene two-dimensional information may be collected by a two-dimensional camera, such as an RGB sensor.
  • the face image acquisition module 410 can receive two-dimensional information of the face.
  • the face image acquisition module 410 may also receive two-dimensional scene information.
  • Step 3 the first electronic device obtains a face depth image according to the face depth information, and obtains a face two-dimensional image according to the face two-dimensional information.
  • the face image acquisition module 410 converts the face depth information into a face depth image, and converts the face two-dimensional information into a face two-dimensional image.
  • the two-dimensional face information includes face information and does not include scene information
  • the two-dimensional face image includes the face image
  • the two-dimensional face image includes the human face. face image and scene image.
  • the face image acquisition module 410 can convert the two-dimensional information of the scene into a two-dimensional image of the scene.
  • the embodiment of the present application does not limit the execution order of the above steps 1 to 3, and the depth image of the face and the two-dimensional image of the face can be obtained as the criterion.
  • the face image acquisition module 410 divides the face depth image into multiple subunits including the first subunit and the second subunit, and divides the face two-dimensional image into the third subunit and the fourth subunit. of multiple subunits.
  • the face image acquisition module 410 transmits multiple subunits of the face depth image to the video encoding module 420 and/or the face three-dimensional generation module 440 through the face depth image buffer.
  • the face image acquisition module 410 transmits the multiple subunits of the face two-dimensional image and/or the scene two-dimensional image to the video encoding module 420 and/or the face three-dimensional generation module 440 through the face two-dimensional image cache.
  • the first electronic device acquires the depth image of the face and the two-dimensional image of the face, which may include the following steps 4 to 6.
  • Step 4 The first electronic device collects face depth information.
  • FIG. 10 is a second application schematic diagram of the first electronic device provided by the embodiment of the present application.
  • the face image acquisition module 410 may include: a face depth image acquisition sub-module 411 , a two-dimensional image acquisition sub-module 412 and an image signal processing sub-module 413 .
  • the face depth image collection sub-module 411 may collect face depth information, and send the face depth information to the image signal processing sub-module 413 .
  • the difference between the first electronic device shown in FIG. 10 and the first electronic device shown in FIG. 9 is that the structure of the face image acquisition module 410 is different, and other parts are the same.
  • the following specific descriptions about the video encoding module 420 , the network transmission module 430 , the three-dimensional face generation module 440 and the display module 450 are applicable to the first electronic device shown in FIG. 9 and FIG. 10 .
  • Step 5 the first electronic device collects two-dimensional information of the face.
  • the two-dimensional image acquisition sub-module 412 may acquire two-dimensional face information and/or scene two-dimensional information, and send the face two-dimensional information and/or scene two-dimensional information to the image signal processing sub-module 413 .
  • the image signal processing sub-module 413 may be made to the above-mentioned step 2, which will not be repeated here.
  • Step 6 The first electronic device obtains a face depth image according to the face depth information, and obtains a face two-dimensional image according to the face two-dimensional information.
  • the image signal processing sub-module 413 may receive the face depth information from the face depth image acquisition sub-module 411, and convert the face depth information into a face depth image.
  • the image signal processing sub-module 413 can receive the two-dimensional face information and/or scene two-dimensional information from the two-dimensional image acquisition sub-module 412, convert the face two-dimensional information into a face two-dimensional image, and convert the scene two-dimensional information into 2D image of the scene.
  • step 3 For the specific description of the two-dimensional image of the face and the two-dimensional image of the scene, reference may be made to step 3 above.
  • This embodiment of the present application does not limit the execution order of the above-mentioned steps 4 to 6, and it is subject to the ability to obtain a face depth image and a two-dimensional face image.
  • the image signal processing sub-module 413 in the face image acquisition module 410 divides the face depth image into a plurality of sub-units including a first sub-unit and a second sub-unit, and divides the face two-dimensional image into a plurality of sub-units including the first sub-unit and the second sub-unit. Multiple subunits of the three subunits and the fourth subunit.
  • the image signal processing sub-module 413 transmits multiple sub-units of the face depth image to the video encoding module 420 and/or the face 3D generation module 440 through the face depth image buffer.
  • the image signal processing sub-module 413 transmits multiple subunits of the two-dimensional face image and/or two-dimensional scene images to the video encoding module 420 and/or the three-dimensional face generation module 440 through the two-dimensional face image cache.
  • the first electronic device performs pipeline processing on the face depth image and the face two-dimensional image with a pair of subunits as the granularity.
  • a pair of subunits includes a subunit of a face depth image and a subunit of a two-dimensional face image corresponding to the subunit of the face depth image.
  • FIG. 11 is a second schematic flowchart of a method for a 3D video call provided by an embodiment of the present application.
  • the sub-slice may include a pair of sub-units, or information related to the pair of sub-units. 8 and 11 , sub-slice 1 may include sub-unit 1 and sub-unit 5, or sub-slice 1 may be coding unit 1a obtained by processing sub-unit 1 and sub-unit 5, or face 3D sub-image 1.
  • Sub-slice 2 may include sub-unit 2 and sub-unit 6, or, sub-slice 2 may be coding unit 2a obtained after sub-unit 2 and sub-unit 6 are processed, or three-dimensional face sub-image 2.
  • sub-slice 3 may include sub-unit 3 and sub-unit 7
  • sub-slice 3 may be encoding unit 3 a obtained by processing sub-unit 3 and sub-unit 7 , or three-dimensional face sub-image 3
  • the sub-slice 4 may include the sub-unit 4 and the sub-unit 8, or the sub-slice 4 may be the coding unit 4a obtained after processing the sub-unit 4 and the sub-unit 8, or the three-dimensional face sub-image 4.
  • the period for processing video is T, and the face image acquisition module 410 can obtain subunit 1 and subunit 5 within the first T time, and send them to the video encoding module 420 .
  • subunit 2 and subunit 6 are obtained and sent to the video encoding module 420 .
  • subunit 3 and subunit 7 are obtained and sent to the video encoding module 420 .
  • subunit and subunit 8 are obtained and sent to the video encoding module 420 .
  • the face image acquisition module 410 can use a pair of subunits as granularity, and transmits multiple subunits of the depth image of the face and multiple subunits of the two-dimensional image of the face to the face three-dimensional generation module 440 in a pipelined manner, as shown in FIG. 11 . It is not shown and will not be repeated here.
  • the time period T shown in FIG. 11 is equal to the maximum value of the time for the module of the electronic device to process each sub-slice.
  • this embodiment of the present application does not limit the size of the time period T.
  • the modules of the electronic device include, but are not limited to, the modules shown in FIGS. 4-6 .
  • the electronic device may process the individual sub-slices using a pipelined processing scheme.
  • the module of the electronic device may process the sub-slice 1 within a time T, and then process the sub-slice 2.
  • the face image acquisition module 410 may acquire subunit 1 and subunit 5 at the beginning of the first T time, and send them to the video encoding module 420, and complete the process at three-quarters of the first T time. .
  • subunit 2 and subunit 6 are acquired and sent to the video encoding module 420, and the process is completed at half of the second T time.
  • each module of the electronic device completes the processing of each sub-chip, which will not be listed one by one here.
  • the electronic device may use a timing processing scheme to process each sub-slice.
  • a timing processing scheme to process each sub-slice.
  • the time period T is greater than the time for each module in the electronic device to process each sub-slice
  • one or more modules in the multiple modules of the electronic device may process each sub-slice periodically. For example, after the processing of sub-slice 1 is completed at three-quarters of the first T time, sub-slice 2 is not processed immediately, but is not processed until the second T time begins.
  • the face image acquisition module 410 acquires subunit 1 and subunit 5 at the beginning of the first T time, and sends them to the video encoding module 420, and the process is completed at three-quarters of the first T time.
  • subunit 2 and subunit 6 are acquired and sent to the video encoding module 420, and the process is completed at three quarters of the second T time. Wait for some more time, and when the third T time begins, start processing the next sub-slice. Similarly, each module of the electronic device completes the processing of each sub-chip, which will not be listed one by one here.
  • each sub-module of the electronic device processes each sub-chip at a regular time, nor does it limit the timing of processing each sub-chip by some sub-modules or all modules of the electronic device, which may be in the electronic device.
  • Some modules process each sub-slice regularly, or each module in the electronic device processes each sub-slice regularly, or each module in the electronic device processes each sub-slice irregularly.
  • the electronic device may process each sub-slice in the following manner, subject to the normal operation of the electronic device. For example, for a pipelined processing solution, if the processing time of M sub-slices is less than the first sub-slice threshold, the module of the electronic device may process each sub-slice in a pipelined manner, wherein the first sub-slice threshold may be a preset processing time The maximum time of the M sub-slices; otherwise, the module of the electronic device may process some sub-slices in the M sub-slices, and discard the processing of the other sub-slices (for example, the processing results of the sub-slices corresponding to the previous frame of image may be used substitution), M is an integer greater than 1.
  • the face image acquisition module 410 can discard the processing of the sub-slice 1 (for example, the previous frame can be used.
  • the processing result of the sub-slice 1 corresponding to the image is replaced), and at the beginning of the second T time, the sub-slice 2 is periodically processed.
  • the time that one or more modules of the electronic device process one or more sub-slices is longer than the time period T, which may be caused by a failure of one or some modules of the electronic device, etc., which is not limited in this application.
  • Discarding the processing of the sub-slice includes: terminating the processing of the sub-slice when the processing of the sub-slice has been started but no processing result is obtained.
  • the first electronic device obtains the first coding unit according to the first subunit and the third subunit; after obtaining the first coding unit, obtains the second coding unit according to the second subunit and the fourth subunit.
  • the above S702 may include the following steps seven to ten.
  • Step 7 The first electronic device encodes the third subunit to obtain a third encoding unit.
  • the third coding unit may be a video coding layer (video coding layer, VCL) type network abstraction layer (network abstraction layer, NAL) unit.
  • VCL video coding layer
  • NAL network abstraction layer
  • FIG. 12 is a schematic structural diagram of a code stream provided by an embodiment of the present application.
  • the code stream may include a header (NALU Header), a (sequence paramater set, SPS) sequence parameter set, a (picture paramater set, PPS) image parameter set, and at least one coding unit.
  • the first electronic device may encode the subunit 5 as an encoding unit 1b, which is a VCL type NAL unit.
  • the video encoding module 420 may encode the third subunit to obtain the third encoding unit.
  • Step 8 the first electronic device obtains the first encoding unit according to the first subunit and the third encoding unit.
  • the first coding unit may be a NAL unit of supplemental enhancement information (SEI) type.
  • SEI Supplemental Enhancement Information
  • the first electronic device can fill the subunit 1 into the preset field of the coding unit 1b to obtain the coding unit 1a, and the coding unit 1a is an SEI type NAL unit.
  • the preset field may be a supplemental enhancement information SEI field.
  • the video encoding module 420 may encode the first subunit and the third coding unit to obtain the first coding unit.
  • this embodiment of the present application does not limit the order in which the electronic device encodes the first subunit and the third subunit.
  • the first subunit may be encoded first, and then the encoded first subunit may be encoded. Perform mixed coding with the third subunit to obtain the first coding unit.
  • the first electronic device encodes the subunit of the depth image of the face and the subunit of the two-dimensional image of the face into the same encoding unit, so as to avoid the subunit of the depth image of the face and the subunit of the two-dimensional image of the face.
  • the subunits are transmitted independently, which can reduce the complexity increased by the transmission of multiple code streams and time synchronization.
  • Step 9 the first electronic device encodes the fourth subunit to obtain the fourth encoding unit.
  • the fourth coding unit may be a VCL type NAL unit.
  • the first electronic device may encode the subunit 6 as a coding unit 2b, and the coding unit 2b is a VCL type NAL unit.
  • the video encoding module 420 may encode the fourth subunit to obtain the fourth encoding unit.
  • step ten the first electronic device obtains the second encoding unit according to the second subunit and the fourth encoding unit.
  • the second coding unit may be an SEI type NAL unit.
  • the first electronic device can fill the subunit 2 into the preset field of the coding unit 2b to obtain the coding unit 2a, and the coding unit 2a is an SEI type NAL unit.
  • the preset field may be a supplemental enhancement information SEI field.
  • the video encoding module 420 may encode the second subunit and the fourth encoding unit to obtain the second encoding unit.
  • the first electronic device can obtain the encoding unit 3b, the encoding unit 3a, the encoding unit 4b, and the encoding unit 4a in the manner described in the above steps seven to eight, which will not be described in detail here.
  • the video encoding module of the first electronic device can perform pipeline encoding on the subunits of the depth image of the face and the subunits of the two-dimensional face image received in a pipelined manner, thereby reducing the number of people obtained by the second electronic device. Time-lapse of 3D images of faces.
  • the video encoding module 420 may encode the first subunit and the third subunit to obtain the first encoding unit; after obtaining the first encoding unit, perform encoding on the second subunit and the fourth subunit coding to obtain a second coding unit.
  • the video encoding module 420 may send the encoded first encoding unit and the second encoding unit to the network transmission module 430 through the encoding unit buffer.
  • the video encoding module 420 may encode the subunit 1 and the subunit 5 to obtain the encoding unit 1a, and send it to the network transmission module 430 .
  • the video encoding module 420 encodes the sub-unit 2 and the sub-unit 6 to obtain the encoding unit 2a, and sends it to the network transmission module 430.
  • the video encoding module 420 can encode the sub-unit 3 and the sub-unit 7, obtain the encoding unit 3a, and send it to the network transmission module 430.
  • the video encoding module 420 may encode the sub-unit 4 and the sub-unit 8, obtain the encoding unit 4a, and send it to the network transmission module 430.
  • the first electronic device sends the first encoding unit to the second electronic device; after sending the first encoding unit, sends the second encoding unit to the second electronic device.
  • the network transmission module 430 may send the first coding unit to the second electronic device; after sending the first coding unit, send the second coding unit to the second electronic device.
  • the first electronic device may send the encoded face depth image subunit and the face two-dimensional image subunit to the second electronic device in a pipelined manner, so that the second electronic device can obtain the three-dimensional face of the human face in a pipelined manner.
  • image which can reduce the time delay for the second electronic device to obtain the three-dimensional image of the face, and improve the efficiency of obtaining the three-dimensional video.
  • the network transmission module 430 may send the encoding unit 1a to the second electronic device.
  • the network transmission module 430 sends the encoding unit 2a to the second electronic device.
  • the network transmission module 430 sends the encoding unit 3a to the second electronic device.
  • the network transmission module 430 sends the encoding unit 4a to the second electronic device.
  • the first electronic device may periodically send the first coding unit and the second coding unit to the second electronic device.
  • the network transmission module 430 may periodically send the first encoding unit and the second encoding unit to the second electronic device.
  • the first electronic device may display a three-dimensional video of the current video scene of the user of the first electronic device, and the first electronic device may display multiple subunits of the depth image of the face and multiple subunits of the two-dimensional image of the face Perform pipeline processing to improve the efficiency of acquiring 3D video.
  • the three-dimensional video calling method may further include: the first electronic device obtains the first face three-dimensional sub-image according to the first subunit and the third subunit; after obtaining the first face three-dimensional sub-image; After the image is obtained, a second three-dimensional sub-image of the face is obtained according to the second sub-unit and the fourth sub-unit.
  • the three-dimensional face generation module 440 may obtain the first three-dimensional face sub-image according to the first subunit and the third subunit, and send the first three-dimensional face sub-image to the display module 450 . After the first three-dimensional face sub-image is obtained and sent, the second three-dimensional face sub-image is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional face sub-image is sent to the display module 450 .
  • the 3D face generation module 440 may use an artificial intelligence algorithm model or a 3D face generation algorithm model to obtain a 3D face sub-image according to the subunits of the depth image of the face and the subunits of the 2D face image.
  • the three-dimensional face generation module 440 may generate a three-dimensional face sub-image 1 according to the sub-unit 1 and the sub-unit 5, and send it to the display module 450. Then, the three-dimensional sub-image 2 of the face is generated according to the sub-unit 2 and the sub-unit 6 and sent to the display module 450 . The three-dimensional face sub-image 3 is generated according to the sub-unit 3 and the sub-unit 7 and sent to the display module 450 . Finally, the three-dimensional sub-image 4 of the face is generated according to the sub-unit 4 and the sub-unit 8 and sent to the display module 450 . It should be noted that the three-dimensional face generation module 440 shown in FIG. 10 can implement the same function, which will not be repeated here.
  • the three-dimensional face generation module 440 may send the two-dimensional image of the scene to the display module 450 .
  • the two-dimensional face image includes a scene image and a face image
  • the two-dimensional scene image may be obtained by the three-dimensional face generation module 440 according to the two-dimensional face image.
  • the two-dimensional image of the scene may be sent by the face image acquisition module 410 to the three-dimensional face generation module 440 .
  • the three-dimensional video calling method provided in the embodiment of the present application may further include: the first electronic device superimposes the three-dimensional sub-image of the first face and the two-dimensional image of the scene; After the two-dimensional images are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed.
  • the display module 450 may superimpose the three-dimensional sub-image of the first face and the two-dimensional image of the scene. After the first face three-dimensional sub-image and the scene two-dimensional image are superimposed, the second face three-dimensional sub-image and the scene two-dimensional image are superimposed.
  • the display subsystem of the display module 450 can pipelined receive multiple face 3D sub-images and scene 2D images through the preview cache, and pipelined the face 3D subsystem and the scene 2D image to superimpose to obtain a human face. The three-dimensional image is then transmitted to the display for display.
  • the display module 450 may superimpose the face 3D sub-image 1 with the scene 2D image, then superimpose the face 3D sub-image 2 with the scene 2D image, and overlay the face 3D sub-image 3 with the scene 2.
  • the three-dimensional images are superimposed, and finally, the three-dimensional sub-image 4 of the face is superimposed with the two-dimensional image of the scene, so as to obtain a three-dimensional image of the face.
  • the first electronic device can obtain multiple frames of three-dimensional images of the face by performing the above-mentioned S701-S704 multiple times, so that the three-dimensional video of the user of the first electronic device can be displayed.
  • the processing of the image and the two-dimensional image of the face can reduce the time delay for the first electronic device to obtain the three-dimensional image of the face, thereby improving the efficiency of displaying the three-dimensional video.
  • the second electronic device receives the first encoding unit from the first electronic device; after receiving the first encoding unit, receives the second encoding unit from the first electronic device.
  • FIG. 13 is a schematic diagram of the application of the second electronic device provided by the embodiment of the present application.
  • the network transmission module 610 can be used to receive the first coding unit from the first electronic device, and send the first coding unit to the video decoding module 620 .
  • the second coding unit from the first electronic device is received, and the second coding unit is sent to the video decoding module 620 .
  • the network transmission module 610 obtains the coding unit 1a and sends the coding unit 1a to the video decoding module 620 .
  • the encoding unit 2a is acquired and sent to the video decoding module 620.
  • the encoding unit 3a is acquired and sent to the video decoding module 620.
  • the encoding unit 4a is acquired and sent to the video decoding module 620.
  • the second electronic device obtains the first subunit and the third subunit according to the first coding unit; after obtaining the first subunit and the third subunit, obtains the second subunit and the fourth subunit according to the second coding unit .
  • the video decoding module 620 may decode the first coding unit to obtain the first sub-unit and the third sub-unit.
  • the video decoding module 620 may transmit the first subunit to the face 3D generation module 440 through the face depth image buffer, and transmit the third subunit to the face 3D generation module 440 through the face 2D image buffer.
  • the video decoding module 620 may decode the second coding unit to obtain the second subunit and the fourth subunit.
  • the video decoding module 620 may transmit the second subunit to the face 3D generation module 440 through the face depth image buffer, and transmit the fourth subunit to the face 3D generation module 440 through the face 2D image buffer.
  • the video decoding module 620 decodes the coding unit 1a to obtain the sub-unit 1 and the sub-unit 5 in the fifth time T, and sends them to the face three-dimensional generation module 630 .
  • the sub-unit 2 and the sub-unit 6 are obtained by decoding the coding unit 2a, and sent to the face three-dimensional generating module 630.
  • the sub-unit 3 and the sub-unit 7 are obtained by decoding the encoding unit 3a, and sent to the face three-dimensional generating module 630.
  • the sub-unit 4 and the sub-unit 8 are obtained by decoding the coding unit 4a, and sent to the face three-dimensional generating module 630.
  • the above S705 may include the following steps eleven and fourteen.
  • step eleven the second electronic device parses the first coding unit to obtain the first sub-unit and the third coding unit.
  • the video decoding module 620 may parse the first coding unit to obtain the first sub-unit and the third coding unit.
  • the video decoding module 620 may parse the coding unit 1a to obtain the sub-unit 1 and the coding unit 1b.
  • step 12 the second electronic device decodes the third coding unit to obtain the third sub-unit.
  • the video decoding module 620 may decode the third coding unit to obtain the third sub-unit.
  • the video decoding module 620 may decode the coding unit 1b to obtain the sub-unit 5 .
  • Step 13 After decoding the third coding unit, the second electronic device parses the second coding unit to obtain the second subunit and the fourth coding unit.
  • the video decoding module 620 may parse the second coding unit to obtain the second sub-unit and the fourth coding unit.
  • the video decoding module 620 may parse the coding unit 2a to obtain the sub-unit 2 and the coding unit 2b.
  • Step fourteen the second electronic device decodes the fourth coding unit to obtain the fourth sub-unit.
  • the video decoding module 620 may decode the fourth coding unit to obtain the fourth sub-unit.
  • the video decoding module 620 may decode the coding unit 2b to obtain the sub-unit 6 .
  • the second electronic device may obtain the subunit 3, the subunit 7, the subunit 4, and the subunit 8 in the manner described in the above steps eleven to twelve, which will not be described in detail here.
  • the video decoding module 620 parses the coding unit 1a in the fifth T time, obtains the subunit 1 and the coding unit 1b, decodes the coding unit 1b to obtain the subunit 5, and then converts the subunit 1 and the subunit 5 It is sent to the face three-dimensional generation module 630 .
  • the coding unit 2a is parsed to obtain the subunit 2 and the coding unit 2b, the coding unit 2b is decoded to obtain the subunit 6, and then the subunit 2 and the subunit 6 are sent to the face 3D generation module 630 .
  • the coding unit 3a is parsed to obtain the subunit 3 and the coding unit 3b, the coding unit 3b is decoded to obtain the subunit 7, and then the subunit 3 and the subunit 7 are sent to the face three-dimensional generation module 630 .
  • the encoding unit 4a is parsed to obtain the subunit 4 and the encoding unit 4b, the encoding unit 4b is decoded to obtain the subunit 8, and then the subunit 4 and the subunit 8 are sent to the face three-dimensional generation module 630 .
  • the second electronic device obtains the first face three-dimensional sub-image according to the first subunit and the third subunit; after obtaining the first face three-dimensional sub-image, obtains the second person according to the second subunit and the fourth subunit 3D sub-image of the face.
  • the three-dimensional face generation module 630 may obtain the first three-dimensional face sub-image according to the first subunit and the third subunit, and send the first three-dimensional face sub-image to the display module 640 .
  • the second three-dimensional face sub-image is obtained according to the second sub-unit and the fourth sub-unit, and the second three-dimensional face sub-image is sent to the display module 640 .
  • the 3D face generation module 630 can generate the 3D face sub-image 1 according to the subunit 1 and the subunit 5 within the sixth T time, and send it to the display module 640 .
  • the three-dimensional face sub-image 2 is generated according to the sub-unit 2 and the sub-unit 6, and is sent to the display module 640.
  • the three-dimensional face sub-image 3 is generated according to the sub-unit 3 and the sub-unit 7, and is sent to the display module 640.
  • the three-dimensional face sub-image 4 is generated according to the sub-unit 4 and the sub-unit 8, and is sent to the display module 640.
  • the three-dimensional face generation module 630 may send the two-dimensional image of the scene to the display module 640 .
  • the 2D face image includes a scene image
  • the 2D scene image may be obtained by the 3D face generation module 630 according to the 2D face image.
  • the two-dimensional scene image may be sent to the face three-dimensional generation module 630 by the first electronic device.
  • the second electronic device superimposes the first face three-dimensional sub-image and the scene two-dimensional image; after superimposing the first face three-dimensional sub-image and the scene two-dimensional image, the second face three-dimensional sub-image and the scene are superimposed 2D images are superimposed.
  • the display module 640 can superimpose the first face three-dimensional sub-image and the scene two-dimensional image; after superimposing the first face three-dimensional sub-image and the scene two-dimensional image, the second face three-dimensional sub-image Overlay with the 2D image of the scene.
  • the display subsystem of the display module 640 can pipelined receive a plurality of face 3D sub-images and scene 2D images through the preview cache, and pipelined the face 3D subsystem and the scene 2D image to superimpose to obtain a human face. The three-dimensional image is then transmitted to the display for display.
  • the display module 640 may superimpose the three-dimensional image of the face and the two-dimensional image of the scene to obtain the three-dimensional image of the face. Specifically, with reference to FIG. 11 , in the seventh time T, the display module 640 may superimpose the three-dimensional sub-image 1 of the face and the two-dimensional image of the scene. During the eighth T time, the three-dimensional sub-image 2 of the face and the two-dimensional image of the scene are superimposed. During the ninth T time, the three-dimensional sub-image 3 of the face and the two-dimensional image of the scene are superimposed. During the tenth T period, the three-dimensional face image 4 is superimposed on the two-dimensional image of the scene, thereby obtaining a three-dimensional face image.
  • the three-dimensional video calling method provided by the embodiment of the present application may further include: detecting an adjustment action, and in response to the adjustment action, adjusting the angle of the displayed face in the three-dimensional image of the face.
  • the touch module 650 is used to detect the adjustment action, and the display module 640 can be used to adjust the angle of the face in the three-dimensional image of the displayed face in response to the adjustment action.
  • the adjustment action may be a left deviation angle or a right deviation angle set by the user.
  • the electronic device may determine the frontal display angle of the three-dimensional face image as 0 degrees, and the display interface of the electronic device may include a left-bias angle setting area and a right-bias angle setting area. Use the value in the Angle setting area to adjust the angle at which the face is displayed.
  • the adjustment action may be a rotation action of the user acting on the touch screen.
  • the user places two fingers on the touch screen to rotate clockwise or counterclockwise.
  • the frontal 3D image of the face is displayed.
  • adjust the display angle of the face as shown in Figure 14
  • a 3D image of the left side of the face is displayed.
  • the three-dimensional video call method provided by the above embodiments of the present application is described by taking the three-dimensional display of the face in the call video as an example, and the three-dimensional video call method provided by the embodiment of the present application can also Both the face and the scene are three-dimensionally displayed, and the specific implementation is similar to the above-mentioned S701-S707.
  • the above-mentioned face depth information may include the depth information of the face and the scene, and the face depth image may include the depth image of the face and the scene. The depth image, The details are not repeated here.
  • the first electronic device divides the face depth image and the face two-dimensional image into a plurality of sub-units, and uses a pair of sub-units as the granularity of the pipeline for the face depth image and the human face.
  • the two-dimensional face image is encoded, and sent to the second electronic device in a pipelined manner, and the second electronic device receives and decodes it in a pipelined manner to obtain a subunit of the depth image of the face and a subunit of the two-dimensional image of the face, and obtain the face in a pipelined manner 3D subimage.
  • the delay in acquiring the three-dimensional image of the face by the second electronic device can be reduced, thereby reducing the delay in acquiring the three-dimensional video.
  • Embodiments of the present application provide a three-dimensional video call system.
  • the system includes one or more first electronic devices as described above, and one or more second electronic devices.
  • the embodiments of the present application provide a computer-readable storage medium, where computer programs or instructions are stored on the computer-readable storage medium, and when the computer programs or instructions are run on a computer, the computer is made to execute the three-dimensional video described in the foregoing method embodiments. call method.
  • the embodiment of the present application provides a computer program product
  • the computer program product includes: a computer program or an instruction, when the computer program or instruction runs on a computer, the computer is made to execute the three-dimensional video calling method described in the above method embodiments.
  • the above embodiments may be implemented in whole or in part by software, hardware (eg, circuits), firmware, or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be an electronic device, a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server, a data center, or the like containing one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • At least one means one or more, and “plurality” means two or more.
  • At least one item(s) below” or similar expressions thereof refer to any combination of these items, including any combination of single item(s) or plural items(s).
  • at least one item (a) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the above-mentioned units or modules is only a logical function division.
  • multiple units or modules may be combined.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units/modules, and may be in electrical, mechanical or other forms.
  • the units/modules described as separate components may or may not be physically separated, and components shown as units/modules may or may not be physical units/modules, that is, they may be located in one place, or may be distributed to on multiple network units/modules. Some or all of the units/modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit/module in each embodiment of the present application may be integrated into one processing unit/module, or each unit/module may exist physically alone, or two or more units/modules may be integrated into one unit/module.
  • the functions, if implemented in the form of software functional units/modules and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • the embodiments may refer to each other.
  • the methods and/or terms between the method embodiments may refer to each other, such as the functions and/or the device embodiments.
  • terms may refer to each other, eg, functions and/or terms between an apparatus embodiment and a method embodiment may refer to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

本申请提供一种三维视频通话方法及电子设备,能够实现三维视频通话,并可以降低获取三维视频的时延。该电子设备包括:人脸图像采集模块用于将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元;向视频编码模块发送第一子单元和第三子单元;然后,向视频编码模块发送第二子单元和第四子单元。视频编码模块用于根据第一子单元和第三子单元获得第一编码单元并发送给网络传输模块;然后,根据第二子单元和第四子单元获得第二编码单元并发送给网络传输模块。网络传输模块,用于向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。

Description

三维视频通话方法及电子设备 技术领域
本申请涉及通信领域,尤其涉及一种三维视频通话方法及电子设备。
背景技术
随着视频编解码技术的发展,视频通话技术已成为较为流行的社交方式之一。现有的视频通话方案已经能够实现三维视频通话。具体地,视频发送设备先获取二维图像和深度图像,然后将二维图像和深度图像压缩后发送至服务器。服务器对接收的二维图像和深度图像进行解码,根据解码后的二维图像和深度图像生成三维图像,并将三维图像压缩后发送至视频接收设备,以实现用户之间的三维视频通话。
上述视频通话方案中,服务器对接收的二维图像和深度图像进行解码、以及对生成的三维图像进行编码的过程,增加了视频接收设备获取三维视频的时延。
发明内容
本申请实施例提供一种三维视频通话方法及电子设备,可以在视频通话的过程中,降低获取三维视频的时延。为达到上述目的,本申请采用如下技术方案。
第一方面,提供一种电子设备。该电子设备包括:人脸图像采集模块、视频编码模块和网络传输模块。其中,人脸图像采集模块,用于获取人脸深度图像和人脸二维图像;将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元;向视频编码模块发送第一子单元和第三子单元;在发送第一子单元和第三子单元后,向视频编码模块发送第二子单元和第四子单元。其中,第一子单元对应于第三子单元,第二子单元对应于第四子单元。视频编码模块,用于根据第一子单元和第三子单元获得第一编码单元,向网络传输模块发送第一编码单元;在获得并发送第一编码单元后,根据第二子单元和第四子单元获得第二编码单元,向网络传输模块发送第二编码单元。网络传输模块,用于向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。
基于第一方面所述的电子设备,该电子设备的人脸图像采集模块将人脸深度图像和人脸二维图像分别划分为多个子单元,向视频编码模块发送一对子单元后再发送下一对子单元,该一对子单元包括人脸深度图像的一个子单元和与该人脸深度图像的一个子单元对应的人脸二维图像的一个子单元。如此,可以缩短视频编码模块等待接收图像的时间。视频编码模块根据一对子单元获得一个编码单元并发送给网络传输模块后,再对下一对子单元做相同处理,可以缩短网络传输模块等待接收编码单元的时间。网络传输模块接收一个编码单元并向第二电子设备发送后,再接收下一个编码单元并向第二电子设备发送,可以缩短第二电子设备等待接收编码单元的时间,从而可以降低第二电子设备获取三维视频的时延。
在一种可能的设计中,人脸图像采集模块,具体可用于接收人脸深度信息;接收 人脸二维信息;根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。如此,人脸图像采集模块对人脸深度信息进行处理获得人脸深度图像,从而可以显示真实的三维视频,实现三维视频通话。
在一种可能的设计中,视频编码模块,具体可用于:对第三子单元进行编码,以获得第三编码单元;根据第一子单元和第三编码单元,获得第一编码单元;以及,对第四子单元进行编码,以获得第四编码单元;根据第二子单元和第四编码单元,获得第二编码单元。也就是说,视频编码模块可以先对人脸二维图像的一个子单元进行编码,获得一个编码单元,再将人脸深度图像的一个子单元与该编码单元进行混合编码。如此,将人脸深度图像的子单元与人脸二维图像的子单元编码至同一编码单元中,可以降低传输的复杂度。
需要说明的是,本申请不对电子设备的视频编码模块对人脸二维图像的子单元和人脸深度图像的子单元进行编码的顺序进行限定,例如,视频编码模块可以先对第一子单元进行编码,再将编码后的第一子单元与第三子单元进行混合编码,获得第一编码单元。
在一种可能的设计中,第一方面所述的电子设备还可以包括:人脸三维生成模块和显示模块。其中,人脸三维生成模块,用于根据第一子单元和第三子单元获得第一人脸三维子图像,向显示模块发送第一人脸三维子图像;在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向显示模块发送第二人脸三维子图像。显示模块,用于将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。如此,人脸三维生成模块根据一对子单元获得一个人脸三维子图像,并向显示模块发送后,再对下一对子单元进行相同的处理,可以缩短显示模块的等待时间,从而可以降低电子设备获得人脸三维图像的时延,进而可以降低电子设备获取三维视频的时延。
第二方面,提供一种电子设备。该电子设备包括:网络传输模块、视频解码模块、三维人脸生成模块和显示模块。其中,网络传输模块,用于接收来自第一电子设备的第一编码单元,向视频解码模块发送第一编码单元;在接收并发送第一编码单元后,接收来自第一电子设备的第二编码单元,向视频解码模块发送第二编码单元。视频解码模块,用于根据第一编码单元获得第一子单元和第三子单元;在获得第一子单元和第三子单元后,根据第二编码单元获得第二子单元和第四子单元。其中,第一子单元和第二子单元分别为人脸深度图像中的子单元,第三子单元和第四子单元分别为人脸二维图像中的子单元,第一子单元对应于第三子单元,第二子单元对应于第四子单元。三维人脸生成模块,用于根据第一子单元和第三子单元获得第一人脸三维子图像,向显示模块发送第一人脸三维子图像;在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向显示模块发送第二人脸三维子图像。显示模块,用于将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。
基于第二方面所述的电子设备,电子设备的网络传输模块接收一个编码单元并向视频解码模块发送后,接收下一个编码单元并向视频解码模块发送,可以缩短视频解 码模块等待的时间。视频解码模块对一个编码单元解码,获得人脸三维图像的一个子单元和人脸二维图像的一个子单元,并向三维人脸生成模块发送后,对下一个编码单元进行相同处理,可以缩短三维人脸生成模块等待的时间。三维人脸生成模块根据一对子单元获得一个人脸三维子图像,并发送给显示模块后,再对下一对子单元进行相同处理,可以缩短显示模块等待的时间,从而可以降低电子设备获取人脸三维图像的时延,降低电子设备获取三维视频的时延。
在一种可能的设计中,视频解码模块,还用于对第一编码单元进行解析,以获得第一子单元和第三编码单元;对第三编码单元进行解码,以获得第三子单元;在对第三编码单元进行解码后,对第二编码单元进行解析,以获得第二子单元和第四编码单元;对第四编码单元进行解码,以获得第四子单元。如此,视频解码模块可从一个编码单元中解码出一对子单元,可以降低获取人脸深度图像的子单元和人脸二维图像的子单元的复杂度。
在一种可能的设计中,第二方面提供的电子设备,还可以包括:触控模块。其中,触控模块,用于检测调整动作。显示模块,用于根据调整动作,调整显示人脸三维图像中人脸的角度。如此,电子设备可以实现显示三维视频中人脸的不同角度。
第三方面,提供一种三维视频通话方法。该三维视频通话方法包括:获取人脸深度图像和人脸二维图像;将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元。根据第一子单元和第三子单元获得第一编码单元;在获得第一编码单元后,根据第二子单元和第四子单元获得第二编码单元。向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。其中,第一子单元对应于第三子单元,第二子单元对应于第四子单元。
在一种可能的设计中,上述获取人脸深度图像和人脸二维图像,可以包括:接收人脸深度信息;接收人脸二维信息;根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。
在一种可能的设计中,第三方面所述的三维视频通话方法,还可以包括:对第三子单元进行编码,以获得第三编码单元;根据第一子单元和第三编码单元,获得第一编码单元;以及,对第四子单元进行编码,以获得第四编码单元;根据第二子单元和第四编码单元,获得第二编码单元。
在一种可能的设计中,第三方面所述的三维视频通话方法,还可以包括:根据第一子单元和第三子单元获得第一人脸三维子图像;在获得第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像。将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。
此外,第三方面所述的三维视频通话方法的技术效果可以参考第一方面中的任意一种实现方式所述的电子设备的技术效果,此处不再赘述。
第四方面,提供一种三维视频通话方法。该三维视频通话方法包括:接收来自第一电子设备的第一编码单元;在接收第一编码单元后,接收来自第一电子设备的第二编码单元。根据第一编码单元获得第一子单元和第三子单元;在获得第一子单元和第 三子单元后,根据第二编码单元获得第二子单元和第四子单元;其中,第一子单元和第二子单元分别为人脸深度图像中的子单元,第三子单元和第四子单元分别为人脸二维图像中的子单元,第一子单元对应于第三子单元,第二子单元对应于第四子单元。根据第一子单元和第三子单元获得第一人脸三维子图像;在获得第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像。将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。
在一种可能的设计中,第四方面所述的三维视频通话方法,还可以包括:对第一编码单元进行解析,以获得第一子单元和第三编码单元;对第三编码单元进行解码,以获得第三子单元;在对第三编码单元进行解码后,对第二编码单元进行解析,以获得第二子单元和第四编码单元;对第四编码单元进行解码,以获得第四子单元。
在一种可能的设计中,第四方面所述的三维视频通话方法,还可以包括:检测调整动作,响应于调整动作,调整显示人脸三维图像中人脸的角度。
此外,第四方面所述的三维视频通话方法的技术效果可以参考第二方面中的任意一种实现方式所述的电子设备的技术效果,此处不再赘述。
第五方面,提供一种电子设备,该电子设备包括:处理器,处理器与存储器耦合。存储器,用于存储计算机程序。处理器,用于执行存储器中存储的计算机程序,以使得电子设备执行如第三方面至第四方面中任意一种可能的实现方式所述的三维视频通话方法。
在一种可能的设计中,第五方面所述的电子设备还可以包括收发器。该收发器可以为收发电路或输入/输出端口。所述收发器可以用于该电子设备与其他设备通信。
在本申请中,第五方面所述的电子设备可以为电子设备,或者设置于电子设备内部的芯片或芯片系统。
此外,第五方面所述的电子设备的技术效果可以参考第三方面至第四方面中任意一种实现方式所述的三维视频通话方法的技术效果,此处不再赘述。
第六方面,提供一种三维视频通话系统。该三维视频通话系统包括第一方面中任意一种可能的实现方式所述的电子设备,以及第二方面中任意一种可能的实现方式所述的电子设备。
第七方面,提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行第三方面至第四方面中任意一种可能的实现方式所述的三维视频通话方法。
第八方面,提供一种计算机程序产品,该计算机程序产品包括:计算机程序或指令,当计算机程序或指令在计算机上运行时,使得计算机执行第三方面至第四方面中任意一种可能的实现方式所述的三维视频通话方法。
附图说明
图1为本申请实施例提供的三维视频通话系统的架构示意图;
图2为本申请实施例提供的电子设备的结构示意图一;
图3为本申请实施例提供的电子设备的软件结构框图;
图4为本申请实施例提供的电子设备的结构示意图二;
图5为本申请实施例提供的人脸图像采集模块的结构示意图;
图6为本申请实施例提供的电子设备的结构示意图三;
图7为本申请实施例提供的三维视频通话方法的流程示意图;
图8为本申请实施例提供的人脸深度图像和人脸二维图像的示意图;
图9为本申请实施例提供的第一电子设备的应用示意图一;
图10为本申请实施例提供的第一电子设备的应用示意图二;
图11为本申请实施例提供的三维视频通话方法的流程示意图二;
图12为本申请实施例提供的码流的结构示意图;
图13为本申请实施例提供的第二电子设备的应用示意图;
图14为本申请实施例提供的人脸三维图像的示意图。
具体实施方式
下面结合附图对本申请实施例提供的三维视频通话方法及电子设备进行详细地描述。
本申请的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括其他没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
需要说明的是,本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请的描述中,除非另有说明,“多个”的含义是指两个或两个以上。本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
图1为本申请实施例提供的三维视频通话方法所适用的一种三维视频通话系统的架构示意图。为便于理解本申请实施例,以图1中示出的三维视频通话系统为例详细说明适用于本申请实施例的三维视频通话系统。应当指出的是,本申请实施例中的方案还可以应用于其他三维视频通话系统中,如第一电子设备对多个第二电子设备、或多个第一电子设备对多个第二电子设备的视频通话场景,相应的名称也可以用其他三维视频通话系统中的对应功能的名称进行替代。
如图1所示,该三维视频通话系统包括至少两个电子设备,如第一电子设备和第二电子设备。其中,本申请实施例以第一电子设备作为三维视频的发送端、第二电子设备作为三维视频的接收端为例进行阐述。其中,电子设备具体可以是手机、平板电脑、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、 上网本、个人数字助理(personal digital assistant,PDA)、人工智能(artificial intelligence)设备、可穿戴设备等具有视频通话功能的终端设备,可穿戴设备可以是智能手表、智能手环、智能眼镜、智能头盔等。本申请实施例对电子设备的具体类型不作任何限制。
图2为本申请实施例提供的电子设备的结构示意图一。如图2所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,传感器模块190,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在本申请的一些实施例中,电子设备100可以利用处理器110获取人脸深度图像和人脸二维图像,将人脸深度图像划分为多个子单元,将人脸二维图像划分为多个子单元。可选地,电子设备100可以利用处理器110根据人脸深度图像和人脸二维图像获得人脸三维子图像。具体地,电子设备100可以利用处理器110根据人脸深度图像的一个子单元和人脸二维图像的一个子单元获得一个人脸三维子图像,该人脸深度图像的一个子单元与该人脸二维图像的一个子单元相对应。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。
电源管理模块141用于连接电池142、充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等实现。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。
在本申请的一些实施例中,电子设备100可以利用移动通信模块150向其他电子设备发送编码后的人脸深度图像和人脸二维图像,和/或,接收来自其他电子设备的编码后的人脸深度图像和人脸二维图像。示例性地,电子设备100可以利用移动通信模块150向其他电子设备发送编码后的人脸深度图像的子单元和人脸二维图像的子单元,和/或,接收来自其他电子设备的编码后的人脸深度图像的子单元和人脸二维图像的子单元。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
在本申请的一些实施例中,电子设备100可以利用GPU将人脸三维子图像与场景二维图像进行叠加。
显示屏194用于显示图像,视频等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。在一些实施例中,电子设备100可以包括1个或N个摄像头193。
电子设备100可以通过ISP、摄像头193、视频编解码器、GPU、显示屏194以及应用处理器等实现拍摄功能。
摄像头193用于捕获静态图像或视频。摄像头193可以包括飞行时间(time of flight,TOF)传感器、三维结构光传感器、以及颜色(red green blue,RGB)传感器等。
在本申请的一些实施例中,电子设备100可以利用摄像头193采集人脸深度图像和人脸二维图像。
在本申请的一些实施例中,电子设备100可以利用视频编解码器对人脸深度图像和人脸二维图像进行编码,和/或,通过解码获得人脸深度图像和人脸二维图像。示例性地,电子设备100可以利用视频编解码器对人脸深度图像的子单元和人脸二维图像的子单元进行编码,和/或,通过解码获得人脸深度图像的子单元和人脸二维图像的子单元。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储 器的指令,执行电子设备100的各种功能应用以及数据处理。在本申请的一些实施例中,内部存储器121可以用于存储人工智能算法模型、和/或三维人脸生成算法模型等。
音频模块170包括扬声器,受话器,麦克风,耳机接口等。
音频模块170用于将数字音频数据转换成模拟音频电信号输出,也用于将模拟音频电信号输入转换为数字音频数据,音频模块170可以包括模/数转换器和数/模转换器。
在一些实施例中,电子设备100可以通过音频模块170,以及应用处理器等实现音频功能。例如音乐播放,录音等。
传感器模块190可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。
在本申请的一些实施例中,电子设备100可以利用触摸传感器检测调整动作,以调整显示屏194显示人脸三维图像中人脸的角度。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
电子设备100的软件系统可以采用分层架构、事件驱动架构、微核架构,微服务架构、或云架构。本发明实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图3为本申请实施例提供的电子设备的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图3所示,应用程序包可以包括相机,日历,地图,WLAN,音乐,短信息,图库,通话,导航等应用程序。
其中,通话应用可用于实现三维视频通话。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图3所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
在一些实施例中,三维视频通话也可以实现为电子设备应用程序框架层中的模块,如三维视频通话模块。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
图4为本申请实施例提供的电子设备的结构示意图二。
图4所示的电子设备400可以为第一电子设备,即电子设备400可以为三维视频的发送端。如图4所示,本申请实施例提供的第一电子设备400可以包括人脸图像采集模块410、视频编码模块420和网络传输模块430。可选地,第一电子 设备400还可以包括人脸三维生成模块440和显示模块450。
需要说明的是,图4所示的模块能够以电子硬件、计算机软件、或者计算机软件和电子硬件的结合来实现。示例性地,当使用软件实现时,图4所示的模块可以实现为图3所示的应用程序层中的通话应用,或者图4所示的模块也可以实现为图3所示的应用程序框架层中的三维视频通话模块。当使用硬件实现时,脸图像采集模块410、视频编码模块420、人脸三维生成模块440可以实现为图2所示的处理器110,网络传输模块430可以实现为图2所示的移动通信模块150,显示模块450可以实现为图2所示的显示屏194。当以计算机软件和电子硬件的结合来实现时,可将上述使用软件实现的方式和使用硬件实现的方式进行结合,本申请实施例不再赘述。
示例性地,人脸图像采集模块410可用于获取人脸深度图像和人脸二维图像,将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元,将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元。示例性地,人脸图像采集模块410可用于向下述视频编码模块420发送第一子单元和第三子单元;在发送第一子单元和第三子单元后,向下述视频编码模块420发送第二子单元和第四子单元。其中,第一子单元对应于第三子单元,第二子单元对应于第四子单元。
可选地,人脸图像采集模块410可以用于向下述人脸三维生成模块440发送第一子单元和第三子单元;在发送第一子单元和第三子单元后,向下述人脸三维生成模块440发送第二子单元和第四子单元。
可选地,人脸图像采集模块410可以用于获取场景二维图像,向下述人脸三维生成模块440和/或视频编码模块420发送该场景二维图像。示例性地,场景二维图像中包括当前视频场景中的场景图像。
在一些实施例中,人脸图像采集模块410可具体用于接收人脸深度信息,接收人脸二维信息,根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。示例性地,人脸深度信息可以是高精度深度摄像头采集的,如TOF传感器、三维结构光传感器等。
示例性地,人脸二维信息可以包括人脸信息和当前视频场景中的场景信息。或者,示例性地,人脸二维信息只包括人脸信息,不包括场景信息。人脸图像采集模块410可用于接收场景二维信息,根据场景二维信息获得场景图像。可选地,人脸二维信息和场景二维信息均可以是二维摄像头采集的,如RGB传感器等。
在另一些实施例中,人脸图像采集模块410可具体用于采集人脸深度信息,采集人脸二维信息,根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。也就是说,人脸图像采集模块410可以包括用于采集人脸深度信息的模块和用于采集人脸二维信息的模块。可选地,人脸图像采集模块410可具体用于采集场景二维信息。
示例性地,图5为本申请实施例提供的人脸图像采集模块的结构示意图。如图5所示,人脸图像采集模块410可以包括:人脸深度图像采集子模块411、二维图像采集子模块412和图像信号处理(image signal processing,ISP)子模块413。
其中,人脸深度图像采集子模块411,可用于采集当前视频场景中人脸深度信息,并发送给下述图像信号处理子模块413。示例性地,人脸深度图像采集子模块411可 以为高精度深度摄像头,可以包括但不限于TOF传感器和三维结构光传感器。其中,TOF传感器可以向目标物体连续发送光脉冲,然后接收从目标物体返回的光,通过探测发送和接收光脉冲的飞行(往返)时间获得自身到目标物体距离,并生成深度信息。三维结构光传感器通过投射结构光到目标物体表面,接收目标物体表面反射的光线来获得目标物体表面的深度信息。
二维图像采集子模块412,可用于采集当前视频场景中人脸二维信息,并发送给下述图像信号处理子模块413。示例性地,二维图像采集子模块412可以为RGB传感器等,本申请对此不进行限定。可选地,二维图像采集子模块412,可用于采集当前视频场景中场景二维信息,并发送给下述图像信号处理子模块413。
ISP子模块413,可用于接收来自人脸深度图像采集子模块411的人脸深度信息和来自二维图像采集子模块412的人脸二维信息,并根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。可选地,ISP子模块413,可用于根据场景二维信息获得场景二维图像。
具体地,ISP子模块413,可用于将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元。ISP子模块413,可用于向下述视频编码模块420发送第一子单元和第三子单元;在发送第一子单元和第三子单元后,向下述视频编码模块420发送第二子单元和第四子单元。可选地,ISP子模块413,可用于向下述视频编码模块420发送场景二维图像。
可选地,ISP子模块413,可用于向下述人脸三维生成模块440发送第一子单元和第三子单元;在发送第一子单元和第三子单元后,向人脸三维生成模块440发送第二子单元和第四子单元。ISP子模块413,可用于向下述人脸三维生成模块440发送场景二维图像。
视频编码模块420,可用于根据第一子单元和第三子单元获得第一编码单元,向网络传输模块430发送第一编码单元;在获得并发送第一编码单元后,根据第二子单元和第四子单元获得第二编码单元,向网络传输模块430发送第二编码单元。如此,视频编码模块420以一对子单元为粒度流水线式接收并编码人脸深度图像包括的多个子单元和人脸二维图像包括的多个子单元,该一对子单元包括人脸深度图像的一个子单元和与该人脸深度图像的一个子单元对应的人脸二维图像的一个子单元,从而流水线式获得并发送每一对子单元分别对应的编码单元,可以降低第二电子设备获得人脸三维图像的时延,提高获取三维视频的效率。
可选地,视频编码模块420,具体用于:对第三子单元进行编码,以获得第三编码单元;根据第一子单元和第三编码单元,获得第一编码单元;以及,对第四子单元进行编码,以获得第四编码单元;根据第二子单元和第四编码单元,获得第二编码单元。也就是说,视频编码模块420可对人脸二维图像的子单元进行编码,获得第三编码单元,再将人脸深度图像的子单元与第三编码单元进行编码,获得第一编码单元。
需要说明的是,本申请实施例不对电子设备对第一子单元和第三子单元进行编码的顺序进行限定,例如,可以先对第一子单元进行编码,再将编码后的第一子单元与 第三子单元进行混合编码,获得第一编码单元。
可选地,视频编码模块420可对场景二维图像进行编码,将编码后的场景二维图像发送给网络传输模块430。
网络传输模块430,可用于向第二电子设备发送编码后的人脸深度图像和人脸二维图像。具体地,网络传输模块430可用于向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。如此,网络传输模块430可以流水线式向第二电子设备发送多对子单元分别对应的编码单元,以使第二电子设备流水线式获得人脸三维子图像,可以降低第二电子设备获得人脸三维图像的时延,降低获得三维视频的时延。
可选地,网络传输模块430,可用于向第二电子设备发送编码后的场景二维图像。
可选地,人脸三维生成模块440,可用于根据人脸深度图像和人脸二维图像获得人脸三维图像。具体地,人脸三维生成模块440,可用于根据第一子单元和第三子单元获得第一人脸三维子图像,向下述显示模块450发送第一人脸三维子图像;在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向下述显示模块450发送第二人脸三维子图像。
可选地,人脸三维生成模块440可用于接收来自人脸图像采集模块410的场景二维图像。或者,人脸三维生成模块440可用于根据人脸二维图像获得的场景二维图像,并发送给下述显示模块450。其中,人脸二维图像包括场景图像和人脸图像。
可选地,人脸三维生成模块440可用于向显示模块450发送场景二维图像。
可选地,显示模块450,用于将第一人脸三维子图像与场景二维图像进行叠加。在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。如此,显示模块450可流水线式将多个人脸三维子图像与场景二维图像进行叠加,以显示人脸三维图像。
图6为本申请实施例提供的电子设备的结构示意图三。
图6所示的电子设备600可以为第二电子设备,即电子设备600可以为三维视频的接收端。如图6所示,本申请实施例提供的第二电子设备600可以包括网络传输模块610、视频解码模块620、人脸三维生成模块630和显示模块640。可选地,第二电子设备600还可以包括触控模块650。
需要说明的是,图6所示的模块能够以电子硬件、计算机软件、或者计算机软件和电子硬件的结合来实现。示例性地,当使用软件实现时,图6所示的模块可以实现为图3所示的应用程序层中的通话应用,或者,图6所示的模块也可以实现为图3所示的应用程序框架层中的三维视频通话模块。当使用硬件实现时,视频解码模块620、人脸三维生成模块630可以实现为图2所示的处理器110,网络传输模块610可以实现为图2所示的移动通信模块150,显示模块640可以实现为图2所示的显示屏194,触控模块650可以实现为图2所示的传感器模块190。当以计算机软件和电子硬件的结合来实现时,可将上述使用软件实现的方式和使用硬件实现的方式进行结合,本申请实施例不再赘述。
其中,网络传输模块610,可用于接收编码后的人脸深度图像和人脸二维图像。 具体地,网络传输模块610,可用于接收来自第一电子设备的第一编码单元,向视频解码模块620发送第一编码单元;在接收并发送第一编码单元后,接收来自第一电子设备的第二编码单元,向视频解码模块620发送第二编码单元。可选地,网络传输模块610,可用于接收编码后的场景二维图像,并发送给视频解码模块620。
视频解码模块620,可用于对编码后的人脸深度图像和人脸二维图像进行解码。具体地,视频解码模块620,可用于根据第一编码单元获得第一子单元和第三子单元,向下述人脸三维生成模块630发送第一子单元和第三子单元;在获得并发送第一子单元和第三子单元后,根据第二编码单元获得第二子单元和第四子单元,向下述人脸三维生成模块630发送第二子单元和第四子单元。其中,第一子单元和第二子单元分别为人脸深度图像中的子单元,第三子单元和第四子单元分别为人脸二维图像中的子单元,第一子单元对应于第三子单元,第二子单元对应于第四子单元。
可选地,视频解码模块620,可具体用于对第一编码单元进行解析,以获得第一子单元和第三编码单元;对第三编码单元进行解码,以获得第三子单元。在对第三编码单元进行解码后,对第二编码单元进行解析,以获得第二子单元和第四编码单元;对第四编码单元进行解码,以获得第四子单元。需要说明的是,视频解码模块620解码的方式与视频编码模块420编码的方式相对应,本申请不对视频解码模块620的具体解码方式进行限定。
可选地,视频解码模块620,可用于对编码后的场景二维图像进行解码,以获得场景二维图像,并向人脸三维生成模块630发送场景二维图像。
人脸三维生成模块630,可用于根据人脸深度图像和人脸二维图像获得人脸三维图像。具体地,人脸三维生成模块630,可用于根据第一子单元和第三子单元获得第一人脸三维子图像,向显示模块640发送第一人脸三维子图像。在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向显示模块640发送第二人脸三维子图像。
可选地,人脸三维生成模块630,可用于根据人脸二维图像获得的场景二维图像。其中,人脸二维图像包括场景图像和人脸图像。
可选地,人脸三维生成模块630,可用于向下述显示模块640发送场景二维图像。
显示模块640,用于将第一人脸三维子图像与场景二维图像进行叠加。在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。如此,显示模块640可流水线式将多个人脸三维子图像与场景二维图像进行叠加,以显示人脸三维图像。
可选地,触控模块650,用于检测调整动作。
可选地,显示模块640,可用于根据调整动作,调整显示整体三维图像中人脸的角度。也就是说,电子设备600可以通过调整显示三维视频中人脸的不同角度。
下面将结合图7-图14对本申请实施例提供的三维视频通话方法进行具体阐述。
图7为本申请实施例提供的三维视频通话方法的流程示意图一。
如图7所示,该三维视频通话方法包括如下步骤:
S701,第一电子设备获取人脸深度图像和人脸二维图像,将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元,将人脸二维图像划分为包括第三子单元 和第四子单元的多个子单元。
具体地,第一子单元对应于第三子单元,第二子单元对应于第四子单元。也就是说,人脸深度图像的多个子单元与人脸二维图像的多个子单元一一对应。
图8为本申请实施例提供的人脸深度图像和人脸二维图像的示意图。以将人脸深度图像和人脸二维图像分别划分为4个子单元为例。如图8所示,将人脸深度图像划分为子单元1、子单元2、子单元3和子单元4,将人脸二维图像划分为子单元5、子单元6、子单元7和子单元8。其中,子单元1对应于子单元5,子单元2对应于子单元6,子单元3对应于子单元7,子单元4对应于子单元8。
需要说明的是,图8仅为本申请实施例提出的一种将人脸深度图像和人脸二维图像分别划分为多个子单元的方式,例如,还可以将人脸深度图像和人脸二维图像分别沿竖直方向划分为多个子单元,本申请对此不进行限定。
在一种可能的设计方式中,上述S701中,第一电子设备获取人脸深度图像和人脸二维图像,可以包括下述步骤一至步骤三。
步骤一,第一电子设备接收人脸深度信息。
示例性地,人脸深度信息可以是高精度深度摄像头采集的,如TOF传感器、三维结构光传感器等。图9为本申请实施例提供的第一电子设备的应用示意图一。结合图9,人脸图像采集模块410可接收人脸深度信息。
步骤二,第一电子设备接收人脸二维信息。
示例性地,人脸二维信息可以包括人脸信息。可选地,人脸二维信息还可以包括当前视频场景中的场景信息。
在一些实施例中,第一电子设备可以接收场景二维信息。其中,场景二维信息可以包括场景信息。
示例性地,人脸二维信息和场景二维信息均可以是二维摄像头采集的,如RGB传感器等。结合图9,人脸图像采集模块410可接收人脸二维信息。可选地,人脸图像采集模块410还可以接收场景二维信息。
步骤三,第一电子设备根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。
结合图9,人脸图像采集模块410将人脸深度信息转化为人脸深度图像,将人脸二维信息转化为人脸二维图像。当人脸二维信息包括人脸信息,不包括场景信息时,人脸二维图像包括人脸图像;当人脸二维信息包括人脸信息和场景信息时,人脸二维图像中包括人脸图像和场景图像。
可选地,人脸图像采集模块410可以将场景二维信息转化为场景二维图像。
需要说明的是,本申请实施例不对上述步骤一至步骤三的执行顺序进行限定,以能够获取人脸深度图像和人脸二维图像为准。
结合图9,人脸图像采集模块410将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元,将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元。可选地,人脸图像采集模块410通过人脸深度图像缓存,将人脸深度图像的多个子单元传输至视频编码模块420和/或人脸三维生成模块440。人脸图像采集模块410通过人脸二维图像缓存,将人脸二维图像的多个子单元和/或场景二维图像传输至 视频编码模块420和/或人脸三维生成模块440。
在另一种可能的设计方式中,上述S701中,第一电子设备获取人脸深度图像和人脸二维图像,可以包括下述步骤四至步骤六。
步骤四,第一电子设备采集人脸深度信息。
图10为本申请实施例提供的第一电子设备的应用示意图二。结合图10,人脸图像采集模块410可以包括:人脸深度图像采集子模块411、二维图像采集子模块412和图像信号处理子模块413。其中,人脸深度图像采集子模块411可采集人脸深度信息,并将该人脸深度信息发送给图像信号处理子模块413。
需要说明的是,图10所示的第一电子设备与图9所示的第一电子设备的区别在于,人脸图像采集模块410的结构不相同,其它部分均相同。下述关于视频编码模块420、网络传输模块430、人脸三维生成模块440和显示模块450的具体阐述,对于图9以及图10所示的第一电子设备均适用。
步骤五,第一电子设备采集人脸二维信息。
结合图10,二维图像采集子模块412可采集人脸二维信息和/或场景二维信息,将人脸二维信息和/或场景二维信息发送给图像信号处理子模块413。关于人脸二维信息和场景二维信息的具体阐述可参照上述步骤二,此处不再赘述。
步骤六,第一电子设备根据人脸深度信息获得人脸深度图像,根据人脸二维信息获得人脸二维图像。
结合图10,图像信号处理子模块413可接收来自人脸深度图像采集子模块411的人脸深度信息,并将人脸深度信息转化为人脸深度图像。图像信号处理子模块413可接收来自二维图像采集子模块412的人脸二维信息和/或场景二维信息,将人脸二维信息转化为人脸二维图像,将场景二维信息转化为场景二维图像。
需要说明的是,关于人脸二维图像和场景二维图像的具体阐述可参照上述步骤三。本申请实施例不对上述步骤四至步骤六的执行顺序进行限定,以能够获取人脸深度图像和人脸二维图像为准。
结合图10,人脸图像采集模块410中的图像信号处理子模块413将人脸深度图像划分为包括第一子单元和第二子单元的多个子单元,将人脸二维图像划分为包括第三子单元和第四子单元的多个子单元。可选地,图像信号处理子模块413通过人脸深度图像缓存,将人脸深度图像的多个子单元传输至视频编码模块420和/或人脸三维生成模块440。图像信号处理子模块413通过人脸二维图像缓存,将人脸二维图像的多个子单元和/或场景二维图像传输至视频编码模块420和/或人脸三维生成模块440。
本申请实施例中,第一电子设备以一对子单元为粒度对人脸深度图像和人脸二维图像进行流水线式处理。其中,一对子单元包括人脸深度图像的一个子单元和与该人脸深度图像的一个子单元对应的人脸二维图像的一个子单元。
图11为本申请实施例提供的三维视频通话方法的流程示意图二。其中,子片可以包括一对子单元,或者与该一对子单元相关的信息。结合图8和图11,子片1可以包括子单元1和子单元5,或者,子片1可以为对子单元1和子单元5进行处理后获得的编码单元1a、或人脸三维子图像1。子片2可以包括子单元2和子单元6,或者,子片2可以为对子单元2和子单元6进行处理后获得的编码单元2a、或人脸三维子图 像2。类似地,子片3可以包括子单元3和子单元7,或者,子片3可以为对子单元3和子单元7进行处理后获得的编码单元3a、或人脸三维子图像3。子片4可以包括子单元4和子单元8,或者,子片4可以为对子单元4和子单元8进行处理后获得的编码单元4a、或人脸三维子图像4。
结合图11,处理视频的周期为T,人脸图像采集模块410可以在第一个T时间内获得子单元1和子单元5,并发送给视频编码模块420。接着在第二个T时间内,获得子单元2和子单元6,并发送给视频编码模块420。在第三个T时间内,获得子单元3和子单元7,并发送给视频编码模块420。在第四个T时间内,获得子单元和子单元8,并发送给视频编码模块420。
类似地,人脸图像采集模块410可以一对子单元为粒度,将人脸深度图像的多个子单元和人脸二维图像的多个子单元流水线式发送给人脸三维生成模块440,图11中未示出,此处不再赘述。
需要说明的是,为了便于描述,图11中所示的时间段T与电子设备的模块处理各子片的时间的最大值相等,在实际应用中,时间段T可大于电子设备的模块处理各子片的时间,本申请实施例不对时间段T的大小进行限定。电子设备的模块包括但不限于图4-图6所示出的模块。
在一些实施例中,电子设备可以采用流水线式处理方案对各个子片进行处理。示例性地,若时间段T大于电子设备的模块处理各子片的时间,电子设备的模块可以在一个T时间内处理子片1后,接着处理子片2。示例性地,人脸图像采集模块410可以在第一个T时间开始时获取子单元1和子单元5,并发送给视频编码模块420,在第一个T时间的四分之三处完成该过程。紧接着获取子单元2和子单元6,并发送给视频编码模块420,在第二个T时间的二分之一处完成该过程。类似地,电子设备的各个模块完成对各个子片的处理,此处不再一一列举。
在另一些实施例中,电子设备可以采用定时处理方案对各个子片进行处理。示例性地,若时间段T大于电子设备中各个模块处理各子片的时间,电子设备的多个模块中的一个或多个模块可以定时处理各个子片。例如,第一个T时间的四分之三处完成对子片1的处理后,不立即处理子片2,而是等到第二个T时间开始时,才处理子片2。示例性地,人脸图像采集模块410在第一个T时间开始时获取子单元1和子单元5,并发送给视频编码模块420,在第一个T时间的四分之三处完成该过程。等待一段时间,当第二个T时间开始时,才获取子单元2和子单元6,并发送给视频编码模块420,在第二个T时间的四分之三处完成该过程。再等待一段时间,当第三个T时间开始时,开始处理下一个子片。类似地,电子设备的各个模块完成对各个子片的处理,此处不再一一列举。
需要说明的是,本申请实施例不对电子设备的各个子模块是否定时处理各个子片进行限定,也不对电子设备的部分子模块或全部模块定时处理各个子片进行限定,可以是电子设备中的部分模块定时处理各个子片,或者,电子设备中的各个模块均定时处理各个子片,或者,电子设备中的各个模块均不定时处理各个子片。
可选地,若电子设备的一个或多个模块处理一个或多个子片的时间大于时间段T,电子设备可以采用如下方式处理各个子片,以电子设备能够正常运行为准。例如,针 对流水线式处理方案,若处理M个子片的时间小于第一子片阈值,则电子设备的模块可流水线式对各个子片进行处理,其中,第一子片阈值可以为预设置的处理M个子片的最大时间;否则,电子设备的模块可以对M个子片中的部分子片进行处理,舍弃对另一部分子片的处理(例如,可以采用上一帧图像对应的子片的处理结果进行替代),M为大于1的整数。又例如,针对定时处理方案,以时间段T小于人脸图像采集模块410处理子片1的时间为例,人脸图像采集模块410可以舍弃对子片1的处理(例如,可以采用上一帧图像对应的子片1的处理结果进行替代),在第二个T时间开始时,定时对子片2进行处理。其中,电子设备的一个或多个模块处理一个或多个子片的时间大于时间段T,可以是电子设备的某个或某些模块故障导致等原因导致,本申请对此不进行限定。舍弃对子片的处理包括:已开始对子片进行处理但未获得处理结果时终断对子片的处理。
S702,第一电子设备根据第一子单元和第三子单元获得第一编码单元;在获得第一编码单元后,根据第二子单元和第四子单元获得第二编码单元。
在一种可能的设计方式中,上述S702,可以包括下述步骤七至步骤十。
步骤七,第一电子设备对第三子单元进行编码,以获得第三编码单元。
示例性地,第三编码单元可以为视频编码层面(video coding layer,VCL)类型的网络抽象层面(network abstraction layer,NAL)单元。
图12为本申请实施例提供的码流的结构示意图。如图12所示,码流可以包括头部(NALU Header)、(sequence paramater set,SPS)序列参数集、(picture paramater set,PPS)图像参数集和至少一个编码单元。第一电子设备可以将子单元5编码为编码单元1b,编码单元1b为VCL类型的NAL单元。
结合图9或图10,视频编码模块420可以对第三子单元进行编码,获得第三编码单元。
步骤八,第一电子设备根据第一子单元和第三编码单元,获得第一编码单元。
示例性地,第一编码单元可以为补充增强信息(supplemental enhancement information,SEI)类型的NAL单元。
结合图12,第一电子设备可以将子单元1填充到编码单元1b的预设字段,获得编码单元1a,编码单元1a为SEI类型的NAL单元。可选地,预设字段可以为补充增强信息SEI字段。
结合图9或图10,视频编码模块420可以对第一子单元和第三编码单元进行编码,获得第一编码单元。
需要说明的是,本申请实施例不对电子设备对第一子单元和第三子单元进行编码的顺序进行限定,例如,可以先对第一子单元进行编码,再将编码后的第一子单元与第三子单元进行混合编码,获得第一编码单元。
本申请实施例中,第一电子设备将人脸深度图像的子单元与人脸二维图像的子单元编码至同一编码单元中,避免将人脸深度图像的子单元与人脸二维图像的子单元独立传输,可以减少因多条码流传输和时间同步而增加的复杂度。
步骤九,第一电子设备对第四子单元进行编码,以获得第四编码单元。
示例性地,第四编码单元可以为VCL类型的NAL单元。
结合图12,第一电子设备可以将子单元6编码为编码单元2b,编码单元2b为VCL类型的NAL单元。
结合图9或图10,视频编码模块420可以对第四子单元进行编码,获得第四编码单元。
步骤十,第一电子设备根据第二子单元和第四编码单元,获得第二编码单元。
示例性地,第二编码单元可以为SEI类型的NAL单元。
结合图12,第一电子设备可以将子单元2填充到编码单元2b的预设字段,获得编码单元2a,编码单元2a为SEI类型的NAL单元。可选地,预设字段可以为补充增强信息SEI字段。
结合图9或图10,视频编码模块420可以对第二子单元和第四编码单元进行编码,获得第二编码单元。
类似地,结合图12,第一电子设备可以采用上述步骤七至步骤八所记载的方式,获得编码单元3b、编码单元3a以及编码单元4b、编码单元4a,此处不再详细赘述。
在本申请实施例中,第一电子设备的视频编码模块可以对流水线式接收的人脸深度图像的子单元和人脸二维图像的子单元进行流水线式编码,从而降低第二电子设备获得人脸三维图像的时延。
结合图9或图10,视频编码模块420可以对第一子单元和第三子单元进行编码,获得第一编码单元;在获得第一编码单元后,对第二子单元和第四子单元进行编码,获得第二编码单元。可选地,视频编码模块420可将编码后的第一编码单元和第二编码单元通过编码单元缓存发送给网络传输模块430。
结合图11,在第二个T时间内,视频编码模块420可对子单元1和子单元5进行编码,获得编码单元1a,并发送给网络传输模块430。在第三个T时间内,视频编码模块420对子单元2和子单元6进行编码,获得编码单元2a,并发送给网络传输模块430。类似地,在第四个T时间内,视频编码模块420可对子单元3和子单元7进行编码,获得编码单元3a,并发送给网络传输模块430。在第五个T时间内,视频编码模块420可对子单元4和子单元8进行编码,获得编码单元4a,并发送给网络传输模块430。
S703,第一电子设备向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。
结合图9或图10,网络传输模块430可以向第二电子设备发送第一编码单元;在发送第一编码单元后,向第二电子设备发送第二编码单元。
在本申请实施例中,第一电子设备可以流水线式向第二电子设备发送编码后的人脸深度图像子单元和人脸二维图像子单元,以使第二电子设备流水线式获取人脸三维图像,可降低第二电子设备获得人脸三维图像的时延,提高获得三维视频的效率。
结合图11,在第三个T时间内,网络传输模块430可以向第二电子设备发送编码单元1a。在第四个T时间内,网络传输模块430向第二电子设备发送编码单元2a。在第五个T时间内,网络传输模块430向第二电子设备发送编码单元3a。在第六个T时间内,网络传输模块430向第二电子设备发送编码单元4a。
在一些实施例中,第一电子设备可以定时向第二电子设备发送第一编码单元和第 二编码单元。
结合图9或图10,网络传输模块430可以定时向第二电子设备发送第一编码单元和第二编码单元。
在一些实施例中,第一电子设备可以显示第一电子设备的用户当前的视频场景的三维视频,第一电子设备可以对人脸深度图像的多个子单元和人脸二维图像的多个子单元进行流水线式处理,从而提高获取三维视频的效率。
可选地,本申请实施例提供的三维视频通话方法,还可以包括:第一电子设备根据第一子单元和第三子单元获得第一人脸三维子图像;在获得第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像。
结合图9或图10,人脸三维生成模块440可以根据第一子单元和第三子单元获得第一人脸三维子图像,向显示模块450发送第一人脸三维子图像。在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向显示模块450发送第二人脸三维子图像。
可选地,人脸三维生成模块440可以采用人工智能算法模型、或三维人脸生成算法模型,根据人脸深度图像的子单元和人脸二维图像的子单元获得人脸三维子图像。
结合图8和图9,人脸三维生成模块440可以根据子单元1和子单元5生成人脸三维子图像1,并将其发送给显示模块450。然后,根据子单元2和子单元6生成人脸三维子图像2,并将其发送给显示模块450。根据子单元3和子单元7生成人脸三维子图像3,并将其发送给显示模块450。最后,根据子单元4和子单元8生成人脸三维子图像4,并将其发送给显示模块450。需要说明的是,图10所示的人脸三维生成模块440可实现相同的功能,此处不再赘述。
可选地,人脸三维生成模块440可以将场景二维图像发送给显示模块450。示例性地,当人脸二维图像包括场景图像和人脸图像时,该场景二维图像可以是人脸三维生成模块440根据人脸二维图像获得的。或者,该场景二维图像可以是人脸图像采集模块410发送给人脸三维生成模块440的。
可选地,本申请实施例提供的三维视频通话方法,还可以包括:第一电子设备将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。
结合图9或图10,显示模块450可以将第一人脸三维子图像与场景二维图像进行叠加。在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。可选地,显示模块450的显示子系统可以通过预览缓存流水线式接收多个人脸三维子图像和场景二维图像,并流水线式将人脸三维子系统与场景二维图像进行叠加,获得人脸三维图像,然后传输至显示屏进行显示。
示例性地,显示模块450可以将人脸三维子图像1与场景二维图像进行叠加,然后,将人脸三维子图像2与场景二维图像进行叠加,将人脸三维子图像3与场景二维图像进行叠加,最后,将人脸三维子图像4与场景二维图像进行叠加,从而获得人脸三维图像。
如此,第一电子设备通过多次执行上述S701-S704,可以获得多帧人脸三维图像 从而可以显示第一电子设备的用户的三维视频,采用以一对子单元为粒度流水线式对人脸深度图像和人脸二维图像进行处理,可以降低第一电子设备获取人脸三维图像的时延,从而提高显示三维视频的效率。
S704,第二电子设备接收来自第一电子设备的第一编码单元;在接收第一编码单元后,接收来自第一电子设备的第二编码单元。
图13为本申请实施例提供的第二电子设备的应用示意图。结合图13,网络传输模610,可用于接收来自第一电子设备的第一编码单元,向视频解码模块620发送第一编码单元。在接收并发送第一编码单元后,接收来自第一电子设备的第二编码单元,向视频解码模块620发送第二编码单元。
结合图11,在第四个T时间内,网络传输模块610获取编码单元1a并向视频解码模块620发送编码单元1a。在第五个T时间内,获取编码单元2a并向视频解码模块620发送编码单元2a。在第六个T时间内,获取编码单元3a并向视频解码模块620发送编码单元3a。在第七个T时间内,获取编码单元4a并向视频解码模块620发送编码单元4a。
S705,第二电子设备根据第一编码单元获得第一子单元和第三子单元;在获得第一子单元和第三子单元后,根据第二编码单元获得第二子单元和第四子单元。
结合图13,视频解码模块620可对第一编码单元进行解码,获得第一子单元和第三子单元。可选地,视频解码模块620可通过人脸深度图像缓存将第一子单元传输至人脸三维生成模块440,通过人脸二维图像缓存将第三子单元传输至人脸三维生成模块440。类似地,视频解码模块620可对第二编码单元进行解码,获得第二子单元和第四子单元。可选地,视频解码模块620可通过人脸深度图像缓存将第二子单元传输至人脸三维生成模块440,通过人脸二维图像缓存将第四子单元传输至人脸三维生成模块440。
结合图11,视频解码模块620在第五个T时间内,对编码单元1a解码获得子单元1和子单元5,并发送给人脸三维生成模块630。在第六个T时间内,对编码单元2a解码获得子单元2和子单元6,并发送给人脸三维生成模块630。在第七个T时间内,对编码单元3a解码获得子单元3和子单元7,并发送给人脸三维生成模块630。在第八个T时间内,对编码单元4a解码获得子单元4和子单元8,并发送给人脸三维生成模块630。
在一些实施例中,上述S705,可以包括下述步骤十一和步骤十四。
步骤十一,第二电子设备对第一编码单元进行解析,以获得第一子单元和第三编码单元。
结合图13,视频解码模块620可对第一编码单元进行解析,获得第一子单元和第三编码单元。示例性地,视频解码模块620可以对编码单元1a进行解析,获得子单元1和编码单元1b。
步骤十二,第二电子设备对第三编码单元进行解码,以获得第三子单元。
结合图13,视频解码模块620可对第三编码单元进行解码,获得第三子单元。示例性地,视频解码模块620可以对编码单元1b进行解码,获得子单元5。
步骤十三,第二电子设备在对第三编码单元进行解码后,对第二编码单元进行解 析,以获得第二子单元和第四编码单元。
结合图13,视频解码模块620可对第二编码单元进行解析,获得第二子单元和第四编码单元。示例性地,视频解码模块620可以对编码单元2a进行解析,获得子单元2和编码单元2b。
步骤十四,第二电子设备对第四编码单元进行解码,以获得第四子单元。
结合图13,视频解码模块620可对第四编码单元进行解码,获得第四子单元。示例性地,视频解码模块620可以对编码单元2b进行解码,获得子单元6。
类似地,第二电子设备可以采用上述步骤十一至步骤十二所记载的方式,获得子单元3、子单元7以及子单元4、子单元8,此处不再详细赘述。
结合图11,视频解码模块620在第五个T时间内,将编码单元1a进行解析,获得子单元1和编码单元1b,对编码单元1b解码获得子单元5,然后将子单元1和子单元5发送给人脸三维生成模块630。在第六个T时间内,将编码单元2a进行解析,获得子单元2和编码单元2b,对编码单元2b解码获得子单元6,然后将子单元2和子单元6发送给人脸三维生成模块630。在第七个T时间内,将编码单元3a进行解析,获得子单元3和编码单元3b,对编码单元3b解码获得子单元7,然后将子单元3和子单元7发送给人脸三维生成模块630。在第八个T时间内,将编码单元4a进行解析,获得子单元4和编码单元4b,对编码单元4b解码获得子单元8,然后将子单元4和子单元8发送给人脸三维生成模块630。
S706,第二电子设备根据第一子单元和第三子单元获得第一人脸三维子图像;在获得第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像。
结合图13,人脸三维生成模块630可以根据第一子单元和第三子单元获得第一人脸三维子图像,向显示模块640发送第一人脸三维子图像。在获得并发送第一人脸三维子图像后,根据第二子单元和第四子单元获得第二人脸三维子图像,向显示模块640发送第二人脸三维子图像。
结合图11,人脸三维生成模块630可以在第六个T时间内,根据子单元1和子单元5生成人脸三维子图像1,并将其发送给显示模块640。然后,在第七个T时间内,根据子单元2和子单元6生成人脸三维子图像2,并将其发送给显示模块640。在第八个T时间内,根据子单元3和子单元7生成人脸三维子图像3,并将其发送给显示模块640。在第九个T时间内,根据子单元4和子单元8生成人脸三维子图像4,并将其发送给显示模块640。
可选地,人脸三维生成模块630可以将场景二维图像发送给显示模块640。当人脸二维图像包括场景图像时,该场景二维图像可以是人脸三维生成模块630根据人脸二维图像获得的。当人脸二维图像不包括场景图像时,该场景二维图像可以是第一电子设备发送给人脸三维生成模块630的。
S707,第二电子设备将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景二维图像进行叠加。
结合图13,显示模块640可以将第一人脸三维子图像与场景二维图像进行叠加;在将第一人脸三维子图像与场景二维图像进行叠加后,将第二人脸三维子图像与场景 二维图像进行叠加。可选地,显示模块640的显示子系统可以通过预览缓存流水线式接收多个人脸三维子图像和场景二维图像,并流水线式将人脸三维子系统与场景二维图像进行叠加,获得人脸三维图像,然后传输至显示屏进行显示。
示例性地,显示模块640可以将人脸三维图像与场景二维图像进行叠加,以获得人脸三维图像。具体地,结合图11,在第七个T时间内,显示模块640可以将人脸三维子图像1与场景二维图像进行叠加。在第八个T时间内,将人脸三维子图像2与场景二维图像进行叠加。在第九个T时间内,将人脸三维子图像3与场景二维图像进行叠加。在第十个T时间内,将人脸三维子图像4与场景二维图像进行叠加,从而获得人脸三维图像。
在一种可能的设计方式中,本申请实施例提供的三维视频通话方法,还可以包括:检测调整动作,响应于调整动作,调整显示人脸三维图像中人脸的角度。
结合图13,触控模块650,用于检测调整动作,显示模块640,可用于响应于该调整动作,调整显示人脸三维图像中人脸的角度。
示例性地,调整动作可为用户设置的左偏角度或右偏角度。例如,电子设备可以将人脸三维图像正面显示的角度确定为0度,电子设备的显示界面可包括左偏角度设置区域和右偏角度设置区域,用户可通过设置左偏角度设置区域或右偏角度设置区域的数值来调整人脸显示的角度。
示例性地,调整动作可为用户作用于触摸屏的旋转动作。例如,用户将两个手指放置在触摸屏上以顺时针或逆时针方向旋动。如图14中(a)所示,当前三维视频中,显示人脸的正面三维图像,用户用两个手指在放置在触摸屏上以顺时针方向旋转后,调整人脸的显示角度如图14中(b)所示,显示人脸的左侧面三维图像。
需要说明的是,本申请上述实施例提供的三维视频通话方法是以将通话视频中人脸进行三维显示为例进行阐述的,本申请实施例提供的三维视频通话方法还可以将通话视频中人脸和场景均进行三维显示,具体实现方式与上述S701-S707类似,上述人脸深度信息可以包括人脸和场景的深度信息,人脸深度图像可以包括人脸的深度图像和场景的深度图像,此处不再详细赘述。
基于图7所示的三维视频通话方法,第一电子设备将人脸深度图像和人脸二维图像分别划分为多个子单元,并以一对子单元为粒度流水线式对人脸深度图像和人脸二维图像进行编码,且流水线式向第二电子设备发送,第二电子设备流水线式接收并解码,获得人脸深度图像的子单元和人脸二维图像的子单元,流水线式获取人脸三维子图像。如此,可以降低第二电子设备获取人脸三维图像的时延,从而降低获取三维视频的时延。
本申请实施例提供一种三维视频通话系统。该系统包括上述一个或多个第一电子设备,以及一个或多个第二电子设备。
本申请实施例提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序或指令,当计算机程序或指令在计算机上运行时,使得该计算机执行上述方法实施例所述的三维视频通话方法。
本申请实施例提供一种计算机程序产品,该计算机程序产品包括:计算机程序或指令,当计算机程序或指令在计算机上运行时,使得该计算机执行上述方法实施例所 述的三维视频通话方法。
上述实施例,可以全部或部分地通过软件、硬件(如电路)、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为电子设备、通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
本申请中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元或模块及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元或模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元或模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或模块可以结合或者可以集成到另一个系统,或一些单元或模块可以忽略,或其对应的功能不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元/模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元/模块可以是或者也可以不是物理上分开的,作为单元/模块显示的部件可以是或者也可以不是物理单元/模块,即可以位于一个地方,或者也可以分布到多个网络单元/模块上。可以根据实际的需要选择其中的部分或者全部单 元/模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元/模块可以集成在一个处理单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一个单元/模块中。
所述功能如果以软件功能单元/模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
在本申请实施例中,在无逻辑矛盾的前提下,各实施例之间可以相互引用,例如方法实施例之间的方法和/或术语可以相互引用,例如装置实施例之间的功能和/或术语可以相互引用,例如装置实施例和方法实施例之间的功能和/或术语可以相互引用。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种电子设备,其特征在于,包括:人脸图像采集模块、视频编码模块和网络传输模块;其中,
    所述人脸图像采集模块,用于获取人脸深度图像和人脸二维图像;将所述人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将所述人脸二维图像划分为包括第三子单元和第四子单元的多个子单元;向所述视频编码模块发送所述第一子单元和所述第三子单元;在发送所述第一子单元和所述第三子单元后,向所述视频编码模块发送所述第二子单元和所述第四子单元;其中,所述第一子单元对应于所述第三子单元,所述第二子单元对应于所述第四子单元;
    所述视频编码模块,用于根据所述第一子单元和所述第三子单元获得第一编码单元,向所述网络传输模块发送所述第一编码单元;在获得并发送所述第一编码单元后,根据所述第二子单元和所述第四子单元获得第二编码单元,向所述网络传输模块发送所述第二编码单元;
    所述网络传输模块,用于向第二电子设备发送所述第一编码单元;在发送所述第一编码单元后,向所述第二电子设备发送所述第二编码单元。
  2. 根据权利要求1所述的电子设备,其特征在于,
    所述人脸图像采集模块,具体用于接收人脸深度信息;接收人脸二维信息;根据所述人脸深度信息获得所述人脸深度图像,根据所述人脸二维信息获得所述人脸二维图像。
  3. 根据权利要求1或2所述的电子设备,其特征在于,
    所述视频编码模块,具体用于:
    对所述第三子单元进行编码,以获得第三编码单元;根据所述第一子单元和所述第三编码单元,获得第一编码单元;以及,
    对所述第四子单元进行编码,以获得第四编码单元;根据所述第二子单元和所述第四编码单元,获得第二编码单元。
  4. 根据权利要求1-3中任一项所述的电子设备,其特征在于,还包括:人脸三维生成模块和显示模块;其中,
    所述人脸三维生成模块,用于根据所述第一子单元和所述第三子单元获得第一人脸三维子图像,向所述显示模块发送所述第一人脸三维子图像;在获得并发送所述第一人脸三维子图像后,根据所述第二子单元和所述第四子单元获得第二人脸三维子图像,向所述显示模块发送所述第二人脸三维子图像;
    所述显示模块,用于将所述第一人脸三维子图像与场景二维图像进行叠加;在将所述第一人脸三维子图像与所述场景二维图像进行叠加后,将所述第二人脸三维子图像与所述场景二维图像进行叠加。
  5. 一种电子设备,其特征在于,包括:网络传输模块、视频解码模块、三维人脸生成模块和显示模块;其中,
    所述网络传输模块,用于接收来自第一电子设备的第一编码单元,向所述视频解码模块发送所述第一编码单元;在接收并发送所述第一编码单元后,接收来自所述第一电子设备的第二编码单元,向所述视频解码模块发送所述第二编码单元;
    所述视频解码模块,用于根据所述第一编码单元获得第一子单元和第三子单元;在获得所述第一子单元和所述第三子单元后,根据所述第二编码单元获得第二子单元和第四子单元;其中,所述第一子单元和所述第二子单元分别为人脸深度图像中的子单元,所述第三子单元和第四子单元分别为所述人脸二维图像中的子单元,所述第一子单元对应于所述第三子单元,所述第二子单元对应于所述第四子单元;
    所述三维人脸生成模块,用于根据所述第一子单元和所述第三子单元获得第一人脸三维子图像,向所述显示模块发送所述第一人脸三维子图像;在获得并发送所述第一人脸三维子图像后,根据所述第二子单元和所述第四子单元获得第二人脸三维子图像,向所述显示模块发送所述第二人脸三维子图像;
    所述显示模块,用于将所述第一人脸三维子图像与场景二维图像进行叠加;在将所述第一人脸三维子图像与所述场景二维图像进行叠加后,将所述第二人脸三维子图像与所述场景二维图像进行叠加。
  6. 根据权利要求5所述的电子设备,其特征在于,
    所述视频解码模块,还用于对所述第一编码单元进行解析,以获得所述第一子单元和第三编码单元;对所述第三编码单元进行解码,以获得所述第三子单元;在对所述第三编码单元进行解码后,对所述第二编码单元进行解析,以获得所述第二子单元和第四编码单元;对所述第四编码单元进行解码,以获得所述第四子单元。
  7. 一种三维视频通话方法,其特征在于,包括:
    获取人脸深度图像和人脸二维图像;将所述人脸深度图像划分为包括第一子单元和第二子单元的多个子单元;将所述人脸二维图像划分为包括第三子单元和第四子单元的多个子单元;其中,所述第一子单元对应于所述第三子单元,所述第二子单元对应于所述第四子单元;
    根据所述第一子单元和所述第三子单元获得第一编码单元;在获得所述第一编码单元后,根据所述第二子单元和所述第四子单元获得第二编码单元;
    向第二电子设备发送所述第一编码单元;在发送所述第一编码单元后,向所述第二电子设备发送所述第二编码单元。
  8. 根据权利要求7所述的三维视频通话方法,其特征在于,所述获取人脸深度图像和人脸二维图像,包括:
    接收人脸深度信息;接收人脸二维信息;根据所述人脸深度信息获得所述人脸深度图像,根据所述人脸二维信息获得所述人脸二维图像。
  9. 根据权利要求7或8所述的三维视频通话方法,其特征在于,还包括:
    对所述第三子单元进行编码,以获得第三编码单元;根据所述第一子单元和所述第三编码单元,获得第一编码单元;以及,
    对所述第四子单元进行编码,以获得第四编码单元;根据所述第二子单元和所述第四编码单元,获得第二编码单元。
  10. 根据权利要求7-9中任一项所述的三维视频通话方法,其特征在于,还包括:
    根据所述第一子单元和所述第三子单元获得第一人脸三维子图像;在获得所述第一人脸三维子图像后,根据所述第二子单元和所述第四子单元获得第二人脸三维子图像;
    将所述第一人脸三维子图像与场景二维图像进行叠加;在将所述第一人脸三维子图像与所述场景二维图像进行叠加后,将所述第二人脸三维子图像与所述场景二维图像进行叠加。
  11. 一种三维视频通话方法,其特征在于,包括:
    接收来自第一电子设备的第一编码单元;在接收所述第一编码单元后,接收来自所述第一电子设备的第二编码单元;
    根据所述第一编码单元获得第一子单元和第三子单元;在获得所述第一子单元和所述第三子单元后,根据所述第二编码单元获得第二子单元和第四子单元;其中,所述第一子单元和所述第二子单元分别为人脸深度图像中的子单元,所述第三子单元和第四子单元分别为所述人脸二维图像中的子单元,所述第一子单元对应于所述第三子单元,所述第二子单元对应于所述第四子单元;
    根据所述第一子单元和所述第三子单元获得第一人脸三维子图像;在获得所述第一人脸三维子图像后,根据所述第二子单元和所述第四子单元获得第二人脸三维子图像;
    将所述第一人脸三维子图像与场景二维图像进行叠加;在将所述第一人脸三维子图像与所述场景二维图像进行叠加后,将所述第二人脸三维子图像与所述场景二维图像进行叠加。
  12. 根据权利要求11所述的三维视频通话方法,其特征在于,还包括:
    对所述第一编码单元进行解析,以获得所述第一子单元和第三编码单元;对所述第三编码单元进行解码,以获得所述第三子单元;在对所述第三编码单元进行解码后,对所述第二编码单元进行解析,以获得所述第二子单元和第四编码单元;对所述第四编码单元进行解码,以获得所述第四子单元。
  13. 一种三维视频通话系统,其特征在于,所述系统包括如权利要求1-4中任一项所述的电子设备,以及如权利要求5-6中任一项所述的电子设备。
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得所述计算机执行如权利要求7-12中任一项所述的三维视频通话方法。
  15. 一种计算机程序产品,其特征在于,所述计算机程序产品包括:计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得所述计算机执行如权利要求7-12中任一项所述的三维视频通话方法。
PCT/CN2021/070536 2021-01-06 2021-01-06 三维视频通话方法及电子设备 WO2022147698A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/070536 WO2022147698A1 (zh) 2021-01-06 2021-01-06 三维视频通话方法及电子设备
CN202180087392.8A CN116711303A (zh) 2021-01-06 2021-01-06 三维视频通话方法及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/070536 WO2022147698A1 (zh) 2021-01-06 2021-01-06 三维视频通话方法及电子设备

Publications (1)

Publication Number Publication Date
WO2022147698A1 true WO2022147698A1 (zh) 2022-07-14

Family

ID=82357025

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070536 WO2022147698A1 (zh) 2021-01-06 2021-01-06 三维视频通话方法及电子设备

Country Status (2)

Country Link
CN (1) CN116711303A (zh)
WO (1) WO2022147698A1 (zh)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453662A (zh) * 2007-12-03 2009-06-10 华为技术有限公司 立体视频通信终端、系统及方法
CN101765022A (zh) * 2010-01-22 2010-06-30 浙江大学 一种基于光流与图像分割的深度表示方法
CN102413306A (zh) * 2011-11-21 2012-04-11 康佳集团股份有限公司 基于3d电视机的三维立体视频通话方法及3d电视机
CN103024337A (zh) * 2012-12-31 2013-04-03 信利光电(汕尾)有限公司 一种实现三维可视电话机的通信方法及装置
CN103208110A (zh) * 2012-01-16 2013-07-17 展讯通信(上海)有限公司 视频图像的转换方法及装置
JP5759439B2 (ja) * 2012-10-19 2015-08-05 日本電信電話株式会社 映像コミュニケーションシステム及び映像コミュニケーション方法
JP5833526B2 (ja) * 2012-10-19 2015-12-16 日本電信電話株式会社 映像コミュニケーションシステム及び映像コミュニケーション方法
US20180020180A1 (en) * 2016-06-08 2018-01-18 Maxst Co., Ltd. System and method for video call using augmented reality
CN108632597A (zh) * 2018-05-06 2018-10-09 Oppo广东移动通信有限公司 三维视频通信方法及系统、电子装置和可读存储介质
CN108769646A (zh) * 2018-05-06 2018-11-06 Oppo广东移动通信有限公司 三维视频通信方法及系统、电子装置、服务器和可读存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453662A (zh) * 2007-12-03 2009-06-10 华为技术有限公司 立体视频通信终端、系统及方法
CN101765022A (zh) * 2010-01-22 2010-06-30 浙江大学 一种基于光流与图像分割的深度表示方法
CN102413306A (zh) * 2011-11-21 2012-04-11 康佳集团股份有限公司 基于3d电视机的三维立体视频通话方法及3d电视机
CN103208110A (zh) * 2012-01-16 2013-07-17 展讯通信(上海)有限公司 视频图像的转换方法及装置
JP5759439B2 (ja) * 2012-10-19 2015-08-05 日本電信電話株式会社 映像コミュニケーションシステム及び映像コミュニケーション方法
JP5833526B2 (ja) * 2012-10-19 2015-12-16 日本電信電話株式会社 映像コミュニケーションシステム及び映像コミュニケーション方法
CN103024337A (zh) * 2012-12-31 2013-04-03 信利光电(汕尾)有限公司 一种实现三维可视电话机的通信方法及装置
US20180020180A1 (en) * 2016-06-08 2018-01-18 Maxst Co., Ltd. System and method for video call using augmented reality
CN108632597A (zh) * 2018-05-06 2018-10-09 Oppo广东移动通信有限公司 三维视频通信方法及系统、电子装置和可读存储介质
CN108769646A (zh) * 2018-05-06 2018-11-06 Oppo广东移动通信有限公司 三维视频通信方法及系统、电子装置、服务器和可读存储介质

Also Published As

Publication number Publication date
CN116711303A (zh) 2023-09-05

Similar Documents

Publication Publication Date Title
KR102122476B1 (ko) 화면의 회전을 컨트롤할 수 있는 전자 장치 및 방법
US20180063512A1 (en) Image streaming method and electronic device for supporting the same
CN117063461A (zh) 一种图像处理方法和电子设备
US20220241689A1 (en) Game Character Rendering Method And Apparatus, Electronic Device, And Computer-Readable Medium
CN110070496B (zh) 图像特效的生成方法、装置和硬件装置
US20130278728A1 (en) Collaborative cross-platform video capture
US11615576B2 (en) Artificial reality system using superframes to communicate surface data
WO2024055797A9 (zh) 一种录像中抓拍图像的方法及电子设备
KR20150027934A (ko) 다각도에서 촬영된 영상을 수신하여 파일을 생성하는 전자 장치 및 방법
US20240013432A1 (en) Image processing method and related device
US20230335081A1 (en) Display Synchronization Method, Electronic Device, and Readable Storage Medium
WO2022147698A1 (zh) 三维视频通话方法及电子设备
US20220222888A1 (en) Reduction of the effects of latency for extended reality experiences
CN115204117A (zh) 一种协同编辑方法和终端设备
WO2023005751A1 (zh) 渲染方法及电子设备
WO2022170866A1 (zh) 数据传输方法、装置及存储介质
US11212442B2 (en) Creation and display of a 360 degree looping video file
CN113784105A (zh) 一种沉浸式vr终端的信息处理方法及系统
EP4321994A1 (en) Display method and electronic device
CN116744106B (zh) 相机应用的控制方法和终端设备
CN115802147B (zh) 一种录像中抓拍图像的方法及电子设备
WO2022206600A1 (zh) 一种投屏方法、系统及相关装置
CN116737097B (zh) 一种投屏图像处理方法及电子设备
WO2024051634A1 (zh) 一种投屏显示的方法、系统以及电子设备
WO2023169276A1 (zh) 投屏方法、终端设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21916756

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180087392.8

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21916756

Country of ref document: EP

Kind code of ref document: A1