CN117041231A - Video transmission method, system, storage medium and device for online conference - Google Patents

Video transmission method, system, storage medium and device for online conference Download PDF

Info

Publication number
CN117041231A
CN117041231A CN202310848320.1A CN202310848320A CN117041231A CN 117041231 A CN117041231 A CN 117041231A CN 202310848320 A CN202310848320 A CN 202310848320A CN 117041231 A CN117041231 A CN 117041231A
Authority
CN
China
Prior art keywords
face image
moment
image
key points
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310848320.1A
Other languages
Chinese (zh)
Inventor
蒙浩程
张定乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qishuo Shenzhen Technology Co ltd
Original Assignee
Qishuo Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qishuo Shenzhen Technology Co ltd filed Critical Qishuo Shenzhen Technology Co ltd
Priority to CN202310848320.1A priority Critical patent/CN117041231A/en
Publication of CN117041231A publication Critical patent/CN117041231A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • H04L65/1089In-session procedures by adding media; by removing media
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application discloses a video transmission method for online conferences, which comprises the following steps: the method comprises the steps that a first terminal obtains a face image at a moment t; the face image at the time t is used as a reference image and is transmitted to at least one corresponding second terminal; the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n; and at least one second terminal determines a face image at the moment t+i according to the reference image and the key points of the face image. The method effectively reduces the data quantity and the data transmission time of transmission, and can maintain the quality of video pictures while reducing the requirement on network bandwidth.

Description

Video transmission method, system, storage medium and device for online conference
Technical Field
The application relates to the technical field of digital image processing and intelligent recognition, in particular to a video transmission method, a system, a storage medium and equipment for online conferences.
Background
With the development of modern infrastructure, remote communities are gradually upgrading, data communications are becoming more and more necessary, remote regional infrastructure is improving, and demand for modern communications is also increasing, electronic communications being used for a variety of purposes including on-line medical services, emergency response, evacuation conferencing and law enforcement, business, and video teleconferencing with entertainment services.
In electronic communication transmission, particularly in video image transmission, the image is typically transmitted in data packets or in frame-by-frame form, and when the image changes, some portions of the image in one frame are identical to corresponding portions in an adjacent frame, transmitting an entire data frame with the same data as the adjacent frame wastes data and time.
In addition, when a video frame is transmitted through the wireless communication network, the situations of insufficient transmission data, video frame delay and the like may be caused due to bandwidth limitation, thereby affecting the quality of video communication.
Disclosure of Invention
Based on this, it is necessary to address the above-described problems, and a video transmission method for an online conference is proposed.
A video transmission method for an online conference, the method comprising the steps of:
the method comprises the steps that a first terminal obtains an original face image at a moment t;
taking the original face image at the time t as a reference image and transmitting the reference image to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and at least one second terminal determines a face image at the moment t+i according to the reference image and the key points of the face image.
In the above scheme, the acquiring the face image at the time t specifically includes: the original face image is shot in real time to be used as the face image at the moment t, or the face image is inquired from the existing face database to be used as the face image at the moment t.
In the above scheme, the capturing the face image in real time by using the image device as the face image at the time t further includes:
shooting an original face image in real time by using external image equipment or a first terminal; the method comprises the steps of carrying out a first treatment on the surface of the
Acquiring a foreground part and a background part of the original face image according to a GrabCut algorithm, and dividing the foreground part and the background part;
and acquiring a foreground part of the face image as a face image at the moment t.
In the above scheme, the face key points include internal key points and contour key points.
In the above scheme, the face key points include internal key points and contour key points, and specifically include: the internal key points are used for describing the shape, position and size of eyebrows, eyes, nose and mouth; the contour key points are used for describing the shape, position and size of the contour of the human face and the ears.
In the above scheme, the determining the face image at the time t+i according to the reference image and the key points of the face image specifically includes: and combining the reference image and the key points of the face image by using the stylegan network model to generate the face image at the moment t+i.
The application also proposes a video transmission system for an online conference, characterized in that it comprises: the device comprises a face image acquisition unit, a face image key point acquisition unit, a face image determination unit and a transmission acquisition unit;
the face image acquisition unit is used for acquiring an original face image at the moment t and taking the original face image as a reference image;
the face image key point acquisition unit is used for acquiring face image key points at the moment t+i;
the transmission acquisition unit is used for transmitting the original face image at the moment t and the key points of the face image at the moment t+i to at least one corresponding second terminal;
the face image determining unit is used for determining a face image at the time t+i according to the reference image and the key points of the face image.
The application also proposes a readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the method comprises the steps that a first terminal obtains an original face image at a moment t;
taking the original face image at the time t as a reference image and transmitting the reference image to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and at least one second terminal determines a face image at the moment t+i according to the reference image and the key points of the face image.
The application also proposes a computer device comprising a memory and a processor, said memory storing a computer program, said computer program being executed by said processor to perform the steps of:
the method comprises the steps that a first terminal obtains an original face image at a moment t;
taking the original face image at the time t as a reference image and transmitting the reference image to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and at least one second terminal determines a face image at the moment t+i according to the reference image and the key points of the face image.
The embodiment of the application has the following beneficial effects: firstly, acquiring an original face image at a moment t through a first terminal; the original face image at the moment t is used as a reference image and is transmitted to at least one corresponding second terminal; acquiring key points of the face image at the moment t+i through a first terminal, and transmitting the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n; and finally, at least one corresponding second terminal determines the face image at the moment t+i according to the reference image and the key points of the face image, so that the problems of large data volume and low transmission speed in the traditional video transmission method are solved, the network bandwidth requirement is reduced, the quality of video pictures is maintained, and the improvement of the on-line video conference development efficiency is facilitated.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is a flow chart of a video transmission method for online conferencing in one embodiment;
FIG. 2 is a schematic flow chart of acquiring an original face image in one embodiment;
fig. 3 is a schematic flow chart of generating a face image at time t+i by at least one second terminal according to a stylegan network model in an embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it is to be understood that the terms "comprises" and/or "comprising" when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes any and all combinations of the associated listed items.
Alternative embodiments of the application are described in detail below, however, the application may have other implementations in addition to these detailed descriptions.
As shown in fig. 1, in one embodiment, a video transmission method for an online conference is provided, which includes steps S101 to S104, and is described in detail as follows:
s101, a first terminal acquires an original face image at a moment t;
in some embodiments, acquiring a face image at time t specifically includes: the original face image is shot in real time to be used as the face image at the moment t, or the face image is inquired from the existing face database to be used as the face image at the moment t.
As shown in fig. 2, in some embodiments, capturing the face image in real time as the original face image at time t further includes the steps of:
s110, shooting an original face image in real time by using external image equipment or a first terminal;
generally, the first terminal can be used for directly shooting pictures, but if the requirement on the picture accuracy of the pictures is higher, high-definition original face images can be obtained by shooting through professional image equipment, and the real-time shooting of the original face images at least can ensure that the five sense organs of the faces can be clearly distinguished, and no shielding object exists.
S111, acquiring a foreground part and a background part of an original face image according to a GrabCut algorithm, and dividing the foreground part and the background part;
specifically, the GrabCut algorithm is an image segmentation algorithm implemented based on graph cut, and the method requires a user to input a segmentation box as a segmentation target position to realize the separation and segmentation of a target and a background, and specifically comprises the following implementation steps:
defining rectangle(s) containing the object in the original face picture, the area outside the rectangle being automatically considered as the background;
for a user-defined rectangular region, the data in the background can be used to distinguish between foreground and background regions within it;
modeling the background and foreground according to a Gaussian Mixture Model (GMM) and labeling undefined pixels as possible foreground or background;
each pixel in the image is considered to be connected to surrounding pixels by virtual edges, each edge has a probability of belonging to the foreground or the background, each pixel (i.e. a node in the algorithm) is connected to a foreground or background node after the node is connected (possibly connected to the background or the foreground), and if the edges between the nodes belong to different terminals (i.e. one node belongs to the foreground and the other node belongs to the background), the boundary between the nodes is cut off, so that the image parts are segmented.
S112, acquiring a foreground part of the face image as the face image at the moment t.
In addition, the original image obtained by real-time shooting is limited by various conditions and randomly disturbed, so that the face shot under the condition often has the conditions of posture transformation, light shielding, even overexposure and image noise; therefore, under complex real conditions, errors are easily generated in the expression transmitted from the first terminal to the other terminals.
The proper preprocessing can reduce the influence on the recognition effect due to poor image quality, so that in order to ensure that a reference image (an original face image) has extremely high reference significance from the source, the scheme also comprises the step of preprocessing the photographed original face image, thereby obtaining the face image with high image picture contrast, face skin and clear five sense organs.
The preprocessing of the photographed original face image includes: light compensation, gray level transformation, histogram equalization, normalization, geometric correction, filtering, sharpening, face alignment, and the like of the face image.
The process of removing image noise (image filtering) is specifically: under the condition of retaining observable information of the original face image as much as possible, detecting noise appearing in the original face image, filtering the noise, and then filtering the noise through a structural image filter, so that noise is removed efficiently, the characteristics of an image target are retained, and the image contour and the edge are not damaged.
Preferably, for noise frequently occurring in the picture, the characteristics of the noise can be found through a statistical means, so that a general filter, such as mean, median, square, bilateral and the like, is developed to filter the noise.
The process of geometrically correcting the original face image specifically comprises the following steps: the acquired human face may be deformed to a certain extent due to imaging, acquisition angles and other reasons, and the deformation does not cause great interference to naked eyes, but is quite different to a computer, after the original human face image is acquired, the pixels are subjected to coordinate transformation, the arrangement relation among the pixels is changed, and geometric transformations such as scaling, overturning, affine, mapping and the like are performed on the image to maximally eliminate the image distortion condition, so that the characteristics of the image content are acquired.
The face alignment process for the original face image specifically comprises the following steps: the key points of the original face image are obtained, the original face image is calibrated or aligned according to the key points of the face image, and then affine transformation is used for uniformly correcting the face, so that errors caused by different gestures are eliminated.
S102, taking an original face image at the moment t as a reference image and transmitting the reference image to at least one corresponding second terminal;
in some embodiments, the first terminal communicates with at least one second terminal, which may be a multi-person video conference in the same area, or may implement a multi-person video conference across areas.
S103, at least one first terminal acquires key points of the face image at the moment t+i and transmits the key points to a corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
in some embodiments, the face keypoints include interior keypoints and contour keypoints; the face key points comprise internal key points and outline key points, and specifically comprise: internal keypoints are used to describe the shape, position and size of the eyebrows, eyes, nose, mouth; contour keypoints are used to describe the shape, position and size of the face contour and ears.
In some embodiments, if not more than 250 key points need to be transmitted per frame of image according to different requirements of image resolution, each key point contains a two-dimensional coordinate, namely two floating point numbers, each floating point number occupies 4 bytes, thus the data volume per frame is not more than 2000 bytes, and the data volume and the transmission time of transmission are greatly reduced.
In addition, because the number of key points of the face is relatively fixed, the transmitted data amount does not become larger as the video definition increases.
S104, the second terminal determines the face image at the moment t+i according to the reference image and the key points of the face image.
Preferably, the face image at the time t+i is determined according to the reference image and the key points of the face image, which specifically comprises: and combining the reference image and the key points of the face image by using the stylegan network model to generate the face image at the moment t+i.
The stylegan network model can display random changes realized by inputting different noises into the same underlying image, and the noises only affect the random aspect, so that the overall structure, the identity, the face and other advanced features are reserved, a higher-quality high-resolution image can be generated, the unsupervised separation of advanced attributes (face gestures, identities) and random changes (such as freckles and hairs) is realized, the control of specific scale attributes in the generated image is realized, and the accuracy of the combination of key points of the reference image and the face image is guaranteed.
As shown in fig. 3, in some embodiments, the generating, by at least one second terminal, a face image at time t+i according to the stylegan network model specifically includes:
s401, inputting a t moment reference image and a t+i moment face key point transmitted by a first terminal into a stylegan network model;
s402, extracting the face features of the same positions of the reference image and the face key points through a stylegan network model, and dividing the face features into global features, intermediate features and detail features;
s403, replacing global features, intermediate features and detail features of the t-moment reference image by using global features, intermediate features and detail features in the face key points at the t+i moment;
s404, taking the t moment reference image with the global feature, the intermediate feature and the detail feature replaced as a t+i moment face image and outputting the t moment reference image.
Preferably, the global features mainly comprise features of facial pose, hairstyle, facial shape and the like, and the resolution of the global features is 8×8; the mid-level features mainly include finer facial features, hairstyles, eye opening and closing, etc., with a resolution of 32 x 32; the detail features mainly comprise texture and color details of eyes, hair, skin and the like, and the resolution is 64×64.
In summary, the scheme of the application firstly obtains an original face image at the moment t through a first terminal; the original face image at the moment t is used as a reference image and is transmitted to at least one corresponding second terminal; acquiring key points of the face image at the moment t+i through a first terminal, and transmitting the key points to a corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n; finally, the second terminal determines the face image at the moment t+i according to the reference image and the key points of the face image, and the method solves the problems of large data volume and low transmission speed in the traditional video transmission method, reduces the requirement on network bandwidth and simultaneously enables the video transmission to be more efficient and quicker.
The application also provides a video transmission system for online conference, which comprises: the device comprises a face image acquisition unit, a face image key point acquisition unit, a face image determination unit and a transmission acquisition unit;
the face image acquisition unit is used for acquiring an original face image at the moment t and taking the original face image as a reference image;
the face image key point acquisition unit is used for acquiring the face image key points at the time t+i;
the transmission acquisition unit is used for transmitting the original face image at the moment t and the key points of the face image at the moment t+i to at least one corresponding second terminal;
and the face image determining unit is used for determining the face image at the time t+i according to the reference image and the key points of the face image.
The application also proposes a readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
the method comprises the steps that a first terminal obtains an original face image at a moment t;
taking the original face image at the moment t as a reference image and transmitting the reference image to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and the at least one second terminal determines the face image at the moment t+i according to the reference image and the key points of the face image.
The application also proposes a computer device comprising a memory and a processor, the memory storing a computer program, the computer program being executed by the processor to perform the steps of:
the method comprises the steps that a first terminal obtains an original face image at a moment t;
taking the original face image at the moment t as a reference image and transmitting the reference image to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and the at least one second terminal determines the face image at the moment t+i according to the reference image and the key points of the face image.
The foregoing describes certain embodiments of the present disclosure, other embodiments being within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings do not necessarily have to be in the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, devices, non-transitory computer readable storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to portions of the description of method embodiments being relevant.
The apparatus, the device, the nonvolatile computer readable storage medium and the method provided in the embodiments of the present disclosure correspond to each other, and therefore, the apparatus, the device, and the nonvolatile computer storage medium also have similar advantageous technical effects as those of the corresponding method, and since the advantageous technical effects of the method have been described in detail above, the advantageous technical effects of the corresponding apparatus, device, and nonvolatile computer storage medium are not described herein again.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification. It will be appreciated by those skilled in the art that the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (9)

1. A video transmission method for an online conference, the method comprising the steps of:
the method comprises the steps that a first terminal obtains a face image at a moment t;
the face image at the time t is used as a reference image and is transmitted to at least one corresponding second terminal;
the first terminal acquires key points of the face image at the moment t+i and transmits the key points to at least one corresponding second terminal, wherein i is more than or equal to 1 and less than or equal to n;
and at least one second terminal determines a face image at the moment t+i according to the reference image and the key points of the face image.
2. The video transmission method for online conferences according to claim 1, wherein the acquiring the face image at the time t specifically includes: the original face image is shot in real time to be used as the face image at the moment t, or the face image is inquired from the existing face database to be used as the face image at the moment t.
3. The video transmission method for an online conference according to claim 2, wherein capturing the face image in real time as the face image at time t further comprises:
shooting an original face image in real time by using external image equipment or a first terminal;
acquiring a foreground part and a background part of the original face image according to a GrabCut algorithm, and dividing the foreground part and the background part;
and acquiring a foreground part of the face image as a face image at the moment t.
4. A video transmission method for an online conference according to claim 3, wherein the face keypoints comprise interior keypoints and contour keypoints.
5. The video transmission method for online conferences of claim 4, wherein the face keypoints comprise interior keypoints and contour keypoints, comprising: the internal key points are used for describing the shape, position and size of eyebrows, eyes, nose and mouth; the contour key points are used for describing the shape, position and size of the contour of the human face and the ears.
6. The video transmission method for online conferences according to any one of claims 1-5, wherein the determining the face image at time t+i according to the reference image and the face image key points specifically includes: and combining the reference image and the key points of the face image by using the stylegan network model to generate the face image at the moment t+i.
7. A video transmission system for an online conference, the system comprising: the device comprises a face image acquisition unit, a face image key point acquisition unit, a face image determination unit and a transmission acquisition unit;
the face image acquisition unit is used for acquiring an original face image at the moment t and taking the original face image as a reference image;
the face image key point acquisition unit is used for acquiring face image key points at the moment t+i;
the transmission acquisition unit is used for transmitting the original face image at the moment t and the key points of the face image at the moment t+i to at least one corresponding second terminal;
the face image determining unit is used for determining a face image at the time t+i according to the reference image and the key points of the face image.
8. A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 6.
9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method as claimed in any one of claims 1 to 6.
CN202310848320.1A 2023-07-11 2023-07-11 Video transmission method, system, storage medium and device for online conference Pending CN117041231A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310848320.1A CN117041231A (en) 2023-07-11 2023-07-11 Video transmission method, system, storage medium and device for online conference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310848320.1A CN117041231A (en) 2023-07-11 2023-07-11 Video transmission method, system, storage medium and device for online conference

Publications (1)

Publication Number Publication Date
CN117041231A true CN117041231A (en) 2023-11-10

Family

ID=88625247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310848320.1A Pending CN117041231A (en) 2023-07-11 2023-07-11 Video transmission method, system, storage medium and device for online conference

Country Status (1)

Country Link
CN (1) CN117041231A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607554A (en) * 2013-10-21 2014-02-26 无锡易视腾科技有限公司 Fully-automatic face seamless synthesis-based video synthesis method
CN108133483A (en) * 2017-12-22 2018-06-08 辽宁师范大学 Red tide method of real-time based on computer vision technique
CN113033442A (en) * 2021-03-31 2021-06-25 清华大学 StyleGAN-based high-freedom face driving method and device
CN113886644A (en) * 2021-09-30 2022-01-04 深圳追一科技有限公司 Digital human video generation method and device, electronic equipment and storage medium
CN114373043A (en) * 2021-12-16 2022-04-19 聚好看科技股份有限公司 Head three-dimensional reconstruction method and equipment
WO2022078066A1 (en) * 2020-10-13 2022-04-21 北京字节跳动网络技术有限公司 Video processing method and system, terminal, and storage medium
CN114841851A (en) * 2022-03-28 2022-08-02 北京达佳互联信息技术有限公司 Image generation method, image generation device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607554A (en) * 2013-10-21 2014-02-26 无锡易视腾科技有限公司 Fully-automatic face seamless synthesis-based video synthesis method
CN108133483A (en) * 2017-12-22 2018-06-08 辽宁师范大学 Red tide method of real-time based on computer vision technique
WO2022078066A1 (en) * 2020-10-13 2022-04-21 北京字节跳动网络技术有限公司 Video processing method and system, terminal, and storage medium
CN113033442A (en) * 2021-03-31 2021-06-25 清华大学 StyleGAN-based high-freedom face driving method and device
CN113886644A (en) * 2021-09-30 2022-01-04 深圳追一科技有限公司 Digital human video generation method and device, electronic equipment and storage medium
CN114373043A (en) * 2021-12-16 2022-04-19 聚好看科技股份有限公司 Head three-dimensional reconstruction method and equipment
CN114841851A (en) * 2022-03-28 2022-08-02 北京达佳互联信息技术有限公司 Image generation method, image generation device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
三分明月落: "From GMM to GrabCut", Retrieved from the Internet <URL:https://blog.csdn.net/qq_40755643/article/details/89480003> *
蒲亮;冯子亮;: "基于长边检测的视频分割算法", 光电子.激光 *
言有三: "【百战GAN】StyleGAN原理详解与人脸图像生成代码实战", Retrieved from the Internet <URL:https://blog.csdn.net/qq_40755643/article/details/89480003> *
辛月兰;: "基于超像素的Grabcut彩色图像分割", 计算机技术与发展, no. 07, 8 April 2013 (2013-04-08) *

Similar Documents

Publication Publication Date Title
CN106778928B (en) Image processing method and device
CN109952594B (en) Image processing method, device, terminal and storage medium
CN109558864A (en) Face critical point detection method, apparatus and storage medium
CN113327278B (en) Three-dimensional face reconstruction method, device, equipment and storage medium
WO2022179401A1 (en) Image processing method and apparatus, computer device, storage medium, and program product
CN112308866B (en) Image processing method, device, electronic equipment and storage medium
CN112348747A (en) Image enhancement method, device and storage medium
CN110781770B (en) Living body detection method, device and equipment based on face recognition
CN109005367B (en) High dynamic range image generation method, mobile terminal and storage medium
CN111383232A (en) Matting method, matting device, terminal equipment and computer-readable storage medium
WO2022135574A1 (en) Skin color detection method and apparatus, and mobile terminal and storage medium
Chen et al. Face swapping: realistic image synthesis based on facial landmarks alignment
CN110853071A (en) Image editing method and terminal equipment
CN111310724A (en) In-vivo detection method and device based on deep learning, storage medium and equipment
CN111860380A (en) Face image generation method, device, server and storage medium
CN111145086A (en) Image processing method and device and electronic equipment
CN112766028A (en) Face fuzzy processing method and device, electronic equipment and storage medium
CN116612263B (en) Method and device for sensing consistency dynamic fitting of latent vision synthesis
CN113658065A (en) Image noise reduction method and device, computer readable medium and electronic equipment
CN112597911A (en) Buffing processing method and device, mobile terminal and storage medium
CN114862729A (en) Image processing method, image processing device, computer equipment and storage medium
CN117041231A (en) Video transmission method, system, storage medium and device for online conference
CN116012418A (en) Multi-target tracking method and device
CN111652792A (en) Image local processing method, image live broadcasting method, image local processing device, image live broadcasting equipment and storage medium
CN111161299A (en) Image segmentation method, computer program, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination