WO2023030321A1 - 视线角度调整方法、装置、电子设备及存储介质 - Google Patents

视线角度调整方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023030321A1
WO2023030321A1 PCT/CN2022/115862 CN2022115862W WO2023030321A1 WO 2023030321 A1 WO2023030321 A1 WO 2023030321A1 CN 2022115862 W CN2022115862 W CN 2022115862W WO 2023030321 A1 WO2023030321 A1 WO 2023030321A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
sight angle
model
line
Prior art date
Application number
PCT/CN2022/115862
Other languages
English (en)
French (fr)
Inventor
李冰川
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023030321A1 publication Critical patent/WO2023030321A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Definitions

  • the present disclosure relates to the technical field of computers, for example, to a method, device, electronic equipment, and storage medium for adjusting a viewing angle.
  • the text to be broadcast can be displayed through the teleprompter, so that the anchor user can interact with other users based on the content on the teleprompter.
  • the present disclosure provides a line of sight angle adjustment method, device, electronic equipment, and storage medium, so as to adjust it to the target line of sight angle and obtain the target face when it is determined that the user's line of sight angle is inconsistent with the target line of sight angle according to the collected images. image, and send the target facial image to at least one client, thereby improving the technical effect of interaction efficiency.
  • the present disclosure provides a method for adjusting a line of sight angle, the method comprising:
  • the facial image to be processed based on the target sight angle adjustment model to obtain a target facial image corresponding to the facial image to be processed; wherein, the target sight angle adjustment model is used to adjust the user's sight angle in the facial image Adjust to target angle;
  • the present disclosure also provides a line-of-sight angle adjustment device, which includes:
  • An image collection module configured to collect the facial image to be processed of the target user
  • the image processing module is configured to process the facial image to be processed based on the target line-of-sight angle adjustment model to obtain a target facial image corresponding to the target line-of-sight angle adjustment model; wherein, the target line-of-sight angle adjustment model is used to The user's line of sight angle in the image is adjusted to the target angle;
  • the image display module is configured to display the target facial image to at least one client.
  • the present disclosure also provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the above-mentioned sight angle adjustment method.
  • the present disclosure also provides a storage medium containing computer-executable instructions, the computer-executable instructions are used to perform the above-mentioned line-of-sight angle adjustment method when executed by a computer processor.
  • the present disclosure also provides a computer program product, including a computer program carried on a non-transitory computer readable medium, where the computer program includes program codes for executing the above method for adjusting the viewing angle.
  • FIG. 1 is a schematic flowchart of a method for adjusting a line of sight angle provided by Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic diagram of a result of line-of-sight angle adjustment provided by Embodiment 1 of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for adjusting a line of sight angle provided by Embodiment 2 of the present disclosure
  • FIG. 4 is a schematic flowchart of a method for adjusting a line of sight angle provided by Embodiment 3 of the present disclosure
  • FIG. 5 is a schematic structural diagram of a line-of-sight angle adjustment device provided by Embodiment 4 of the present disclosure
  • FIG. 6 is a schematic structural diagram of an electronic device provided by Embodiment 5 of the present disclosure.
  • method embodiments may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 is a schematic flow diagram of a method for adjusting a line of sight angle provided by Embodiment 1 of the present disclosure.
  • This embodiment of the present disclosure can be applied to real-time interactive application scenarios supported by the Internet, and can also be applied to non-real-time interactive scenarios. Used in situations where the target user's gaze is focused to a particular angle.
  • the method can be performed by a sight angle adjustment device, which can be implemented in the form of software and/or hardware, for example, implemented by electronic equipment, and the electronic equipment can be a mobile terminal, a personal computer (Personal Computer, PC) terminal or Server etc.
  • the real-time interactive application scenario can usually be implemented by the cooperation of the client and the server.
  • the method provided in this embodiment can be executed by the client, the server, or both.
  • the method provided by the embodiments of the present disclosure can be integrated on any application program or mobile terminal. If the method provided by the embodiment of the present disclosure is integrated in the application program, when the application program starts, the integrated gaze angle adjustment method can be automatically loaded, and when the user’s facial image information is obtained, the face image can be adjusted based on the gaze angle adjustment method deal with. If the method provided by the embodiments of the present disclosure is integrated on the terminal device, the line of sight focusing method can be used as a method running in the background, and when the facial image information is collected, the facial image can be processed based on the line of sight angle adjustment method. That is to say, as long as the facial image information is collected, the technical solution of the present disclosure can be used to focus the line of sight.
  • the user can also set whether to call the line-of-sight angle adjustment method according to actual needs.
  • the user can manually set whether to call the line of sight angle adjustment method. If the user manually sets and calls the line of sight angle adjustment method, the line of sight in the facial image can be focused to the target angle when the facial image is collected. If the user does not manually set the call line of sight If the angle adjustment method is used, then when the facial image is collected, any processing may not be performed on the line of sight in the facial image.
  • the technical solutions provided by the embodiments of the present disclosure can be applied in real-time interactive scenarios, for example, live broadcast, video conferencing, and the like.
  • the anchor user can interact with other users through the terminal device.
  • the facial image to be processed corresponding to the target user can be Adjust the line of sight angle to the target line of sight angle to obtain the target face image, so that other users can watch the target user whose line of sight angle is adjusted to the target line of sight through the client.
  • the anchor user can adjust the sight angle, so that other users can watch the target user whose sight angle is always at the target angle. If it is applied in a non-real-time interactive scene, for example, when a camera is used to take pictures of a user, based on this technical solution, the line-of-sight angle of the captured user can be adjusted to a target angle.
  • the method includes:
  • the line-of-sight angle adjustment method may be integrated in the terminal, or an application program installed on the terminal may integrate the line-of-sight angle adjustment method.
  • an application program installed on the terminal may integrate the line-of-sight angle adjustment method.
  • the camera on the terminal may be called to capture the user's facial image, and the user corresponding to the facial image is used as the target user.
  • the captured facial image is used as the facial image to be processed.
  • user B after user B triggers the target application A to enter the main page, it can trigger the shooting control on the main page.
  • the camera device can be called to take a face image of user B based on the camera device, and display the face image The image is used as the facial image to be processed, and correspondingly, user B is the target user.
  • collecting the to-be-processed facial image of the target user includes: collecting the to-be-processed facial image of the target user among the at least one user when at least one user interacts based on the real-time interactive interface.
  • the real-time interactive interface is any interactive interface in the real-time interactive application scenario.
  • the real-time interactive scene can be realized through the Internet and computer means, for example, an interactive application program realized through a native program or a web program.
  • the real-time interactive application scenarios can be live broadcast scenarios, video conference scenarios, voice broadcast scenarios, and recorded and broadcast video scenarios.
  • the live broadcast scene can include the live broadcast of selling goods in the application program, and the live broadcast scene based on the live broadcast platform; the voice broadcast scene can be that the anchor in the TV station is broadcasting the corresponding content, and the multimedia data stream broadcast by the anchor can be sent to at least one customer based on the camera end scene.
  • facial images to be processed of the target user may be collected periodically. In order to improve the processing accuracy of the facial image, it is also possible to choose to collect the facial image to be processed of the target user in real time.
  • the facial image to be processed of the target user may be collected periodically or in real time, and then the facial image to be processed may be processed to obtain a target facial image corresponding to the facial image to be processed.
  • the real-time interactive interface is generated based on a scene broadcast by an online video.
  • the video broadcast includes the anchor and the viewers who watch the anchor's broadcast.
  • the camera device can capture the face image of the anchor in real time or every few seconds, such as 5 seconds, to obtain the facial image to be processed.
  • anchor users there can be one or more anchor users in a real-time interactive scene.
  • one anchor is mainly for viewing users, and the other
  • the anchor mainly plays a coordinating role, so the line of sight of the coordinating anchor may not be the user's attention.
  • the angle of sight of the anchor user facing the viewing user can be adjusted mainly.
  • the main anchor user and the secondary anchor user can be set in advance, and when the facial images including the two anchor users are captured, the sight angle can only be adjusted for the main anchor user;
  • the anchor user is not included in the video, it is not necessary to adjust the sight angle of the collected facial images.
  • all the hosts can also be used as target users. At this time, as long as the facial images to be processed of the target users are collected, the line of sight angles of the target users can be adjusted.
  • collecting the facial image to be processed of the target user among the at least one user includes: when at least one user interacts based on the real-time interactive interface, determining the current The speaking user takes the speaking user as a target user; collects the facial image of the target user to be processed based on the camera module.
  • each participating user can be used as a target user; in order to improve the fun and viewability of live video broadcasts, multiple anchors may connect to each other.
  • Mai users are the target users.
  • this technical solution can be used to collect facial images of each target user, and then adjust the line-of-sight angle in the facial images, and then send the focused target facial images to at least one user in the form of a multimedia data stream.
  • the client terminal so that the line of sight of the target user seen by the viewing user is adjusted by the line of sight angle.
  • the speaking user may be determined in real time, and the speaking user may be used as a target user.
  • the facial image collected by the camera device corresponding to the target user may be used as the facial image to be processed.
  • collecting the facial image to be processed of the target user may be: when a preset trigger event is detected, collecting the facial image to be processed of the target user based on the camera module.
  • the preset event may be triggering a wake-up word, triggering a sight adjustment control, or detecting that a user appears in front of the display screen, and it can be determined that the preset event is triggered.
  • the target user has activated the sight angle adjustment model, and the facial image collected by the camera device can be used as the facial image to be processed.
  • S120 Process the facial image to be processed based on the target line-of-sight angle adjustment model to obtain a target facial image corresponding to the facial image to be processed.
  • the facial image When the user's facial image is captured based on the camera device, the facial image usually includes the facial features of the user. In this embodiment, more attention is paid to the user's line of sight features.
  • the facial image captured by the camera device installed on the terminal will have a certain line-of-sight angle difference, resulting in the user's line-of-sight angle in the facial image captured by the camera device being different.
  • the application is used in a voice broadcast scenario, the user watches the contents of the teleprompter for broadcast, and when the user looks at the content of the teleprompter, the angle of sight deviation will occur , resulting in a poor user experience for viewing broadcasts.
  • the target gaze angle adjustment model is a pre-trained model for adjusting the user's gaze angle in facial images to a target angle.
  • the target facial image is an image obtained after adjusting the gaze angle in the facial image to be processed to the target angle through the target gaze angle adjustment model. That is, the user's sight angle in the target facial image is adjusted to a preset target angle.
  • the target angle may be an angle at which the user's line of sight is perpendicular to the display screen, that is, an angle at which the user's line of sight looks squarely at the display screen.
  • the target angle can be any preset angle.
  • the target angle may be an angle at which the sight line of the target user and the camera device are on a horizontal line.
  • the face image to be processed can be input into the target line of sight angle adjustment model to perform line of sight angle adjustment processing, so as to adjust the line of sight angle in the face image to be processed to the target angle, that is, non-orthogonal The line of sight angle or the front line of sight angle is adjusted to the target angle.
  • the user's line of sight angle in the facial image to be processed may be consistent with the target angle, or may not be consistent with the target angle.
  • the line of sight angle in the facial image to be processed can be determined in advance after the facial image to be processed is obtained. Is it consistent with the target angle.
  • the feature detection module it is determined whether the line-of-sight feature in the facial image to be processed matches the preset line-of-sight feature; if the line-of-sight feature in the facial image to be processed does not match the preset line-of-sight feature, based on the The target line-of-sight angle adjustment model processes the facial image to be processed to obtain the target facial image.
  • the feature detection module is used to detect the features of the user's line of sight, and is mainly used to determine whether the user's line of sight angle is consistent with the target angle.
  • a preset line-of-sight feature is a feature that matches the target angle.
  • the preset line-of-sight features may be features such as eyelids and pupils, for example, whether the pupils are in the center of the eyes or not.
  • the facial image to be processed can be processed based on the feature detection module to determine whether the line of sight feature in the facial image to be processed matches the preset line of sight feature, if the line of sight feature in the facial image to be processed matches the preset line of sight feature Assuming that the line-of-sight features are inconsistent, it means that the line-of-sight angle of the target user is inconsistent with the target angle.
  • the face image to be processed can be processed based on the target line-of-sight angle adjustment model.
  • the target gaze angle adjustment model is a model that adjusts the user's gaze angle in the facial image to be processed to the target angle, so the output image based on the target gaze angle adjustment model is the target facial image consistent with the target angle.
  • the feature of the user's line of sight in the target facial image is different from the feature of the user's line of sight in the facial image to be processed, and the features of other facial images are completely the same.
  • the processing the facial image to be processed based on the target sight angle adjustment model to obtain the target facial image includes: inputting the pending facial image into the target sight angle adjustment model to obtain the The target facial image; wherein, the line-of-sight angle in the target facial image is different from the line-of-sight angle in the image to be processed.
  • the number of at least one client can be one or more.
  • the client may be a client to which the target user belongs, or a client associated with the target user.
  • the application scene is a live broadcast scene
  • the facial image to be processed is the facial image of the live broadcast user
  • the target facial image may be an image obtained after the gaze angle in the facial image to be processed is adjusted to the target gaze angle.
  • the client can be the client of each viewing user watching the live broadcast, that is, after determining the target facial image corresponding to the anchor, the target facial image can be sent to each user watching the live broadcast in the form of a data stream, and at the same time , may also present the target facial image on the target client to which the target user belongs.
  • the displaying the target facial image to at least one client includes: sending the multimedia data stream corresponding to the target facial image to at least one client associated with the target user for display.
  • the multimedia data stream corresponding to the target facial image is sent to the clients of other associated users, so that other users see the target user who is looking straight ahead, Thereby, the effect of interacting with the target user is improved.
  • the facial image to be processed is a non-frontal image.
  • the face image to be processed can be input into the pre-trained target gaze angle adjustment model to obtain the front view image as shown in FIG. 2 , and the gaze angle in the front view image is consistent with the target gaze angle.
  • the user's line of sight in the facial image to be processed is focused on the target by adjusting the model based on the pre-trained target sight angle adjustment model to process the facial image to be processed.
  • sight angle and display the target facial image focused on the target sight angle to other clients, which solves the problem of poor interaction effect caused by line of sight deviation or unfocused line of sight when broadcasting based on voice in related technologies, and realizes
  • the target user interacts with other users through the terminal, the user's line of sight can be automatically focused to the target line of sight angle, thereby improving the technical effect of the interaction efficiency of the target user's interaction with other interactive users.
  • Fig. 3 is a schematic flow chart of a method for adjusting a line of sight angle provided by Embodiment 2 of the present disclosure.
  • the target line of sight can be obtained by training Angle adjustment model, wherein technical terms that are the same as or corresponding to those in the above-mentioned embodiments will not be repeated here.
  • the method includes:
  • the training sample set includes a plurality of training samples, and each training sample includes a target line-of-sight angle image and a non-target line-of-sight angle image, and the training samples are determined based on a pre-trained target sample generation model.
  • the line-of-sight angle of the user in the target line-of-sight angle image is consistent with the preset line-of-sight angle.
  • the non-target line-of-sight angle image is a facial image in which the user's line-of-sight is inconsistent with the target line-of-sight angle.
  • the target sample generation model can be understood as a model that generates training samples.
  • a target sample generation model can be obtained by training first.
  • the target sample generation model includes a positive sample generation sub-model and a negative sample generation sub-model.
  • the positive sample generation sub-model is used to generate the target line-of-sight angle image in the training sample.
  • the user's line-of-sight angle in the target line-of-sight angle image is consistent with the target line-of-sight angle.
  • the negative sample generation sub-model is used to generate the non-target gaze angle image in the training sample, and the user's gaze angle in the non-target gaze angle image is inconsistent with the target gaze angle.
  • the adjustment model of the sight line angle to be trained can be trained according to each training sample in the training sample set, so as to obtain the target line of sight angle adjustment model.
  • Each non-target line-of-sight angle image in the training sample can be used as the input of the line-of-sight angle adjustment model to be trained, and the target line-of-sight angle image corresponding to the non-target line-of-sight angle image is compared with the output of the line-of-sight angle adjustment model to be trained to adjust the line-of-sight angle adjustment model to be trained.
  • Train the model parameters in the Gaze Angle Tuning Model When it is detected that the loss function in the line-of-sight angle adjustment model to be trained converges, it is determined that the target line-of-sight angle adjustment model has been trained.
  • S230 Determine a loss value according to the actual output image of the current training sample and the target sight angle image, and adjust the sight angle adjustment model to be trained based on the loss value and the preset loss function of the sight angle adjustment model to be trained. model parameters.
  • S250 Collect the to-be-processed facial image of the target user.
  • S260 Process the facial image to be processed based on the target line-of-sight angle adjustment model to obtain a target facial image corresponding to the facial image to be processed.
  • the target line-of-sight angle adjustment model is used to adjust the line-of-sight angle of the user in the facial image to the target angle;
  • each collected facial image to be processed can be processed, and the obtained target facial image can be sent to other clients in the form of a multimedia data stream.
  • the captured video is more flexible.
  • each viewing user can see the image whose line of sight is always focused to the target line of sight angle, improving the viewing experience of the user.
  • the target sight angle adjustment model before processing the facial image to be processed based on the target sight angle adjustment model, can be trained first, so that the target sight angle adjustment model can be used to process the pending facial image collected by the camera device. Processing, so as to obtain the target facial image with the focus of the line of sight, and send the target facial image to at least one client, so that the image of the target user after the focus of the line of sight is watched by each user can obtain a more interactive video stream .
  • Fig. 4 is a schematic flow chart of a method for adjusting a line of sight angle provided by Embodiment 3 of the present disclosure.
  • the corresponding training model can be generated based on the target sample generation model. Samples, correspondingly, before obtaining the training samples, the target sample generation model can be trained first.
  • technical terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.
  • the method includes:
  • the Gaussian distribution vector and the original non-orthogonal sample image collected in advance are input into the non-target line of sight angle image generation sub-model to be trained, and an error value is obtained; based on the error value and the non-target line of sight angle image to be trained, the sub-model is generated
  • the loss function of the non-target line-of-sight angle image generation sub-model to be trained is corrected; the loss function convergence is used as the training target to obtain the non-target line-of-sight angle image generation sub-model based on the The non-target line-of-sight angle image generation sub-model generates non-target line-of-sight angle images in the training samples.
  • the pre-collected Gaussian distribution vector and the original non-orthogonal sample image are input into the non-target line of sight angle image generation sub-model to be trained, and the error value is obtained, including:
  • the Gaussian distribution vector is processed to obtain an image to be compared; based on the discriminator in the non-target line-of-sight angle image generation sub-model to be trained, the discriminator The original non-orthoscopic sample image and the image to be compared are processed to obtain the error value.
  • a Gaussian distributed vector can be randomly sampled noise.
  • the face image of the user can be collected when the user is not looking straight up, and the original sample image of not looking straight up can be obtained.
  • the model parameters in the non-target line-of-sight angle image generation sub-model to be trained are default parameter values.
  • the Gaussian distribution vector and the original non-orthogonal sample image can be used as the input of the non-target line-of-sight angle image generation sub-model to be trained to obtain the actual output result, that is, the actual output image. According to the actual output image and the original non-orthoscopic sample image, the error value can be obtained.
  • the model parameters in the sub-model can be corrected.
  • the convergence of the loss function can be used as the training goal to obtain the non-target line-of-sight angle image generation sub-model.
  • the confrontation mode training can be: the non-target line-of-sight angle image generation sub-model includes a generator and a discriminator.
  • the generator is used to process the Gaussian distribution vector to generate the corresponding image.
  • the discriminator is used to determine the similarity between the generated image and the original image, so as to adjust the model parameters in the generator and the discriminator according to the error until the training of the non-target line-of-sight angle image generation sub-model is completed.
  • the generator in the non-target line-of-sight angle image generation sub-model processes the Gaussian distribution vector to obtain the image to be compared corresponding to the Gaussian distribution vector.
  • the image to be compared and the original non-orthoscopic sample image can be input into the discriminator, and the discriminator can perform discriminative processing on the two images to obtain an output result.
  • the model parameters in the generator and discriminator can be modified.
  • the obtained model can be used as a non-target line-of-sight angle image to generate a sub-model.
  • the target line-of-sight angle generation sub-model can be trained. For example, obtain the model parameters in the non-target line-of-sight angle image generation sub-model, and multiplex the model parameters into the target line-of-sight angle image generation sub-model to be trained; The target line-of-sight angle image generation sub-model to be trained is trained to obtain the target line-of-sight angle image generation sub-model.
  • the image generation sub-model of the target line-of-sight angle to be trained is also trained based on the confrontation method. That is, the sub-model also includes the generator and discriminator.
  • the functions of the generator and the discriminator on the above sub-models are the same, and the training method to obtain the sub-model of the target line-of-sight angle image generation is the same as the method of obtaining the non-target line-of-sight angle image generation sub-model, and will not be repeated here.
  • the model parameters in the non-target line-of-sight angle image generation sub-model can be reused as training Obtain the initial model parameters of the image generation sub-model of the target line-of-sight angle.
  • the target line-of-sight angle image generation sub-model and the non-target line-of-sight angle image generation sub-model as a whole can be used as the target sample generation model. It is also possible to package the sub-model of image generation at target sight angle and the sub-model of image generation at non-target sight angle, so that two images can be output according to the input, and the user's sight angles in the two images are different at this time.
  • the general problem of training the model is that a large number of samples need to be collected, and sample collection is difficult to a certain extent. For example, in this embodiment, a large number of images of users at the target line of sight angle and images at non-target line of sight angles are collected. There are samples Collection difficulties and inconsistent standards. Based on this technical solution, random sampling noise can be directly processed to obtain images under different viewing angles of the same user, thereby obtaining corresponding samples, which improves the convenience and versatility of determining samples, and further improves the convenience of training models.
  • multiple Gaussian distribution vectors are processed sequentially to obtain target line-of-sight angle images and non-target line-of-sight angle images in the training samples.
  • S350 Collect the to-be-processed facial image of the target user.
  • S360 Process the to-be-processed facial image based on the target line-of-sight angle adjustment model to obtain a target facial image corresponding to the to-be-processed facial image.
  • the target line-of-sight angle adjustment model is used to adjust the line-of-sight angle of the user in the facial image to the target angle;
  • the target sample generation model obtained through pre-training can process random sampling noise to obtain a large number of training samples for training the target line-of-sight angle adjustment model, which improves the convenience and uniformity of obtaining training samples. Effect.
  • FIG. 5 is a schematic structural diagram of a line-of-sight angle adjustment device provided by Embodiment 4 of the present disclosure.
  • the device includes: an image acquisition module 410 , an image processing module 420 and an image display module 430 .
  • the image acquisition module 410 is configured to collect the facial image to be processed of the target user;
  • the image processing module 420 is configured to process the facial image to be processed based on the target line-of-sight angle adjustment model to obtain a facial image corresponding to the facial image to be processed.
  • the target line-of-sight angle adjustment model is used to adjust the user's line-of-sight angle in the facial image to the target line-of-sight angle;
  • the image display module 430 is configured to display the target face image to at least one client .
  • the image adopting module 410 is configured to collect the facial image to be processed of the target user among the at least one user when at least one user interacts based on the real-time interactive interface; or, when detected When a preset event is triggered, the facial image to be processed of the target user is collected based on the camera module.
  • the image acquisition module 410 is configured to determine the current speaking user and use the current speaking user as the target user when at least one user interacts based on the real-time interactive interface; The pending facial image of the target user.
  • the real-time interactive interface includes a voice broadcast interactive interface, a live video interactive interface or a group chat interactive interface.
  • the image processing module 420 is configured to determine whether the line of sight feature in the facial image to be processed matches the preset line of sight feature based on the feature detection module; if the line of sight in the facial image to be processed If the features do not match the preset line-of-sight features, the face image to be processed is processed based on the target line-of-sight angle adjustment model to obtain the target face image.
  • the image processing module 420 is configured to input the facial image to be processed into the target sight angle adjustment model to obtain the target facial image; wherein, in the target facial image The view angle of is different from the view angle in the image to be processed.
  • the image display module 430 is configured to send the multimedia data stream corresponding to the target facial image to at least one client associated with the target user for display.
  • the device also includes: a model training module, configured to obtain a training sample set; wherein, the training sample set includes a plurality of training samples, and each training sample includes a target line of sight angle image and a non- The target line of sight angle image, the training sample is determined based on the target sample generation model obtained through pre-training; for each training sample, the non-target line of sight angle image in the current training sample is input into the line of sight angle adjustment model to be trained to obtain The actual output image corresponding to the current training sample; according to the actual output image of the current training sample and the target line of sight angle image, determine the loss value, and adjust the preset loss of the model based on the loss value and the line of sight angle to be trained function to adjust the model parameters of the line-of-sight angle adjustment model to be trained; the convergence of the preset loss function of the line-of-sight angle adjustment model to be trained is used as a training target to obtain the target line-of-sight angle adjustment model.
  • a model training module configured to obtain a training sample
  • the device also includes: the sample model generation module is set to obtain the non-target line-of-sight angle image generation sub-model in the target sample generation model through training in the following manner: the pre-collected Gaussian distribution vector and The original non-orthogonal sample image is input into the non-target line-of-sight angle image generation sub-model to be trained, and an error value is obtained; based on the error value and the loss function in the non-target line-of-sight angle image generation sub-model to be trained, the to-be-trained Training the model parameters in the non-target line-of-sight angle image generation sub-model is corrected; the loss function convergence in the non-target line-of-sight angle image generation sub-model to be trained is used as the training target to obtain the non-target line-of-sight angle image generation sub-model, The non-target line-of-sight angle images in the training samples are generated based on the non-target line-of-sight angle image generation sub-model.
  • the sample model generation module is set to input the pre-collected Gaussian distribution vector and the original non-orthogonal sample image into the non-target line-of-sight angle image generation sub-model to be trained in the following manner to obtain the error value:
  • the generator in the non-target line-of-sight angle image generation sub-model to be trained processes the Gaussian distribution vector to obtain an image to be compared;
  • the non-orthoscopic sample image and the image to be compared are processed to obtain the error value.
  • the sample model generation module is also configured to obtain the target line of sight angle image generation sub-model in the target sample generation model through training in the following manner: obtain the non-target line of sight angle image generation sub-model Model parameters, and the model parameters are multiplexed into the target line of sight angle image generation sub-model to be trained; based on the pre-collected Gaussian distribution vector and the original orthographic sample image, the target line of sight angle image generation sub-model for training is trained, Obtaining the sub-model for generating the target line-of-sight angle image, so as to generate the target line-of-sight angle image in the training samples based on the target line-of-sight angle image generation sub-model.
  • the user's line of sight in the facial image to be processed is focused on the target by adjusting the model based on the pre-trained target sight angle adjustment model to process the facial image to be processed. view angle, and display the target facial image focused on the target view angle to other clients, which solves the problem in related technologies that when the user makes a voice broadcast based on the teleprompter, there is a deviation of the line of sight or the line of sight is not focused, which leads to poor interaction effects
  • the best problem is that when the target user interacts with other users through the terminal, the user's line of sight can be automatically focused to the target line of sight angle.
  • the target line of sight angle can be the angle perpendicular to the user's terminal camera device, thereby improving the interaction between the target user and other users. The technical effect of interaction efficiency on user interactions.
  • the line-of-sight angle adjustment device provided by the embodiments of the present disclosure can execute the line-of-sight angle adjustment method provided by any embodiment of the present disclosure, and has corresponding functional modules and effects for executing the method.
  • the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, the names of multiple functional units are only for the convenience of distinguishing each other , and are not intended to limit the protection scope of the embodiments of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by Embodiment 5 of the present disclosure.
  • the terminal device 500 in the embodiment of the present disclosure may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (PAD), portable multimedia players (Portable Media Player, PMP), mobile terminals such as vehicle-mounted terminals (eg, vehicle-mounted navigation terminals), and fixed terminals such as digital televisions (Television, TV), desktop computers, and the like.
  • the electronic device 500 shown in FIG. 6 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 500 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 508 is loaded into the program in the random access memory (Random Access Memory, RAM) 503 to execute various appropriate actions and processes.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the electronic device 500 are also stored.
  • the processing device 501, ROM 502, and RAM 503 are connected to each other through a bus 504.
  • An edit/output (Input/Output I/O) interface 505 is also connected to the bus 504.
  • an input device 506 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; including, for example, a liquid crystal display ((Liquid Crystal Display, LCD ), the output device 507 of speaker, vibrator etc.; comprise the memory device 506 such as magnetic tape, hard disk etc.; 6 shows the electronic device 500 having various means, and it is not required to implement or have all the means shown. More or fewer means may be implemented or provided instead.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 509, or from storage means 508, or from ROM 502.
  • the processing device 501 When the computer program is executed by the processing device 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the electronic device provided by the embodiment of the present disclosure belongs to the same idea as the line-of-sight angle adjustment method provided by the above-mentioned embodiment.
  • the technical details not described in detail in this embodiment can be found in the above-mentioned embodiment, and this embodiment has the same features as the above-mentioned embodiment. Effect.
  • Embodiment 6 of the present disclosure provides a computer storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for adjusting the viewing angle provided in the foregoing embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Examples of computer readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • the program code contained on the computer readable medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), and can communicate with digital data in any form or medium
  • the communication eg, communication network
  • Examples of communication networks include local area networks (Local Area Network, LAN), wide area networks (Wide Area Network, WAN), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently existing networks that are known or developed in the future.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device collects the target user's facial image to be processed;
  • the facial image to be processed is processed to obtain a target facial image corresponding to the facial image to be processed; wherein, the target line-of-sight angle adjustment model is used to adjust the user's line-of-sight angle in the facial image to a target angle;
  • the target The facial image is displayed to at least one client.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Included are conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a LAN or WAN, or may be connected to an external computer (for example via the Internet using an Internet Service Provider).
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware.
  • the name of the unit does not constitute a limitation of the unit itself in one case, for example, the image display module can also be described as "a module for displaying the target facial image to at least one client".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (Field Programmable Gate Arrays, FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Parts, ASSP), System on Chip (System on Chip, SOC), Complex Programmable Logic Device (Complex Programmable Logic Device, CPLD) and so on.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard drives, RAM, ROM, EPROM or flash memory, optical fibers, CD-ROMs, optical storage devices, magnetic storage devices, or Any suitable combination of the above.
  • Example 1 provides a line of sight angle adjustment method, the method including:
  • the facial image to be processed based on the target sight angle adjustment model to obtain a target facial image corresponding to the facial image to be processed; wherein, the target sight angle adjustment model is used to adjust the user's sight angle in the facial image Adjust to target angle;
  • Example 2 provides a line of sight angle adjustment method, the method including:
  • the facial image to be processed of the collected target user includes:
  • Example 3 provides a line of sight angle adjustment method, which includes:
  • collecting the to-be-processed facial image of the target user among the at least one user includes:
  • the at least one user interacts based on the real-time interactive interface, determine the current speaking user and use the current speaking user as the target user;
  • the to-be-processed facial image of the target user is collected based on the camera module.
  • Example 4 provides a line of sight angle adjustment method, the method including:
  • the real-time interactive interface includes a voice broadcast interactive interface, a live video interactive interface or a group chat interactive interface.
  • Example 5 provides a line of sight angle adjustment method, the method including:
  • the target line-of-sight angle adjustment model obtained based on pre-training processes the facial image to be processed, and obtains the target facial image corresponding to the facial image to be processed, it also includes:
  • the facial image to be processed is processed based on the target line-of-sight angle adjustment model to obtain the target facial image.
  • Example 6 provides a line of sight angle adjustment method, the method including:
  • the processing of the facial image to be processed based on the target sight angle adjustment model to obtain the target facial image includes:
  • Example 7 provides a line of sight angle adjustment method, the method including:
  • the displaying the target facial image to at least one client includes:
  • Example 8 provides a method for adjusting a line of sight angle, the method including:
  • the training sample set includes a plurality of training samples, each training sample includes a target line of sight angle image and a non-target line of sight angle image, and the training sample is determined based on a pre-trained target sample generation model of;
  • the non-target line of sight angle image in the current training sample is input into the line of sight angle adjustment model to be trained, and the actual output image corresponding to the current training sample is obtained;
  • the convergence of the preset loss function of the line-of-sight angle adjustment model to be trained is used as a training target to obtain the target line-of-sight angle adjustment model.
  • Example 9 provides a line of sight angle adjustment method, the method including:
  • Training obtains the non-target line-of-sight angle image generation sub-model in the target sample generation model, including:
  • the model parameters in the non-target line-of-sight angle image generation sub-model to be trained are corrected
  • the loss function convergence in the non-target line-of-sight angle image generation sub-model to be trained is used as the training target to obtain the non-target line-of-sight angle image generation sub-model, so as to generate the training sub-model based on the non-target line-of-sight angle image generation sub-model Non-target line-of-sight angle image in the sample.
  • Example 10 provides a line of sight angle adjustment method, the method including:
  • the Gaussian distribution vector collected in advance and the original non-orthogonal sample image are input into the non-target line-of-sight angle image generation sub-model to be trained, and the error value is obtained, including:
  • the original non-orthogonal sample image and the image to be compared are processed to obtain the error value.
  • Example Eleven provides a method for adjusting a sight angle, the method including:
  • the target line-of-sight angle image generation submodel in the target sample generation model obtained through training includes:
  • the target line of sight angle image generation sub-model to be trained is trained to obtain the target line of sight angle image generation sub-model, to generate based on the target line of sight angle image generation sub-model The target line-of-sight angle image in the training sample.
  • Example 12 provides a line of sight angle adjustment method, the method comprising:
  • a plurality of Gaussian distribution vectors are respectively input into the sub-model of target line-of-sight angle image generation and the sub-model of non-target line-of-sight angle image generation, so as to obtain the normal line-of-sight sample images and the non-front-line line of sight sample images in the training samples.
  • Example 13 provides a sight angle adjustment device, which includes:
  • An image collection module configured to collect the facial image to be processed of the target user
  • the image processing module is configured to process the facial image to be processed based on the target line-of-sight angle adjustment model to obtain a target facial image corresponding to the target line-of-sight angle adjustment model; wherein, the target line-of-sight angle adjustment model is used to The user's line of sight angle in the image is adjusted to the target angle;
  • the image display module is configured to display the target facial image to at least one client.

Abstract

本文公开了一种视线角度调整方法、装置、电子设备及存储介质。该视线角度调整方法包括:采集目标用户的待处理面部图像;基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;将所述目标面部图像展示至至少一个客户端。

Description

视线角度调整方法、装置、电子设备及存储介质
本申请要求在2021年08月31日提交中国专利局、申请号为202111013443.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,例如涉及一种视线角度调整方法、装置、电子设备及存储介质。
背景技术
随着智能终端和互联网技术的发展,越来越多的主播用户通过智能终端与其他用户进行互动。
为了提高互动效率,可以将待播报的文本通过提词器进行展示,以便主播用户可以基于提词器上的内容与其他用户互动。
但是,在主播用户使用提词器的过程中,很容易出现视线偏移以及不聚焦的情形,极大的降低了互动效果。
发明内容
本公开提供一种视线角度调整方法、装置、电子设备及存储介质,以实现在根据采集到的图像确定用户视线角度与目标视线角度不一致的情况下,将其调整至目标视线角度,得到目标面部图像,并将目标面部图像发送至至少一个客户端,从而提高互动效率的技术效果。
本公开提供了一种视线角度调整方法,该方法包括:
采集目标用户的待处理面部图像;
基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
将所述目标面部图像展示至至少一个客户端。
本公开还提供了一种视线角度调整装置,该装置包括:
图像采集模块,设置为采集目标用户的待处理面部图像;
图像处理模块,设置为基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目 标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
图像展示模块,设置为将所述目标面部图像展示至至少一个客户端。
本公开还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上所述的视线角度调整方法。
本公开还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如上所述的视线角度调整方法。
本公开还提供了一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行上述的视线角度调整方法的程序代码。
附图说明
图1为本公开实施例一所提供的一种视线角度调整方法的流程示意图;
图2为本公开实施例一所提供的一种视线角度调整的结果示意图;
图3为本公开实施例二所提供的一种视线角度调整方法的流程示意图;
图4是本公开实施例三所提供的一种视线角度调整方法的流程示意图;
图5为本公开实施例四所提供的一种视线角度调整装置的结构示意图;
图6为本公开实施例五所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而本公开可以通过多种形式来实现,而且不应该被解释为限于这里阐述的实施例,本公开的附图及实施例仅用于示例性作用。
开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施 例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
实施例一
图1为本公开实施例一所提供的一种视线角度调整方法的流程示意图,本公开实施例可适用于在互联网所支持的实时互动应用场景中,也可以应用在非实时互动的场景中,用于将目标用户的视线聚焦到一个特殊角度的情形中。该方法可以由视线角度调整装置来执行,该装置可以通过软件和/或硬件的形式实现,例如,通过电子设备来实现,该电子设备可以是移动终端、个人电脑(Personal Computer,PC)端或服务端等。实时互动应用场景通常可由客户端和服务端来配合实现,本实施例所提供的方法,可以由客户端来执行,也可以由服务端来执行,或由两者配合执行。
在介绍本技术方案之前,可以先对应用场景进行示例性说明。本公开实施例提供的方法可以集成在任意应用程序或者移动终端上。如果本公开实施例提供的方法是集成在应用程序中,当应用程序启动时,可以自动加载集成的视线角度调整方法,并在获取用户的面部图像信息时,基于视线角度调整方法对面部图像进行处理。如果本公开实施例提供的方法是集成在终端设备上,可以将视线聚焦方法作为后台运行的方法,在采集到面部图像信息时,可以基于该视线角度调整方法对面部图像进行处理。也就是说,只要采集面部图像信息就可以采用本公开的技术方案进行视线聚焦。
在应用中,用户也可以根据实际需求设置是否调用该视线角度调整方法。用户可以手动设置是否需要调用视线角度调整方法,如果用户手动设置调用视线角度调整方法,则可以在采集到面部图像时,就将面部图像中的视线聚焦至目标角度,如果用户未手动设置调用视线角度调整方法,则在采集到面部图像时,可以不用对面部图像中的视线进行任何处理。
本公开实施例所提供的技术方案可以应用在实时互动场景中,例如,直播、视频会议等。在直播场景中主播用户可以通过终端设备与其他用户进行互动,在互动的过程中,在将目标用户的多媒体数据流下发至其他客户端时,可以将 与目标用户相对应的待处理面部图像中的视线角度调整至目标视线角度,得到目标面部图像,以使其他用户可以通过客户端观看到视线角度调整至目标角度的目标用户。如,可以基于本技术方案对主播用户进行视线角度调整,以使其他用户观看到视线角度总是在目标角度的目标用户。如果应用在非实时互动场景中,例如,基于摄像机为用户拍照时,可以基于本技术方案,将被拍摄用户的视线角度调整至目标角度。
如图1所示,所述方法包括:
S110、采集目标用户的待处理面部图像。
可以在终端内部集成有视线角度调整方法,或者终端上安装的一个应用程序集成有该视线角度调整方法。当用户基于该终端上的摄像装置拍摄其面部图像时,将与面部图像相一致的用户作为目标用户。
在检测到用户触发目标应用程序A,并触发拍摄控件时,可以调取终端上的摄像装置拍摄用户的面部图像,并将与面部图像相对应的用户作为目标用户。相应的,将拍摄得到的面部图像作为待处理面部图像。
示例性的,用户B触发目标应用程序A进入主页面后,可以触发主页面上的拍摄控件,此时可以调用摄像装置,以基于摄像装置拍摄包括用户B的人脸图像,并将此人脸图像作为待处理面部图像,相应的,用户B为目标用户。
在本实施例中,采集目标用户的待处理面部图像,包括:在至少一个用户基于实时互动界面进行互动时,采集至少一个用户中的目标用户的待处理面部图像。
实时互动界面为实时互动应用场景中的任意交互界面。实时互动场景可通过互联网和计算机手段实现,例如,通过原生程序或web程序等实现的交互应用程序。实时互动应用场景可以是直播场景、视频会议场景、语音播报场景以及录播视频的场景。直播场景可以包括应用程序中的卖货直播,以及基于直播平台进行直播的场景;语音播报场景可以是电视台中的主播在播报相应内容,基于摄像机可以将主播播报的多媒体数据流下发至至少一个客户端的场景。在考虑到节省资源的情形下,可以周期性的采集目标用户的待处理面部图像。为了提高对面部图像的处理精度,也可以选择实时采集目标用户的待处理面部图像。
在基于实时互动界面进行互动时,可以周期性的或者实时的采集目标用户的待处理面部图像,进而对待处理面部图像进行处理,以得到与待处理面部图像相对应的目标面部图像。
示例性的,实时互动界面是基于在线视频播报的场景生成的。视频播报中 包括主播,以及观看主播播报的观看用户。在主播基于预先设置的播报文本进行播报时,摄像装置可以实时或者每隔几秒例如5秒采集一次主播的面部图像,得到待处理面部图像。
实时互动场景中的主播用户可以是一个也可以是多个,例如,在一个直播间中有两个主播,两个主播为配合关系,此种情况下一个主播为主要面对观看用户,另一个主播主要起配合作用,那么起配合作用的主播其视线可能并不是用户关注的,在此种情况下可以主要对面对观看用户的主播用户进行视线角度调整。例如,在主播用户直播之前,可以预先设置主主播用户和从主播用户,在拍摄到包括两个主播用户的面部图像时,可以仅对主主播用户进行视线角度调整;也可以是,在视频画面中不包括主主播用户时,可以不用对采集的面部图像进行视线角度调整。在同一直播间中有多个主播的情形下,也可以将全部的主播作为目标用户,此时只要采集到目标用户的待处理面部图像就对其视线角度进行调整处理。
在本实施例中,所述在至少一个用户基于实时互动界面进行互动时,采集至少一个用户中的目标用户的待处理面部图像,包括:在至少一个用户基于实时互动界面进行互动时,确定当前发言用户并将所述发言用户作为目标用户;基于摄像模块采集所述目标用户的待处理面部图像。
在应用场景中,如视频会议场景中,可以将每个参会用户都作为目标用户;为了提高视频直播的趣味性和可观看性,多个主播可能出现连麦的情况,此时多个连麦用户为目标用户。此种情况下都可以采用本技术方案对每个目标用户进行面部图像采集,进而对面部图像中的视线角度调整处理,进而将聚焦后的目标面部图像以多媒体数据流的形式下发至至少一个客户端,从而使观看用户观看到的目标用户的视线均是经过视线角度调整过的。
在实时互动界面中包括多个互动用户的场景下,可以实时确定发言用户,并将发言用户作为目标用户。当检测到当前设备所对应的用户为目标用户时,可以将与目标用户所对应的摄像装置采集的面部图像作为待处理面部图像。
在上述技术方案的基础上,采集目标用户的待处理面部图像,可以是:当检测到触发预设事件时,基于摄像模块采集目标用户的待处理面部图像。
预设事件可以是触发唤醒词、触发视线调整控件、检测到显示屏幕前出现有用户,则都可以确定其触发了预设事件。
在实际应用中,当检测到触发上述预设事件中的一种或者多种时,就可以认为目标用户开启了视线角度调整模型,可以将摄像装置采集的面部图像作为待处理面部图像。
S120、基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像。
在基于摄像装置拍摄用户的面部图像时,面部图像通常包括用户的五官特征,在本实施例中对用户的视线特征是比较关注的。在实际应用中,当用户正对着终端的显示屏幕时,终端上设置的摄像装置拍摄出的面部图像会存在一定的视线角度差异,导致摄像装置拍摄出的面部图像中的用户的视线角度是非正视的,容易引起互动用户体验不佳问题,或者是,如果应用在语音播报的场景中,用户注视提词器中的内容进行播报,当用户看提词器的内容时,会出现视线角度偏差,导致观看播报的用户体验不佳。
目标视线角度调整模型为预先训练的,用于将面部图像中用户的视线角度调整至目标角度的模型。目标面部图像是经过目标视线角度调整模型对待处理面部图像中的视线角度调整至目标角度后得到的图像。即,目标面部图像中用户的视线角度是调整至预先设置的目标角度的。目标角度可以是用户的视线与显示屏幕垂直的角度,即用户的视线正视显示屏幕的角度。目标角度可以是预先设置的任意角度。为了提高主播用户与其他用户之间的交互效率,目标角度可以是目标用户的视线与摄像装置在一条水平线的角度。
只要采集到待处理面部图像,就可以将待处理面部图像输入至目标视线角度调整模型中进行视线角度调整处理,以将待处理面部图像中的视线角度调整至目标角度,即,可以将非正视的视线角度或者正视的视线角度调整至目标角度。
通常,待处理面部图像中用户的视线角度可能与目标角度相一致,也可能与目标角度不一致,为了节省处理资源,可以在获取到待处理面部图像之后,预先确定待处理面部图像中的视线角度是否与目标角度一致。
一实施例中,基于特征检测模块,确定待处理面部图像中的视线特征是否与预设视线特征相匹配;若待处理面部图像中的视线特征不与预设视线特征相匹配,则基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像。
特征检测模块用于检测用户的视线特征,主要用于确定用户的视线角度是否与目标角度相一致。预设视线特征是与目标角度相匹配的特征。预设视线特征可以是眼睑、瞳孔等特征,例如,瞳孔是否处于眼睛的中央位置等。
在获取到待处理面部图像后,可以基于特征检测模块对待处理面部图像进行处理,确定待处理面部图像中的视线特征是否与预设视线特征相匹配,如果待处理面部图像中的视线特征与预设视线特征不一致,则说明目标用户的视线 角度与目标角度不一致,此时,可以基于目标视线角度调整模型对待处理面部图像进行处理。
目标视线角度调整模型是将待处理面部图像中用户的视线角度调整至目标角度的模型,因此基于目标视线角度调整模型输出的图像是与目标角度相一致的目标面部图像。此时,目标面部图像中的用户的视线特征是与待处理面部图像中的用户的视线特征不同的,其他面部图像特征是完全相同的。
所述基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像,包括:将所述待处理面部图像输入至所述目标视线角度调整模型中,得到所述目标面部图像;其中,所述目标面部图像中的视线角度与所述待处理图像中的视线角度相异。
S130、将所述目标面部图像展示至至少一个客户端。
至少一个客户端的数量可以是一个也可以是多个。客户端可以是目标用户所属的客户端,也可以是与目标用户相关联的客户端。例如,应用场景为直播场景,待处理面部图像为主播用户的面部图像,目标面部图像可以是对待处理面部图像中的视线角度调整至目标视线角度之后得到的图像。客户端可以是观看直播的每个观看用户的客户端,即在确定与主播相对应的目标面部图像后,可以将目标面部图像以数据流的形式下发至每个观看该直播的用户,同时,也可以将目标面部图像呈现在目标用户所属的目标客户端上。
所述将所述目标面部图像展示至至少一个客户端,包括:将与所述目标面部图像相对应的多媒体数据流发送至与所述目标用户相关联的至少一个客户端上进行展示。
在将目标用户的待处理图像转换为目标面部图像后,将与目标面部图像相对应的多媒体数据流下发至其他相关联用户的客户端,以使其他用户观看到的是视线正视的目标用户,从而提高了与目标用户交互的效果。
为了了解本技术方案达到的技术效果,可以参见如图2所示的示意图,当检测到的待处理面部图像中的用户视线角度与目标视线角度不一致时,即待处理面部图像为非正视图像,那么可以将待处理面部图像输入至预先训练好的目标视线角度调整模型中,得到如图2所示的正视图像,该正视图像中的视线角度与目标视线角度相一致。将正视图像所对应的数据流下发至观看目标用户直播的客户端,同时,也将其显示在目标用户自己的客户端上。
本公开实施例的技术方案,通过在采集到目标用户的待处理面部图像时,基于预先训练得到的目标视线角度调整模型对待处理面部图像进行处理,将待处理面部图像中的用户视线聚焦至目标视线角度,并将聚焦至目标视线角度的 目标面部图像展示至其他客户端,解决了相关技术中基于语音播报时,存在视线偏移或者视线不聚焦,从而导致互动效果不佳的问题,实现了在目标用户与其他用户通过终端互动时,可以自动将用户的视线聚焦至目标视线角度,从而提高目标用户与其他互动用户互动的互动效率的技术效果。
实施例二
图3为本公开实施例二所提供的一种视线角度调整方法的流程示意图,在前述实施例的基础上,在基于目标视线角度调整模型对待处理面部图像进行处理之前,可以先训练得到目标视线角度调整模型,其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图3所示,所述方法包括:
S210、获取训练样本集。
在训练得到目标视线角度调整模型之前,需要先获取训练样本,以基于训练样本来训练。为了提高模型的准确性,可以尽可能多而丰富的获取训练样本。
所述训练样本集中包括多个训练样本,每个训练样本中包括目标视线角度图像和非目标视线角度图像,所述训练样本是基于预先训练得到的目标样本生成模型确定的。其中,目标视线角度图像中用户的视线角度与预设视线角度相一致。非目标视线角度图像为用户的视线与目标视线角度不一致的面部图像。目标样本生成模型可以理解为生成训练样本的模型。
可以先训练得到一个目标样本生成模型。目标样本生成模型中包括正样本生成子模型和负样本生成子模型。正样本生成子模型,用于生成训练样本中目标视线角度图像。该目标视线角度图像中的用户的视线角度与目标视线角度相一致。相应的,负样本生成子模型,用于生成训练样本中非目标视线角度图像,该非目标视线角度图像中的用户的视线角度与目标视线角度不一致。
S220、针对每个训练样本,将当前训练样本中的非目标视线角度图像输入至待训练视线角度调整模型中,得到与所述当前训练样本相对应的实际输出图像。
可以依据训练样本集中的每个训练样本对待训练视线角度调整模型进行训练,以得到目标视线角度调整模型。可以将训练样本中每个非目标视线角度图像作为待训练视线角度调整模型的输入,将与非目标视线角度图像相对应的目标视线角度图像与待训练视线角度调整模型的输出比较,来调整待训练视线角度调整模型中的模型参数。当检测到待训练视线角度调整模型中的损失函数收敛时,则确定训练得到目标视线角度调整模型。
S230、根据当前训练样本的实际输出图像和目标视线角度图像,确定损失值,并基于所述损失值和所述待训练视线角度调整模型的预设损失函数,调整所述待训练视线角度调整模型的模型参数。
S240、将所述预设损失函数收敛作为训练目标,得到所述目标视线角度调整模型。
S250、采集目标用户的待处理面部图像。
S260、基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像。
所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
S270、将所述目标面部图像展示至至少一个客户端。
如果是实时互动场景,则可以对采集的每个待处理面部图像进行处理,并将得到的目标面部图像以多媒体数据流的形式下发至其他客户端,一是使拍摄出的视频更具有灵动性和互动性,二是可以使每个观看用户可以看到视线总聚焦至目标视线角度的图像,提高用户的观看体验。
本公开实施例的技术方案,在基于目标视线角度调整模型对待处理面部图像进行处理之前,可以先训练得到目标视线角度调整模型,从而基于目标视线角度调整模型对摄像装置采集的待处理面部图像进行处理,从而得到视线聚焦的目标面部图像,并将目标面部图像发送至至少一个客户端,以使每个用户观看到的为视线聚焦之后的目标用户的图像,得到了更具有互动性的视频流。
实施例三
图4为本公开实施例三所提供的一种视线角度调整方法的流程示意图,在前述实施例的基础上,在训练得到目标视线角度调整模型之前,可以基于目标样本生成模型来生成相应的训练样本,相应的,在得到训练样本之前,可以先训练得到目标样本生成模型。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图4所示,所述方法包括:
S310、训练得到所述目标样本生成模型中的非目标视线角度图像生成子模型。
将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值;基于所述误差值和所述待训练非目标 视线角度图像生成子模型中的损失函数,对所述待训练非目标视线角度图像生成子模型中的模型参数进行修正;将所述损失函数收敛作为训练目标,得到所述非目标视线角度图像生成子模型,以基于所述非目标视线角度图像生成子模型生成所述训练样本中的非目标视线角度图像。
在本实施例中,所述将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值,包括:
基于所述待训练非目标视线角度图像生成子模型中的生成器对所述高斯分布向量进行处理,得到待比较图像;基于所述待训练非目标视线角度图像生成子模型中的判别器对所述原始非正视样本图像和所述待比较图像进行处理,得到所述误差值。
高斯分布向量可以是随机采样噪声。可以在用户为非正视的情况下采集其面部图像,得到原始非正视样本图像。待训练非目标视线角度图像生成子模型中的模型参数为默认参数值。可以将高斯分布向量和原始非正视样本图像作为待训练非目标视线角度图像生成子模型的输入,得到实际输出结果,即实际输出图像。根据实际输出图像和原始非正视样本图像,可以得到误差值。基于该误差值和非正视目标视线角度图像生成子模型中的预设损失函数,可以对该子模型中的模型参数进行修正。可以将损失函数收敛作为训练目标得到非目标视线角度图像生成子模型。
在训练本技术方案公开的多个模型时,均可以采用对抗方式训练。对抗方式训练可以是:非目标视线角度图像生成子模型中包括生成器和判别器。生成器用于对高斯分布向量进行处理,生成相应的图像。判别器用于确定生成的图像和原始图像之间的相似度,以便根据误差调整生成器和判别器中的模型参数,直至非目标视线角度图像生成子模型训练完成。
非目标视线角度图像生成子模型中的生成器对高斯分布向量进行处理,可以得到与高斯分布向量相对应的待比较图像。同时,可以将待比较图像和原始非正视样本图像输入至判别器中,判别器可以对两幅图像进行判别处理,得到输出结果。根据输出结果可以修正生成器和判别器中的模型参数。在检测到该模型的损失函数收敛时,可以将得到的模型作为非目标视线角度图像生成子模型。
S320、获取非目标视线角度图像生成子模型中的模型参数,并将模型参数复用至待训练目标视线角度图像生成子模型中,基于预先采集的高斯分布向量和原始正视样本图像对待训练目标视线角度图像生成子模型进行训练,得到目标视线角度图像生成子模型。
在得到非目标视线角度生成子模型后,可以训练得到目标视线角度生成子模型。例如,获取所述非目标视线角度图像生成子模型中的模型参数,并将所述模型参数复用至待训练目标视线角度图像生成子模型中;基于预先采集的高斯分布向量和原始正视样本图像对所述待训练目标视线角度图像生成子模型进行训练,得到目标视线角度图像生成子模型。
此时待训练目标视线角度图像生成子模型也是基于对抗方式训练完成的。即该子模型中也包括生成器和判别器。生成器和判别器的作用于上述子模型的作用相同,训练得到目标视线角度图像生成子模型的方式与得到非目标视线角度图像生成子模型的方式相同,在此不再一一赘述。
为了提高训练得到目标视线角度图像生成子模型的便捷性,在非目标视线角度图像生成子模型训练完成后,可以复用该非目标视线角度图像生成子模型中的模型参数,以将其作为训练得到目标视线角度图像生成子模型的初始模型参数。
S330、将多个待训练高斯分布向量分别输入至目标视线角度图像生成子模型和非目标视线角度图像生成子模型中,得到训练样本中的目标视线角度图像和非目标视线角度图像。
可以将目标视线角度图像生成子模型和非目标视线角度图像生成子模型整体作为目标样本生成模型。也可以是将目标视线角度图像生成子模型和非目标视线角度图像生成子模型封装在一起,以根据输入,可以输出两幅图像,此时两幅图像中用户的视线角度不同。
训练模型的通用问题是,需要采集大量的样本,样本采集在一定程度上存在困难度,如,本实施例中采集大量用户在目标视线角度下的图像和非目标视线角度下的图像,存在样本采集困难和标准不统一的问题。基于本技术方案,可以直接对随机采样噪声进行处理,得到同一用户不同视线角度下的图像,从而得到相应的样本,提高了确定样本的便捷性和通用性,进而提高了训练模型的便捷性。
基于目标样本生成模型中的目标视线角度图像生成子模型和非目标视线角度图像生成子模型,依次对多个高斯分布向量进行处理,得到训练样本中的目标视线角度图像和非目标视线角度图像。
S340、基于多个训练样本训练得到目标视线角度调整模型。
S350、采集目标用户的待处理面部图像。
S360、基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像。
所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
S370、将所述目标面部图像展示至至少一个客户端。
本公开实施例的技术方案,通过预先训练得到的目标样本生成模型可以对随机采样噪声进行处理,以得到训练目标视线角度调整模型的大量训练样本,提高了获取训练样本便捷性和统一性的技术效果。
实施例四
图5为本公开实施例四所提供的一种视线角度调整装置的结构示意图,该装置包括:图像采集模块410、图像处理模块420以及图像展示模块430。其中,图像采集模块410,设置为采集目标用户的待处理面部图像;图像处理模块420,设置为基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整到目标视线角度;图像展示模块430,设置为将所述目标面部图像展示至至少一个客户端。
在上述技术方案的基础上,所述图像采用模块410,设置为在至少一个用户基于实时互动界面进行互动时,采集所述至少一个用户中的目标用户的待处理面部图像;或,当检测到触发预设事件时,基于摄像模块采集所述目标用户的待处理面部图像。
在上述技术方案的基础上,所述图像采集模块410,设置为在至少一个用户基于实时互动界面进行互动时,确定当前发言用户并将所述当前发言用户作为目标用户;基于摄像模块采集所述目标用户的待处理面部图像。
在上述技术方案的基础上,所述实时互动界面包括语音播报交互界面、视频直播交互界面或群聊交互界面。
在上述技术方案的基础上,所述图像处理模块420,设置为基于特征检测模块,确定待处理面部图像中的视线特征是否与预设视线特征相匹配;若所述待处理面部图像中的视线特征不与所述预设视线特征相匹配,则基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像。
在上述技术方案的基础上,所述图像处理模块420,设置为将所述待处理面部图像输入至所述目标视线角度调整模型中,得到所述目标面部图像;其中,所述目标面部图像中的视线角度与所述待处理图像中的视线角度相异。
在上述技术方案的基础上,所述图像展示模块430,设置为将与所述目标面 部图像相对应的多媒体数据流发送至与所述目标用户相关联的至少一个客户端上进行展示。
在上述技术方案的基础上,所述装置还包括:模型训练模块,设置为获取训练样本集;其中,所述训练样本集中包括多个训练样本,每个训练样本中包括目标视线角度图像和非目标视线角度图像,所述训练样本是基于预先训练得到的目标样本生成模型确定的;针对每个训练样本,将当前训练样本中的非目标视线角度图像输入至待训练视线角度调整模型中,得到与所述当前训练样本相对应的实际输出图像;根据当前训练样本的实际输出图像和目标视线角度图像,确定损失值,并基于所述损失值和所述待训练视线角度调整模型的预设损失函数,调整所述待训练视线角度调整模型的模型参数;将所述待训练视线角度调整模型的预设损失函数收敛作为训练目标,得到所述目标视线角度调整模型。
在上述技术方案的基础上,所述装置还包括:样本模型生成模块设置为通过如下方式训练得到所述目标样本生成模型中的非目标视线角度图像生成子模型:将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值;基于所述误差值和所述待训练非目标视线角度图像生成子模型中的损失函数,对所述待训练非目标视线角度图像生成子模型中的模型参数进行修正;将所述待训练非目标视线角度图像生成子模型中的损失函数收敛作为训练目标,得到所述非目标视线角度图像生成子模型,以基于所述非目标视线角度图像生成子模型生成所述训练样本中的非目标视线角度图像。
在上述技术方案的基础上,样本模型生成模块设置为通过如下方式将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值:基于所述待训练非目标视线角度图像生成子模型中的生成器对所述高斯分布向量进行处理,得到待比较图像;基于所述待训练非目标视线角度图像生成子模型中的判别器对所述原始非正视样本图像和所述待比较图像进行处理,得到所述误差值。
在上述技术方案的基础上,样本模型生成模块,还设置为通过如下方式训练得到所述目标样本生成模型中的目标视线角度图像生成子模型:获取所述非目标视线角度图像生成子模型中的模型参数,并将所述模型参数复用至待训练目标视线角度图像生成子模型中;基于预先采集的高斯分布向量和原始正视样本图像对所述待训练目标视线角度图像生成子模型进行训练,得到目标视线角度图像生成子模型,以基于所述目标视线角度图像生成子模型生成训练样本中的目标视线角度图像。
本公开实施例的技术方案,通过在采集到目标用户的待处理面部图像时,基于预先训练得到的目标视线角度调整模型对待处理面部图像进行处理,将待处理面部图像中的用户视线聚焦至目标视线角度,并将聚焦至目标视线角度的目标面部图像展示至其他客户端,解决了相关技术中在用户基于提词器进行语音播报时,存在视线偏移或者视线不聚焦,从而导致互动效果不佳的问题,实现了在目标用户与其他用户通过终端互动时,可以自动将用户的视线聚焦至目标视线角度,目标视线角度可以是用户与终端摄像装置垂直的角度,从而提高目标用户与其他互动用户互动的互动效率的技术效果。
本公开实施例所提供的视线角度调整装置可执行本公开任意实施例所提供的视线角度调整方法,具备执行方法相应的功能模块和效果。
上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
实施例五
图6为本公开实施例五所提供的一种电子设备的结构示意图。下面参考图6,其示出了适于用来实现本公开实施例的电子设备(例如图6中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备500可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图6示出的电子设备500仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(Read-Only Memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(RandomAccess Memory,RAM)503中的程序而执行多种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的多种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。编辑/输出(Input/Output I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器((Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507;包 括例如磁带、硬盘等的存储装置506;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有多种装置的电子设备500,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的视线角度调整方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
实施例六
本公开实施例六提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的视线角度调整方法。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算 机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:采集目标用户的待处理面部图像;基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;将所述目标面部图像展示至至少一个客户端。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执 行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定,例如,图像展示模块还可以被描述为“将所述目标面部图像展示至至少一个客户端的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM或快闪存储器、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种视线角度调整方法,该方法包括:
采集目标用户的待处理面部图像;
基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
将所述目标面部图像展示至至少一个客户端。
根据本公开的一个或多个实施例,【示例二】提供了一种视线角度调整方法,该方法包括:
所述采集目标用户的待处理面部图像,包括:
在至少一个用户基于实时互动界面进行互动时,采集所述至少一个用户中的目标用户的待处理面部图像;或,当检测到触发预设事件时,基于摄像模块采集所述目标用户的待处理面部图像。
根据本公开的一个或多个实施例,【示例三】提供了一种视线角度调整方法,该方法包括:
所述在至少一个用户基于实时互动界面进行互动时,采集所述至少一个用户中的目标用户的待处理面部图像,包括:
在所述至少一个用户基于实时互动界面进行互动时,确定当前发言用户并将所述当前发言用户作为所述目标用户;
基于摄像模块采集所述目标用户的待处理面部图像。
根据本公开的一个或多个实施例,【示例四】提供了一种视线角度调整方法,该方法包括:
所述实时互动界面包括语音播报交互界面、视频直播交互界面或群聊交互界面。
根据本公开的一个或多个实施例,【示例五】提供了一种视线角度调整方法,该方法包括:
在所述基于预先训练得到的目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像之前,还包括:
基于特征检测模块,确定所述待处理面部图像中的视线特征是否与预设视线特征相匹配;
若所述待处理面部图像中的视线特征不与预设视线特征相匹配,则基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像。
根据本公开的一个或多个实施例,【示例六】提供了一种视线角度调整方法,该方法包括:
所述基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像,包括:
将所述待处理面部图像输入至所述目标视线角度调整模型中,得到所述目标面部图像;其中,所述目标面部图像中的视线角度与所述待处理图像中的视线角度相异。
根据本公开的一个或多个实施例,【示例七】提供了一种视线角度调整方法,该方法包括:
所述将所述目标面部图像展示至至少一个客户端,包括:
将与所述目标面部图像相对应的多媒体数据流发送至与所述目标用户相关联的至少一个客户端上进行展示。
根据本公开的一个或多个实施例,【示例八】提供了一种视线角度调整方法,该方法包括:
获取训练样本集;其中,所述训练样本集中包括多个训练样本,每个训练样本中包括目标视线角度图像和非目标视线角度图像,所述训练样本是基于预先训练得到的目标样本生成模型确定的;
针对每个训练样本,将当前训练样本中的非目标视线角度图像输入至待训练视线角度调整模型中,得到与所述当前训练样本相对应的实际输出图像;
根据所述当前训练样本的实际输出图像和目标视线角度图像,确定损失值,并基于所述损失值和所述待训练视线角度调整模型的预设损失函数,调整所述待训练视线角度调整模型的模型参数;
将所述待训练视线角度调整模型的预设损失函数收敛作为训练目标,得到所述目标视线角度调整模型。
根据本公开的一个或多个实施例,【示例九】提供了一种视线角度调整方法,该方法包括:
训练得到所述目标样本生成模型中的非目标视线角度图像生成子模型,包括:
将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值;
基于所述误差值和所述待训练非目标视线角度图像生成子模型中的损失函数,对所述待训练非目标视线角度图像生成子模型中的模型参数进行修正;
将所述待训练非目标视线角度图像生成子模型中的损失函数收敛作为训练目标,得到所述非目标视线角度图像生成子模型,以基于所述非目标视线角度图像生成子模型生成所述训练样本中的非目标视线角度图像。
根据本公开的一个或多个实施例,【示例十】提供了一种视线角度调整方法,该方法包括:
所述将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值,包括:
基于所述待训练非目标视线角度图像生成子模型中的生成器对所述高斯分布向量进行处理,得到待比较图像;
基于待训练所述非目标视线角度图像生成子模型中的判别器对所述原始非正视样本图像和所述待比较图像进行处理,得到所述误差值。
根据本公开的一个或多个实施例,【示例十一】提供了一种视线角度调整方法,该方法包括:
训练得到所述目标样本生成模型中的目标视线角度图像生成子模型,包括:
获取所述非目标视线角度图像生成子模型中的模型参数,并将所述模型参数复用至待训练目标视线角度图像生成子模型中;
基于预先采集的高斯分布向量和原始正视样本图像对所述待训练目标视线角度图像生成子模型进行训练,得到所述目标视线角度图像生成子模型,以基于所述目标视线角度图像生成子模型生成所述训练样本中的目标视线角度图像。
根据本公开的一个或多个实施例,【示例十二】提供了一种视线角度调整方法,该方法包括:
将多个高斯分布向量分别输入至所述目标视线角度图像生成子模型和非目标视线角度图像生成子模型中,得到训练样本中的视线正视样本图像和视线非正视样本图像。
根据本公开的一个或多个实施例,【示例十三】提供了一种视线角度调整装置,该装置包括:
图像采集模块,设置为采集目标用户的待处理面部图像;
图像处理模块,设置为基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
图像展示模块,设置为将所述目标面部图像展示至至少一个客户端。
以上描述仅为本公开的实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但 是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (14)

  1. 一种视线角度调整方法,包括:
    采集目标用户的待处理面部图像;
    基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
    将所述目标面部图像展示至至少一个客户端。
  2. 根据权利要求1所述的方法,其中,所述采集目标用户的待处理面部图像,包括:
    在至少一个用户基于实时互动界面进行互动的情况下,采集所述至少一个用户中的目标用户的待处理面部图像;或,当检测到触发预设事件时,基于摄像模块采集所述目标用户的待处理面部图像。
  3. 根据权利要求2所述的方法,其中,所述在多个用户基于实时互动界面进行互动的情况下,采集所述至少一个用户中的目标用户的待处理面部图像,包括:
    在所述至少一个用户基于实时互动界面进行互动的情况下,确定当前发言用户并将所述当前发言用户作为所述目标用户;
    基于摄像模块采集所述目标用户的待处理面部图像。
  4. 根据权利要求1所述的方法,其中,所述目标视线角度调整模型是预先训练得到的,在所述基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像之前,还包括:
    基于特征检测模块,确定所述待处理面部图像中的视线特征是否与预设视线特征相匹配;
    响应于所述待处理面部图像中的视线特征不与所述预设视线特征相匹配,基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像。
  5. 根据权利要求4所述的方法,其中,所述基于所述目标视线角度调整模型对所述待处理面部图像进行处理,以得到所述目标面部图像,包括:
    将所述待处理面部图像输入至所述目标视线角度调整模型中,得到所述目标面部图像;其中,所述目标面部图像中的视线角度与所述待处理图像中的视线角度相异。
  6. 根据权利要求1所述的方法,其中,所述将所述目标面部图像展示至至 少一个客户端,包括:
    将与所述目标面部图像相对应的多媒体数据流发送至与所述目标用户相关联的至少一个客户端上进行展示。
  7. 根据权利要求1所述的方法,还包括:
    获取训练样本集;其中,所述训练样本集中包括多个训练样本,每个训练样本中包括目标视线角度图像和非目标视线角度图像,所述训练样本是基于预先训练得到的目标样本生成模型确定的;
    针对每个训练样本,将当前训练样本中的非目标视线角度图像输入至待训练视线角度调整模型中,得到与所述当前训练样本相对应的实际输出图像;
    根据所述当前训练样本的实际输出图像和目标视线角度图像,确定损失值,并基于所述损失值和所述待训练视线角度调整模型的预设损失函数,调整所述待训练视线角度调整模型的模型参数;
    将所述待训练视线角度调整模型的预设损失函数收敛作为训练目标,得到所述目标视线角度调整模型。
  8. 根据权利要求7所述的方法,其中,训练得到所述目标样本生成模型中的非目标视线角度图像生成子模型,包括:
    将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值;
    基于所述误差值和所述待训练非目标视线角度图像生成子模型中的损失函数,对所述待训练非目标视线角度图像生成子模型中的模型参数进行修正;
    将所述待训练非目标视线角度图像生成子模型中的损失函数收敛作为训练目标,得到所述非目标视线角度图像生成子模型,以基于所述非目标视线角度图像生成子模型生成所述训练样本中的非目标视线角度图像。
  9. 根据权利要求8所述的方法,其中,所述将预先采集的高斯分布向量和原始非正视样本图像输入至待训练非目标视线角度图像生成子模型中,得到误差值,包括:
    基于所述待训练非目标视线角度图像生成子模型中的生成器对所述高斯分布向量进行处理,得到待比较图像;
    基于所述待训练非目标视线角度图像生成子模型中的判别器对所述原始非正视样本图像和所述待比较图像进行处理,得到所述误差值。
  10. 根据权利要求8所述的方法,其中,训练得到所述目标样本生成模型 中的目标视线角度图像生成子模型,包括:
    获取所述非目标视线角度图像生成子模型中的模型参数,并将所述模型参数复用至待训练目标视线角度图像生成子模型中;
    基于预先采集的高斯分布向量和原始正视样本图像对所述待训练目标视线角度图像生成子模型进行训练,得到所述目标视线角度图像生成子模型,以基于所述目标视线角度图像生成子模型生成所述训练样本中的目标视线角度图像。
  11. 一种视线角度调整装置,包括:
    图像采集模块,设置为采集目标用户的待处理面部图像;
    图像处理模块,设置为基于目标视线角度调整模型对所述待处理面部图像进行处理,得到与所述待处理面部图像相对应的目标面部图像;其中,所述目标视线角度调整模型用于将面部图像中用户的视线角度调整至目标角度;
    图像展示模块,设置为将所述目标面部图像展示至至少一个客户端。
  12. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-10中任一项所述的视线角度调整方法。
  13. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-10中任一项所述的视线角度调整方法。
  14. 一种计算机程序产品,包括承载在非暂态计算机可读介质上的计算机程序,所述计算机程序包含用于执行如权利要求1-10中任一项所述的视线角度调整方法的程序代码。
PCT/CN2022/115862 2021-08-31 2022-08-30 视线角度调整方法、装置、电子设备及存储介质 WO2023030321A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111013443.0A CN113641247A (zh) 2021-08-31 2021-08-31 视线角度调整方法、装置、电子设备及存储介质
CN202111013443.0 2021-08-31

Publications (1)

Publication Number Publication Date
WO2023030321A1 true WO2023030321A1 (zh) 2023-03-09

Family

ID=78424583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/115862 WO2023030321A1 (zh) 2021-08-31 2022-08-30 视线角度调整方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113641247A (zh)
WO (1) WO2023030321A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641247A (zh) * 2021-08-31 2021-11-12 北京字跳网络技术有限公司 视线角度调整方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120120264A1 (en) * 2010-11-12 2012-05-17 Samsung Electronics Co., Ltd. Method and apparatus for video stabilization by compensating for view direction of camera
CN104699124A (zh) * 2015-03-24 2015-06-10 天津通信广播集团有限公司 一种基于视线观看角度检测的电视机角度调整方法
CN111353336A (zh) * 2018-12-21 2020-06-30 华为技术有限公司 图像处理方法、装置及设备
CN113222857A (zh) * 2021-05-27 2021-08-06 Oppo广东移动通信有限公司 图像处理方法、模型的训练方法及装置、介质和电子设备
CN113641247A (zh) * 2021-08-31 2021-11-12 北京字跳网络技术有限公司 视线角度调整方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598765B (zh) * 2019-08-28 2023-05-26 腾讯科技(深圳)有限公司 样本生成方法、装置、计算机设备及存储介质
CN112733794B (zh) * 2021-01-22 2021-10-15 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质
CN112733795B (zh) * 2021-01-22 2022-10-11 腾讯科技(深圳)有限公司 人脸图像的视线矫正方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120120264A1 (en) * 2010-11-12 2012-05-17 Samsung Electronics Co., Ltd. Method and apparatus for video stabilization by compensating for view direction of camera
CN104699124A (zh) * 2015-03-24 2015-06-10 天津通信广播集团有限公司 一种基于视线观看角度检测的电视机角度调整方法
CN111353336A (zh) * 2018-12-21 2020-06-30 华为技术有限公司 图像处理方法、装置及设备
CN113222857A (zh) * 2021-05-27 2021-08-06 Oppo广东移动通信有限公司 图像处理方法、模型的训练方法及装置、介质和电子设备
CN113641247A (zh) * 2021-08-31 2021-11-12 北京字跳网络技术有限公司 视线角度调整方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113641247A (zh) 2021-11-12

Similar Documents

Publication Publication Date Title
WO2022121557A1 (zh) 一种直播互动方法、装置、设备及介质
WO2018010682A1 (zh) 直播方法、直播数据流展示方法和终端
CN113411642B (zh) 投屏方法、装置、电子设备和存储介质
CN112714330A (zh) 基于连麦直播的礼物赠送方法、装置及电子设备
WO2022033494A1 (zh) 交互类直播连麦的控制方法、装置、系统、设备及介质
CN111064987B (zh) 信息展示方法、装置及电子设备
WO2022048651A1 (zh) 合拍方法、装置、电子设备及计算机可读存储介质
CN112291502B (zh) 信息交互方法、装置、系统和电子设备
JP2023528958A (ja) ビデオ複合撮影方法、装置、電子機器及びコンピュータ可読媒体
WO2023030121A1 (zh) 数据处理方法、装置、电子设备及存储介质
CN111818383B (zh) 视频数据的生成方法、系统、装置、电子设备及存储介质
WO2023030321A1 (zh) 视线角度调整方法、装置、电子设备及存储介质
WO2023040749A1 (zh) 图像处理方法、装置、电子设备及存储介质
US11553255B2 (en) Systems and methods for real time fact checking during stream viewing
WO2024001802A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023226814A1 (zh) 视频处理方法、装置、电子设备及存储介质
CN112243157A (zh) 直播控制方法、装置、电子设备及计算机可读介质
WO2023125366A1 (zh) 图像处理方法、装置、电子设备和存储介质
CN114125358A (zh) 云会议字幕显示方法、系统、装置、电子设备和存储介质
CN113891168A (zh) 字幕处理方法、装置、电子设备和存储介质
CN113905177A (zh) 视频生成方法、装置、设备及存储介质
TWI792444B (zh) 攝像頭的控制方法、裝置、介質和電子設備
WO2024032111A1 (zh) 在线会议的数据处理方法、装置、设备、介质及产品
WO2023088461A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2022213979A1 (zh) 特效显示方法、装置、设备、存储介质及产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE