CN114422862A

CN114422862A - Service video generation method, device, equipment, storage medium and program product

Info

Publication number: CN114422862A
Application number: CN202111598400.3A
Authority: CN
Inventors: 秦小波; 银星茜; 李旭佳; 石广洲; 曹继印; 李锋
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-29

Abstract

The application relates to a service video generation method, apparatus, device, storage medium and program product. The method comprises the following steps: collecting dynamic video stream of a real person seat; processing the dynamic video stream to obtain expression information and/or action information; acquiring a preset configured digital human template; and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video. By adopting the method, the user experience can be improved.

Description

Service video generation method, device, equipment, storage medium and program product

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for generating a service video.

Background

The digital human technology is to utilize the method of information science to perform virtual simulation on the shapes and functions of human bodies at different levels. Compared with the prior art, the technology for generating the simulated virtual digital human is relatively mature, and a great number of digital human figures in the market are applied to various life scenes such as games, entertainment and the like, so that the staticized virtual visual enjoyment is brought to users. When the method is applied to a scene that an enterprise provides services for users, a digital person often exists as a virtual digital staff in the enterprise to communicate with the users. At present, most of digital people communicate with users in a man-machine conversation and digital image visualization mode.

However, the current man-machine conversation and the mode of visualizing the digital human image are independent from the man-machine conversation for guests, and the user experience is poor.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a service video generation method, apparatus, device, storage medium, and program product capable of improving user experience.

In a first aspect, the present application provides a service video generation method, which is applied to an intelligent camera, and the method includes:

collecting dynamic video stream of a real person seat;

processing the dynamic video stream to obtain expression information and/or action information;

acquiring a preset configured digital human template;

and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video.

In one embodiment, after mapping the expression information and/or the action information to a digital human template and rendering the mapped digital human template to obtain a service video, the method includes:

and sending the service video to a customer service platform through a transmission channel between the original camera and the customer service platform, so that the customer service platform sends the service video to a user terminal under a preset condition.

In one embodiment, before the obtaining the preset configured digital human template, the method further includes:

and periodically downloading a digital person template from a digital person configuration center, wherein the digital person template is obtained by pre-configuration based on the service information.

In one embodiment, the processing the dynamic video stream to obtain expression information includes:

carrying out face recognition on a target frame in the dynamic video stream to obtain a face area;

and extracting the expression vector of the face area as expression information.

In one embodiment, the method further comprises:

performing motion capture on a target frame in the dynamic video stream;

a motion vector of the captured motion is extracted as motion information.

In one embodiment, the digital human template carries a standard vector and image data; the mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video includes:

mapping an expression vector corresponding to the expression information with the standard vector and/or mapping an action vector corresponding to the action information with the standard vector;

and rendering according to the mapped standard vector and the image parameters of the digital human template to obtain a service video.

In a second aspect, the present application further provides a service video generating apparatus, including:

the acquisition module is used for acquiring a dynamic video stream of the real seat;

the processing module is used for processing the dynamic video stream to obtain expression information and/or action information;

the template acquisition module is used for acquiring a preset configured digital human template;

and the mapping module is used for mapping the expression information and/or the action information to a digital human template and rendering the mapped digital human template to obtain a service video.

In a third aspect, the present application further provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method in any one of the above embodiments when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method in any of the above-described embodiments.

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method in any of the embodiments described above.

According to the service video generation method, the device, the equipment, the storage medium and the program product, the service video is obtained by acquiring the dynamic video stream of the human seat, identifying the dynamic video stream to obtain the expression information and/or the action information, and mapping the expression information and/or the action information to the digital human template, so that the digital human is highly consistent with the dynamic expression and action of the human, the effect that the human can drive the digital human in real time or non-real time and the digital human is displayed on the same screen of the human and the machine of a user is achieved, and the user experience is improved.

Drawings

FIG. 1 is a diagram of an application environment of a service video generation method in one embodiment;

FIG. 2 is a flow diagram that illustrates a method for service video generation, according to one embodiment;

FIG. 3 is a diagram illustrating an expression information obtaining step according to an embodiment;

fig. 4 is a flowchart illustrating a service video generation method in another embodiment;

FIG. 5 is a block diagram showing the construction of a service video generating apparatus according to an embodiment;

FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The service video generation method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The smart camera 102 is in communication with the customer service platform 104 and the digital human configuration center 106, respectively. The data storage system may store data that the customer service platform 104 needs to process. The data storage system may be integrated on the customer service platform 104 or may be located on the cloud or other network server. The intelligent camera 102 collects dynamic video streams of a real person seat; processing the dynamic video stream to obtain expression information and/or action information; acquiring a preset configured digital human template; and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video. Therefore, the digital person has the dynamic expression and action which are highly consistent with those of the real person, the real person can drive the digital person in real time or non-real time, the effect of displaying the human and the machine of the user on the same screen is achieved, and the user experience is improved.

The smart camera 102 is obtained by modifying a conventional camera, and software of a corresponding service video generation method and hardware supporting the cloud top of the software, such as a CPU and a memory, are added. Customer service platform 104 and digital human configuration center 106 may be implemented as separate servers or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a service video generation method is provided, which is described by taking the method as an example applied to the smart camera 102 in fig. 1, and includes the following steps:

s202: and collecting the dynamic video stream of the real seat.

Specifically, the dynamic video stream is obtained by shooting a real person seat when the intelligent camera provides seat service, wherein in order to drive a digital person through the real person seat, the digital person is mapped with a digital person shape image completed by three-dimensional modeling through real person dynamic image data acquisition such as body motion capture and facial expression extraction of the real person, and then the real person is highly restored through technologies such as voice synthesis, so that the digital person has the dynamic expression and motion keeping high consistency with the real person, and the effect that the real person can drive the digital person in real time or non-real time and is displayed on the same screen of a human machine of a user is achieved.

In order to reduce the cost and realize the method for driving the digital human by the human seat, the dynamic video stream is processed by the intelligent camera, so that the video stream is prevented from being sent to the cloud end and the cloud end processing result is prevented from being sent to the intelligent camera, the traditional camera is only required to be replaced by the intelligent camera, and even a cloud end server is replaced to perform analysis, rendering and other work, so that the cost is reduced, and the use experience is improved.

S204: and processing the dynamic video stream to obtain expression information and/or action information.

Specifically, the expression information refers to facial information of the human seat, which may represent the emotion of the human seat and the movement of the sound corresponding to the mouth. The motion information refers to the motion of the real human seat, including limb motions, such as the motion of each joint, the rotation angle, and the like.

The intelligent camera can process the dynamic video stream according to a model obtained by pre-training to obtain expression information and/or action information.

It should be noted that the intelligent camera may process each frame of image in the dynamic video stream to obtain expression information and/or motion information. In other embodiments, the smart camera may further process the dynamic video stream to identify a target video frame from the dynamic video stream, so that only the target video frame is subjected to model processing to obtain expression information and/or motion information.

The intelligent camera can extract the expression and the limb action, so that only common camera hardware needs to be modified on a hardware level, and hardware required for extracting the expression and the limb action is integrated. On the software level, the capabilities of data acquisition, data analysis, AI application, data conversion, video rendering and the like are automatically realized end to end, and automatic concatenation is realized. But the whole externally exposed input and output are still kept consistent, so that the use threshold is greatly reduced, and the method can be rapidly popularized and applied.

S206: and acquiring a preset configured digital human template.

Specifically, the digital human template is a preset uniform character parameter, wherein the uniform character parameter can be set according to different service types. For example, different digital human templates are set for different service types. More preferably, the digital person configuration center may pre-configure a plurality of digital person templates, and the user may select a corresponding digital person template as needed, so that when the user uses a corresponding customer service, the digital person template corresponding to the user is selected to be combined with a human seat to obtain the customer service.

S208: and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video.

Specifically, the intelligent camera firstly maps expression information and/or action information to the digital human template, and when face driving and body driving are completed, real-time rendering of the digital human is automatically triggered to generate a digital human video driven by a real human. Therefore, the whole process of live video input and digital video output is completed. The output digital human video can be transmitted to the PC end for direct use through the USB interface of the intelligent camera.

The intelligent camera is integrated with the common camera in practical application, other newly-added hardware is not needed, a cloud server used for video analysis and digital human rendering is not additionally added, and the cost of the whole hardware is low.

According to the service video generation method, the dynamic video stream of the real person seat is collected, the dynamic video stream is identified to obtain the expression information and/or the action information, and the expression information and/or the action information are mapped to the digital person template to obtain the service video, so that the digital person has the high consistency with the dynamic expression and action of the real person, the real person can drive the digital person in real time or non-real time, the effect of displaying the digital person to the same screen of a human machine of a user is achieved, and the user experience is improved.

In one embodiment, after mapping the expression information and/or the action information to the digital human template and rendering the mapped digital human template to obtain the service video, the method includes: the service video is sent to the customer service platform through a transmission channel between the original camera and the customer service platform, so that the customer service platform sends the service video to the user terminal under the preset condition.

Specifically, in this embodiment, the method is actually applied to the business-to-business service of an enterprise, and when a traditional customer service person provides a face-to-face video service (such as a remote bank) for a customer, a real-time video of a real person needs to be transmitted to a PC through a USB by a common camera and integrated into a customer service platform.

In the embodiment, the intelligent camera inputs the video and outputs the finally required video, so that the video is not required to be transmitted with the cloud server in real time, the bandwidth problem is thoroughly solved, the time delay is shortened, and the experience is improved; the use mode of the intelligent camera is consistent with that of the traditional camera, the integration scheme of the traditional camera in a customer service system can be reused, secondary development and integration work is not needed after the intelligent camera is used, and the development cost is greatly reduced.

In one embodiment, before acquiring the preset configured digital human template, the method further includes: and periodically downloading the digital human template from the digital human configuration center, wherein the digital human template is obtained by pre-configuration based on the service information.

The digital human image detail information driven in the whole process is regularly interconnected with the digital human configuration center by equipment, and the latest current unified image parameters can be issued/obtained. The issuing means that when the digital person configuration center configures a new digital person template, the digital person configuration center actively issues the new digital person template to the corresponding intelligent camera. The acquisition is that the intelligent camera actively goes to a digital human configuration center to acquire a new digital human template, for example, the acquisition is periodically performed, that is, a new template acquisition request is periodically sent to the digital human configuration center, so that the digital human configuration center judges whether the new digital human template exists in the period, including addition, modification or deletion, and if the new digital human template exists, the new digital human template is fed back to the intelligent camera, so that the digital human template in the intelligent camera can be updated in real time to generate a service video meeting the requirements.

When the face driving and the body driving are completed, the real-time rendering of the digital person is automatically triggered, and the digital person video driven by the real person is generated. Therefore, the whole process of live video input and digital video output is completed. The output digital human video can be transmitted to the PC end for direct use through the USB interface of the intelligent camera.

The cloud configuration center matched with the intelligent camera preferably adopts a platform developed by Java language for managing digital human figure design, production and configuration, realizes real-time configuration updating through a TCP protocol, supports parallel operation of multiple cameras and supports unified image management.

In one embodiment, processing the dynamic video stream to obtain expression information includes: carrying out face recognition on a target frame in the dynamic video stream to obtain a face area; and extracting expression vectors of the face area as expression information.

Specifically, in this embodiment, the expression information may be obtained through calculation by a pre-trained model.

As shown in fig. 3, firstly, extracting key points of the real human face, judging the real human expression of the current frame, and mapping the real human expression with the expression of the digital human to complete face driving; the process needs advanced human face recognition, and the dynamic video stream can be processed by adopting the traditional technology or a subsequent new human face recognition model to obtain the human face in the dynamic video stream, namely the human face is served. In other embodiments, the recognition can be directly performed according to the current seat photo, so as to improve the recognition efficiency and accuracy. After the face in the image is detected, the face region is cut, and the cut face region is sent to a feature extraction algorithm, and the feature extraction algorithm creates a multi-dimensional vector of face embedding (representing face features). After the multi-dimensional vectors of the dynamic video stream are collected, the multi-dimensional vectors are mapped with the embedded multi-dimensional vectors of the face which can be supported by the digital person, and therefore expression consistent driving is achieved.

In one embodiment, the method further includes: performing motion capture on a target frame in the dynamic video stream; a motion vector of the captured motion is extracted as motion information.

For the extraction of limb actions, the intelligent camera firstly extracts the actions, rotation angles and the like of each joint of the current body of a real person, enters action analysis, redirects bones of a digital person, completes real-time restoration of the body actions of the digital person and realizes body driving; similar to the expression mapping technical principle, the intelligent camera captures the real person motion characteristics, converts the real person motion characteristics into multi-dimensional vectors, and then performs mapping conversion with the body motion multi-dimensional vectors supported by the digital person, so that the driving of motion consistency is realized; because the information such as the bone length of the real person is different from that of the digital person, the movement trend of the bone can be transferred and learned by using a deep learning model in the redirection process, so that the more friendly conversion can be realized.

In one embodiment, the digital human template carries a standard vector and image data; mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video, wherein the method comprises the following steps: mapping the expression vector corresponding to the expression information with a standard vector and/or mapping the action vector corresponding to the action information with a standard vector; and rendering according to the mapped standard vector and the image parameters of the digital human template to obtain the service video.

Specifically, when face driving and body driving are completed, real-time rendering of the digital person is automatically triggered, and a real-person driven digital person video is generated. Therefore, the whole process of live video input and digital video output is completed. The output digital human video can be transmitted to the PC end for direct use through the USB interface of the intelligent camera.

For convenience of understanding, fig. 4 is a schematic diagram of a service video generation method in an embodiment, where in this embodiment, by introducing an intelligent camera, the problems of use cost, effect, experience, and the like in real-time driving from a real-person image to a digital-person image are solved end to end, and secondary development is not required in a program, so that a use threshold of the entire technical scheme in an actual service scene is reduced, and rapid service laying can be facilitated.

Specifically, with reference to fig. 4, the input of the overall scheme is a real-time dynamic video of a real person, and the output of the scheme drives the generated digital human video in real time, and the expression and the action of the digital human video are both real-time and consistent with those of the real person. The output video can be provided for various terminal devices such as common PC by USB access mode, and can be directly integrated and used in a universal way.

In this embodiment, the main splitting is described as two modules, a cloud pure soft module and an intelligent camera are integrated into a whole, wherein the cloud module serves digital human figure design, production and configuration management, and the intelligent camera integrated module undertakes all the work of driving a digital human by a real human, which is also the key point of the scheme. The intelligent camera comprises basic video acquisition, storage, management, transmission and other capabilities of a traditional camera, also comprises an AI capability aiming at end-to-end automation of a real person driving technology, and correspondingly, integration of hardware required by AI capability operation and camera hardware is carried out. The automatic operation principle in the intelligent camera is as follows:

as with using traditional camera, after the installation intelligent camera, the real person sits in its the place ahead, opens the camera and can get into video acquisition. The collected video stream enters expression and motion capture in real time:

the expression information is obtained by firstly extracting key points of the real human face, judging the real human expression of the current frame, and mapping the real human expression with the expression of the digital human to finish the face driving; as shown in the following figure, this process requires advanced face recognition (whether the current recognition is a face or not, the model is mature), and after the face in the image is detected, we crop the face and send it to a feature extraction algorithm, which will create a multidimensional vector of face embedding (representing the face features). After the multi-dimensional vectors of the video are collected, the multi-dimensional vectors are mapped with the embedded multi-dimensional vectors of the face which can be supported by the digital person, and therefore expression consistent driving is achieved.

The acquisition of the limb movement comprises the following steps: extracting the action, the rotation angle and the like of each joint of the current body of the real person, entering action analysis, redirecting the skeleton of the digital person, completing the real-time restoration of the body action of the digital person, and realizing body drive; similar to the expression mapping technical principle, the method comprises the steps of capturing real person motion characteristics, converting the real person motion characteristics into multi-dimensional vectors, and then mapping and converting the multi-dimensional vectors with body motion multi-dimensional vectors supported by a digital person, so that the driving of motion consistency is realized; because the information such as the bone length of the real person is different from that of the digital person, the movement trend of the bone can be transferred and learned by using a deep learning model in the redirection process, so that the more friendly conversion can be realized.

The digital human image detail information driven in the whole process is regularly interconnected with the cloud digital human configuration center by the equipment, and the latest current unified image parameters can be issued/obtained. When the face driving and the body driving are completed, the real-time rendering of the digital person is automatically triggered, and the digital person video driven by the real person is generated. Therefore, the whole process of live video input and digital video output is completed. The output digital human video can be transmitted to the PC end for direct use through the USB interface of the intelligent camera.

The system is practically applied to the customer service of an enterprise, when the traditional customer service personnel provide face-to-face video service (such as a remote bank) for customers, real-time video of a real person needs to be transmitted to a PC (personal computer) through a USB (universal serial bus) and is integrated into a customer service platform, the input and output mode of the scheme is not different from that of the common camera, only the output content is changed from the original real-person video into the driven digital human video, the whole system is not changed for the butt joint of a customer service system, the system can be used after being opened, and the development cost is greatly reduced.

In the above embodiment, the intelligent camera for extracting the expressions and the body actions is realized, that is, the hardware layer is transformed into common camera hardware, and hardware required for extracting the expressions and the body actions is integrated. And on the software level, the capabilities of data acquisition, data analysis, AI application, data conversion, video rendering and the like are automatically realized end to end, and automatic concatenation is realized. But the whole externally exposed input and output are still kept consistent, so that the use threshold is greatly reduced, and the method can be rapidly popularized and applied. The cloud configuration center matched with the intelligent camera is also realized, namely a platform for managing digital human figure design, production and configuration developed by Java language is adopted, real-time configuration updating is realized through a TCP protocol, parallel operation of multiple cameras is supported, and unified image management is supported.

In this way, in the embodiment, only the common camera in practical application needs to be upgraded into an integrated intelligent camera, other newly-added hardware is not needed, and a cloud server used for video analysis and digital human rendering is not additionally added, so that the cost of the whole hardware is low; the intelligent camera inputs the video and outputs the finally required video, so that the video is not required to be transmitted with a cloud server in real time, the bandwidth problem is thoroughly solved, the time delay is shortened, and the experience is improved; the use mode of the intelligent camera is consistent with that of the traditional camera, the integration scheme of the traditional camera in a customer service system can be reused, and the intelligent camera does not need to be used for secondary development and integration, so that the development cost is greatly reduced;

compared with the scheme of the movable catching device, the movable catching device does not need any wearing device, does not affect the normal work of real person customer service, keeps good mood and improves the overall service quality.

It should be understood that, although the steps in the flowcharts related to the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a service video generation device for implementing the service video generation method mentioned above. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the service video generation device provided below can be referred to the limitations of the service video generation method in the foregoing, and details are not described herein again.

In one embodiment, as shown in fig. 5, there is provided a service video generating apparatus including: an acquisition module 501, a processing module 502, a template acquisition module 503 and a mapping module 504, wherein:

the acquisition module 501 is used for acquiring a dynamic video stream of a real person seat;

a processing module 502, configured to process the dynamic video stream to obtain expression information and/or motion information;

the template obtaining module 503 is configured to obtain a preset configured digital human template;

and the mapping module 504 is configured to map the expression information and/or the action information to a digital human template, and render the mapped digital human template to obtain a service video.

In one embodiment, the service video generating apparatus further includes:

and the sending module is used for sending the service video to the customer service platform through a transmission channel between the original camera and the customer service platform, so that the customer service platform sends the service video to the user terminal under the preset condition.

In one embodiment, the service video generating apparatus further includes:

and the downloading module is used for periodically downloading the digital human template from the digital human configuration center, and the digital human template is obtained by pre-configuration based on the service information.

In one embodiment, the processing module 502 includes:

the first identification unit is used for carrying out face identification on a target frame in the dynamic video stream to obtain a face area;

the first extraction unit is used for extracting expression vectors of the face area as expression information;

in one embodiment, the processing module 502 further includes:

the second identification unit is used for capturing the motion of the target frame in the dynamic video stream;

a second extraction unit for extracting a motion vector of the captured motion as motion information.

In one embodiment, the mapping module 504 includes:

the mapping unit is used for mapping the expression vector corresponding to the expression information with the standard vector and/or mapping the action vector corresponding to the action information with the standard vector;

and the rendering unit is used for rendering according to the mapped standard vector and the image parameters of the digital human template to obtain the service video.

The modules in the service video generation device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a service video generation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: collecting dynamic video stream of a real person seat; processing the dynamic video stream to obtain expression information and/or action information; acquiring a preset configured digital human template; and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video.

In one embodiment, mapping the facial expression information and/or the motion information to the digital human template and rendering the mapped digital human template to obtain the service video, which is implemented when the processor executes the computer program, includes: the service video is sent to the customer service platform through a transmission channel between the original camera and the customer service platform, so that the customer service platform sends the service video to the user terminal under the preset condition.

In one embodiment, before the obtaining of the preset configured digital human template, the processor, when executing the computer program, further includes: and periodically downloading the digital human template from the digital human configuration center, wherein the digital human template is obtained by pre-configuration based on the service information.

In one embodiment, the processing of the dynamic video stream to obtain the expression information, which is implemented by the processor when executing the computer program, includes: carrying out face recognition on a target frame in the dynamic video stream to obtain a face area; extracting expression vectors of the face area as expression information;

in one embodiment, the processor, when executing the computer program, further performs the steps of: performing motion capture on a target frame in the dynamic video stream; a motion vector of the captured motion is extracted as motion information.

In one embodiment, the processor executes the computer program with a digital human template carrying standard vectors and image data; the mapping of expression information and/or action information to a digital human template and the rendering of the mapped digital human template to obtain a service video, which are realized when a processor executes a computer program, comprises: mapping the expression vector corresponding to the expression information with a standard vector and/or mapping the action vector corresponding to the action information with a standard vector; and rendering according to the mapped standard vector and the image parameters of the digital human template to obtain the service video.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: collecting dynamic video stream of a real person seat; processing the dynamic video stream to obtain expression information and/or action information; acquiring a preset configured digital human template; and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video.

In one embodiment, the mapping of the facial expression information and/or motion information to the digital human template and rendering the mapped digital human template into the service video when the computer program is executed by the processor comprises: the service video is sent to the customer service platform through a transmission channel between the original camera and the customer service platform, so that the customer service platform sends the service video to the user terminal under the preset condition.

In one embodiment, before the computer program is executed by a processor to obtain the preset configured digital human template, the method further includes: and periodically downloading the digital human template from the digital human configuration center, wherein the digital human template is obtained by pre-configuration based on the service information.

In one embodiment, the processing of the dynamic video stream into expression information, implemented when the computer program is executed by a processor, includes: carrying out face recognition on a target frame in the dynamic video stream to obtain a face area; extracting expression vectors of the face area as expression information;

in one embodiment, the computer program when executed by the processor further performs the steps of: performing motion capture on a target frame in the dynamic video stream; a motion vector of the captured motion is extracted as motion information.

In one embodiment, the digital human template involved in the execution of the computer program by the processor carries a standard vector and image data; the computer program, when executed by a processor, maps expression information and/or motion information to a digital human template, and renders the mapped digital human template to obtain a service video, including: mapping the expression vector corresponding to the expression information with a standard vector and/or mapping the action vector corresponding to the action information with a standard vector; and rendering according to the mapped standard vector and the image parameters of the digital human template to obtain the service video.

In one embodiment, a computer program product is provided, comprising a computer program that when executed by a processor performs the steps of capturing a dynamic video stream of a human agent; processing the dynamic video stream to obtain expression information and/or action information; acquiring a preset configured digital human template; and mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A service video generation method is applied to an intelligent camera, and comprises the following steps:

collecting dynamic video stream of a real person seat;

acquiring a preset configured digital human template;

2. The method of claim 1, wherein after mapping the facial expression information and/or motion information to a digital human template and rendering the mapped digital human template to obtain a service video, the method comprises:

3. The method of claim 1, wherein before obtaining the preset configuration of the digital human template, further comprising:

4. The method according to any one of claims 1 to 3, wherein the processing the dynamic video stream to obtain expression information comprises:

5. The method of claim 4, further comprising:

performing motion capture on a target frame in the dynamic video stream;

a motion vector of the captured motion is extracted as motion information.

6. The method of claim 5, wherein the digital human template carries a standard vector and image data; the mapping the expression information and/or the action information to a digital person template, and rendering the mapped digital person template to obtain a service video includes:

7. A service video generation apparatus, characterized in that the apparatus comprises:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.