CN111260756A

CN111260756A - Method and apparatus for transmitting information

Info

Publication number: CN111260756A
Application number: CN201811459739.3A
Authority: CN
Inventors: 朱祥祥
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09
Anticipated expiration: 2038-11-30
Also published as: CN111260756B

Abstract

The embodiment of the application discloses a method and a device for sending information. One embodiment of the method comprises: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a static image of the at least two static images, performing the following operations: in response to determining that the static image contains a face image, taking the face image contained in the static image as a second face image; processing the first facial image based on the second facial image; replacing a second face image in the static image with the processed first face image; and responding to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and sending the replaced information to be processed. The embodiment realizes replacement of the second face image contained in the information to be processed based on the first face image contained in the original image.

Description

Method and apparatus for transmitting information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for sending information.

Background

In the process of continuous development of internet technology, various information resources are gradually enriched. At present, users can obtain various pictures and videos through the internet. In practice, some information displayed in the pictures and videos can be replaced according to actual needs. Taking a picture of a person as an example, a specific part picture can be used to directly replace a specific part (for example, a head, a face, a body, and the like) of the person in the picture according to actual needs, and this way, a better replacement effect can be obtained in a still image. Since the positions of the characters in the moving pictures and videos are dynamically changed, the manner of directly replacing the pictures with a specific position often results in the lack of vividness of the replaced moving pictures or videos.

Disclosure of Invention

The embodiment of the application provides a method and a device for sending information.

In a first aspect, an embodiment of the present application provides a method for sending information, where the method includes: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a static image of the at least two static images, performing the following operations: in response to determining that the static image contains a face image, taking the face image contained in the static image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image; and responding to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and sending the replaced information to be processed.

In some embodiments, the processing the first face image based on the second face image includes: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some embodiments, the processing the first face image based on the second face image includes: inputting the first facial image and the second facial image into a pre-established expression recognition model respectively to obtain expression categories of the first facial image and the second facial image, wherein the expression recognition model is used for representing the corresponding relation between the facial image and the expression categories; and in response to determining that the expression categories of the first facial image and the second facial image are matched, taking the first facial image as a processed first facial image.

In some embodiments, the processing the first face image based on the second face image further includes: and in response to determining that the expression types of the first face image and the second face image are not matched, inputting the expression types of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as the processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression types and the generated face image.

In some embodiments, the expression recognition model is trained by: acquiring a first training sample set, wherein the first training sample set comprises a face image and an expression category corresponding to the face image; and taking the facial image of the first training sample in the first training sample set as input, taking the expression type corresponding to the input facial image as expected output, and training to obtain the expression recognition model.

In some embodiments, the image generation model is trained by: acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category; and taking the sample face image and the sample expression category of the second training sample in the second training sample set as input, taking the sample generation face image corresponding to the input sample face image and the sample expression category as expected output, and training to obtain the image generation model.

In a second aspect, an embodiment of the present application provides an apparatus for transmitting information, where the apparatus includes: a receiving unit configured to receive an original image including a first face image and information to be processed composed of at least two still images; an execution unit configured to execute a predetermined operation with respect to a still image of the at least two still images, wherein the execution unit includes: a determination unit configured to, in response to determining that the face image is included in the still image, take the face image included in the still image as a second face image; a processing unit configured to process the first face image based on the second face image; a replacement unit configured to replace the second face image in the still image with the processed first face image; and the sending unit is configured to respond to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and send the replaced information to be processed.

In some embodiments, the processing unit is further configured to: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some embodiments, the processing unit is further configured to: inputting the first facial image and the second facial image into a pre-established expression recognition model respectively to obtain expression categories of the first facial image and the second facial image, wherein the expression recognition model is used for representing the corresponding relation between the facial image and the expression categories; and in response to determining that the expression categories of the first facial image and the second facial image are matched, taking the first facial image as a processed first facial image.

In some embodiments, the processing unit is further configured to: and in response to determining that the expression types of the first face image and the second face image are not matched, inputting the expression types of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as the processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression types and the generated face image.

In a third aspect, an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

The method and the device for sending information provided by the embodiment of the application firstly receive an original image containing a first face image and information to be processed consisting of at least two static images, and then execute the following operations on the static images in the at least two static images: and finally, in response to the fact that the replacement of each static image containing the face image in the information to be processed is completed, sending the replaced image to be processed, so that the replacement of the second face image contained in the information to be processed based on the first face image contained in the original image is realized.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for transmitting information according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for transmitting information according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for transmitting information according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for transmitting information in accordance with the present application;

FIG. 6 is a block diagram of a computer system suitable for use in implementing the apparatus of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which a method for transmitting information or an apparatus for transmitting information of an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an image processing application, a video editing application, a web browser application, a search application, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image processing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for information displayed on the

terminal devices

101, 102, 103. The backend server may analyze and otherwise process the received information such as the image, and feed back the processing result (e.g., the processed information) to the

terminal apparatuses

101, 102, and 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

The method for sending information provided in the embodiment of the present application may be executed by the

terminal devices

101, 102, and 103, or may be executed by the server 105. Accordingly, the means for transmitting information may be provided in the

terminal devices

101, 102, 103, or in the server 105. This is not limited in this application.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for transmitting information in accordance with the present application is shown. The method for transmitting information comprises the following steps:

step 201, receiving an original image containing a first face image and information to be processed composed of at least two static images.

In the present embodiment, the execution subject of the method for transmitting information (e.g., the

terminal apparatus

101, 102, 103 or the server 105 shown in fig. 1) may receive the original image and the information to be processed in various ways. For example, when the execution subject is a terminal device, an original image and information to be processed input by a user may be directly received. For another example, when the execution subject is a server, the original image and the information to be processed may be received from a terminal device with which the user inputs information. Here, the original image may include a first face image, and the original image may be, for example, a full-body photograph, a half-body photograph, or the like of a person. The information to be processed may be composed of at least two still images, and as an example, the information to be processed may be a moving picture such as a GIF (Graphics Interchange Format) image. A motion picture is a picture that produces some dynamic effect when a specific set of still images is switched at a specified frequency. As another example, the information to be processed may also be a piece of video.

In practice, after the execution subject receives the original image, the execution subject may perform face detection on the original image, so as to obtain a first face image. It should be noted that the face detection technology is a well-known technology widely studied and applied at present, and is not described herein again.

In step 202, a predetermined operation is performed on a static image of the at least two static images.

In the present embodiment, the execution main body may execute a predetermined operation for each of at least two still images constituting information to be processed. Wherein the predetermined operation may include the steps of:

step 2021, in response to determining that the still image contains the face image, taking the face image contained in the still image as the second face image.

In this embodiment, the executing entity may perform face detection on the still image, and determine whether the still image includes a face image according to a face detection result. And in response to determining that the static image contains the face image, taking the face image contained in the static image as a second face image.

Step 2022, process the first facial image based on the second facial image.

In this embodiment, the execution subject may perform various kinds of processing on the first face image based on the second face image. As an example, the executing subject may adjust information such as an angle of the face, a state of the eyes, a state of the mouth, and lighting in the first facial image according to information such as an angle of the face (e.g., a frontal face, a lateral face, an overhead, a head-down, and the like), a state of the eyes (e.g., open the eyes, squint the eyes, close the eyes, and the like), a state of the mouth (e.g., open the mouth, close the mouth, and the like), and lighting in the second facial image. Taking the angle of the face as an example, assuming that the angle of the face in the second face image is a side face and the angle of the face in the first face image is a front face, the face posture of the first face image can be corrected based on affine transformation, forward-porch transformation and other manners, so that the angle of the face in the first face image is adjusted to be the side face. It should be noted that, performing face pose correction based on affine transformation, orthodox transformation, and the like is a well-known technology widely studied and applied at present, and is not described herein again.

In some optional implementations of this embodiment, the step 2022 may specifically include the following:

in step S1, the executing entity may input the first facial image and the second facial image into a pre-established expression recognition model respectively to obtain expression categories of the first facial image and the second facial image.

In this implementation, the expression category may be used to represent a category of an expression of a face presented by the face image. As an example, the facial expressions can be classified into different categories in advance according to actual needs, for example, the expressed emotions according to the facial expressions are classified into blankness, happiness, surprise, fear, anger and the like. It can be understood that the more categories the facial expression is divided into, the better the effect of the to-be-processed information obtained after the facial image is replaced is.

Here, the expression recognition model may be used to represent the correspondence between the facial image and the expression category. As an example, the expression recognition model described above may include a feature extraction section and a first correspondence table. The feature extraction part can be used for extracting feature information of the face image. The first correspondence table may be a correspondence table in which correspondence between a plurality of feature information and expression categories is stored, which is prepared by a technician based on statistics of a large amount of feature information and expression categories. In this way, for a certain facial image, the expression recognition model may first extract feature information of the facial image using a feature extraction unit, and use the obtained feature information as target feature information. And then, comparing the target characteristic information with the characteristic information in the first corresponding relation table, and if the target characteristic information is the same as or similar to certain piece of characteristic information in the first corresponding relation table, taking the expression type corresponding to the piece of characteristic information in the first corresponding relation table as the expression type of the face image.

In some optional implementations, the expression recognition model may be trained by: first, a first training sample set is obtained, wherein the first training sample may include a facial image and an expression category corresponding to the facial image. Then, the facial image of the first training sample in the first training sample set is used as input, the expression category corresponding to the input facial image is used as expected output, and the expression recognition model is obtained through training.

In this implementation manner, the execution subject of the training expression recognition model may be the same as or different from the above subject. As an example, an executive who trains the expression recognition model may first determine a first initial model and model parameters of the first initial model. Here, the first initial model may be used to represent the correspondence between the facial image and the expression category, and the first initial model may be a convolutional neural network, a deep neural network, or other various machine learning models. Then, the facial image in the first training sample set may be input into the first initial model to obtain an expression category of the facial image, the expression category corresponding to the facial image is taken as an expected output of the first initial model, and the first initial model is trained by using a machine learning method. Specifically, the difference between the resulting expression category and the desired output may first be calculated using a preset loss function. Then, based on the calculated difference, the model parameters of the first initial model are adjusted, and in the case that a preset training end condition is met, the training is ended, so that the expression recognition model is obtained. For example, the preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset duration, the training times exceeds the preset times, the prediction accuracy of the first initial model is greater than a preset accuracy threshold, and the like.

Here, various implementations may be employed to adjust model parameters of the first initial model based on differences between the generated expression categories and the desired output. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm may be used to adjust the model parameters of the first initial model.

Step S2, in response to determining that the expression categories of the first face image and the second face image match, takes the first face image as a processed first face image.

Here, the execution subject may determine whether expression categories of the first face image and the second face image match. In response to determining that the expression categories of the first facial image and the second facial image match (e.g., are the same), the first facial image may be treated as a processed first facial image. That is, when the expression categories of the first facial image and the second facial image are matched, the facial information in the first facial image does not need to be adjusted, and the first facial image is directly used as the processed first facial image.

In some optional implementations, the step 2022 may further include the following:

step S3, in response to determining that the expression categories of the first face image and the second face image do not match, inputting the expression categories of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and using the generated face image as the processed first face image.

Here, the image generation model may be used for correspondence between input information and generated face images, where the input information may include face images and expression categories. As an example, the image generation model described above may include a feature extraction section and a second correspondence table. The feature extraction part may be configured to extract feature information in the face image. The second correspondence table may be a correspondence table in which correspondence between a plurality of input information items and the generated image is stored, the correspondence table being prepared by a technician based on statistics of a large amount of input information items and the generated image, the input information items including feature information and expression categories of the face image. In this way, for a certain face image and expression category that are input, the image generation model may first extract feature information of the face image using a feature extraction unit to obtain input information, and use the obtained input information as target input information. And then comparing the target input information with the input information in the second corresponding relation table, and if the target input information is the same as or similar to certain piece of input information in the second corresponding relation table, taking a generated image corresponding to the input information in the second corresponding relation table as a generated image of the input information.

Optionally, the image generation model may be trained in the following manner: first, a second training sample set is obtained, where the second training sample may include a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category. The sample face image and the sample generated face image are face images of the same person, and the expression category of the sample generated face image is matched with (for example, the same as) the sample expression category. Then, taking the sample face image and the sample expression category of the second training sample in the second training sample set as input, taking the sample generation face image corresponding to the input sample face image and the sample expression category as expected output, and training to obtain an image generation model.

As an example, an executive who trains the image generation model may first determine the second initial model and the model parameters of the second initial model. Here, the second initial model may be used to characterize a correspondence relationship between input information and generated face images, where the input information may include sample face images and sample expression categories. The second initial model may be a convolutional neural network, a deep neural network, or the like, among various machine learning models. Then, the sample face image and the sample expression category in the second training sample set can be input into the second initial model to obtain a generated face image, the sample face image and the sample generation face image corresponding to the sample expression category are used as expected output of the second initial model, and the machine learning method is utilized to train the second initial model. Specifically, the difference between the resulting generated face image and the desired output may first be calculated using a preset loss function. Then, based on the calculated difference, the model parameters of the second initial model may be adjusted, and in the case that a preset training end condition is satisfied, the training is ended, so as to obtain the image generation model. For example, the preset training end condition may include, but is not limited to, at least one of the following: the training time exceeds the preset duration, the training times exceeds the preset times, the generation accuracy of the second initial model is greater than a preset accuracy threshold, and the like.

At step 2023, the processed first facial image is used to replace the second facial image in the still image.

In this embodiment, the executing subject may replace the second face image in the still image with the first face image processed in step 2022. It will be appreciated that the size or the like of the processed first face image may need to be adjusted before replacing the second face image in the still image with the processed first face image. After the replacement, the replaced image can be subjected to seamless fusion, sharpening and other processing, so that the fusion effect between the edge of the face image and the background image is ensured.

And step 203, responding to the fact that the replacement of each static image in the information to be processed is completed, and sending the replaced information to be processed.

In this embodiment, the execution subject may determine whether or not replacement of each still image including a face image in the information to be processed is completed. In response to determining that the replacement of each still image including the face image in the information to be processed is completed, the above-mentioned execution may send the information to be processed after the replacement is completed. As an example, when the execution subject is a terminal device, the to-be-processed information after completion of replacement may be sent to a display device for display. When the execution main body is a server, the replaced information to be processed can be sent to the terminal equipment which is used by the user to send the original image and the information to be processed.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for transmitting information according to the present embodiment. In the application scenario of fig. 3, the terminal device 301 first receives an original image containing a first face image and a moving picture composed of 5 still images sent by a user. Then, for each still image in the moving picture, the terminal device 301, in response to determining that a face image is included in the still image, takes the face image included in the still image as a second face image, and processes the first face image based on the second face image, and replaces the second face image in the still image with the processed first face image. And finally, in response to the fact that the replacement of each static image containing the face image in the dynamic image is completed, sending the replaced dynamic image to a display for displaying.

The method provided by the above embodiment of the present application replaces the second face image included in the to-be-processed information based on the first face image included in the original image, and since the first face image is processed according to the second face image in each still image included in the to-be-processed information during replacement, the replaced to-be-processed information can be vivid and natural.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for transmitting information is shown. The process 400 of the method for transmitting information includes the steps of:

step 401, receiving an original image containing a first face image and information to be processed composed of at least two static images.

In this embodiment, step 401 is similar to step 201 of the embodiment shown in fig. 2, and is not described here again.

Step 402, for the static image of at least two static images, executing a predetermined operation.

In the present embodiment, the execution main body may execute a predetermined operation for each of at least two still images constituting information to be processed. Wherein the predetermined operation comprises the steps of:

step 4021, in response to determining that the static image contains the face image, taking the face image contained in the static image as a second face image.

In this embodiment, step 4021 is similar to step 2021 of the embodiment shown in fig. 2, and is not described herein again.

Step 4022, performing face key point detection on the first face image and the second face image respectively to obtain key point information of the first face image and the second face image.

In this embodiment, the executing entity may perform face key point detection on the first face image and the second face image respectively, so as to obtain key point information of the face key points of the first face image and the second face image, for example, position information of each face key point. In practice, the face key points may be divided into interior key points and contour key points, and the interior key points may include key points of eyebrows, eyes, nose, mouth, and the like. By detecting the key points of the human face image, the position information of eyebrows, eyes, nose, mouth and the like of the human face in the human face image can be positioned. It should be noted that, performing face key point detection on a face image is a well-known technology widely studied and applied at present, and is not described herein again.

And step 4023, adjusting the key point information of the first face image according to the key point information of the second face image.

In this embodiment, the executing entity may adjust the key point information of the first face image according to the key point information of the second face image. As an example, the execution subject may adjust the position information of the key points of the first face image according to the position information of the key points in the second face image. Taking the mouth as an example, the executing main body may determine an opening and closing angle of the mouth of the human face in the second face image according to the position information of the plurality of key points related to the mouth in the second face image, and use the determined opening and closing angle as the target opening and closing angle. According to the target opening and closing angle, the execution main body can adjust the position information of a plurality of key points related to the mouth in the first face image, so that the opening and closing angle of the face mouth in the adjusted first face image is the same as or close to the target opening and closing angle.

And step 4024, replacing the second face image in the static image with the processed first face image.

In this embodiment, step 4024 is similar to step 2023 of the embodiment shown in fig. 2, and is not described herein again.

And step 403, in response to determining that the replacement of each static image containing the face image in the information to be processed is completed, sending the replaced information to be processed.

In this embodiment, step 403 is similar to step 203 of the embodiment shown in fig. 2, and is not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for sending information in the present embodiment highlights a step of adjusting the key point information of the first face image according to the key point information of the second face image. Therefore, the facial action of the face in the first face image is close to the facial action of the face in the second face image, and the facial expression of each static image of the replaced information to be processed is more natural and vivid.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for sending information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for transmitting information of the present embodiment includes: a receiving unit 501, an execution unit 502 and a sending unit 503. The receiving unit 501 is configured to receive an original image including a first face image and information to be processed composed of at least two still images; the execution unit 502 is configured to execute a predetermined operation on a still image of the at least two still images, wherein the execution unit 502 includes: a determination unit 5021 configured to respond to the determination that the static image contains the face image, and take the face image contained in the static image as a second face image; a processing unit 5022 configured to process the first face image based on the second face image; a replacing unit 5023 configured to replace the second face image in the still image with the processed first face image; the transmission unit 503 is configured to transmit the replaced information to be processed in response to determining that replacement of each still image including the face image in the information to be processed is completed.

In this embodiment, specific processes of the receiving unit 501, the executing unit 502, and the sending unit 503 of the apparatus 500 for sending information and technical effects brought by the processes can refer to the related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the processing unit 5022 is further configured to: respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image; and adjusting the key point information of the first face image according to the key point information of the second face image.

In some optional implementations of this embodiment, the processing unit 5022 is further configured to: inputting the first facial image and the second facial image into a pre-established expression recognition model respectively to obtain expression categories of the first facial image and the second facial image, wherein the expression recognition model is used for representing the corresponding relation between the facial image and the expression categories; and in response to determining that the expression categories of the first facial image and the second facial image are matched, taking the first facial image as a processed first facial image.

In some optional implementations of the present embodiment, the processing unit 5022 is further configured to: and in response to determining that the expression types of the first face image and the second face image are not matched, inputting the expression types of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as the processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression types and the generated face image.

In some optional implementation manners of this embodiment, the expression recognition model is obtained by training in the following manner: acquiring a first training sample set, wherein the first training sample set comprises a face image and an expression category corresponding to the face image; and taking the facial image of the first training sample in the first training sample set as input, taking the expression type corresponding to the input facial image as expected output, and training to obtain the expression recognition model.

In some optional implementations of the present embodiment, the image generation model is trained by: acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category; and taking the sample face image and the sample expression category of the second training sample in the second training sample set as input, taking the sample generation face image corresponding to the input sample face image and the sample expression category as expected output, and training to obtain the image generation model.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the apparatus of an embodiment of the present application. The apparatus shown in fig. 6 is only an example, and should not bring any limitation to the function and use range of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, an execution unit, and a transmitting unit. Here, the names of the units do not constitute a limitation to the unit itself in some cases, and for example, the receiving unit may also be described as a "unit that receives an original image containing a first face image and information to be processed composed of at least two still images".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: receiving an original image containing a first face image and information to be processed consisting of at least two static images; for a static image of the at least two static images, performing the following operations: in response to determining that the static image contains a face image, taking the face image contained in the static image as a second face image; processing the first face image based on the second face image; replacing a second face image in the static image with the processed first face image; and responding to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and sending the replaced information to be processed.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for transmitting information, comprising:

receiving an original image containing a first face image and information to be processed consisting of at least two static images;

for a static image of the at least two static images, performing the following operations: in response to determining that the static image contains a face image, taking the face image contained in the static image as a second face image; processing the first facial image based on the second facial image; replacing a second face image in the static image with the processed first face image;

and responding to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and sending the replaced information to be processed.

2. The method of claim 1, wherein said processing the first facial image based on the second facial image comprises:

respectively carrying out face key point detection on the first face image and the second face image to obtain key point information of the first face image and the second face image;

and adjusting the key point information of the first face image according to the key point information of the second face image.

3. The method of claim 1, wherein said processing the first facial image based on the second facial image comprises:

inputting the first facial image and the second facial image into a pre-established expression recognition model respectively to obtain expression categories of the first facial image and the second facial image, wherein the expression recognition model is used for representing the corresponding relation between the facial image and the expression categories;

in response to determining that the expression categories of the first facial image and the second facial image match, treating the first facial image as a processed first facial image.

4. The method of claim 3, wherein said processing the first facial image based on the second facial image further comprises:

and in response to the fact that the expression types of the first face image and the second face image are not matched, inputting the expression types of the first face image and the second face image into a pre-established image generation model to obtain a generated face image, and taking the generated face image as the processed first face image, wherein the image generation model is used for representing the corresponding relation between the face image and the expression types and the generated face image.

5. The method of claim 3, wherein the expression recognition model is trained by:

acquiring a first training sample set, wherein the first training sample set comprises a face image and an expression category corresponding to the face image;

and taking the facial image of the first training sample in the first training sample set as input, taking the expression category corresponding to the input facial image as expected output, and training to obtain the expression recognition model.

6. The method of claim 4, wherein the image generation model is trained by:

acquiring a second training sample set, wherein the second training sample comprises a sample face image and a sample expression category, and a sample generation face image corresponding to the sample face image and the sample expression category, wherein the sample face image and the sample generation face image are face images of the same person, and the expression category of the sample generation face image is matched with the sample expression category;

and taking the sample face image and the sample expression category of the second training sample in the second training sample set as input, taking the sample generation face image corresponding to the input sample face image and the sample expression category as expected output, and training to obtain the image generation model.

7. An apparatus for transmitting information, comprising:

a receiving unit configured to receive an original image including a first face image and information to be processed composed of at least two still images;

an execution unit configured to execute a predetermined operation for a still image of the at least two still images, wherein the execution unit includes: a determination unit configured to, in response to determining that the face image is included in the still image, take the face image included in the still image as a second face image; a processing unit configured to process the first face image based on the second face image; a replacement unit configured to replace the second face image in the still image with the processed first face image;

and the sending unit is configured to respond to the fact that the replacement of each static image containing the face image in the information to be processed is completed, and send the replaced information to be processed.

8. The apparatus of claim 7, wherein the processing unit is further configured to:

9. The apparatus of claim 7, wherein the processing unit is further configured to:

10. The apparatus of claim 9, wherein the processing unit is further configured to:

11. The apparatus of claim 9, wherein the expression recognition model is trained by:

12. The apparatus of claim 10, wherein the image generation model is trained by:

13. An apparatus, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.