WO2019019927A1 - 一种视频处理方法、网络设备和存储介质 - Google Patents

一种视频处理方法、网络设备和存储介质 Download PDF

Info

Publication number
WO2019019927A1
WO2019019927A1 PCT/CN2018/095564 CN2018095564W WO2019019927A1 WO 2019019927 A1 WO2019019927 A1 WO 2019019927A1 CN 2018095564 W CN2018095564 W CN 2018095564W WO 2019019927 A1 WO2019019927 A1 WO 2019019927A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
dimensional image
depth information
occlusion model
feature point
Prior art date
Application number
PCT/CN2018/095564
Other languages
English (en)
French (fr)
Inventor
程培
傅斌
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2019019927A1 publication Critical patent/WO2019019927A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2016Rotation, translation, scaling

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a video processing method, a network device, and a storage medium.
  • AR technology is a technology that can calculate the position and angle of camera images in real time and add corresponding images. This technology can combine and interact with virtual world and real world on the screen.
  • a real-time two-dimensional (2D, 2 Dimensions) dynamic sticker effect can be added to each frame image when the user performs imaging.
  • the facial recognition technology can be used to acquire the facial features of the face contained in the current frame image, and then use the five sense points to draw a two-dimensional sticker at a specified point, such as drawing a two-dimensional rabbit ear, cat ear, or Beard, and so on.
  • the inventors of the present disclosure have found that the two-dimensional dynamic sticker effect added by the related art scheme has a certain interest, but the degree of fusion with the original image is poor, and the video processing quality is not good. good.
  • Embodiments of the present disclosure provide a video processing method, a network device, and a storage medium; a three-dimensional image effect can be added to an image, which improves the fusion degree between the added effect and the collected original image, improves the video processing quality, and implements the form. Rich AR effects enrich the video processing method.
  • An embodiment of the present disclosure provides a video processing method, including:
  • the target three-dimensional image is rendered on the object based on depth information of the target three-dimensional image.
  • an embodiment of the present disclosure further provides a video processing apparatus, including:
  • An acquisition unit configured to collect video data, and determine an object that needs to be processed from the video data
  • a detecting unit configured to detect a feature point of the object, and acquire an Euler angle of the target target part
  • An acquiring unit configured to acquire depth information of the target three-dimensional image according to the feature point and the Euler angle
  • a drawing unit for rendering the target three-dimensional image on the object based on depth information of the target three-dimensional image.
  • an embodiment of the present disclosure further provides a storage medium, where the storage medium stores a plurality of instructions, where the instructions are suitable for loading by a processor to perform any of the video processing methods provided by the embodiments of the present disclosure. A step of.
  • an embodiment of the present disclosure further provides a network device, including: one or more processors, one or more memories, where the memory stores at least one application, and the at least one application is suitable for the processor. Load to do the following:
  • the target three-dimensional image is rendered on the object based on depth information of the target three-dimensional image.
  • the embodiment of the present disclosure may determine an object that needs to be processed from the collected video data, and then detect a feature point of the object, and obtain an Euler angle of the target part, and obtain the feature point and the Euler angle according to the feature point and the Euler angle.
  • the scheme can only add two-dimensional dynamics relative to related technologies
  • the degree of fusion between the added effect and the original image collected can be greatly improved, thereby improving the video processing quality as a whole, and realizing a rich AR effect, enriching the video processing mode and effect.
  • the degree of fusion between the added effect and the original image collected can be greatly improved, thereby improving the video processing quality as a whole, and realizing a rich AR effect, enriching the video processing mode and effect.
  • FIG. 1 is a schematic diagram of a scenario of a video processing method according to an embodiment of the present disclosure
  • FIG. 1b is a schematic diagram of a scenario of a video processing method according to an embodiment of the present disclosure
  • FIG. 1c is a flowchart of a video processing method according to an embodiment of the present disclosure.
  • 1d is a schematic diagram of face detection in a video processing method according to an embodiment of the present disclosure
  • FIG. 2a is another flowchart of a video processing method according to an embodiment of the present disclosure
  • 2b is a schematic diagram of a Euler angle of a human head pose in a video processing method according to an embodiment of the present disclosure
  • 2c is a diagram showing an example of a Euler angle in a video processing method according to an embodiment of the present disclosure
  • FIG. 3a is still another flowchart of a video processing method according to an embodiment of the present disclosure.
  • FIG. 3b is a schematic diagram of an occlusion model in a video processing method according to an embodiment of the present disclosure
  • FIG. 3c is a schematic diagram of writing occlusion model depth information in a video processing method according to an embodiment of the present disclosure
  • FIG. 3d is a schematic diagram of writing three-dimensional helmet depth information in a video processing method according to an embodiment of the present disclosure
  • 3 e is a drawing effect diagram of a three-dimensional helmet in a video processing method according to an embodiment of the present disclosure
  • FIG. 4a is a schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure.
  • 4b is another schematic structural diagram of a video processing apparatus according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a network device according to an embodiment of the present disclosure.
  • Embodiments of the present disclosure provide a video processing method, apparatus, and storage medium.
  • the video processing device can be integrated in a network device, such as a server or a terminal.
  • a network device such as a server or a terminal.
  • the type of the terminal may be a mobile phone, a tablet computer, a notebook computer, a personal computer (PC, a personal computer), etc., and the embodiment of the present disclosure does not specifically limit this.
  • the terminal may collect video data, determine an object to be processed from the video data, and detect a feature point of the object and a target part of the object.
  • the Euler angle for example, the terminal can detect the facial features of a certain person in the video frame, and the Euler angle of the head posture, and the like, and then the terminal acquires the depth information of the target three-dimensional image according to the feature point and the Euler angle. And drawing the three-dimensional image on the object based on the depth information of the three-dimensional image, for example, adding a three-dimensional helmet to the portrait, and the like.
  • the terminal may provide the video data to the server, and the server determines, from the video data, the object that needs to be processed. And detecting a feature point of the object and an Euler angle of the target part of the object, and then the server acquires depth information of the target three-dimensional image according to the feature point and the Euler angle, and draws on the object based on the depth information of the three-dimensional image.
  • the three-dimensional image optionally, the server can also return the video data after the three-dimensional image is drawn to the terminal.
  • the flow of the video processing method can be as follows:
  • the terminal collects video data, such as the terminal shooting through the camera, or reading the video data from the local (ie, the terminal), and the like.
  • the server receives the video data sent by the terminal, that is, the terminal sends the video data to the server after collecting the video data.
  • the video processing device can determine from the video data the object that needs to be processed.
  • the type of the object may be determined according to the requirements of the actual application.
  • the object may be a person, an animal, or even an object, and the like, which is not specifically limited in the embodiment of the present disclosure.
  • the number of the objects may also be determined according to the requirements of the actual application, and the objects may be single or multiple, and the embodiments of the present disclosure also do not specifically limit the same.
  • the setting of the feature point and the target part may also be determined according to the needs of the actual application. Taking the object as a portrait, the feature points may be set as the facial features of the person, such as eyebrows, eyes, nose, mouth and ears, and people.
  • the contour of the face, etc., and setting the target part as the head, that is, when detecting the feature point of the object and acquiring the Euler angle of the target part, can be realized as follows:
  • the face detection method is used to perform face recognition on the face of the object, and the facial feature point of the object is obtained, and the head posture of the object is detected to obtain an Euler angle of the head of the object.
  • the facial feature points may include feature points such as facial features and facial contours.
  • the Euler angle is a set of three independent angular parameters used to determine the position of the rigid body at a fixed point, from the nutation angle ⁇ , the precession angle (ie, the precession angle) ⁇ and the rotation angle composition. That is to say, by acquiring the Euler angle of the target part of the object, for example, obtaining the Euler angle of the person's head (including the nutation angle, the precession angle and the rotation angle) and the time, the head of the figure can be known. The movement of the ministry.
  • the depth information of the target three-dimensional image can be obtained as follows:
  • At least one of scaling, rotating, and shifting the target three-dimensional image may be performed according to the feature point and the Euler angle, so that the three-dimensional image and the object are in size, angle, and position. Both match.
  • the three-dimensional image may be selected according to the needs of the actual application or the user's preference, for example, may be a three-dimensional helmet, a three-dimensional rabbit ear, a three-dimensional cat ear, three-dimensional glasses, or a three-dimensional headscarf, and the like.
  • the method for determining whether the three-dimensional image matches the object may be various. For example, when the three-dimensional image satisfies a certain functional relationship with the object in size, position, and angle, the three-dimensional image and the object are determined. Matching in size, position, and angle; or, when the three-dimensional image is consistent or substantially identical in size, position, and angle (ie, the error is less than a preset range), the three-dimensional image is determined Objects match in size, position, and angle, and so on.
  • the 3D glasses may be displaced according to the facial feature point of the person and the Euler angle of the head, so that the 3D glasses are in position with the face of the person.
  • the 3D glasses are scaled and rotated according to the facial feature points of the person and the Euler angles of the head such that the 3D glasses are substantially identical in size and angle to the person's face, and the like.
  • the depth information of the three-dimensional image in the state is extracted (the object has its corresponding depth information in different states), and the three-dimensional image is obtained. Depth information.
  • depth information is the premise for human stereo vision.
  • perspective projection is a many-to-one relationship. Any point on the projection line can correspond to the same image point. If two cameras (equivalent to human eyes) are used, this many-to-one situation can be eliminated. Thereby, the value of the third-dimensional coordinate Z can be determined, and this value is called depth information.
  • the three-dimensional image When the three-dimensional image is drawn on the object, the three-dimensional image may be rendered on the frame where the object is located according to the depth information of the three-dimensional image, such as drawing a three-dimensional eyeglass, a three-dimensional helmet, or a three-dimensional rabbit ear on the head of the portrait. ,and many more.
  • a matching occlusion model may be set according to the part of the object that needs to be exposed (ie, a portion that is not blocked by the three-dimensional image), so that the three-dimensional image is
  • the image processing method may further include: before the drawing, the video processing method may further include: before the rendering of the three-dimensional image on the object based on the depth information of the three-dimensional image, the video processing method may further include:
  • the process of obtaining the depth information of the target occlusion model is similar to the process of acquiring the depth information of the three-dimensional image.
  • the method can be implemented as follows: acquiring the target occlusion model, and adjusting the target occlusion model according to the feature point and the Euler angle to make the target
  • the occlusion model matches the object
  • the depth information of the target occlusion model is obtained in a state where the target occlusion model matches the object.
  • At least one of scaling, rotating, and shifting the target occlusion model may be performed according to the feature point and the Euler angle, so that the target occlusion model and the object are in size, angle, and position. Both match.
  • the target occlusion model may be set according to the part that the object needs to be exposed. For example, if the part requiring the bare part is a human face, a model of the human head may be established as the occlusion model, and the like.
  • multiple objects of the same type can use the same occlusion model; continue to use the naked part as the face, and the three-dimensional image to be drawn is a three-dimensional helmet as an example.
  • the occlusion model A can be used to avoid the face of the user A being occluded when the three-dimensional helmet is drawn; if the target part of the object is the head of the user B, the same Occlusion model A can also be used to avoid the user's face being occluded when drawing a 3D helmet, and so on.
  • the occlusion model may also be established according to a specific object; for example, continue to use a part that needs to be exposed as a face, and a three-dimensional image to be drawn is a three-dimensional helmet as an example.
  • the occlusion model A can be established according to the head of the user A, and then the occlusion model A is used to avoid the occlusion of the face of the user A when the three-dimensional helmet is drawn; if the object The target part is the head of user B, then the occlusion model B can be established according to the head of the user B, and then the occlusion model B is used to avoid the face of the user B being occluded when drawing the three-dimensional helmet, and so on, etc. .
  • the three-dimensional image may also be judged, if the three-dimensional image If the image is a preset type, the occlusion model is required. Otherwise, the three-dimensional image may be directly drawn; that is, before the step of “acquiring the depth information of the target occlusion model”, the video processing method may further include:
  • the depth information is the step of drawing the three-dimensional image on the object.
  • the target condition may be whether the type of the three-dimensional image belongs to a preset type.
  • the embodiment of the present disclosure can determine an object that needs to be processed from the collected video data, and then detect a feature point of the object and obtain an Euler angle of the target part, and then according to the feature point and The Euler angle acquires depth information of the target three-dimensional image, and draws a three-dimensional image on the object based on the depth information, thereby adding a three-dimensional image on the collected original image, such as adding a three-dimensional object; the solution can only be compared with related technologies.
  • the degree of fusion between the added effect and the original image can be greatly improved, thereby improving the quality of the video processing as a whole, and on the other hand, the AR effect of the 3D form can also be realized, enriching The function of the video processing device is better.
  • the video processing device is integrated into the network device as an example.
  • the network device may be a terminal, or may be a device such as a server.
  • a video processing method can be as follows:
  • the network device collects video data, and determines an object to be processed from the video data.
  • the type of the object may be determined according to the needs of the actual application.
  • the object may be a person, an animal, or even an object, and the like, and the number of the object may also be determined according to the needs of the actual application. It can be single or multiple, and will not be described here.
  • the terminal can capture the user's face through the camera to collect video data, and then the terminal determines from the video data that the processing needs to be performed.
  • Objects such as "Portraits” that need to add a three-dimensional image.
  • the video data can be collected by the terminal, and then the video data is provided by the terminal to the server, and the server determines, from the video data, an object that needs to be processed, for example, determining that a three-dimensional image needs to be added. "Portrait", and so on.
  • the terminal may also generate corresponding prompt information to prompt the user to take a face, so that the user can shoot in a better posture, so that the terminal can Get more effective video data.
  • the network device detects a feature point of the object.
  • the network device performs face recognition on the face of the object by using a face detection technology to obtain a facial feature point of the object.
  • the face detection technology may include OpenCV (a cross-platform computer vision library) face detection technology, face detection technology provided by each mobile terminal system, Face++ face detection technology, sensetime face detection technology, and the like.
  • OpenCV a cross-platform computer vision library
  • the network device acquires an Euler angle of the target part of the object.
  • the network device can detect the head posture of the portrait in real time, and obtain the Euler angle of the head of the portrait.
  • the nose of the portrait can be rotated as a fixed point "o"
  • a set of independent angular parameters of the portrait head based on the fixed point "o" ie, the tip of the nose
  • the nutation angle ⁇ ie precession angle
  • precession angle ie precession angle
  • rotation angle ie precession angle
  • the detailed acquisition method of Euler angle can be as follows:
  • a fixed coordinate system oxyz can be made based on the fixed point o (the positions of the x-axis, the y-axis, and the z-axis, and the relationship between the three coordinate axes can also be seen in Fig. 2b), and the image is attached to the portrait head.
  • the coordinate system of the part is ox'y'z'.
  • the perpendicular line oN of the plane zoz' is called a pitch line, which is again the intersection line of the basic plane ox'y' and oxy.
  • the vertical plane oxy and ox'y' are basic planes, and the angle oz to oz' is calculated to obtain the nutation angle ⁇ .
  • the nutation angle ⁇ should be measured in the counterclockwise direction as seen from the positive end of oN.
  • the angle from the fixed axis ox to the pitch line oN can be measured to obtain the precession angle.
  • the rotation angle ⁇ is obtained. Viewed by the positive ends of the axes oz and oz' Both ⁇ and ⁇ are also measured in a counterclockwise direction.
  • the Euler angle may change according to the change of the posture of the portrait of the portrait, and the angle of the three-dimensional image to be added subsequently depends on the Euler angle, so that the three-dimensional image can also be made with the portrait head.
  • the change in posture changes, which will be described in detail in step 204.
  • steps 202 and 203 may be in no particular order.
  • the network device performs at least one of scaling, rotating, and shifting the target three-dimensional image according to the feature point and the Euler angle, so that the three-dimensional image matches the object in size, angle, and position.
  • the three-dimensional image may be selected according to the needs of the actual application or the user's preference, for example, may be a three-dimensional rabbit ear, three-dimensional cat ears, three-dimensional glasses, or a three-dimensional headscarf, and the like.
  • the method for determining whether the three-dimensional image matches the object may be various. For example, when the three-dimensional image satisfies a certain functional relationship with the object in size, position, and angle, the three-dimensional image is determined to be in the object. Matching in size, position, and angle; or, it may be set to determine the three-dimensional image and the object when the three-dimensional image is consistent or substantially identical in size, position, and angle (ie, the error is less than a preset range) Match in size, position and angle, and more.
  • the three-dimensional image will be described as an example of matching conditions in which the size, position, and angle are consistent or substantially identical.
  • the 3D glasses may be displaced according to the facial feature points of the portrait and the Euler angle of the head, so that the 3D glasses are in position with the face of the person.
  • the above is generally uniform, and the 3D glasses are scaled and rotated according to the facial feature points of the portrait and the Euler angles of the head such that the 3D glasses are substantially identical in size and angle to the person's face, and the like.
  • the network device acquires depth information of the three-dimensional image in a state that the three-dimensional image matches the object.
  • the matching of the three-dimensional image with the object refers to that the three-dimensional image matches the object in size, position and angle.
  • the network device acquires the three-dimensional image.
  • the depth information of the glasses is performed and step 206 is performed.
  • the network device draws the three-dimensional image on the object according to the depth information of the three-dimensional image.
  • the network device can draw the three-dimensional glasses on the face of the portrait according to the depth information of the three-dimensional glasses obtained in step 205.
  • the embodiment of the present disclosure can determine an object that needs to be processed from the collected video data, and then detect a feature point of the object and an Euler angle of the target part of the object, according to the feature point and Euler.
  • the degree of fusion between the added effect and the original image can be greatly improved, thereby improving the overall video processing quality.
  • the video processing device is also integrated into the network device as an example.
  • Draw another type of 3D image such as a 3D helmet as an example.
  • a video processing method can be as follows:
  • the network device collects video data, and determines an object to be processed from the video data.
  • the network device detects a feature point of the object.
  • the network device detects an Euler angle of the target part of the object.
  • steps 301-303 For the execution of steps 301-303, refer to the related description of steps 201-203 in the previous embodiment.
  • the target occlusion model may be set according to the part that the object needs to be exposed. For example, taking the part that needs to be exposed as a human face, as shown in FIG. 3b, a model of the human head may be established as the target occlusion model, and the like.
  • multiple objects of the same type can use the same occlusion model; for example, continue to face the part that needs to be exposed, and the three-dimensional image that needs to be drawn is a three-dimensional helmet.
  • the occlusion model A can be used to avoid the occlusion of the face of the user A when drawing the three-dimensional helmet; and if the target part of the object is the user B
  • the head can also use the occlusion model A to avoid the user's face being occluded when drawing the 3D helmet, and so on.
  • the target occlusion model can also be established according to the specific object; for example, continue to use the naked part as the face, and the three-dimensional image to be drawn is a three-dimensional helmet as an example.
  • the occlusion model A can be established according to the head of the user A, and then the occlusion model A is used to avoid the occlusion of the face of the user A when the three-dimensional helmet is drawn; If the target part of the object is the head of the user B, the occlusion model B can be established according to the head of the user B, and then the occlusion model B is used to avoid the occlusion of the face of the user B when the three-dimensional helmet is drawn, and so on, etc. .
  • the three-dimensional image may be judged if the three-dimensional image belongs to the preset.
  • Type for example, if the 3D image to be drawn is a 3D helmet, the occlusion model is needed. If the 3D image does not belong to the preset type, for example, if the 3D image to be drawn is 3D glasses, the 3D image can be directly drawn. See the related description in the previous embodiment.
  • the preset type refers to the type that will occlude the object.
  • the network device adjusts the target occlusion model according to the feature point and the Euler angle, so that the target occlusion model matches the object, and obtains depth information of the target occlusion model in a state where the target occlusion model matches the object.
  • the network device may perform at least one of scaling, rotating, and shifting the target occlusion model according to the feature point and the Euler angle, so that the target occlusion model and the object are in size, position, and The angles are matched. Then, the depth information of the target occlusion model in the matching state is extracted, and the depth information of the target occlusion model is obtained. For example, see FIG. 3c, which is a schematic diagram of the depth information of the target occlusion model.
  • the method for determining whether the target occlusion model matches the object may be various.
  • the target occlusion model and the object may be determined when the target occlusion model satisfies a certain functional relationship in size, position, and angle. Matching in size, position, and angle; or, when the target occlusion model is consistent or substantially identical in size, position, and angle (ie, the error is less than the preset range), the target occlusion model is determined Objects match in size, position, and angle, and so on.
  • the network device performs at least one of scaling, rotating, and shifting the target three-dimensional image according to the feature point and the Euler angle, so that the three-dimensional image matches the object in size, angle, and position.
  • the depth information of the three-dimensional image is acquired in a state in which the image matches the object.
  • FIG. 3d is a schematic diagram of depth information written into a three-dimensional helmet.
  • the three-dimensional helmet is directly drawn according to the depth information of the three-dimensional helmet, the face of the portrait will be blocked. Therefore, the face needs to be "naked out", and this operation can be implemented by performing step 307.
  • steps 304 and 306 may be in no particular order.
  • the network device superimposes the target occlusion model and the three-dimensional image according to the depth information of the target occlusion model and the depth information of the three-dimensional image, and sets the color of the overlapping portion to be transparent, and obtains the processed depth information.
  • the network device may set the color of the coincident portion to vec (0.0.0.0) after the target occlusion model and the three-dimensional image are superimposed, and write the depth information of the three-dimensional helmet (ie, the depth of the three-dimensional helmet obtained in step 306).
  • the information is updated), and the processed depth information is obtained.
  • the network device draws the three-dimensional image on the object according to the processed depth information.
  • the network device can draw a three-dimensional helmet on the head of the portrait according to the processed depth information obtained in step 307.
  • Fig. 3e is a drawing effect diagram of the three-dimensional helmet, it can be seen that a three-dimensional helmet is added to the head of the portrait, and the face of the portrait has been revealed and clearly visible.
  • the embodiment of the present disclosure can determine an object that needs to be processed from the collected video data, and then detect a feature point of the object and an Euler angle of the target part of the object, according to the feature point and Euler.
  • An angle obtaining depth information of the occlusion model and depth information of the target three-dimensional image, and drawing a three-dimensional image on the object based on the depth information, thereby achieving the purpose of adding a three-dimensional image (such as a three-dimensional object) on the collected original image;
  • the solution can only add a two-dimensional dynamic sticker effect, and the fusion effect between the added effect and the original image can be greatly improved, thereby improving the overall video processing quality.
  • the scheme can also improve the addition effect of the three-dimensional image by setting the occlusion model, thereby avoiding the occurrence of the occlusion object, thereby improving the flexibility of the implementation and further improving the fusion degree of the added effect with the original image, and Video processing quality.
  • the embodiment of the present disclosure further provides a video processing device, which may be integrated in a network device, such as a server or a terminal.
  • the terminal may specifically be a mobile phone or a tablet. , laptop, and/or PC.
  • the video processing apparatus may include an acquisition unit 401, a detection unit 402, an acquisition unit 403, and a rendering unit 404, as follows:
  • the collecting unit 401 is configured to collect video data, and determine an object that needs to be processed from the video data.
  • the collecting unit 401 may be specifically configured to perform shooting by a camera, or read video data or the like from a local (ie, terminal), and determine an object to be processed from the video data.
  • the collecting unit 401 may be specifically configured to receive video data sent by the terminal, and determine an object that needs to be processed from the video data.
  • the type of the object may be determined according to the needs of the actual application.
  • the object may be a person, an animal, or even an object, and the like, and the number of the object may also be determined according to the needs of the actual application. It can be single or multiple, and will not be described here.
  • the detecting unit 402 is configured to detect a feature point of the object and acquire an Euler angle of the target target part.
  • the setting of the feature point and the target part may be determined according to the needs of the actual application.
  • the feature points may be set as the facial features of the person, such as eyebrows, eyes, nose, mouth and ears, and human
  • the detecting unit 402 can be used for: the face contour and the like, that is, when the object is a portrait and the target part is a head.
  • the face detection method is used to perform face recognition on the face of the object to obtain a facial feature point of the object; and the head posture of the object is detected to obtain an Euler angle of the head of the object.
  • the facial feature points may include feature points such as facial features and facial contours. For details, refer to the previous method embodiments, and details are not described herein.
  • the obtaining unit 403 is configured to acquire depth information of the target three-dimensional image according to the feature point and the Euler angle.
  • the obtaining unit 403 may include an adjustment subunit and an extraction subunit, as follows:
  • the adjustment subunit may be configured to adjust the target three-dimensional image according to the feature point and the Euler angle such that the three-dimensional image matches the object.
  • the extracting subunit may be configured to extract depth information of the three-dimensional image in a state in which the three-dimensional image matches the object.
  • the adjustment subunit may be specifically configured to perform at least one of scaling, rotating, and shifting the target three-dimensional image according to the feature point and the Euler angle, such that the three-dimensional image and the object are in size, position, and angle. Match on both.
  • the three-dimensional image may be selected according to the needs of the actual application or the user's preference, for example, may be a three-dimensional helmet, a three-dimensional rabbit ear, a three-dimensional cat ear, three-dimensional glasses, or a three-dimensional headscarf, and the like.
  • the method for determining whether the three-dimensional image matches the object may be various. For example, when the three-dimensional image satisfies a certain functional relationship with the object in size, position, and angle, the three-dimensional image is determined to be in the object. Matching in size, position, and angle; or, it may be set to determine the three-dimensional image and the object when the three-dimensional image is consistent or substantially identical in size, position, and angle (ie, the error is less than a preset range) Match in size, position and angle, and more.
  • the drawing unit 404 is configured to draw the three-dimensional image on the object based on the depth information of the three-dimensional image.
  • the drawing unit 404 may be specifically configured to render the three-dimensional image on the frame where the object is located according to the depth information of the three-dimensional image, such as drawing a three-dimensional eyeglass, a three-dimensional helmet, or a three-dimensional rabbit ear on the head, and the like.
  • a matching occlusion model may be set according to the part of the object that needs to be exposed (ie, a portion that is not blocked by the three-dimensional image), so that the three-dimensional image is When drawing, the exposed portion of the object can be avoided accordingly; that is, as shown in FIG. 4b, the video processing device may further include an occlusion acquisition unit 405 and an occlusion adjustment unit 406, as follows:
  • the occlusion acquisition unit 405 can be configured to acquire depth information of the target occlusion model.
  • the occlusion adjustment unit 406 can be configured to superimpose the occlusion model and the three-dimensional image according to the depth information of the target occlusion model and the depth information of the three-dimensional image, and set the color of the overlapping portion to be transparent, and obtain the processed depth information. .
  • the drawing unit 404 is specifically configured to draw the three-dimensional image on the object according to the processed depth information obtained by the occlusion adjustment unit.
  • the depth information of acquiring the target occlusion model is similar to the depth information of acquiring the three-dimensional image.
  • the specific information may be as follows:
  • the occlusion acquisition unit 405 is specifically configured to acquire a target occlusion model, and adjust the target occlusion model according to the feature point and the Euler angle, so that the target occlusion model matches the object; and the target occlusion model matches the object. Get the depth information of the target occlusion model.
  • the occlusion acquisition unit 405 may be configured to acquire a target occlusion model, and perform at least one of scaling, rotating, and shifting on the target occlusion model according to the feature point and the Euler angle, so that the occlusion model and the occlusion model Objects match in size, position, and angle.
  • the target occlusion model may be set according to the part that the object needs to be exposed. For example, if the part that needs to be exposed is a human face, for example, a model of the human head may be established as the occlusion model, and the like.
  • the occlusion model may be established according to a specific object. For details, refer to the foregoing method embodiments, and details are not described herein again.
  • the determining unit 407 may be configured to determine whether the type of the three-dimensional image satisfies a target condition, and if the type of the three-dimensional image satisfies the target condition, triggering the occlusion acquiring unit 405 to perform an operation of acquiring depth information of the target occlusion model; if the three-dimensional image The type of the image does not satisfy the target condition, and the trigger drawing unit 404 performs an operation of drawing the three-dimensional image on the object based on the depth information of the three-dimensional image.
  • the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities.
  • the foregoing method embodiments and details are not described herein.
  • the embodiment of the present disclosure can determine an object that needs to be processed from the collected video data, and then detect a feature point of the object and an Euler angle of the target part of the object, according to the feature point and Euler.
  • An angle obtaining depth information of the occlusion model and depth information of the target three-dimensional image, and drawing a three-dimensional image on the object based on the depth information, thereby achieving the purpose of adding a three-dimensional image (such as a three-dimensional object) on the collected original image;
  • the solution can only add a two-dimensional dynamic sticker effect, and the fusion effect between the added effect and the original image can be greatly improved, thereby improving the overall video processing quality.
  • the scheme can also improve the addition effect of the three-dimensional image by setting the occlusion model, thereby avoiding the occurrence of the occlusion object, thereby improving the flexibility of the implementation and further improving the fusion degree of the added effect with the original image, and Video processing quality.
  • the embodiment of the present disclosure further provides a network device, which may be a terminal or a server.
  • a network device which may be a terminal or a server.
  • FIG. 5 it shows a schematic structural diagram of a network device according to an embodiment of the present disclosure, specifically:
  • the network device can include one or more processing core processor 501, one or more computer readable storage media memory 502, power source 503, and input unit 504. It will be understood by those skilled in the art that the network device structure illustrated in FIG. 5 does not constitute a limitation to the network device, and may include more or less components than those illustrated, or some components may be combined, or different component arrangements. among them:
  • the processor 501 is the control center of the network device, interconnecting various portions of the entire network device using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 502, and recalling stored in the memory 502. Data, performing various functions of the network device and processing data, thereby performing overall monitoring of the network device.
  • the processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 501.
  • the memory 502 can be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by running software programs and modules stored in the memory 502.
  • the memory 502 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of network devices, etc.
  • memory 502 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 502 can also include a memory controller to provide processor 501 access to memory 502.
  • the network device also includes a power source 503 that supplies power to the various components.
  • the power source 503 can be logically coupled to the processor 501 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
  • the power supply 503 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
  • the network device can also include an input unit 504 that can be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • an input unit 504 can be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
  • the network device may further include a display unit or the like, and details are not described herein again.
  • the processor 501 in the network device loads the executable file corresponding to the process of one or more applications into the memory 502 according to the following instruction, and runs the storage by the processor 501.
  • the application in memory 502 thus implements various functions as follows:
  • Collecting video data determining an object to be processed from the video data, detecting a feature point of the object, acquiring an Euler angle of the target part of the object, and acquiring depth information of the target three-dimensional image according to the feature point and the Euler angle And drawing the three-dimensional image on the object based on the depth information of the three-dimensional image.
  • the setting of the feature point and the target part may be determined according to the needs of the actual application.
  • the feature points may be set as the facial features of the person, such as eyebrows, eyes, nose, mouth and ears, and human
  • the facial contour or the like, that is, the processor 501 can also run an application stored in the memory 502, thereby implementing the following functions:
  • the face detection method is used to perform face recognition on the face of the object, and the facial feature point of the object is obtained, and the head posture of the object is detected to obtain an Euler angle of the head of the object.
  • the facial feature points may include facial features, feature points such as facial contours, and the like.
  • a matched occlusion model may be set according to the exposed part of the object, so that the three-dimensional image can be correspondingly avoided when the three-dimensional image is drawn.
  • the processor 501 can also run an application stored in the memory 502 to implement the following functions:
  • the depth information of the target occlusion model is similar to the depth information of the three-dimensional image. For details, refer to the previous embodiment, and details are not described herein.
  • the embodiment of the present disclosure can determine an object that needs to be processed from the collected video data, and then detect a feature point of the object and an Euler angle of the target part of the object, according to the feature point and Euler.
  • An angle obtaining depth information of the occlusion model and depth information of the target three-dimensional image, and drawing a three-dimensional image on the object based on the depth information, thereby achieving the purpose of adding a three-dimensional image (such as a three-dimensional object) on the collected original image;
  • the solution can only add a two-dimensional dynamic sticker effect, and the fusion effect between the added effect and the original image can be greatly improved, thereby improving the overall video processing quality.
  • the scheme can also improve the addition effect of the three-dimensional image by setting the occlusion model, thereby avoiding the occurrence of the occlusion object, thereby improving the flexibility of the implementation and further improving the fusion degree of the added effect with the original image, and Video processing quality.
  • Collecting video data determining an object to be processed from the video data, detecting a feature point of the object, acquiring an Euler angle of the target part of the object, and acquiring depth information of the target three-dimensional image according to the feature point and the Euler angle And drawing the three-dimensional image on the object based on the depth information of the three-dimensional image.
  • the setting of the feature point and the target part may be determined according to the needs of the actual application.
  • the feature points may be set as the facial features of the person, such as eyebrows, eyes, nose, mouth and ears, and The contour of the person's face, etc., that is, the instruction can also perform the following steps:
  • the face detection method is used to perform face recognition on the face of the object, and the facial feature point of the object is obtained, and the head posture of the object is detected to obtain an Euler angle of the head of the object.
  • the facial feature points may include facial features, feature points such as facial contours, and the like.
  • a matched occlusion model may be set according to the exposed part of the object, so that the three-dimensional image can be correspondingly avoided when the three-dimensional image is drawn.
  • the part; that is, the instruction can also perform the following steps:
  • the storage medium may include: a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
  • ROM read only memory
  • RAM random access memory
  • magnetic disk a magnetic disk or an optical disk.
  • any one of the video processing methods provided by the embodiments of the present disclosure may be implemented by using the instructions stored in the storage medium. Therefore, any video processing method provided by the embodiments of the present disclosure may be implemented. For the beneficial effects, see the previous embodiments in detail, and details are not described herein again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本公开实施例公开了一种视频处理方法、网络设备和存储介质;本公开实施例可以采集视频数据,并从所述视频数据中确定需要进行处理的对象;检测所述对象的特征点,以及获取所述对象目标部位的欧拉角;根据所述特征点和所述欧拉角获取目标三维图像的深度信息;基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像;该方案可以在图像上添加三维图像效果,提高了所添加效果与采集到的原始图像的融合度,改善了视频处理质量,实现了形式丰富的AR效果,丰富了视频处理方式。

Description

一种视频处理方法、网络设备和存储介质
本申请要求于2017年07月27日提交中国国家知识产权局、申请号为2017106230119、发明名称为“一种视频处理方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,具体涉及一种视频处理方法、网络设备和存储介质。
背景技术
近年来,随着计算机技术的发展,增强现实(AR,Augmented Reality)技术的应用越来越为广泛。其中,AR技术是一种能够实时地计算摄影机影像的位置及角度并加上相应图像的技术,这种技术可以在屏幕上将虚拟世界和现实世界结合,并进行互动。
以视频处理为例,相关技术中,为了增加视频的趣味性,在用户进行摄像时,可以在各帧图像上添加实时的二维(2D,2 Dimensions)动态贴纸效果。例如,具体可以利用人脸识别技术获取当前帧图像中包含的人脸的五官点,然后利用这些五官点,在指定点位绘制二维贴纸,比如绘制一个二维的兔子耳朵、猫耳朵、或胡子,等等。
在对相关技术的研究和实践过程中,本公开的发明人发现,相关技术的方案所添加的二维动态贴纸效果虽然具有一定趣味性,但与原始图像的融合度较差,视频处理质量不佳。
发明内容
本公开实施例提供一种视频处理方法、网络设备和存储介质;可以在图像上添加三维图像效果,提高了所添加效果与采集到的原始图像的融合度,改善了视频处理质量,实现了形式丰富的AR效果,丰富了视频处理方式。
本公开实施例提供了一种视频处理方法,包括:
采集视频数据,并从所述视频数据中确定需要进行处理的对象;
检测所述对象的特征点,以及获取所述对象目标部位的欧拉角;
根据所述特征点和所述欧拉角获取目标三维(3D,3 Dimensions)图像的深度信息;
基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像。
相应的,本公开实施例还提供了一种视频处理装置,包括:
采集单元,用于采集视频数据,并从所述视频数据中确定需要进行处理的对象;
检测单元,用于检测所述对象的特征点、以及获取所述对象目标部位的欧拉角;
获取单元,用于根据所述特征点和所述欧拉角获取目标三维图像的深度信息;
绘制单元,用于基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像。
此外,本公开实施例还提供了一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本公开实施例所提供的任一种视频处理方法中的步骤。
另外,本公开实施例还提供了一种网络设备,包括:一个或者一个以上处理器、一个或一个以上存储器,所述存储器中存储有至少一段应用程序,所述至少一段应用程序适于处理器进行加载,以执行以下操作:
采集视频数据,并从所述视频数据中确定需要进行处理的对象;
检测所述对象的特征点,以及获取所述对象目标部位的欧拉角;
根据所述特征点和所述欧拉角获取目标三维图像的深度信息;
基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像。
本公开实施例可以从采集到的视频数据中,确定需要进行处理的对象,然后,检测该对象的特征点、以及获取该对象目标部位的欧拉角,进而根据这些特征点和欧拉角获取目标三维图像的深度信息,并基于该深度信息在该对象上 绘制三维图像,从而达到在该图像上添加三维图像(比如三维物品)效果的目的;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,可以大大提高了所添加效果与采集到的原始图像的融合度,从而从整体上改善了视频处理质量,而且实现了形式丰富的AR效果,丰富了视频处理方式,效果较佳。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a是本公开实施例提供的视频处理方法的场景示意图;
图1b是本公开实施例提供的视频处理方法的场景示意图;
图1c是本公开实施例提供的视频处理方法的流程图;
图1d是本公开实施例提供的视频处理方法中人脸检测的示意图;
图2a是本公开实施例提供的视频处理方法的另一流程图;
图2b是本公开实施例提供的视频处理方法中人头姿势欧拉角的示意图;
图2c是本公开实施例提供的视频处理方法中欧拉角的示例图;
图3a是本公开实施例提供的视频处理方法的又一流程图;
图3b是本公开实施例提供的视频处理方法中遮挡模型的示意图;
图3c是本公开实施例提供的视频处理方法中写入遮挡模型深度信息的示意图;
图3d是本公开实施例提供的视频处理方法中写入三维头盔深度信息的示意图;
图3e是本公开实施例提供的视频处理方法中三维头盔的绘制效果图;
图4a是本公开实施例提供的视频处理装置的结构示意图;
图4b是本公开实施例提供的视频处理装置的另一结构示意图;
图5是本公开实施例提供的网络设备的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清 楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开实施例提供了一种视频处理方法、装置和存储介质。
该视频处理装置可以集成在网络设备,比如服务器或终端等设备中。其中,终端的类型可以为手机、平板电脑、笔记本电脑、个人计算机(PC,Personal Computer)等,本公开实施例对此不进行具体限定。
例如,以该视频处理装置集成在终端中为例,则参见图1a,终端可以采集视频数据,从该视频数据中确定需要进行处理的对象,并检测该对象的特征点、以及该对象目标部位的欧拉角,比如,终端可以检测视频帧中某人像的五官特征点、以及头部姿势的欧拉角,等等,然后,终端根据该特征点和欧拉角获取目标三维图像的深度信息,基于该三维图像的深度信息在该对象上绘制该三维图像,比如,在该人像上添加一个三维的头盔,等等。
又例如,以该视频处理装置集成在服务器中为例,则见图1b,终端在采集到视频数据后,可以将该视频数据提供给服务器,由服务器从该视频数据中确定需要进行处理的对象,并检测该对象的特征点、以及该对象目标部位的欧拉角,然后,服务器根据该特征点和欧拉角获取目标三维图像的深度信息,基于该三维图像的深度信息在该对象上绘制该三维图像;可选的,此后,服务器还可以将绘制了三维图像后的视频数据返回给终端。
通过本公开实施例的方案,可以实现一种增强现实的效果,达到在屏幕上将虚拟世界和现实世界结合,并进行互动。以下对本公开所提供的方案分别进行详细说明。
如图1c所示,该视频处理方法的流程可以如下:
101、采集视频数据,并从该视频数据中确定需要进行处理的对象。
以视频处理装置集成在终端中为例,则由终端采集视频数据,比如终端通过摄像头进行拍摄,或从本地(即终端)读取视频数据,等等。以该视频处理装置集成在服务器中为例,则服务器接收终端发送的视频数据,即终端在采集视频数据后将该视频数据发送给服务器。
在采集到视频数据后,视频处理装置便可以从该视频数据中确定需要进行处理的对象。其中,该对象的类型可以根据实际应用的需求而定,比如,该对 象可以是人、动物、甚至是物体,等等,本公开实施例对此不进行具体限定。此外,该对象的数量也可以根据实际应用的需求而定,该对象可以是单个,也可以是多个,本公开实施例对此同样不进行具体限定。
需说明的是,为了描述方便,在本公开实施例中,仅以该对象为人,且数量为单个为例进行说明。
102、检测该对象的特征点、以及获取该对象目标部位的欧拉角。
其中,特征点和目标部位的设置也可以根据实际应用的需求而定,以该对象为人像为例,则可以将特征点设置为人的五官,如眉毛、眼睛、鼻子、嘴巴和耳朵,以及人的脸部轮廓等,而将目标部位设置为头部,即在检测该对象的特征点、以及获取该对象目标部位的欧拉角时,可以通过如下方式实现:
采用人脸检测技术对该对象的面部进行人脸识别,得到该对象的面部特征点,以及对该对象的头部姿势进行检测,得到该对象的头部的欧拉角。
其中,参见图1d,该面部特征点可以包括五官、以及脸部轮廓等特征点。
其中,欧拉角是用来确定定点转动刚体位置的3个一组独立角参量,由章动角θ、旋进角(即进动角)ψ和自转角
Figure PCTCN2018095564-appb-000001
组成。也就是说,通过获取该对象目标部位的欧拉角,比如获取该人的头部的欧拉角(包括章动角、旋进角和自转角)与时间的关系,可以获知该人像的头部的运动情况。
103、根据该特征点和欧拉角获取目标三维图像的深度信息。
例如,可以采取如下方式获取目标三维图像的深度信息:
(1)根据该特征点和欧拉角对目标三维图像进行调整,使得该三维图像与该对象匹配。
在对目标三维图像进行调整时,可根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、角度和位置上均匹配。
其中,该三维图像可以根据实际应用的需求或用户的喜好进行选择,比如,可以是一个三维头盔、三维兔子耳朵、三维猫耳朵、三维眼镜、或三维头巾,等等。
其中,在判断该三维图像是否与该对象匹配的方式可以有多种,比如,可以设置当该三维图像与该对象在大小、位置和角度上满足一定函数关系时,确 定该三维图像与该对象在大小、位置和角度上均匹配;或者,也可以设置当该三维图像与该对象在大小、位置和角度上一致或大体上一致(即误差小于预设范围)时,确定该三维图像与该对象在大小、位置和角度上均匹配,等等。
以该对象为人,且三维图像为三维眼镜为例,则可以根据该人的面部特征点和头部的欧拉角对该三维眼镜进行移位,使得该三维眼镜与该人的面部在位置上大体一致,以及根据该人的面部特征点和头部的欧拉角对该三维眼镜进行缩放和旋转,使得该三维眼镜与该人的面部在大小和角度上大体一致,等等。
(2)在该三维图像与该对象匹配的状态下,获取该三维图像的深度信息。
当该三维图像与该对象在大小、位置和角度上均匹配时,提取该三维图像在该状态下的深度信息(物体在不同的状态下都有其对应的深度信息),得到该三维图像的深度信息。
其中,深度信息是让人类产生立体视觉的前提。众所周知,透视投影是多对一的关系,投影线上的任何一点都可对应同一个像点,而如果用两个摄像机(相当于人的双眼),则可以消除这种多对一的情况,从而能够确定第三维坐标Z的值,而该值即称为深度信息。
104、基于该三维图像的深度信息在该对象上绘制该三维图像。
在该对象上绘制该三维图像时,可以根据该三维图像的深度信息,在该对象所在的帧上渲染该三维图像,比如在人像的头部上绘制一个三维眼镜、三维头盔、或三维兔子耳朵,等等。
可选的,为了避免所绘制的三维图像对该对象造成遮挡,还可以根据该对象需要裸露的部分(即避免被三维图像遮挡的部分)设置相匹配的遮挡模型(occluder),以便该三维图像在绘制时,能够相应地避开该对象需要裸露的部分;即,在基于该三维图像的深度信息在该对象上绘制该三维图像之前,该视频处理方法还可以包括:
获取目标遮挡模型的深度信息,根据该目标遮挡模型的深度信息和该三维图像的深度信息,对该遮挡模型和三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息;相应地,在基于该三维图像的深度信息在该对象上绘制该三维图像时,具体是根据处理后的深度信息在该对象上绘制该三维图像。
其中,获取目标遮挡模型的深度信息与获取三维图像的深度信息的过程类似,例如,可以采取如下方式实现:获取目标遮挡模型,根据该特征点和欧拉角对目标遮挡模型进行调整,使得目标遮挡模型与该对象匹配,在目标遮挡模型与该对象匹配的状态下,获取目标遮挡模型的深度信息。
在对目标遮挡模型进行调整时,可以根据该特征点和欧拉角对目标遮挡模型进行缩放、旋转和移位中的至少一种处理,使得目标遮挡模型与该对象在大小、角度和位置上均匹配。
其中,目标遮挡模型可以根据该对象需要裸露的部分进行设置,比如,以需要裸露的部分为人脸为例,则可以建立一个人头的模型来作为该遮挡模型,等等。
需说明的是,为了使得算法更为简便,以及提高处理效率,多个类型相同的不同对象可以使用同一遮挡模型;继续以需要裸露的部分为人脸,且需要绘制的三维图像为三维头盔为例,若该对象的目标部位为用户甲的头部,则可以采用遮挡模型A,来避免绘制三维头盔时用户甲的脸部被遮挡;若该对象的目标部位为用户乙的头部,则同样也可以采用遮挡模型A,来避免绘制三维头盔时,用户乙的脸部被遮挡,以此类推,等等。
可选的,为了提高绘制的精确度,以及提高处理效果,还可以根据具体的对象来建立该遮挡模型;比如,继续以需要裸露的部分为人脸,且需要绘制的三维图像为三维头盔为例,若该对象的目标部位为用户甲的头部,则可以根据用户甲的头部建立遮挡模型A,然后,采用遮挡模型A来避免绘制三维头盔时用户甲的脸部被遮挡;若该对象的目标部位为用户乙的头部,则可以根据用户乙的头部建立遮挡模型B,然后,采用遮挡模型B来避免绘制三维头盔时,用户乙的脸部被遮挡,以此类推,等等。
可选的,由于并不是所有需要绘制的三维图像均会对该对象造成遮挡,因此为了提高灵活性,在获取目标遮挡模型的深度信息之前,还可以对该三维图像进行判断,若该三维图像属于预设类型,则需要遮挡模型,否则,可以直接对该三维图像进行绘制;即,在步骤“获取目标遮挡模型的深度信息”之前,该视频处理方法还可以包括:
确定该三维图像的类型是否满足目标条件,若该三维图像的类型满足目标 条件,则执行获取目标遮挡模型的深度信息的步骤;若该三维图像的类型不满足目标条件,则执行基于该三维图像的深度信息在该对象上绘制该三维图像的步骤。其中,该目标条件可为三维图像的类型是否属于预设类型。
由上可知,本公开实施例可以从采集到的视频数据中,确定需要进行处理的对象,然后,检测该对象的特征点、以及获取该对象目标部位的欧拉角,进而根据这些特征点和欧拉角获取目标三维图像的深度信息,并基于该深度信息在该对象上绘制三维图像,从而实现在采集到的原始图像上添加三维图像,比如添加三维物品;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,一方面,可以大大提高所添加效果与原始图像的融合度,从而从整体上改善视频处理质量,另一方面,也可以实现3D形式的AR效果,丰富了视频处理装置的功能,效果较佳。
根据上述实施例所描述的方法,以下将举例作进一步详细说明。
在本公开实施例中,将以该视频处理装置集成在网络设备中为例进行说明,其中,该网络设备可以是终端,也可以是服务器等设备,本公开实施例对此不进行具体限定。
如图2a所示,一种视频处理方法,流程可以如下:
201、网络设备采集视频数据,并从该视频数据中确定需要处理的对象。
其中,该对象的类型可以根据实际应用的需求而定,比如,该对象可以是人、动物、甚至是物体,等等,此外,该对象的数量也可以根据实际应用的需求而定,该对象可以是单个,也可以是多个,在此不再赘述。
以该网络设备为终端,且需要处理的对象为“人像”为例,则终端可以通过摄像头对用户的面部进行拍摄,以采集视频数据,然后,由终端从该视频数据中确定需要进行处理的对象,比如确定需要添加三维图像的“人像”。
以该网络设备为服务器为例,则可以由终端采集视频数据,然后,由终端将该视频数据提供给服务器,由服务器从该视频数据中确定需要进行处理的对象,比如确定需要添加三维图像的“人像”,等等。
可选的,为了确保视频数据的有效性,终端在采集视频数据时,还可以生成相应的提示信息,以提示用户需要拍摄人脸,以便用户可以以更好的姿态进行拍摄,从而使得终端可以获取到有效性更高的视频数据。
202、网络设备检测该对象的特征点。
继续以需要处理的对象为“人像”为例,则如图1d所示,网络设备采用人脸检测技术对该对象的面部进行人脸识别,得到该对象的面部特征点。
其中,该人脸检测技术可以包括OpenCV(一个跨平台计算机视觉库)人脸检测技术、各个移动终端系统自带的人脸检测技术、Face++人脸检测技术、sensetime人脸检测技术,等等。
203、网络设备获取该对象目标部位的欧拉角。
继续以需要处理的对象为“人像”,且该目标部位为“头部”为例,则网络设备可以对该人像的头部姿势进行实时检测,得到该人像的头部的欧拉角。比如,参见图2b,可以以人像的鼻尖作为定点“o”来进行转动,得到基于该定点“o”(即鼻尖)进行转动时人像头部的一组独立角参量,包括:章动角θ、旋进角(即进动角)ψ和自转角
Figure PCTCN2018095564-appb-000002
从而得到该人像的头部的欧拉角。其中,欧拉角的详细获取方式可以如下:
如图2c所示,可以基于定点o作出固定坐标系oxyz(x轴、y轴和z轴的位置、以及这三个坐标轴之间关系还可参见图2b),以及固连于该人像头部的坐标系ox′y′z′。其中,平面zoz′的垂线oN称为节线,它又是基本平面ox′y′和oxy的交线。以轴oz和oz′为基本轴,其垂直面oxy和ox′y′为基本平面,计算由轴oz到oz′的角度,便可以得到章动角θ。在该坐标系中,由oN的正端看,章动角θ应按逆时针方向计量。此外,可以测量由固定轴ox到节线oN的角度,得到进动角
Figure PCTCN2018095564-appb-000003
以及测量由节线oN到动轴ox′的角度,得到自转角φ。由轴oz和oz′正端看,角
Figure PCTCN2018095564-appb-000004
和φ也都按逆时针方向计量。
需说明的是,该欧拉角可以随着人像头部姿势的变化而变化,而后续所需要添加的三维图像的角度取决于该欧拉角,因此,可以使得该三维图像也随着人像头部姿势的变化而变化,这在步骤204中将进行详细描述。
其中,步骤202和203的执行顺序可以不分先后。
204、网络设备根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、角度和位置上均匹配。
其中,该三维图像可以根据实际应用的需求或用户的喜好进行选择,比如, 可以是一个三维兔子耳朵、三维猫耳朵、三维眼镜、或三维头巾,等等。
其中,判断该三维图像是否与该对象匹配的方式可以有多种,比如,可以设置当该三维图像与该对象在大小、位置和角度上满足一定函数关系时,确定该三维图像与该对象在大小、位置和角度上均匹配;或者,也可以设置当该三维图像与该对象在大小、位置和角度上一致或大体上一致(即误差小于预设范围)时,确定该三维图像与该对象在大小、位置和角度上均匹配,等等。
为了描述方便,在本公开实施例中,将均以该三维图像与该对象在大小、位置和角度上一致或大体上一致作为匹配条件为例进行说明。以该对象为人像,且三维图像为三维眼镜为例,则可以根据该人像的面部特征点和头部的欧拉角对该三维眼镜进行移位,使得该三维眼镜与该人的面部在位置上大体一致,以及根据该人像的面部特征点和头部的欧拉角,对该三维眼镜进行缩放和旋转,使得该三维眼镜与该人的面部在大小和角度上大体一致,等等。
205、网络设备在该三维图像与该对象匹配的状态下,获取该三维图像的深度信息。
其中,上述该三维图像与该对象匹配指代的是,该三维图像与该对象在大小、位置和角度上均匹配。
继续以该需要处理的对象为“人像”,且需要添加的三维图像为三维眼镜为例,则当该三维眼镜与该人的面部在位置、大小和角度上大体一致时,网络设备获取该三维眼镜的深度信息,并执行步骤206。
206、网络设备根据三维图像的深度信息在该对象上绘制该三维图像。
继续以该需要处理的对象为“人像”,且需要添加的三维图像为三维眼镜为例,则网络设备可以根据步骤205中得到的三维眼镜的深度信息,在该人像的面部绘制三维眼镜。
由上可知,本公开实施例可以从采集到的视频数据中,确定需要进行处理的对象,然后,检测该对象的特征点、以及该对象目标部位的欧拉角,根据这些特征点和欧拉角获取目标三维图像的深度信息,并基于该深度信息在该对象上绘制三维图像,从而实现在采集到的原始图像上添加三维图像,比如添加三维物品;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,可以大大提高所添加效果与原始图像的融合度,从而从整体上改善视频处理质量。
另外,也可以实现形式丰富的AR效果,丰富了网络设备的功能,效果较佳。
与上一个实施例相同的是,在本公开实施例中,同样以该视频处理装置集成在网络设备中为例进行说明,与上一个实施例不同的是,在本公开实施例中,将以绘制另一种类型的三维图像,如三维头盔为例进行说明。
如图3a所示,一种视频处理方法,流程可以如下:
301、网络设备采集视频数据,并从该视频数据中确定需要处理的对象。
302、网络设备检测该对象的特征点。
303、网络设备检测该对象目标部位的欧拉角。
其中,步骤301~303的执行可参见上一个实施例中步骤201~203的相关描述。
304、网络设备获取目标遮挡模型。
其中,目标遮挡模型可以根据该对象需要裸露的部分进行设置,比如,以需要裸露的部分为人脸为例,则如图3b所示,可以建立一个人头的模型来作为目标遮挡模型,等等。
需说明的是,为了使得算法更为简便,以及提高处理效率,多个类型相同的不同对象可以使用同一遮挡模型;比如,继续以需要裸露的部分为人脸,且需要绘制的三维图像为三维头盔为例,若该对象的目标部位为用户甲的头部,则此时,可以采用遮挡模型A,来避免绘制三维头盔时用户甲的脸部被遮挡;而若该对象的目标部位为用户乙的头部,则同样也可以采用遮挡模型A,来避免绘制三维头盔时用户乙的脸部被遮挡,以此类推,等等。
可选的,为了提高绘制的精确度,以及提高处理效果,还可以根据具体的对象来建立目标遮挡模型;比如,继续以需要裸露的部分为人脸,且需要绘制的三维图像为三维头盔为例,若该对象的目标部位为用户甲的头部,则可以根据用户甲的头部建立遮挡模型A,然后,采用遮挡模型A来避免绘制三维头盔时用户甲的脸部被遮挡;而若该对象的目标部位为用户乙的头部,则可以根据用户乙的头部建立遮挡模型B,然后,采用遮挡模型B来避免绘制三维头盔时用户乙的脸部被遮挡,以此类推,等等。
可选的,由于并不是所有需要绘制的三维图像均会对该对象造成遮挡,因 此为了提高灵活性,在获取目标遮挡模型之前,还可以对该三维图像进行判断,若该三维图像属于预设类型,比如若需要绘制的三维图像为三维头盔,则需要遮挡模型,若该三维图像不属于预设类型,比如若需要绘制的三维图像为三维眼镜,则可以直接对该三维图像进行绘制,可参见上一个实施例中的相关描述。
即,预设类型指代的是会对该对象进行遮挡的类型。
305、网络设备根据该特征点和欧拉角对目标遮挡模型进行调整,使得目标遮挡模型与该对象匹配,在目标遮挡模型与该对象匹配的状态下,获取目标遮挡模型的深度信息。
在对目标遮挡模型进行调整时,网络设备可根据该特征点和欧拉角对目标遮挡模型进行缩放、旋转和移位中的至少一种处理,使得目标遮挡模型与该对象在大小、位置和角度上均匹配,然后,提取目标遮挡模型在匹配状态下的深度信息,得到目标遮挡模型的深度信息,比如,参见图3c,该图为目标遮挡模型的深度信息的示意图。
其中,在判断目标遮挡模型与该对象是否匹配的方式可以有多种,比如,可以设置当目标遮挡模型与该对象在大小、位置和角度上满足一定函数关系时,确定目标遮挡模型与该对象在大小、位置和角度上均匹配;或者,也可以设置当目标遮挡模型与该对象在大小、位置和角度上一致或大体上一致(即误差小于预设范围)时,确定目标遮挡模型与该对象在大小、位置和角度上均匹配,等等。
306、网络设备根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、角度和位置上均匹配,在该三维图像与该对象匹配的状态下,获取该三维图像的深度信息。
其中,判断该三维图像与该对象匹配的方式可以有多种,可参见上一个实施例中步骤204和205的相关描述。
比如,参见图3d,该图为写入三维头盔的深度信息的示意图,由图3d可以看出,若直接根据三维头盔的深度信息来绘制该三维头盔,则将会对该人像的面部造成遮挡,因此,需要将面部“裸露”出来,而这个操作,可以通过执行步骤307来实现。
需说明的是,步骤304和306的执行顺序可以不分先后。
307、网络设备根据目标遮挡模型的深度信息和该三维图像的深度信息,对目标遮挡模型和三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息。
例如,网络设备可以在将目标遮挡模型和三维图像进行叠加后,将重合部分的颜色设置为vec(0.0.0.0),并写入三维头盔的深度信息(即对步骤306得到的三维头盔的深度信息进行更新),得到处理后的深度信息,这样,后续在绘制头盔时,头盔被目标遮挡模型遮挡的像素便会被丢弃,相当于在绘制的头盔上挖出来个洞将人像的面部显露出来。
308、网络设备根据处理后的深度信息在该对象上绘制该三维图像。
继续以该需要处理的对象为“人像”,且需要添加的三维图像为三维头盔为例,则网络设备可以根据步骤307中得到的处理后的深度信息,在该人像的头部绘制三维头盔。
由于此时所依据的是处理后的深度信息,因此,在绘制该三维头盔时,被遮挡模型遮挡的像素会被丢弃,从而避免了人像的面部被绘制的三维头盔遮挡的情况发生,比如,参见图3e,该图为三维头盔的绘制效果图,可见,在该人像的头部,添加了一个三维头盔,且该人像的面部已显露出来,清晰可见。
由上可知,本公开实施例可以从采集到的视频数据中,确定需要进行处理的对象,然后,检测该对象的特征点、以及该对象目标部位的欧拉角,根据这些特征点和欧拉角,获取遮挡模型的深度信息和目标三维图像的深度信息,并基于这些深度信息在该对象上绘制三维图像,从而达到在采集到的原始图像上添加三维图像(比如三维物品)效果的目的;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,可以大大提高所添加效果与原始图像的融合度,从而从整体上改善视频处理质量。
此外,该方案还可以通过设置遮挡模型,来改善三维图像的添加效果,避免遮挡对象的情况发生,因此,提高了实现的灵活性、以及进一步提高了所添加效果与原始图像的融合度、以及视频处理质量。
另外,还可以实现形式丰富的AR效果,丰富了视频处理方式,效果较佳。
为了更好地实施以上方法,本公开实施例还提供一种视频处理装置,该视频处理装置具体可以集成在网络设备,比如服务器或终端等设备中;其中,该 终端具体可以为手机、平板电脑、笔记本电脑、和/或PC等。
例如,如图4a所示,该视频处理装置可以包括采集单元401、检测单元402、获取单元403和绘制单元404,如下:
(1)采集单元401;
采集单元401,用于采集视频数据,并从该视频数据中确定需要进行处理的对象。
例如,该采集单元401,具体可以用于通过摄像头进行拍摄,或从本地(即终端)读取视频数据等,并从该视频数据中确定需要进行处理的对象。
或者,该采集单元401,具体可以用于接收终端发送的视频数据,并从该视频数据中确定需要进行处理的对象。
其中,该对象的类型可以根据实际应用的需求而定,比如,该对象可以是人、动物、甚至是物体,等等,此外,该对象的数量也可以根据实际应用的需求而定,该对象可以是单个,也可以是多个,在此不再赘述。
(2)检测单元402;
检测单元402,用于检测该对象的特征点、以及获取该对象目标部位的欧拉角。
其中,特征点和目标部位的设置可以根据实际应用的需求而定,以该对象为人像为例,则可以将特征点设置为人的五官,如眉毛、眼睛、鼻子、嘴巴和耳朵,以及人的脸部轮廓等,即当该对象为人像,目标部位为头部时,该检测单元402,具体可以用于:
采用人脸检测技术对该对象的面部进行人脸识别,得到该对象的面部特征点;以及,对该对象的头部姿势进行检测,得到该对象的头部的欧拉角。
其中,该面部特征点可以包括五官、以及脸部轮廓等特征点,具体可参见前面的方法实施例,在此不再赘述。
(3)获取单元403;
获取单元403,用于根据该特征点和欧拉角获取目标三维图像的深度信息。
例如,该获取单元403可以包括调整子单元和提取子单元,如下:
该调整子单元,可以用于根据该特征点和欧拉角对目标三维图像进行调整,使得该三维图像与该对象匹配。
该提取子单元,可以用于在该三维图像与该对象匹配的状态下,提取该三维图像的深度信息。
比如,该调整子单元,具体可以用于根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、位置和角度上均匹配。
其中,该三维图像可以根据实际应用的需求或用户的喜好进行选择,比如,可以是一个三维头盔、三维兔子耳朵、三维猫耳朵、三维眼镜、或三维头巾,等等。
其中,判断该三维图像与该对象是否匹配的方式可以有多种,比如,可以设置当该三维图像与该对象在大小、位置和角度上满足一定函数关系时,确定该三维图像与该对象在大小、位置和角度上均匹配;或者,也可以设置当该三维图像与该对象在大小、位置和角度上一致或大体上一致(即误差小于预设范围)时,确定该三维图像与该对象在大小、位置和角度上均匹配,等等。
(4)绘制单元404;
绘制单元404,用于基于该三维图像的深度信息在该对象上绘制该三维图像。
例如,绘制单元404,具体可以用于根据该三维图像的深度信息在该对象所在的帧上渲染该三维图像,比如在头部绘制一个三维眼镜、三维头盔、或三维兔子耳朵,等等。
可选的,为了避免所绘制的三维图像对该对象造成遮挡,还可以根据该对象需要裸露的部分(即避免被三维图像遮挡的部分)设置相匹配的遮挡模型(occluder),以便该三维图像在绘制时,能够相应地避开该对象需要裸露的部分;即如图4b所示,该视频处理装置还可以包括遮挡获取单元405和遮挡调整单元406,如下:
该遮挡获取单元405,可以用于获取目标遮挡模型的深度信息。
该遮挡调整单元406,可以用于根据该目标遮挡模型的深度信息和三维图像的深度信息,对该遮挡模型和三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息。
则,该绘制单元404,具体可以用于根据遮挡调整单元得到的处理后的深 度信息在该对象上绘制该三维图像。
其中,获取目标遮挡模型的深度信息与获取三维图像的深度信息类似,例如,具体可以如下:
该遮挡获取单元405,具体可以用于获取目标遮挡模型,根据该特征点和欧拉角对目标遮挡模型进行调整,使得目标遮挡模型与该对象匹配;在目标遮挡模型与该对象匹配的状态下,获取目标遮挡模型的深度信息。
比如,该遮挡获取单元405,具体可以用于获取目标遮挡模型,根据该特征点和欧拉角对目标该遮挡模型进行缩放、旋转和移位中的至少一种处理,使得该遮挡模型与该对象在大小、位置和角度上均匹配。
其中,目标遮挡模型可以根据该对象需要裸露的部分进行设置,比如,以需要裸露的部分为人脸为例,则具体可以建立一个人头的模型来作为该遮挡模型,等等。
需说明的是,为了使得算法更为简便,以及提高处理效率,多个类型相同的不同对象可以使用同一遮挡模型。可选的,为了提高绘制的精确度,以及提高处理效果,还可以根据具体的对象来建立该遮挡模型,具体可参见前面的方法实施例,在此不再赘述。
可选的,由于并不是所有需要绘制的三维图像均会对该对象造成遮挡,因此为了提高灵活性,在获取目标遮挡模型的深度信息之前,还可以对该三维图像进行判断,若该三维图像属于预设类型,则需要遮挡模型,否则,可以直接对该三维图像进行绘制;即如图4b所示,该视频处理装置还可以包括确定单元407,如下:
该确定单元407,可以用于确定该三维图像的类型是否满足目标条件,若该三维图像的类型满足目标条件,则触发遮挡获取单元405执行获取目标遮挡模型的深度信息的操作;若该三维图像的类型不满足目标条件,则触发绘制单元404执行基于该三维图像的深度信息在该对象上绘制该三维图像的操作。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
由上可知,本公开实施例可以从采集到的视频数据中,确定需要进行处理 的对象,然后,检测该对象的特征点、以及该对象目标部位的欧拉角,根据这些特征点和欧拉角,获取遮挡模型的深度信息和目标三维图像的深度信息,并基于这些深度信息在该对象上绘制三维图像,从而达到在采集到的原始图像上添加三维图像(比如三维物品)效果的目的;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,可以大大提高所添加效果与原始图像的融合度,从而从整体上改善视频处理质量。
此外,该方案还可以通过设置遮挡模型,来改善三维图像的添加效果,避免遮挡对象的情况发生,因此,提高了实现的灵活性、以及进一步提高了所添加效果与原始图像的融合度、以及视频处理质量。
另外,还可以实现形式丰富的AR效果,丰富了视频处理方式,效果较佳。
相应的,本公开实施例还提供一种网络设备,该网络设备可以是终端,也可以是服务器。例如,如图5所示,其示出了本公开实施例所涉及的网络设备的结构示意图,具体来讲:
该网络设备可以包括一个或者一个以上处理核心的处理器501、一个或一个以上计算机可读存储介质的存储器502、电源503和输入单元504等部件。本领域技术人员可以理解,图5中示出的网络设备结构并不构成对网络设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器501是该网络设备的控制中心,利用各种接口和线路连接整个网络设备的各个部分,通过运行或执行存储在存储器502内的软件程序和/或模块,以及调用存储在存储器502内的数据,执行网络设备的各种功能和处理数据,从而对网络设备进行整体监控。可选的,处理器501可包括一个或多个处理核心;优选的,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。
存储器502可用于存储软件程序以及模块,处理器501通过运行存储在存储器502的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器502可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存 储数据区可存储根据网络设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器502还可以包括存储器控制器,以提供处理器501对存储器502的访问。
网络设备还包括给各个部件供电的电源503,优选的,电源503可以通过电源管理系统与处理器501逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源503还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该网络设备还可包括输入单元504,该输入单元504可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,网络设备还可以包括显示单元等,在此不再赘述。具体在本公开实施例中,网络设备中的处理器501会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器502中,并由处理器501来运行存储在存储器502中的应用程序,从而实现各种功能,如下:
采集视频数据,并从该视频数据中确定需要进行处理的对象,检测该对象的特征点,以及获取该对象目标部位的欧拉角,根据该特征点和欧拉角获取目标三维图像的深度信息,基于该三维图像的深度信息,在该对象上绘制该三维图像。
例如,具体可以根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、位置和角度上均匹配,然后,在该三维图像与该对象匹配的状态下,提取该三维图像的深度信息。
其中,特征点和目标部位的设置可以根据实际应用的需求而定,以该对象为人像为例,则可以将特征点设置为人的五官,如眉毛、眼睛、鼻子、嘴巴和耳朵,以及人的脸部轮廓等,即处理器501还可以运行存储在存储器502中的应用程序,从而实现如下功能:
采用人脸检测技术对该对象的面部进行人脸识别,得到该对象的面部特征点,以及对该对象的头部姿势进行检测,得到该对象的头部的欧拉角。其中,该面部特征点可以包括五官、以及脸部轮廓等特征点等。
可选的,为了避免所绘制的三维图像对该对象造成遮挡,还可以根据该对象需要裸露的部分设置相匹配的遮挡模型,以便该三维图像在绘制时,能够相应地避开该对象需要裸露的部分;即处理器501还可以运行存储在存储器502中的应用程序,从而实现如下功能:
获取目标遮挡模型的深度信息,根据该目标遮挡模型的深度信息和三维图像的深度信息对该遮挡模型和三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息,然后,根据处理后的深度信息在该对象上绘制该三维图像。
其中,获取目标遮挡模型的深度信息与获取三维图像的深度信息类似,具体可参见前面的实施例,在此不再赘述。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
由上可知,本公开实施例可以从采集到的视频数据中,确定需要进行处理的对象,然后,检测该对象的特征点、以及该对象目标部位的欧拉角,根据这些特征点和欧拉角,获取遮挡模型的深度信息和目标三维图像的深度信息,并基于这些深度信息在该对象上绘制三维图像,从而达到在采集到的原始图像上添加三维图像(比如三维物品)效果的目的;该方案相对于相关技术仅能添加二维动态贴纸效果的方案而言,可以大大提高所添加效果与原始图像的融合度,从而从整体上改善视频处理质量。
此外,该方案还可以通过设置遮挡模型,来改善三维图像的添加效果,避免遮挡对象的情况发生,因此,提高了实现的灵活性、以及进一步提高了所添加效果与原始图像的融合度、以及视频处理质量。
另外,还可以实现形式丰富的AR效果,丰富了视频处理方式,效果较佳。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本公开实施例提供一种存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本公开实施例所提供的任一种视频处理方法中的步骤。例如,该指令可以执行如下步骤:
采集视频数据,并从该视频数据中确定需要进行处理的对象,检测该对象 的特征点,以及获取该对象目标部位的欧拉角,根据该特征点和欧拉角获取目标三维图像的深度信息,基于该三维图像的深度信息,在该对象上绘制该三维图像。
例如,具体可以根据该特征点和欧拉角对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得该三维图像与该对象在大小、位置和角度上均匹配,然后,在该三维图像与该对象匹配的状态下,提取该三维图像的深度信息。
其中,特征点和目标部位的设置可以根据实际应用的需求而定,比如,以该对象为人像为例,则可以将特征点设置为人的五官,如眉毛、眼睛、鼻子、嘴巴和耳朵,以及人的脸部轮廓等,即该指令还可以执行如下步骤:
采用人脸检测技术对该对象的面部进行人脸识别,得到该对象的面部特征点,以及对该对象的头部姿势进行检测,得到该对象的头部的欧拉角。其中,该面部特征点可以包括五官、以及脸部轮廓等特征点等。
可选的,为了避免所绘制的三维图像对该对象造成遮挡,还可以根据该对象需要裸露的部分设置相匹配的遮挡模型,以便该三维图像在绘制时,能够相应地避开该对象需要裸露的部分;即该指令还可以执行如下步骤:
获取目标遮挡模型的深度信息,根据该目标遮挡模型的深度信息和三维图像的深度信息对该遮挡模型和三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息,然后,根据处理后的深度信息在该对象上绘制该三维图像。
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。
其中,该存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该存储介质中所存储的指令,可以执行本公开实施例所提供的任一种视频处理方法中的步骤,因此,可以实现本公开实施例所提供的任一种视频处理方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
以上对本公开实施例所提供的一种视频处理方法、装置和存储介质进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (19)

  1. 一种视频处理方法,其特征在于,所述方法应用于网络设备,包括:
    采集视频数据,并从所述视频数据中确定需要进行处理的对象;
    检测所述对象的特征点,以及获取所述对象目标部位的欧拉角;
    根据所述特征点和所述欧拉角获取目标三维图像的深度信息;
    基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述特征点和所述欧拉角获取目标三维图像的深度信息,包括:
    根据所述特征点和所述欧拉角对所述目标三维图像进行调整,使得所述目标三维图像与所述对象匹配;
    在所述目标三维图像与所述对象匹配的状态下,获取所述目标三维图像的深度信息。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述特征点和所述欧拉角对目标三维图像进行调整,使得所述目标三维图像与所述对象匹配,包括:
    根据所述特征点和所述欧拉角,对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得所述目标三维图像与所述对象在大小、角度和位置上均匹配。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取目标遮挡模型的深度信息;
    根据所述目标遮挡模型的深度信息和所述目标三维图像的深度信息,对所述目标遮挡模型和所述目标三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息;
    所述基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像,包括:根据所述处理后的深度信息在所述对象上绘制所述目标三维图像。
  5. 根据权利要求4所述的方法,其特征在于,所述获取目标遮挡模型的深度信息,包括:
    获取目标遮挡模型;
    根据所述特征点和所述欧拉角对所述目标遮挡模型进行调整,使得所述目标遮挡模型与所述对象匹配;
    在所述目标遮挡模型与所述对象匹配的状态下,获取所述目标遮挡模型的深度信息。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述特征点和所述欧拉角对所述目标遮挡模型进行调整,使得所述目标遮挡模型与所述对象匹配,包括:
    根据所述特征点和所述欧拉角,对所述目标遮挡模型进行缩放、旋转和移位中的至少一种处理,使得所述目标遮挡模型与所述对象在大小、角度和位置上均匹配。
  7. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    确定所述目标三维图像的类型是否满足目标条件;
    若所述目标三维图像的类型满足所述目标条件,则执行所述获取目标遮挡模型的深度信息的步骤。
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:
    若所述目标三维图像的类型不满足所述目标条件,则执行所述基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像的步骤。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述对象为人像,所述目标部位为头部,则所述检测所述对象的特征点,以及获取所述对象目标部位的欧拉角,包括:
    对所述对象的面部进行人脸识别,得到所述对象的面部特征点;
    对所述对象的头部姿势进行检测,得到所述对象的头部的欧拉角。
  10. 一种网络设备,其特征在于,包括:一个或者一个以上处理器、一个或一个以上存储器,所述存储器中存储有至少一段应用程序,所述至少一段应用程序适于处理器进行加载,以执行以下操作:
    采集视频数据,并从所述视频数据中确定需要进行处理的对象;
    检测所述对象的特征点,以及获取所述对象目标部位的欧拉角;
    根据所述特征点和所述欧拉角获取目标三维图像的深度信息;
    基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像。
  11. 根据权利要求10所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    根据所述特征点和所述欧拉角对所述目标三维图像进行调整,使得所述目标三维图像与所述对象匹配;
    在所述目标三维图像与所述对象匹配的状态下,获取所述目标三维图像的深度信息。
  12. 根据权利要求11所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    根据所述特征点和所述欧拉角,对目标三维图像进行缩放、旋转和移位中的至少一种处理,使得所述目标三维图像与所述对象在大小、角度和位置上均匹配。
  13. 根据权利要求10所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    获取目标遮挡模型的深度信息;
    根据所述目标遮挡模型的深度信息和所述目标三维图像的深度信息,对所述目标遮挡模型和所述目标三维图像进行叠加,并将重合部分的颜色设置为透明,得到处理后的深度信息;
    所述基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像,包括:根据所述处理后的深度信息在所述对象上绘制所述目标三维图像。
  14. 根据权利要求13所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    获取目标遮挡模型;
    根据所述特征点和所述欧拉角对所述目标遮挡模型进行调整,使得所述目标遮挡模型与所述对象匹配;
    在所述目标遮挡模型与所述对象匹配的状态下,获取所述目标遮挡模型的深度信息。
  15. 根据权利要求14所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    根据所述特征点和所述欧拉角,对所述目标遮挡模型进行缩放、旋转和移位中的至少一种处理,使得所述目标遮挡模型与所述对象在大小、角度和位置上均匹配。
  16. 根据权利要求13所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    确定所述目标三维图像的类型是否满足目标条件;
    若所述目标三维图像的类型满足所述目标条件,则执行所述获取目标遮挡模型的深度信息的步骤。
  17. 根据权利要求16所述的网络设备,其特征在于,所述处理器加载所述至少一段应用程序执行以下操作:
    若所述目标三维图像的类型不满足所述目标条件,则执行所述基于所述目标三维图像的深度信息,在所述对象上绘制所述目标三维图像的步骤。
  18. 根据权利要求10至17任一项所述的网络设备,其特征在于,所述对象 为人像,所述目标部位为头部,所述处理器加载所述至少一段应用程序执行以下操作:
    对所述对象的面部进行人脸识别,得到所述对象的面部特征点;
    对所述对象的头部姿势进行检测,得到所述对象的头部的欧拉角。
  19. 一种存储介质,其特征在于,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至9任一项所述的视频处理方法中的步骤。
PCT/CN2018/095564 2017-07-27 2018-07-13 一种视频处理方法、网络设备和存储介质 WO2019019927A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710623011.9 2017-07-27
CN201710623011.9A CN107341827B (zh) 2017-07-27 2017-07-27 一种视频处理方法、装置和存储介质

Publications (1)

Publication Number Publication Date
WO2019019927A1 true WO2019019927A1 (zh) 2019-01-31

Family

ID=60216460

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/095564 WO2019019927A1 (zh) 2017-07-27 2018-07-13 一种视频处理方法、网络设备和存储介质

Country Status (3)

Country Link
CN (1) CN107341827B (zh)
TW (1) TWI678099B (zh)
WO (1) WO2019019927A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710472A (zh) * 2020-12-16 2022-07-05 中国移动通信有限公司研究院 一种ar视频通话的处理方法、装置及通信设备

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341827B (zh) * 2017-07-27 2023-01-24 腾讯科技(深圳)有限公司 一种视频处理方法、装置和存储介质
CN108764135B (zh) * 2018-05-28 2022-02-08 北京微播视界科技有限公司 图像生成方法、装置,及电子设备
CN108986042A (zh) * 2018-06-15 2018-12-11 Oppo广东移动通信有限公司 贴纸共享方法及装置
CN108830928A (zh) * 2018-06-28 2018-11-16 北京字节跳动网络技术有限公司 三维模型的映射方法、装置、终端设备和可读存储介质
CN110798677B (zh) * 2018-08-01 2021-08-31 Oppo广东移动通信有限公司 三维场景建模方法及装置、电子装置、可读存储介质及计算机设备
WO2020037679A1 (zh) * 2018-08-24 2020-02-27 太平洋未来科技(深圳)有限公司 视频处理方法、装置及电子设备
CN111710044A (zh) * 2019-03-18 2020-09-25 北京京东尚科信息技术有限公司 图像处理方法、装置和计算机可读存储介质
CN112927343B (zh) * 2019-12-05 2023-09-05 杭州海康威视数字技术股份有限公司 一种图像生成方法及装置
KR20210091571A (ko) * 2020-01-14 2021-07-22 엘지전자 주식회사 헤드 포즈를 추정하는 인공 지능 장치 및 그 방법
CN111540060B (zh) * 2020-03-25 2024-03-08 深圳奇迹智慧网络有限公司 增强现实设备的显示校准方法、装置、电子设备
CN112770185B (zh) * 2020-12-25 2023-01-20 北京达佳互联信息技术有限公司 雪碧图的处理方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012071463A2 (en) * 2010-11-24 2012-05-31 Aria Glassworks, Inc. System and method for presenting virtual and augmented reality scenes to a user
US20160035133A1 (en) * 2014-07-31 2016-02-04 Ulsee Inc. 2d image-based 3d glasses virtual try-on system
US20160246078A1 (en) * 2015-02-23 2016-08-25 Fittingbox Process and method for real-time physically accurate and realistic-looking glasses try-on
CN106373182A (zh) * 2016-08-18 2017-02-01 苏州丽多数字科技有限公司 一种增强现实人脸互动娱乐方法
CN107341827A (zh) * 2017-07-27 2017-11-10 腾讯科技(深圳)有限公司 一种视频处理方法、装置和存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006202115A (ja) * 2005-01-21 2006-08-03 National Institute Of Advanced Industrial & Technology 画像処理方法および画像処理プログラム
JP4757142B2 (ja) * 2006-08-10 2011-08-24 キヤノン株式会社 撮影環境校正方法及び情報処理装置
JP5297677B2 (ja) * 2008-04-08 2013-09-25 株式会社フローベル 検出装置および方法、プログラム、記録媒体、並びにシミュレーションシステム
CN102308276B (zh) * 2008-12-03 2014-12-17 轩江 利用某些视觉效果来显示对象
CN101794459A (zh) * 2010-02-09 2010-08-04 北京邮电大学 一种立体视觉影像与三维虚拟物体的无缝融合方法
CN101964064B (zh) * 2010-07-27 2013-06-19 上海摩比源软件技术有限公司 一种人脸比对方法
TWI544447B (zh) * 2011-11-29 2016-08-01 財團法人資訊工業策進會 擴增實境的方法及系統
CN103489214A (zh) * 2013-09-10 2014-01-01 北京邮电大学 增强现实系统中基于虚拟模型预处理的虚实遮挡处理方法
CN106157358A (zh) * 2015-03-26 2016-11-23 成都理想境界科技有限公司 基于视频图像的对象融合方法及终端
JP6491517B2 (ja) * 2015-03-31 2019-03-27 Kddi株式会社 画像認識ar装置並びにその姿勢推定装置及び姿勢追跡装置
CN106157282A (zh) * 2015-03-31 2016-11-23 深圳迈瑞生物医疗电子股份有限公司 图像处理系统及方法
CN105657408B (zh) * 2015-12-31 2018-11-30 北京小鸟看看科技有限公司 虚拟现实场景的实现方法和虚拟现实装置
CN105898561B (zh) * 2016-04-13 2019-06-18 腾讯科技(深圳)有限公司 一种视频图像处理方法和装置
CN106851092B (zh) * 2016-12-30 2018-02-09 中国人民解放军空军预警学院监控系统工程研究所 一种红外视频拼接方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012071463A2 (en) * 2010-11-24 2012-05-31 Aria Glassworks, Inc. System and method for presenting virtual and augmented reality scenes to a user
US20160035133A1 (en) * 2014-07-31 2016-02-04 Ulsee Inc. 2d image-based 3d glasses virtual try-on system
US20160246078A1 (en) * 2015-02-23 2016-08-25 Fittingbox Process and method for real-time physically accurate and realistic-looking glasses try-on
CN106373182A (zh) * 2016-08-18 2017-02-01 苏州丽多数字科技有限公司 一种增强现实人脸互动娱乐方法
CN107341827A (zh) * 2017-07-27 2017-11-10 腾讯科技(深圳)有限公司 一种视频处理方法、装置和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114710472A (zh) * 2020-12-16 2022-07-05 中国移动通信有限公司研究院 一种ar视频通话的处理方法、装置及通信设备

Also Published As

Publication number Publication date
CN107341827B (zh) 2023-01-24
CN107341827A (zh) 2017-11-10
TWI678099B (zh) 2019-11-21
TW201840179A (zh) 2018-11-01

Similar Documents

Publication Publication Date Title
WO2019019927A1 (zh) 一种视频处理方法、网络设备和存储介质
US10691934B2 (en) Real-time visual feedback for user positioning with respect to a camera and a display
WO2020199906A1 (zh) 人脸关键点检测方法、装置、设备及存储介质
US11436779B2 (en) Image processing method, electronic device, and storage medium
WO2022012192A1 (zh) 三维人脸模型的构建方法、装置、设备及存储介质
JP7483301B2 (ja) 画像処理及び画像合成方法、装置及びコンピュータプログラム
US20220254058A1 (en) Method for determining line-of-sight, method for processing video, device, and storage medium
WO2019200719A1 (zh) 三维人脸模型生成方法、装置及电子设备
KR20180121494A (ko) 단안 카메라들을 이용한 실시간 3d 캡처 및 라이브 피드백을 위한 방법 및 시스템
WO2021143282A1 (zh) 三维人脸模型生成方法、装置、计算机设备及存储介质
US8854376B1 (en) Generating animation from actor performance
WO2021082801A1 (zh) 增强现实处理方法及装置、系统、存储介质和电子设备
WO2019062056A1 (zh) 一种智能投影方法、系统及智能终端
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
WO2023071790A1 (zh) 目标对象的姿态检测方法、装置、设备及存储介质
US10672191B1 (en) Technologies for anchoring computer generated objects within augmented reality
WO2019075656A1 (zh) 图像处理方法、装置、终端和存储介质
CN112470164A (zh) 姿态校正
CN112101247B (zh) 一种人脸姿态估计方法、装置、设备及存储介质
CN110188630A (zh) 一种人脸识别方法和相机
CN110060295A (zh) 目标定位方法及装置、控制装置、跟随设备及存储介质
JP2004326179A (ja) 画像処理装置、画像処理方法および画像処理プログラムならびに画像処理プログラムを記録した記録媒体
WO2023025175A1 (zh) 用于空间定位的方法及装置
CN116137025A (zh) 视频图像矫正方法及装置、计算机可读介质和电子设备
KR102296820B1 (ko) 얼굴 영상의 2차원 텍스쳐 맵 형성 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18839120

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18839120

Country of ref document: EP

Kind code of ref document: A1