CN108111911B

CN108111911B - Video data real-time processing method and device based on self-adaptive tracking frame segmentation

Info

Publication number: CN108111911B
Application number: CN201711423802.3A
Authority: CN
Inventors: 赵鑫; 邱学侃; 颜水成
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-12-25
Filing date: 2017-12-25
Publication date: 2020-07-28
Anticipated expiration: 2037-12-25
Also published as: CN108111911A

Abstract

The invention discloses a video data real-time processing method, a device, computing equipment and a computer storage medium based on self-adaptive tracking frame segmentation, wherein the method comprises the following steps: acquiring a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to the t-1 frame image; according to the t frame image, adjusting the tracking frame corresponding to the t-1 frame image to obtain a tracking frame corresponding to the t frame image; according to the tracking frame corresponding to the t frame image, carrying out scene segmentation processing on a partial area of the t frame image to obtain a segmentation result corresponding to the t frame image; determining a second foreground image of the t frame image according to the segmentation result; adding a personalized special effect according to the second foreground image to obtain a processed t frame image; covering the processed t frame image with the processed t frame image to obtain processed video data; and displaying the processed video data. The technical scheme can add the personalized special effect to the frame image more accurately and quickly.

Description

Video data real-time processing method and device based on self-adaptive tracking frame segmentation

Technical Field

The invention relates to the technical field of image processing, in particular to a video data real-time processing method and device based on self-adaptive tracking frame segmentation, computing equipment and a computer storage medium.

Background

In the prior art, when a user needs to perform personalized processing such as background replacement and special effect addition on a video, an image segmentation method is often used for performing scene segmentation processing on a frame image in the video, wherein a pixel-level segmentation effect can be achieved by using the image segmentation method based on deep learning. However, when the existing image segmentation method is used for scene segmentation processing, scene segmentation processing needs to be performed on all contents of a frame image, so that the data processing amount is large, and the processing efficiency is low; in addition, when the existing image segmentation method is used for carrying out scene segmentation processing, the proportion of the foreground image in the frame image is not considered, so when the proportion of the foreground image in the frame image is small, the existing image segmentation method is used for easily dividing pixel points which actually belong to the edge of the foreground image into the background image, and the obtained segmentation result is low in segmentation precision and poor in segmentation effect. Therefore, the image segmentation method in the prior art has the problems of large data processing amount of image scene segmentation and low processing efficiency and segmentation precision, so that the obtained segmentation result cannot be used for well and accurately adding the personalized special effect to the frame image in the video, and the obtained processed video data has poor display effect.

Disclosure of Invention

In view of the above, the present invention has been made to provide a method, an apparatus, a computing device and a computer storage medium for real-time processing of video data based on adaptive tracking frame segmentation that overcome or at least partially solve the above-mentioned problems.

According to an aspect of the present invention, there is provided a video data real-time processing method based on adaptive tracking frame segmentation, the method is used for processing groups of frame images obtained by dividing every n frames in a video, and for one group of frame images, the method includes:

acquiring a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to a t-1 frame image, wherein t is larger than 1; the tracking frame corresponding to the 1 st frame image is determined according to the segmentation result corresponding to the 1 st frame image;

according to the t frame image, adjusting the tracking frame corresponding to the t-1 frame image to obtain a tracking frame corresponding to the t frame image; according to the tracking frame corresponding to the t frame image, carrying out scene segmentation processing on a partial area of the t frame image to obtain a segmentation result corresponding to the t frame image;

determining a second foreground image of the t frame image according to a segmentation result corresponding to the t frame image;

adding a personalized special effect according to the second foreground image to obtain a processed t frame image;

covering the processed t frame image with the processed t frame image to obtain processed video data;

and displaying the processed video data.

Further, adding a personalized special effect according to the second foreground image, and obtaining the processed tth frame image further comprises:

extracting key information of a region to be processed from the second foreground image;

drawing an effect map according to the key information;

fusing the effect mapping, the second foreground image and a preset background image to obtain a processed t frame image; or, the effect map, the second foreground image and the second background image determined according to the segmentation result corresponding to the t-th frame image are fused to obtain the processed t-th frame image.

Further, the key information is key point information; according to the key information, drawing the effect map further comprises:

searching a basic effect map corresponding to the key point information; or acquiring a basic effect map specified by a user;

calculating position information between at least two key points with a symmetrical relation according to the key point information;

and processing the basic effect map according to the position information to obtain the effect map.

extracting key information of a region to be identified from the second foreground image;

recognizing the posture of the specific object according to the key information to obtain a posture recognition result of the specific object;

and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object to obtain the processed t frame image.

Further, according to the gesture recognition result of the specific object, determining a corresponding effect processing command to be responded to the t-th frame image, and obtaining the processed t-th frame image further includes:

and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object and the interaction information with the interaction object contained in the t frame image to obtain the processed t frame image.

Further, the effect processing command to be responded includes an effect map processing command, a stylization processing command, a brightness processing command, a light processing command, and/or a tone processing command.

Further, according to the t-th frame image, the adjusting the tracking frame corresponding to the t-1-th frame image further includes:

identifying the t frame image, and determining a first foreground image aiming at a specific object in the t frame image;

applying a tracking frame corresponding to the t-1 th frame image to the t-th frame image;

and adjusting the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t-frame image.

Further, according to the first foreground image in the t-th frame image, the adjusting the tracking frame corresponding to the t-1-th frame image further includes:

calculating the proportion of pixel points belonging to the first foreground image in the t frame image in all pixel points in the tracking frame corresponding to the t-1 frame image, and determining the proportion as the proportion of the first foreground pixel of the t frame image;

acquiring a second foreground pixel proportion of the t-1 frame image, wherein the second foreground pixel proportion of the t-1 frame image is the proportion of pixel points belonging to the first foreground image in the t-1 frame image in all pixel points in a tracking frame corresponding to the t-1 frame image;

calculating a difference value between a first foreground pixel proportion of the t frame image and a second foreground proportion of the t-1 frame image;

judging whether the difference value exceeds a preset difference threshold value or not; if yes, adjusting the size of the tracking frame corresponding to the t-1 frame image according to the difference value.

calculating the distance between the first foreground image in the t frame image and each frame of the tracking frame corresponding to the t-1 frame image;

and adjusting the size of the tracking frame corresponding to the t-1 frame image according to the distance and a preset distance threshold.

determining the central point position of a first foreground image in the t frame image according to the first foreground image in the t frame image;

and adjusting the position of the tracking frame corresponding to the t-1 frame image according to the position of the central point of the first foreground image in the t-th frame image, so that the position of the central point of the tracking frame corresponding to the t-1 frame image is superposed with the position of the central point of the first foreground image in the t-th frame image.

Further, performing scene segmentation processing on a partial region of the t-th frame image according to the tracking frame corresponding to the t-th frame image, and obtaining a segmentation result corresponding to the t-th frame image further includes:

extracting an image to be segmented from a partial region of the t frame image according to a tracking frame corresponding to the t frame image;

carrying out scene segmentation processing on an image to be segmented to obtain a segmentation result corresponding to the image to be segmented;

and obtaining a segmentation result corresponding to the t frame image according to the segmentation result corresponding to the image to be segmented.

Further, extracting an image to be segmented from a partial region of the t-th frame image according to the tracking frame corresponding to the t-th frame image further includes:

and extracting an image in a tracking frame corresponding to the t frame image from the t frame image, and determining the extracted image as an image to be segmented.

Further, the scene segmentation processing is performed on the image to be segmented, and obtaining a segmentation result corresponding to the image to be segmented further includes:

and inputting the image to be segmented into a scene segmentation network to obtain a segmentation result corresponding to the image to be segmented.

Further, displaying the processed video data further comprises: displaying the processed video data in real time;

the method further comprises the following steps: and uploading the processed video data to a cloud server.

Further, uploading the processed video data to a cloud server further comprises:

and uploading the processed video data to a cloud video platform server so that the cloud video platform server can display the video data on a cloud video platform.

and uploading the processed video data to a cloud live broadcast server so that the cloud live broadcast server can push the video data to a client of a watching user in real time.

and uploading the processed video data to a cloud public server so that the cloud public server pushes the video data to a public attention client.

According to another aspect of the present invention, there is provided a video traversal processing apparatus based on adaptive tracking frame segmentation, the apparatus being configured to process groups of frame images obtained by dividing every n frames in a video, the apparatus including:

the system comprises an acquisition module, a tracking module and a tracking module, wherein the acquisition module is suitable for acquiring a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to a t-1 frame image, and t is greater than 1; the tracking frame corresponding to the 1 st frame image is determined according to the segmentation result corresponding to the 1 st frame image;

the segmentation module is suitable for adjusting the tracking frame corresponding to the t-1 frame image according to the t frame image to obtain the tracking frame corresponding to the t frame image; according to the tracking frame corresponding to the t frame image, carrying out scene segmentation processing on a partial area of the t frame image to obtain a segmentation result corresponding to the t frame image;

the determining module is suitable for determining a second foreground image of the t frame image according to the segmentation result corresponding to the t frame image;

the processing module is suitable for adding the personalized special effect according to the second foreground image to obtain a processed t frame image;

the covering module is suitable for covering the processed t frame image with the processed t frame image to obtain processed video data;

and the display module is suitable for displaying the processed video data.

Further, the processing module is further adapted to:

drawing an effect map according to the key information;

Further, the key information is key point information; the processing module is further adapted to:

Further, the processing module is further adapted to:

Further, the segmentation module is further adapted to:

Further, the display module is further adapted to: displaying the processed video data in real time;

the device also includes: and the uploading module is suitable for uploading the processed video data to the cloud server.

Further, the upload module is further adapted to:

According to yet another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the video data real-time processing method based on the self-adaptive tracking frame segmentation.

According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for real-time processing of video data based on adaptive tracking box segmentation as described above.

According to the technical scheme provided by the invention, for each group of frame images, the tracking frame corresponding to the t-th frame image is obtained based on the tracking frame corresponding to the t-1 th frame image, and the scene segmentation is carried out on the t-th frame image by using the tracking frame, so that the segmentation result corresponding to the t-th frame image can be quickly and accurately obtained, and the segmentation precision of the image scene segmentation is effectively improved. Compared with the prior art that the scene segmentation processing is carried out on all the contents of the frame image, the method only carries out the scene segmentation processing on partial areas of the frame image, effectively reduces the data processing amount of image scene segmentation, improves the processing efficiency and optimizes the image scene segmentation processing mode; and based on the obtained segmentation result, the personalized special effect can be more accurately and quickly added to the frame image, and the video data display effect is beautified.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow diagram illustrating a method for real-time processing of video data based on adaptive track box segmentation according to an embodiment of the present invention;

FIG. 2 is a flow chart diagram illustrating a method for real-time processing of video data based on adaptive track box segmentation according to another embodiment of the present invention;

FIG. 3 is a block diagram of an apparatus for real-time processing of video data based on adaptive track box segmentation according to an embodiment of the present invention;

FIG. 4 shows a schematic structural diagram of a computing device according to an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The invention provides a video data real-time processing method based on self-adaptive tracking frame segmentation, which considers that the number of shot or recorded specific objects may change due to the motion and other reasons in the process of video shooting or video recording, and takes the specific objects as human bodies as examples, the number of the shot or recorded human bodies may increase or decrease. In the present invention, the foreground image may contain only a specific object, and the background image is an image other than the foreground image in the frame image. In order to distinguish the foreground image in the frame image before the segmentation process from the foreground image in the frame image after the segmentation process, in the present invention, the foreground image in the frame image before the segmentation process is referred to as a first foreground image, and the foreground image in the frame image after the segmentation process is referred to as a second foreground image. Similarly, the background image in the frame image before the segmentation process is referred to as a first background image, and the background image in the frame image after the segmentation process is referred to as a second background image. The tracking frame may be a rectangular frame, and is used to frame the first foreground image in the frame image to realize tracking of the specific object in the frame image, and a person skilled in the art may set n according to actual needs, which is not limited here. Wherein n may be a fixed preset value, for example, when n is 20, then dividing the frame images in the video every 20 frames to obtain each group of frame images, and the method processes each group of frame images obtained by dividing.

Fig. 1 is a schematic flowchart of a method for processing video data in real time based on adaptive tracking frame segmentation according to an embodiment of the present invention, the method is used for processing groups of frame images obtained by dividing every n frames in a video, as shown in fig. 1, for one group of frame images, the method includes the following steps:

step S100, a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to the t-1 frame image are obtained.

The frame image includes a specific object, which may be a human body, a vehicle, or the like. The specific object can be set by those skilled in the art according to actual needs, and is not limited herein. When a t frame image in a group of frame images needs to be subjected to scene segmentation, wherein t is greater than 1, in step S100, the t frame image and a tracking frame corresponding to the t-1 frame image are acquired. And the tracking frame corresponding to the t-1 frame image can completely select the first foreground image frame in the t-1 frame image. Specifically, the tracking frame corresponding to the 1 st frame image is determined from the segmentation result corresponding to the 1 st frame image.

Step S101, adjusting a tracking frame corresponding to the t-1 frame image according to the t frame image to obtain a tracking frame corresponding to the t frame image; and according to the tracking frame corresponding to the t-th frame image, carrying out scene segmentation processing on a partial area of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image.

In the process of tracking the first foreground image by using the tracking frame, the tracking frame needs to be adjusted according to each frame image, and then, for the t-th frame image, the size and the position of the tracking frame corresponding to the t-1-th frame image can be adjusted, so that the adjusted tracking frame can be suitable for the t-th frame image, and the tracking frame corresponding to the t-th frame image is obtained. The tracking frame corresponding to the t-th frame image can frame the first foreground image in the t-th frame image, so that the scene segmentation processing can be performed on the partial area of the t-th frame image according to the tracking frame corresponding to the t-th frame image, and the segmentation result corresponding to the t-th frame image is obtained. For example, the scene segmentation process may be performed on the area framed by the tracking frame corresponding to the t-th frame image in the t-th frame image. Compared with the prior art in which scene segmentation processing is carried out on all contents of the frame image, the method only carries out scene segmentation processing on partial regions of the frame image, effectively reduces the data processing amount of image scene segmentation, and improves the processing efficiency.

Step S102, determining a second foreground image of the t frame image according to the segmentation result corresponding to the t frame image.

And clearly determining which pixel points in the t frame image belong to the second foreground image and which pixel points belong to the second background image according to the segmentation result corresponding to the t frame image, thereby determining the second foreground image of the t frame image.

And step S103, adding a personalized special effect according to the second foreground image to obtain a processed t frame image.

After the second foreground image is determined, adding a personalized special effect according to the second foreground image to obtain a processed t frame image. The person skilled in the art can set the personalized special effect according to the actual need, and the invention is not limited herein. For example, an effect map may be added at the edge of the specific object according to the second foreground image, where the effect map may be a static effect map or a dynamic effect map, and specifically, when the specific object is a human body, the effect map may be an effect map such as a flame, a bouncing note, or a wave; when the specific object is a human head, the effect map may be an effect map such as a hair crown, a wobbling ear, and the like, and is specifically set according to an implementation situation, which is not limited herein.

And step S104, covering the processed t frame image with the processed t frame image to obtain processed video data.

And directly covering the original t frame image with the processed t frame image to directly obtain the processed video data. Meanwhile, the recorded user can also directly see the processed t-th frame image.

Step S105 displays the processed video data.

When the processed t frame image is obtained, the processed t frame image is directly covered on the original t frame image. The covering is faster, and is generally completed within 1/24 seconds. For the user, since the time of the overlay processing is relatively short, the human eye does not perceive the process that the original t-th frame image in the video data is overlaid. Therefore, when the processed video data is subsequently displayed, the processed video data is displayed in real time while the video data is shot and/or recorded and/or played, and a user cannot feel the display effect of covering the frame image in the video data.

According to the video data real-time processing method based on the adaptive tracking frame segmentation provided by the embodiment, for each group of frame images, the tracking frame corresponding to the t-th frame image is obtained based on the tracking frame corresponding to the t-1 th frame image, and the scene segmentation is performed on the t-th frame image by using the tracking frame, so that the segmentation result corresponding to the t-th frame image can be quickly and accurately obtained, and the segmentation precision of the image scene segmentation is effectively improved. Compared with the prior art that the scene segmentation processing is carried out on all the contents of the frame image, the method only carries out the scene segmentation processing on partial areas of the frame image, effectively reduces the data processing amount of image scene segmentation, improves the processing efficiency and optimizes the image scene segmentation processing mode; and based on the obtained segmentation result, the personalized special effect can be more accurately and quickly added to the frame image, and the video data display effect is beautified.

Fig. 2 is a schematic flowchart of a video data real-time processing method based on adaptive tracking frame segmentation according to another embodiment of the present invention, the method is used for processing groups of frame images obtained by dividing every n frames in a video, as shown in fig. 2, for one group of frame images, the method includes the following steps:

step S200, a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to the t-1 frame image are obtained.

Where t is greater than 1. For example, when t is 2, in step S200, a 2 nd frame image containing a specific object in a group of frame images and a tracking frame corresponding to the 1 st frame image are acquired, specifically, the tracking frame corresponding to the 1 st frame image is determined according to a segmentation result corresponding to the 1 st frame image; when t is 3, in step S200, a 3 rd frame image including the specific object in the group of frame images and a tracking frame corresponding to the 2 nd frame image are obtained, where the tracking frame corresponding to the 2 nd frame image is obtained by adjusting the tracking frame corresponding to the 1 st frame image during the scene segmentation processing on the 2 nd frame image.

Step S201, carrying out recognition processing on the t frame image, determining a first foreground image aiming at a specific object in the t frame image, applying a tracking frame corresponding to the t-1 frame image to the t frame image, and carrying out adjustment processing on the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t frame image.

Specifically, image processing tools such as ae (adobe After effects), nuke (the foundation nuke) and the like in the prior art can be used for identifying and processing the t-th frame image, which pixel points in the t-th frame image belong to the first foreground image can be identified, and thus the first foreground image for the specific object in the t-th frame image is determined and obtained. After the first foreground image is determined, a tracking frame corresponding to the t-1 th frame image may be set on the t-th frame image, so as to adjust the tracking frame according to the first foreground image in the t-th frame image, thereby obtaining the tracking frame corresponding to the t-th frame image.

Specifically, the proportion of pixel points belonging to a first foreground image in the t-th frame image in all pixel points in a tracking frame corresponding to the t-1-th frame image can be calculated, the proportion is determined as the proportion of first foreground pixels of the t-th frame image, then the proportion of second foreground pixels of the t-1-th frame image is obtained, wherein the proportion of second foreground pixels of the t-1-th frame image is the proportion of pixel points belonging to the first foreground image in the t-1-th frame image in all pixel points in the tracking frame corresponding to the t-1-th frame image, then the difference value between the proportion of first foreground pixels of the t-th frame image and the proportion of second foreground of the t-1-th frame image is calculated, whether the difference value exceeds a preset difference threshold value is judged, if the difference value is judged to exceed the preset difference threshold value, the tracking frame corresponding to the t-1-th frame image is not matched with the first foreground image in the t-th frame image, and adjusting the size of the tracking frame corresponding to the t-1 frame image according to the difference value. If the difference value obtained by judgment does not exceed the preset difference threshold value, the size of the tracking frame corresponding to the t-1 frame image is not adjusted. The preset difference threshold can be set by a person skilled in the art according to actual needs, and is not limited herein.

Assuming that after the tracking frame corresponding to the t-1 frame image is applied to the t-1 frame image, although the tracking frame corresponding to the t-1 frame image can completely select the first foreground image in the t-1 frame image, a difference value between a first foreground pixel proportion of the t-1 frame image and a second foreground proportion of the t-1 frame image exceeds a preset difference threshold value, which indicates that for the first foreground image in the t-1 frame image, the tracking frame corresponding to the t-1 frame image may be larger or smaller, and therefore, the size of the tracking frame corresponding to the t-1 frame image needs to be adjusted. For example, when the first foreground pixel proportion of the t-th frame image is 0.9, the second foreground proportion of the t-1 th frame image is 0.7, and the difference value between the two proportions exceeds a preset difference threshold, the size of the tracking frame corresponding to the t-1 th frame image can be adaptively enlarged according to the difference value; for another example, when the first foreground pixel ratio of the t-th frame image is 0.5, the second foreground ratio of the t-1 th frame image is 0.7, and the difference value between the two ratios exceeds the preset difference threshold, the size of the tracking frame corresponding to the t-1 th frame image can be adaptively reduced according to the difference value.

Optionally, calculating the distance between the first foreground image in the t frame image and each frame of the tracking frame corresponding to the t-1 frame image; and adjusting the size of the tracking frame corresponding to the t-1 frame image according to the calculated distance and a preset distance threshold. The preset distance threshold can be set by a person skilled in the art according to actual needs, and is not limited herein. For example, if the calculated distance is smaller than the preset distance threshold, the size of the tracking frame corresponding to the t-1 th frame image may be adaptively enlarged, so that the distance from the first foreground image in the t-th frame image to each frame of the tracking frame meets the preset distance threshold; for another example, if the calculated distance is greater than the preset distance threshold, the size of the tracking frame corresponding to the t-1 th frame image may be adaptively reduced, so that the distance from the first foreground image in the t-th frame image to each frame of the tracking frame meets the preset distance threshold.

In addition, the central point position of the first foreground image in the t frame image can be determined according to the first foreground image in the t frame image; and adjusting the position of the tracking frame corresponding to the t-1 frame image according to the position of the central point of the first foreground image in the t-frame image, so that the position of the central point of the tracking frame corresponding to the t-1 frame image is superposed with the position of the central point of the first foreground image in the t-frame image, and the first foreground image can be positioned in the middle of the tracking frame.

Step S202, extracting an image to be segmented from a partial area of the t frame image according to the tracking frame corresponding to the t frame image.

Specifically, an image in a tracking frame corresponding to the t-th frame image may be extracted from the t-th frame image, and the extracted image may be determined as an image to be segmented. Because the first foreground image in the t frame image can be completely selected by the tracking frame corresponding to the t frame image, and the pixel points in the t frame image except the tracking frame belong to the second background image, after the tracking frame corresponding to the t frame image is obtained, the image in the tracking frame corresponding to the t frame image can be extracted from the t frame image, the image is determined as the image to be segmented, and only the image to be segmented is subjected to scene segmentation subsequently, so that the data processing amount of image scene segmentation is effectively reduced, and the processing efficiency is improved.

Step S203, the image to be segmented is subjected to scene segmentation processing, and a segmentation result corresponding to the image to be segmented is obtained.

The tracking frame corresponding to the t-th frame image can completely select the first foreground image in the t-th frame image, so that the pixel points outside the tracking frame in the t-th frame image can be determined to belong to the second background image without performing scene segmentation processing on the pixel points outside the tracking frame, and thus, the scene segmentation processing can be performed only on the extracted image to be segmented.

When the image to be segmented is subjected to scene segmentation processing, a deep learning method can be utilized. Deep learning is a method based on characterization learning of data in machine learning. An observation (e.g., an image) may be represented using a number of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, a specially shaped region, etc. And tasks are easier to learn from the examples using some specific representation methods. The scene segmentation processing can be carried out on the image to be segmented by utilizing a segmentation method of deep learning, and a segmentation result corresponding to the image to be segmented is obtained. The image to be segmented is subjected to scene segmentation processing by using a scene segmentation network obtained by a deep learning method and the like to obtain a segmentation result corresponding to the image to be segmented, and which pixel points in the image to be segmented belong to the second foreground image and which pixel points belong to the second background image can be determined according to the segmentation result.

In order to facilitate the scene segmentation processing of the input image by the scene segmentation network in the prior art, the size of the image needs to be adjusted to a preset size, for example, a preset size of 320 × 240 pixels, whereas generally, the size of the image is mostly 1280 × 720 pixels, and therefore, the size of the image needs to be adjusted to 320 × 240 pixels first, and then the image after the size adjustment is subjected to the scene segmentation processing.

According to the technical scheme provided by the invention, the image in the tracking frame corresponding to the t frame image extracted from the t frame image is determined as the image to be segmented, then the image to be segmented is subjected to scene segmentation processing, when the proportion of the first foreground image in the t frame image is small, the size of the extracted image to be segmented is far smaller than that of the t frame image, and therefore, the foreground image information can be more effectively reserved for the image to be segmented which is adjusted to the preset size compared with the frame image which is adjusted to the preset size, and the segmentation precision of the obtained segmentation result is higher.

Step S204, obtaining a segmentation result corresponding to the t frame image according to the segmentation result corresponding to the image to be segmented.

The image to be segmented is an image in a tracking frame corresponding to the t-th frame image, which pixel points in the image to be segmented belong to the second foreground image and which pixel points belong to the second background image can be clearly determined according to the segmentation result corresponding to the image to be segmented, and the pixel points in the t-th frame image, which belong to the outside of the tracking frame, all belong to the second background image, so that the segmentation result corresponding to the t-th frame image can be conveniently and quickly obtained according to the segmentation result corresponding to the image to be segmented, and which pixel points in the t-th frame image belong to the second foreground image and which pixel points belong to the second background image can be clearly determined. Compared with the prior art that the scene segmentation processing is carried out on all the contents of the frame image, the scene segmentation processing method only carries out the scene segmentation processing on the image to be segmented extracted from the frame image, effectively reduces the data processing amount of image scene segmentation, and improves the processing efficiency.

In step S205, a second foreground image of the t-th frame image is determined according to the segmentation result corresponding to the t-th frame image.

And step S206, adding a personalized special effect according to the second foreground image to obtain a processed t frame image.

In a specific embodiment, key information of the region to be processed may be extracted from the second foreground image, and the effect map may be drawn according to the key information, where the key information may specifically be key point information, key region information, and/or key line information. The embodiment of the present invention is described by taking the key information as the key point information as an example, but the key information of the present invention is not limited to the key point information. The processing speed and efficiency of drawing the effect map according to the key point information can be improved by using the key point information, the effect map can be directly drawn according to the key point information, and complex operations such as subsequent calculation, analysis and the like on the key information are not needed. Meanwhile, the key point information is convenient to extract and accurate in extraction, so that the effect of drawing the effect map is more accurate. Specifically, the key point information of the edge of the region to be processed may be extracted from the second foreground image. The skilled person can set the region to be processed according to the actual requirement, which is not limited here.

In order to draw the effect map conveniently and quickly, a plurality of basic effect maps can be drawn in advance, so that when the effect map is drawn, the corresponding basic effect map can be found firstly, and then the basic effect map is processed, so that the effect map can be obtained quickly. The basic effect maps may include different clothing effect maps, decoration effect maps, texture effect maps, and the like, for example, the decoration effect maps may be effect maps such as flames, beating notes, waves, crowns, and swaying ears. In addition, in order to facilitate management of the basic effect maps, an effect map library may be established, and the basic effect maps may be stored in the effect map library.

Specifically, taking the key information as the key point information as an example, after the key point information of the region to be processed is extracted from the second foreground image, the basic effect map corresponding to the key point information may be searched, then, according to the key point information, the position information between at least two key points having a symmetric relationship is calculated, and then, according to the position information, the basic effect map is processed to obtain the effect map. In this way, the effect map can be accurately drawn. According to the method, the basic effect map corresponding to the key point information can be automatically searched from the effect map library according to the extracted key point information. In addition, in practical application, in order to facilitate the use of the user and better meet the personalized requirements of the user, the basic effect chartlet contained in the effect chartlet library can be displayed to the user, and the user can automatically specify the basic effect chartlet according to the preference of the user, so that the method can obtain the basic effect chartlet specified by the user under the condition.

After the effect map is obtained through drawing, the effect map, the second foreground image and the preset background image can be subjected to fusion processing, and a processed t-th frame image is obtained. The skilled person can set the preset background image according to the actual need, which is not limited herein. The preset background image may be a two-dimensional scene background image, or may be a three-dimensional scene background image, such as a three-dimensional submarine scene background image, a three-dimensional volcanic scene background image, or the like. In addition, the effect map, the second foreground image, and the second background image (i.e., the original background image of the t-th frame image) determined according to the segmentation result corresponding to the t-th frame image may be fused to obtain the processed t-th frame image.

Optionally, in another specific embodiment, key information of the region to be recognized may be extracted from the second foreground image, then the posture of the specific object is recognized according to the key information, so as to obtain a posture recognition result of the specific object, and then the corresponding effect processing command to be responded to the t-th frame image is determined according to the posture recognition result of the specific object, so as to obtain the processed t-th frame image.

When the gesture of the specific object is recognized, matching the key information with preset gesture key information to obtain a gesture recognition result; in addition, the gesture of the specific object can be recognized by utilizing the trained gesture recognition network, and the recognition network is trained, so that the gesture recognition result of the specific object can be conveniently and quickly obtained. And after the gesture recognition result of the specific object is obtained, determining a corresponding effect processing command to be responded to the t-th frame image according to different gesture recognition results of the specific object. Specifically, the gesture recognition results may include facial gestures, leg movements, overall body gesture movements, etc. of different shapes, and according to different gesture recognition results, in combination with different application scenes (a scene where the video data is located, a video data application scene), one or more corresponding effect processing commands to be responded may be determined for different gesture recognition results. The same gesture recognition result can determine different effect processing commands to be responded to different application scenes, and the different gesture recognition results can also determine the same effect processing command to be responded to the same application scene. For one gesture recognition result, one or more processing commands may be included in the determined effect processing command to be responded. The specific setting is according to the implementation, and does not limit here. And after the effect processing command to be responded is determined, responding to the effect processing command to be responded, and processing the t frame image according to the effect processing command to be responded so as to obtain a processed t frame image.

The effect processing command to be responded may include, for example, various effect map processing commands, stylization processing commands, brightness processing commands, light processing commands, tone processing commands, and the like. The effect processing command to be responded can comprise more than a plurality of processing commands at a time, so that when the t-th frame image is processed according to the effect processing command to be responded, the effect of the processed t-th frame image is more vivid and the whole image is more harmonious.

For example, when a user self-shoots, live broadcasts or records a fast video, if the gesture recognition result obtained by recognition is a hand-to-heart shape, the determined effect processing command to be responded to the t-th frame image may be a heart-shaped effect map processing command added to the t-th frame image, and the heart-shaped effect map may be a static map or a dynamic map; if the gesture recognition result obtained by the recognition is that two hands are placed under the head and make a flower gesture, the determined effect processing command to be responded to the t-th frame image may include an effect mapping command for adding a sunflower to the head, a stylization processing command for modifying the style of the t-th frame image into a garden style, an illumination processing command (clear day illumination effect) for processing the illumination effect of the t-th frame image, and the like.

Optionally, the corresponding effect processing command to be responded to the t-th frame image may also be determined according to the gesture recognition result of the specific object and the interaction information with the interaction object included in the t-th frame image, so as to obtain the processed t-th frame image.

For example, when the user is on the air, the t-th frame image includes the user (i.e. the specific object) and also includes the interactive information with the interactive object (e.g. the viewer watching the air), for example, the viewer watching the air feeds the user with an ice cream, and the t-th frame image appears with an ice cream. And combining the interactive information, when the obtained gesture recognition result is that the user makes a gesture of eating the ice cream, determining that the effect processing command to be responded is to remove the original ice cream effect mapping and increase the effect mapping with reduced ice cream bite, and then processing the t-th frame image according to the effect processing command to be responded so as to increase the interactive effect of the audience watching the live broadcast and attract more audiences to watch the live broadcast.

And step S207, covering the processed t frame image with the processed t frame image to obtain processed video data.

Step S208, the processed video data is displayed.

After the processed video data is obtained, the processed video data can be displayed in real time, and a user can directly see the display effect of the processed video data.

And step S209, uploading the processed video data to a cloud server.

The processed video data can be directly uploaded to a cloud server, and specifically, the processed video data can be uploaded to one or more cloud video platform servers, such as a cloud video platform server for love art, Youkou, fast video and the like, so that the cloud video platform servers can display the video data on a cloud video platform. Or the processed video data can be uploaded to a cloud live broadcast server, and when a user at a live broadcast watching end enters the cloud live broadcast server to watch, the video data can be pushed to a watching user client in real time by the cloud live broadcast server. Or the processed video data can be uploaded to a cloud public server, and when a user pays attention to the public, the cloud public server pushes the video data to a public client; further, the cloud public number server can push video data conforming to user habits to the public number attention client according to the watching habits of users paying attention to the public numbers.

According to the video data real-time processing method based on the adaptive tracking frame segmentation provided by the embodiment, for each group of frame images, the tracking frame corresponding to the t-1 frame image is adjusted according to the first foreground image in the t-th frame image to obtain the tracking frame corresponding to the t-th frame image, the image to be segmented is extracted by using the tracking frame, the segmentation result corresponding to the t-th frame image can be quickly and accurately obtained according to the segmentation result corresponding to the image to be segmented, and the segmentation precision of the image scene segmentation is effectively improved. Compared with the prior art that the scene segmentation processing is carried out on all the contents of the frame image, the method only carries out the scene segmentation processing on the image to be segmented extracted from the frame image, effectively reduces the data processing amount of image scene segmentation, improves the processing efficiency and optimizes the image scene segmentation processing mode; based on the obtained segmentation result, the personalized special effect can be added to the frame image more accurately and rapidly, and the video data display effect is beautified; in addition, the gesture can be recognized more accurately based on the obtained segmentation result, and the effect processing command to be responded can be determined quickly and accurately so as to process the frame image and optimize the video data processing mode.

Fig. 3 is a block diagram illustrating a video data real-time processing apparatus based on adaptive tracking frame segmentation according to an embodiment of the present invention, the apparatus is used for processing groups of frame images obtained by dividing every n frames in a video, as shown in fig. 3, and the apparatus includes: an acquisition module 310, a segmentation module 320, a determination module 330, a processing module 340, an overlay module 350, and a display module 360.

The acquisition module 310 is adapted to: and acquiring a t frame image containing a specific object in a group of frame images and a tracking frame corresponding to the t-1 frame image.

Wherein t is greater than 1; the tracking frame corresponding to the 1 st frame image is determined based on the segmentation result corresponding to the 1 st frame image.

The segmentation module 320 is adapted to: according to the t frame image, adjusting the tracking frame corresponding to the t-1 frame image to obtain a tracking frame corresponding to the t frame image; and according to the tracking frame corresponding to the t-th frame image, carrying out scene segmentation processing on a partial area of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image.

Optionally, the segmentation module 320 is further adapted to: identifying the t frame image, and determining a first foreground image aiming at a specific object in the t frame image; applying a tracking frame corresponding to the t-1 th frame image to the t-th frame image; and adjusting the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t-frame image.

In particular, the segmentation module 320 is further adapted to: calculating the proportion of pixel points belonging to the first foreground image in the t frame image in all pixel points in the tracking frame corresponding to the t-1 frame image, and determining the proportion as the proportion of the first foreground pixel of the t frame image; acquiring a second foreground pixel proportion of the t-1 frame image, wherein the second foreground pixel proportion of the t-1 frame image is the proportion of pixel points belonging to the first foreground image in the t-1 frame image in all pixel points in a tracking frame corresponding to the t-1 frame image; calculating a difference value between a first foreground pixel proportion of the t frame image and a second foreground proportion of the t-1 frame image; judging whether the difference value exceeds a preset difference threshold value or not; if yes, adjusting the size of the tracking frame corresponding to the t-1 frame image according to the difference value.

The segmentation module 320 is further adapted to: calculating the distance between the first foreground image in the t frame image and each frame of the tracking frame corresponding to the t-1 frame image; and adjusting the size of the tracking frame corresponding to the t-1 frame image according to the distance and a preset distance threshold.

The segmentation module 320 is further adapted to: determining the central point position of a first foreground image in the t frame image according to the first foreground image in the t frame image; and adjusting the position of the tracking frame corresponding to the t-1 frame image according to the position of the central point of the first foreground image in the t-th frame image, so that the position of the central point of the tracking frame corresponding to the t-1 frame image is superposed with the position of the central point of the first foreground image in the t-th frame image.

Optionally, the segmentation module 320 is further adapted to: extracting an image to be segmented from a partial region of the t frame image according to a tracking frame corresponding to the t frame image; carrying out scene segmentation processing on an image to be segmented to obtain a segmentation result corresponding to the image to be segmented; and obtaining a segmentation result corresponding to the t frame image according to the segmentation result corresponding to the image to be segmented.

The segmentation module 320 is further adapted to: and extracting an image in a tracking frame corresponding to the t frame image from the t frame image, and determining the extracted image as an image to be segmented.

The segmentation module 320 is further adapted to: and inputting the image to be segmented into a scene segmentation network to obtain a segmentation result corresponding to the image to be segmented.

The determination module 330 is adapted to: and determining a second foreground image of the t frame image according to the segmentation result corresponding to the t frame image.

The processing module 340 is adapted to: and adding a personalized special effect according to the second foreground image to obtain a processed t frame image.

Optionally, the processing module 340 is further adapted to: extracting key information of a region to be processed from the second foreground image; drawing an effect map according to the key information; fusing the effect mapping, the second foreground image and a preset background image to obtain a processed t frame image; or, the effect map, the second foreground image and the second background image determined according to the segmentation result corresponding to the t-th frame image are fused to obtain the processed t-th frame image.

The key information may specifically be key point information, key area information, and/or key line information. The embodiment of the present invention is described by taking key information as key point information as an example. The processing module 340 is further adapted to: searching a basic effect map corresponding to the key point information; or acquiring a basic effect map specified by a user; calculating position information between at least two key points with a symmetrical relation according to the key point information; and processing the basic effect map according to the position information to obtain the effect map.

Optionally, the processing module 340 is further adapted to: extracting key information of a region to be identified from the second foreground image; recognizing the posture of the specific object according to the key information to obtain a posture recognition result of the specific object; and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object to obtain the processed t frame image. Wherein, the effect processing command to be responded comprises an effect mapping processing command, a stylization processing command, a brightness processing command, a light processing command and/or a tone processing command.

Optionally, the processing module 340 is further adapted to: and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object and the interaction information with the interaction object contained in the t frame image to obtain the processed t frame image.

The overlay module 350 is adapted to: and covering the processed t frame image with the processed t frame image to obtain processed video data.

The display module 360 is adapted to: and displaying the processed video data.

After the processed video data is obtained, the display module 360 can display the processed video data in real time, and a user can directly see the display effect of the processed video data.

The apparatus may further comprise: an uploading module 370 adapted to upload the processed video data to a cloud server.

The uploading module 370 may directly upload the processed video data to a cloud server, and specifically, the uploading module 370 may upload the processed video data to one or more cloud video platform servers, such as a cloud video platform server for curiosity, soul, and fast videos, so that the cloud video platform servers display the video data on a cloud video platform. Or the uploading module 370 may also upload the processed video data to the cloud live broadcast server, and when a user at a live broadcast watching end enters the cloud live broadcast server to watch, the cloud live broadcast server may push the video data to the watching user client in real time. Or the uploading module 370 may also upload the processed video data to a cloud public server, and when a user pays attention to the public, the cloud public server pushes the video data to a public client; further, the cloud public number server can push video data conforming to user habits to the public number attention client according to the watching habits of users paying attention to the public numbers.

According to the video data real-time processing device based on the adaptive tracking frame segmentation provided by the embodiment, for each group of frame images, the tracking frame corresponding to the t-th frame image is obtained based on the tracking frame corresponding to the t-1 th frame image, and the scene segmentation is performed on the t-th frame image by using the tracking frame, so that the segmentation result corresponding to the t-th frame image can be quickly and accurately obtained, and the segmentation precision of the image scene segmentation is effectively improved. Compared with the prior art that the scene segmentation processing is carried out on all the contents of the frame image, the method only carries out the scene segmentation processing on partial areas of the frame image, effectively reduces the data processing amount of image scene segmentation, improves the processing efficiency and optimizes the image scene segmentation processing mode; and based on the obtained segmentation result, the personalized special effect can be more accurately and quickly added to the frame image, and the video data display effect is beautified.

The invention also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the video data real-time processing method based on the self-adaptive tracking frame segmentation in any method embodiment.

Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

As shown in fig. 4, the computing device may include: a processor (processor)402, a Communications Interface 404, a memory 406, and a Communications bus 408.

Wherein:

the processor 402, communication interface 404, and memory 406 communicate with each other via a communication bus 408.

A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.

The processor 402 is configured to execute the program 410, and may specifically execute relevant steps in the embodiment of the video data real-time processing method based on adaptive tracking box segmentation.

In particular, program 410 may include program code comprising computer operating instructions.

The processor 402 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 406 for storing a program 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 410 may be specifically configured to enable the processor 402 to execute the video data real-time processing method based on the adaptive tracking frame segmentation in any of the method embodiments described above. For specific implementation of each step in the program 410, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing real-time video data processing embodiment of adaptive tracking frame segmentation, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A real-time video data processing method based on adaptive tracking frame segmentation is used for processing groups of frame images obtained by dividing every n frames in a video, and for one group of frame images, the method comprises the following steps:

acquiring a t frame image containing a specific object in the group of frame images and a tracking frame corresponding to the t-1 frame image, wherein t is greater than 1; the tracking frame corresponding to the 1 st frame image is determined according to the segmentation result corresponding to the 1 st frame image;

according to the t frame image, adjusting the tracking frame corresponding to the t-1 frame image to obtain a tracking frame corresponding to the t frame image; according to a tracking frame corresponding to the t-th frame image, carrying out scene segmentation processing on a partial region of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image; the tracking frame corresponding to the t-1 frame image can completely select the first foreground image frame in the t-1 frame image; the tracking frame corresponding to the t frame image can select the first foreground image frame in the t frame image; the partial area refers to an area in a tracking frame corresponding to the t frame image;

adding an individualized special effect according to the second foreground image to obtain a processed t frame image;

and displaying the processed video data.

2. The method of claim 1, wherein adding a personalized special effect according to the second foreground image to obtain a processed tth frame image further comprises:

drawing an effect map according to the key information;

fusing the effect map, the second foreground image and a preset background image to obtain a processed t frame image; or, the effect map, the second foreground image and a second background image determined according to the segmentation result corresponding to the t-th frame image are fused to obtain a processed t-th frame image.

3. The method of claim 2, wherein the key information is key point information; the drawing the effect map further comprises, according to the key information:

and processing the basic effect map according to the position information to obtain an effect map.

4. The method of claim 1, wherein adding a personalized special effect according to the second foreground image to obtain a processed tth frame image further comprises:

and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object to obtain a processed t frame image.

5. The method according to claim 4, wherein the determining a corresponding effect processing command to be responded to the t-th frame image according to the gesture recognition result of the specific object, and obtaining the processed t-th frame image further comprises:

and determining a corresponding effect processing command to be responded to the t frame image according to the gesture recognition result of the specific object and the interaction information with the interaction object contained in the t frame image, so as to obtain the processed t frame image.

6. The method of claim 4, wherein the effect processing command to respond comprises an effect map processing command, a stylization processing command, a brightness processing command, a light processing command, and/or a tint processing command.

7. The method according to any one of claims 1-6, wherein the adjusting the tracking frame corresponding to the t-1 frame image according to the t-frame image further comprises:

8. The method according to claim 7, wherein the adjusting the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t-frame image further comprises:

judging whether the difference value exceeds a preset difference threshold value or not; and if so, adjusting the size of the tracking frame corresponding to the t-1 frame image according to the difference value.

9. The method according to claim 7, wherein the adjusting the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t-frame image further comprises:

10. The method according to claim 7, wherein the adjusting the tracking frame corresponding to the t-1 frame image according to the first foreground image in the t-frame image further comprises:

11. The method according to any one of claims 1 to 6, wherein the performing scene segmentation processing on the partial region of the t-th frame image according to the tracking frame corresponding to the t-th frame image to obtain the segmentation result corresponding to the t-th frame image further comprises:

performing scene segmentation processing on the image to be segmented to obtain a segmentation result corresponding to the image to be segmented;

12. The method according to claim 11, wherein the extracting the image to be segmented from the partial region of the tth frame image according to the tracking frame corresponding to the tth frame image further comprises:

13. The method according to claim 11, wherein the performing scene segmentation processing on the image to be segmented to obtain a segmentation result corresponding to the image to be segmented further comprises:

14. The method of any of claims 1-6, wherein the displaying the processed video data further comprises: displaying the processed video data in real time;

15. The method of claim 14, wherein the uploading the processed video data to a cloud server further comprises:

16. The method of claim 14, wherein the uploading the processed video data to a cloud server further comprises:

17. The method of claim 14, wherein the uploading the processed video data to a cloud server further comprises:

18. An apparatus for processing video data in real time based on adaptive tracking frame segmentation, the apparatus being used for processing groups of frame images obtained by dividing every n frames in a video, the apparatus comprising:

the segmentation module is suitable for adjusting the tracking frame corresponding to the t-1 frame image according to the t frame image to obtain the tracking frame corresponding to the t frame image; according to a tracking frame corresponding to the t-th frame image, carrying out scene segmentation processing on a partial region of the t-th frame image to obtain a segmentation result corresponding to the t-th frame image; the tracking frame corresponding to the t-1 frame image can completely select the first foreground image frame in the t-1 frame image; the tracking frame corresponding to the t frame image can select the first foreground image frame in the t frame image; the partial area refers to an area in a tracking frame corresponding to the t frame image;

the processing module is suitable for adding a personalized special effect according to the second foreground image to obtain a processed t frame image;

and the display module is suitable for displaying the processed video data.

19. The apparatus of claim 18, wherein the processing module is further adapted to:

drawing an effect map according to the key information;

20. The apparatus of claim 19, wherein the key information is key point information; the processing module is further adapted to:

21. The apparatus of claim 18, wherein the processing module is further adapted to:

22. The apparatus of claim 21, wherein the processing module is further adapted to:

23. The apparatus of claim 21, wherein the effect processing command to respond comprises an effect map processing command, a stylization processing command, a brightness processing command, a light processing command, and/or a tint processing command.

24. The apparatus of any one of claims 18-23, wherein the segmentation module is further adapted to:

25. The apparatus of claim 24, wherein the segmentation module is further adapted to:

26. The apparatus of claim 24, wherein the segmentation module is further adapted to:

27. The apparatus of claim 24, wherein the segmentation module is further adapted to:

28. The apparatus of any one of claims 18-23, wherein the segmentation module is further adapted to:

29. The apparatus of claim 28, wherein the segmentation module is further adapted to:

30. The apparatus of claim 28, wherein the segmentation module is further adapted to:

31. The apparatus of any one of claims 18-23, wherein the display module is further adapted to: displaying the processed video data in real time;

the device further comprises: and the uploading module is suitable for uploading the processed video data to the cloud server.

32. The apparatus of claim 31, wherein the upload module is further adapted to:

33. The apparatus of claim 31, wherein the upload module is further adapted to:

34. The apparatus of claim 31, wherein the upload module is further adapted to:

35. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video data real-time processing method based on the adaptive tracking frame segmentation in any one of claims 1-17.

36. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the adaptive tracking box segmentation based video data real-time processing method according to any one of claims 1 to 17.