CN113132638B

CN113132638B - Video processing method, video processing system, mobile terminal and readable storage medium

Info

Publication number: CN113132638B
Application number: CN202110436061.2A
Authority: CN
Inventors: 曾振
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2023-06-09
Anticipated expiration: 2041-04-22
Also published as: CN113132638A

Abstract

The application discloses a video processing method, a video processing system, a mobile terminal and a nonvolatile computer readable storage medium. The video comprises continuous multi-frame images, and the video processing method comprises the following steps: selecting a corresponding output frequency to output multi-frame images in the video according to the jitter parameters corresponding to the current frame images; if the current frame image is the output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; if the current frame image is not the output image, the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image. According to the method and the device, the multi-frame image in the output video of the corresponding output frequency is selected according to the dithering parameters of the current frame image to carry out the image segmentation processing, and compared with a method of dividing the images at fixed intervals, the power consumption of a video processing system can be reduced while a better image segmentation result is obtained, so that the power consumption of a mobile terminal is reduced.

Description

Video processing method, video processing system, mobile terminal and readable storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video processing method, a video processing system, a mobile terminal, and a non-volatile computer readable storage medium.

Background

When the video is subjected to background blurring, a portrait area and a background area in each frame of image need to be acquired. In general, an image in a video is output to a portrait segmentation processing model to be subjected to portrait segmentation processing, so that a portrait region and a background region in the image can be obtained. However, if each frame of image in the video is output to the image segmentation processing model to perform image segmentation, the video processing is slow and consumes a lot of power.

Disclosure of Invention

Embodiments of the present application provide a video processing method, a video processing system, a mobile terminal, and a non-volatile computer readable storage medium.

The embodiment of the application provides a video processing method. The video comprises continuous multi-frame images, and the video processing method comprises the following steps: selecting corresponding output frequency to output multi-frame images in the video according to jitter parameters corresponding to the current frame image; if the current frame image is an output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; if the current frame image is not the output image, acquiring the image segmentation result of the current frame image according to the image segmentation result of the previous frame image.

The embodiment of the application provides a video processing system. The video comprises a succession of multi-frame images, and the video processing system comprises a processor. The processor is configured to: selecting corresponding output frequency to output multi-frame images in the video according to jitter parameters corresponding to the current frame image; if the current frame image is an output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; and if the current frame image is not the output image, acquiring the image segmentation result of the current frame image according to the image segmentation result of the previous frame image.

The embodiment of the application provides a mobile terminal. The mobile terminal comprises a shooting module and a video processing system. The shooting module is used for acquiring a video, and the video comprises continuous multi-frame images. The video processing system includes a processor. The processor is configured to: selecting corresponding output frequency to output multi-frame images in the video according to jitter parameters corresponding to the current frame image; if the current frame image is an output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; and if the current frame image is not the output image, acquiring the image segmentation result of the current frame image according to the image segmentation result of the previous frame image.

Embodiments of the present application provide a non-transitory computer readable storage medium containing a computer program. The computer program, when executed by a processor, causes the processor to perform the video processing method described above.

According to the video processing method, the video processing system, the mobile terminal and the nonvolatile computer readable storage medium, the multi-frame images in the output video are selected to be subjected to image segmentation processing according to the jitter parameters of the current frame images, and compared with a method for segmenting the images by fixed frames, the power consumption of the video processing system can be reduced while a better image segmentation result is obtained, so that the power consumption of the mobile terminal is reduced.

Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a video processing method according to some embodiments of the present application;

FIG. 2 is a schematic diagram of a video processing system according to some embodiments of the present application;

fig. 3 to 9 are flowcharts of a video processing method according to some embodiments of the present application;

FIG. 10 is a schematic diagram of a portrait segmentation process performed on an input image by a portrait segmentation process model according to some embodiments of the present application;

FIGS. 11-13 are flow diagrams of video processing methods according to certain embodiments of the present application;

FIG. 14 is a schematic diagram of acquiring a portrait local area of a current frame image according to a portrait segmentation result of a first image according to some embodiments of the present application;

FIG. 15 is a schematic diagram of a current frame image of a portrait segmentation processing model according to some embodiments of the present application;

FIG. 16 is a flow chart of a video processing method of some embodiments of the present application;

FIG. 17 is a schematic diagram of a mobile terminal according to some embodiments of the present application;

FIG. 18 is a schematic illustration of interactions of a non-transitory computer readable storage medium with a processor according to some embodiments of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.

When the video is subjected to background blurring, a portrait area and a background area in each frame of image need to be acquired. In general, an image in a video is output to a portrait segmentation processing model to be subjected to portrait segmentation processing, so that a portrait region and a background region in the image can be obtained. However, if each frame of image in the video is output to the image segmentation processing model to perform image segmentation, the video processing is slow and consumes a lot of power. In order to increase the processing speed of video and reduce the power consumption, a method of dividing frames at intervals is generally adopted, that is, a frame of image is output to a portrait division processing model at intervals to divide the portrait, however, if the output frequency of the portrait division processing model is higher, that is, the fewer the number of frames at intervals, the better the portrait division effect on continuous frames, but the higher the power; the larger the output frequency of the image segmentation process to the image segmentation process model, that is, the larger the number of frames at intervals, the smaller the power, but the worse the image segmentation effect on the continuous frames. Therefore, it is difficult to balance the relationship between power consumption and image division effects by adopting the fixed frame division method.

Referring to fig. 1, the present application provides a video processing method. The video comprises continuous multi-frame images, and the video processing method comprises the following steps:

01: selecting a corresponding output frequency to output multi-frame images in the video according to the jitter parameters corresponding to the current frame images;

02: if the current frame image is the output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result;

03: if the current frame image is the output image, the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image.

Referring to fig. 1 and 2, a video processing system 100 is also provided. The video includes a plurality of successive frames of images. The video processing system 100 includes a processor 10. The shooting module 200, step 01, step 02 and step 03 can be implemented by the processor 10. That is, the processor 10 is configured to select a corresponding output frequency to output a multi-frame image in the video according to a jitter parameter corresponding to the current frame image; if the current frame image is the output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; if the current frame image is the output image, the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image.

According to the video processing method and the video processing system 100, the multi-frame images in the output video with the corresponding output frequency are selected according to the dithering parameters of the current frame images to perform the image segmentation processing, and compared with a method of dividing the images with fixed frames, the power consumption of the video processing system 100 can be reduced while a better image segmentation result is obtained.

If the current image is the 1 st frame image in the video, the 1 st frame image in the video needs to be output to the image segmentation processing model for image segmentation processing. The output frequency of the multi-frame images in the subsequent output video is updated in real time. For example, assume that a video includes 9 consecutive frames of images, a first preset frequency outputs images once every 3 frames, a second preset frequency outputs images once every 1 frame, and a third preset frequency outputs images once every 2 frames. When the current frame image is the 1 st frame image, selecting to output images at a first preset frequency; selecting to output images at a second preset frequency when the current frame image is the 2 nd frame image; and selecting to output images at a third preset frequency when the current frame image is the 3 rd frame image. The processor 10 processes the 1 st frame image, that is, the current frame image is the 1 st frame image, and the processor 10 outputs the 1 st frame image because the current frame image is the 1 st frame image. And the processor 10 obtains the 5 th and 9 th frame images as the images to be outputted according to the first preset frequency (outputting one image every 3 frames) and the number of frames of the last outputted image. The processor 10 then processes the 2 nd frame image, i.e. the current frame image is the 2 nd frame image, and the processor 10 does not output the 2 nd frame image since the 2 nd frame is not the image that needs to be output. The processor 10 updates the images to be output to the 3 rd, 5 th, 7 th and 9 th frames according to the second preset frequency (the image is output once every 1 frame) and the number of frames (the 1 st frame) of the last output image. Then, the processor 10 processes the 3 rd frame image, and since the 3 rd frame is the image to be output, the processor 10 outputs the 3 rd frame image, and updates the image to be output to the 5 th, 7 th and 9 th frame images according to the third preset frequency (the image is output once every 2 frames) and the number of frames of the last output image (the 3 rd frame). The processor 10 then operates on the 4 th to 9 th frame images. Thus, compared with the method of dividing the video by fixed frames, the method can obtain better image dividing results and reduce the power consumption of the video processing system 100.

Further, the output frequency represents a frequency at which an image in the video is output to the image segmentation process model for image segmentation processing. It can be understood that the larger the number of frames of two frames of images which are continuously output to the image segmentation processing model for image segmentation processing in the video, the smaller the output frequency is represented; the smaller the number of frames of two frames of images which are continuously output to the image segmentation processing model for image segmentation processing in the video, the larger the output frequency. For example, it is assumed that the video includes consecutive 7-frame images, the first output scheme is to output the 1 st frame, the 3 rd frame, and the 5 th frame to the portrait segmentation processing model respectively for portrait segmentation processing, that is, the number of frames actually spaced in the video between two frames of images continuously output to the portrait segmentation processing model for portrait segmentation processing is one frame, and the second output scheme is to output the 1 st frame, the 4 th frame, and the 7 th frame to the portrait segmentation processing model respectively for portrait segmentation processing, that is, the number of frames actually spaced in the video between two frames of images continuously output to the portrait segmentation processing model for portrait segmentation processing is two frames. Because the first output scheme is smaller than the number of frames of the two frames of images which are continuously output to the image segmentation processing model for image segmentation processing in the first output scheme and are actually spaced in the video, the output frequency of the first output scheme is higher than that of the second output scheme.

In some embodiments, the larger the dithering parameter corresponding to the current frame image, the larger the corresponding output frequency. For example, please refer to fig. 1 and 3, step 01: selecting the multi-frame image in the corresponding output frequency output video according to the jitter parameter corresponding to the current frame image, comprising:

011: if the jitter parameter corresponding to the current video is larger than a first threshold value, selecting a first preset frequency to output multi-frame images in the video; and if the jitter parameter corresponding to the current video is smaller than a first threshold value, selecting multi-frame images in a second preset video, wherein the second preset frequency is smaller than the first preset frequency.

Referring to fig. 2 and 3, in some embodiments, step 011 may be performed by the processor 10 implementation. That is, the processor 10 is further configured to select the multi-frame image in the first preset frequency output video if the jitter parameter corresponding to the current video is greater than the first threshold; and if the jitter parameter corresponding to the current video is smaller than the first threshold value, selecting a second preset frequency to output multi-frame images in the video, wherein the second preset frequency is smaller than the first preset frequency.

It should be noted that, the first threshold, the first preset frequency, and the second preset frequency may be preset by the manufacturer before the video processing system 100 leaves the factory; or may be set by the user according to his own needs, and is not limited herein.

The processor 10 obtains the jitter parameter of the current frame image, and if the jitter parameter of the current frame image is greater than the first threshold, the shooting device can be considered to be jittering when shooting the current frame image, and at this time, the multi-frame image in the output video of the first preset frequency is selected. If the jitter parameter of the current frame image is smaller than the first threshold, the shooting device can be considered to be relatively stable when the current frame image is shot, and at the moment, a second preset frequency is selected to output multi-frame images in the video, wherein the second preset frequency is smaller than the first preset frequency. If the jitter parameter of the current frame image is equal to the first threshold, the photographing device may be considered to be jittering when the current frame image is photographed, or may be considered to be relatively stable when the current frame image is photographed, and at this time, the multi-frame image in the output video of the first preset frequency may be selected, or the multi-frame image in the output video of the second preset frequency may be selected, which is not limited herein. In some embodiments, a gyroscope (not shown) is also included in the video processor system 100 for acquiring shake parameters of the photographing apparatus when photographing an image. Of course, in some embodiments, the shake parameter when capturing the image may be obtained by other methods, which are not illustrated herein.

Since the shake parameter corresponding to the current frame image is larger than the first threshold value, the shooting device is in shake when shooting the current frame image. At this time, the scene of the subsequent shooting may change, and the region of the portrait between the subsequent multi-frame continuous images may change, if the portrait segmentation is performed by using the image-to-portrait segmentation processing model in the video output at a smaller output frequency, that is, the images are output to the portrait segmentation processing model for portrait segmentation processing only after the multi-frame comparison, the worse the portrait segmentation effect of the continuous frames may be caused, thereby affecting the subsequent processing. Therefore, in this embodiment, when the jitter parameter corresponding to the current frame image is greater than the first threshold, the multiple frame images in the video are output at the first preset frequency, that is, the number of frames of two frame images that are continuously output to the image segmentation processing model for image segmentation processing, which are actually spaced in the video, is smaller, so that the image segmentation effect can be improved. The jitter parameter corresponding to the current frame image is smaller than the first threshold value, which indicates that the photographing device is relatively stable when photographing the current frame image. At this time, the scene shot in the subsequent short period of time may remain unchanged, and the areas of the figures between the subsequent multi-frame continuous images may be completely consistent, if the previous output frequency is still adopted to output the images in the video to the figure segmentation processing model for the figure segmentation, the improvement of the figure segmentation effect is not obvious and larger power consumption is also required. Therefore, in this embodiment, when the jitter parameter corresponding to the current frame image is smaller than the first threshold, the multi-frame image in the video is output at the second preset frequency, which is smaller than the first preset frequency, that is, the number of frames of the two frames of images that are continuously output to the portrait segmentation processing model for portrait segmentation processing, which are actually spaced in the video, is larger, so that the power consumption of the video processing system 100 can be reduced.

It should be noted that, in some embodiments, the larger the jitter parameter corresponding to the current frame image is, the smaller the selected output frequency is; the smaller the jitter parameter corresponding to the current frame image, the larger the selected output frequency. Because different dithering parameters correspond to different output frequencies, the power consumption of the video processing system 100 can be further reduced while obtaining better image segmentation results.

Referring to fig. 1 and 4, in some embodiments, step 01: selecting the corresponding output frequency to output the multi-frame image in the video according to the jitter parameter corresponding to the current frame image, and further comprising:

012: and selecting a corresponding output frequency to output multi-frame images in the video according to the dithering parameters corresponding to the current frame image and the similarity between the current frame image and the previous frame image.

Referring to fig. 2 and 4, in some embodiments, step 012 may also be performed by processor 10. That is, the processor 10 is further configured to select the corresponding output frequency to output the multi-frame image in the video according to the dithering parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image.

Since the photographing apparatus remains stable throughout the video photographing process, the picture in the video may change, that is, the region of the portrait between successive frames of images may change. Therefore, in this embodiment, the multi-frame image in the video is output by selecting the corresponding output frequency according to the dithering parameter corresponding to the current image and the similarity between the current frame image and the previous frame image, so that the power consumption of the video processing system 100 can be further reduced while obtaining a better image segmentation result.

Specifically, referring to fig. 4 and 5, in some embodiments, step 012: according to the dithering parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image, selecting the corresponding output frequency to output the multi-frame image in the video, comprising:

0121: if the similarity between the current frame image and the previous frame image is smaller than the first preset similarity, selecting a first output frequency to output multi-frame images in the video;

0122: and if the similarity between the current frame image and the previous frame image is larger than the first preset similarity, obtaining a dithering parameter corresponding to the current frame image, and selecting a corresponding second output frequency to output multi-frame images in the video according to the dithering parameter, wherein the second output frequency is smaller than the first output frequency.

Referring to fig. 2 and 5, in some embodiments,

steps

0121 and 0122 may also be implemented by the processor 10. That is, the processor 10 is further configured to select the first output frequency to output the multi-frame image in the video if the similarity between the current frame image and the previous frame image is less than the first preset similarity; and if the similarity between the current frame image and the previous frame image is greater than the first preset similarity, obtaining a dithering parameter corresponding to the current frame image, and selecting a corresponding second output frequency to output multi-frame images in the video according to the dithering parameter, wherein the second output frequency is smaller than the first output frequency.

It should be noted that, the first preset similarity, the first output frequency, and the second output frequency may be preset by the manufacturer before the video processing system 100 leaves the factory; or may be set by the user according to his own needs, and is not limited herein.

Illustratively, in some embodiments, the processor 10 obtains the similarity of the current frame image to the previous frame image. If the similarity between the current frame image and the previous frame image is smaller than the first preset similarity, the difference between the current frame image and the previous frame image is considered, and at the moment, a plurality of frame images in the video are output at the first output frequency. Because the similarity between the current frame image and the previous frame image is smaller than the first preset similarity, that is, the difference exists between the current frame image and the previous frame image, at this time, the scene shot subsequently may change, the region of the portrait between the continuous images of the subsequent frames may change, if the portrait segmentation is performed by using the image-to-portrait segmentation processing model in the output video with a smaller output frequency, that is, the images are output to the portrait segmentation processing model for portrait segmentation processing only after the multiple frames are compared at intervals, the worse the portrait segmentation effect of the continuous frames may be caused, thereby affecting the subsequent processing. Therefore, in this embodiment, when the similarity between the current frame image and the previous frame image is smaller than the first preset similarity, the multi-frame image in the video is output at the first output frequency with a larger ratio, that is, the number of frames of the two frames of images which are continuously output to the image segmentation processing model for image segmentation processing in the video actually spaced is smaller, so that the image segmentation effect can be improved.

It should be noted that, in some embodiments, the third preset similarity is smaller than the first preset similarity, and if the similarity between the current frame image and the previous frame image is smaller than the third preset similarity, the current frame image is directly output. Specifically, when the similarity between the current frame image and the previous frame image is smaller than the third preset similarity, the current frame image and the previous frame image can be considered to be completely different, that is, the portrait area of the current frame image and the portrait area of the previous frame image have a larger difference. If the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image, the finally obtained image segmentation effect is poor. Therefore, in this embodiment, when the similarity between the current frame image and the previous frame image is smaller than the first preset similarity, the current frame image is directly output to the portrait segmentation processing model to perform portrait segmentation processing, so that the final portrait segmentation effect can be improved, and the experience of the user can be improved.

In addition, in some embodiments, the similarity between the current frame image and the previous frame image may be calculated by using a perceptual hash algorithm, and then the hamming distance is used to evaluate the similarity between the two images, so as to obtain the similarity between the current frame image and the previous frame image. Specifically, the perceptual hash algorithm comprises the steps of: (1) The size is reduced, and the image is reduced to a size of 8×8, for a total of 64 pixels. The effect of this step is to remove the details of the image, only keep basic information such as structure, brightness, etc., discard the picture difference brought by different size, proportion; (2) Simplifying the color, and converting the reduced image into 64-level gray scale. That is, all pixels have only 64 colors in total; (3) Calculating an average value, and calculating the gray average value of all 64 pixels in the image subjected to color processing; (4) The gray scale of the pixels is compared, and the gray scale of each pixel is compared with the average value. Greater than or equal to the average value, recorded as 1; less than the average, recorded as 0; (5) Calculating a hash value, and combining the comparison results of the previous step (namely, the step (4)) to form a 64-bit integer, namely, the fingerprint of the frame of image. The order of the combination is not important as long as it is ensured that all pictures are in the same order. After the fingerprints of the current frame image and the previous frame image are obtained, the 64 bits in the fingerprints of the current frame image and the previous frame image can be compared according to the hamming distance so as to obtain the similarity of the current frame image and the previous frame image. Of course, in some embodiments, the similarity between the current frame image and the previous frame image may be obtained by other algorithms, which are not exemplified herein.

If the similarity between the current frame image and the previous frame image is greater than the first preset similarity, the current frame image and the previous frame image are considered to be similar, at the moment, the dithering parameters corresponding to the current frame image are acquired, the multi-frame image in the output video of the second output frequency is selected according to the dithering parameters, and the second output frequency is smaller than the first output frequency. Since the similarity between the current frame image and the previous frame image is greater than the first preset similarity, that is, the current frame image is similar to the previous frame image, at this time, the scene shot in a short period of time may remain unchanged, the areas of the portraits between the continuous images of the subsequent frames may be completely consistent, if the previous output frequency is still adopted to output the images in the video to the portraits segmentation processing model for portraits segmentation, the improvement of the portraits segmentation effect is not obvious and larger power consumption is also required. Therefore, in this embodiment, when the similarity between the current frame image and the previous frame image is greater than the first preset similarity, the multi-frame image in the video is output at the second smaller output frequency according to the jitter parameter of the current frame image, that is, the number of frames of the two frame images that are continuously output to the portrait segmentation processing model for portrait segmentation processing is greater in the video at the actual interval, so that the power consumption of the video processing system 100 can be reduced.

In some embodiments, the second output frequency includes a plurality of different frequencies, and when the similarity between the current frame image and the previous frame image is greater than the first preset similarity, it is predicted whether the portrait area in the subsequent image is different according to the dithering parameter of the current frame image, so as to select the corresponding second output frequency to output the multi-frame image in the video. This can further reduce the power consumption of the video processing system 100 while achieving better image segmentation results. Specifically, referring to fig. 5 and 6, in some embodiments, the second output frequency includes a first frequency and a second frequency, and the first frequency is greater than the second frequency. Step 0122: if the similarity is greater than the first preset similarity, obtaining a jitter parameter corresponding to the current frame image, and selecting a corresponding second output frequency to output a multi-frame image in the video according to the jitter parameter, wherein the method comprises the following steps:

01221: if the similarity is greater than the first preset similarity and the jitter parameter is greater than the first preset value, selecting a multi-frame image in the first frequency output video;

01222: if the similarity is larger than the first preset similarity and the jitter parameter is smaller than the first preset value, selecting a multi-frame image in the second frequency output video.

Referring to fig. 2 and 6, in some embodiments, both

steps

0121 and 0122 may be implemented by the processor 10. That is, the processor 10 is further configured to select the multi-frame image in the first frequency output video if the similarity is greater than the first preset similarity and the jitter parameter is greater than the first preset value; and if the similarity is greater than the first preset similarity and the jitter parameter is smaller than the first preset value, selecting a multi-frame image in the second frequency output video.

It should be noted that, the first preset similarity, the first preset value, the first frequency, and the second frequency may be preset by the manufacturer before the video processing system 100 leaves the factory; the first preset value may be the same as or different from the first threshold in the above embodiment, and is not limited herein, but it is always satisfied that both the first frequency and the second frequency are smaller than the first output frequency.

For example, if the similarity between the current frame image and the previous frame image is greater than a first preset similarity, and the jitter parameter of the current frame image is greater than a first preset value, the difference between the current image and the previous frame image may be considered to be smaller, and the photographing device is jittering when photographing the current frame image, and at this time, a multi-frame image in the output video with the first frequency is selected. If the similarity is greater than the first preset similarity and the jitter parameter is smaller than the first preset value, selecting the multi-frame image in the second frequency output video, wherein the difference between the current image and the previous frame image is considered to be smaller, and the shooting equipment is relatively stable when the current frame image is shot, and at the moment, selecting the multi-frame image in the second frequency output video. It should be noted that, although the difference between the current frame image and the previous frame image is smaller, the difference between the portrait areas of the two adjacent frame images is smaller in a shorter time. However, if the shake parameter corresponding to the current frame image is larger, that is, when the current frame image is shot, the shooting device shakes, it is highly likely that the picture in the video will change the scene, that is, the region of the portrait between the subsequent multi-frame continuous images may change, and if the portrait segmentation is performed by using the image-to-portrait segmentation processing model in the video output with a smaller output frequency, that is, the images are output to the portrait segmentation processing model for portrait segmentation processing only after the multi-frame comparison, the worse the portrait segmentation effect of the continuous frames may be caused, thereby affecting the subsequent processing. Therefore, in this embodiment, when the similarity is greater than the first preset similarity and the jitter parameter is greater than the first preset value, the multi-frame image in the video is output at the first frequency with the larger ratio, that is, the number of frames of the two frames of images which are continuously output to the portrait segmentation processing model for portrait segmentation processing in the video actually spaced is smaller, so as to improve the portrait segmentation effect. Likewise, if the difference between the current frame and the previous frame image is not large, and the jitter parameter corresponding to the current frame image is small, that is, the shooting device remains stable when the current frame image is shot, the possibility that the scene is to be converted in the subsequent video is not large, that is, the scene shot in the subsequent short period of time may remain unchanged, the areas of the figures between the subsequent multi-frame continuous images may be completely consistent, and if the previous output frequency is still adopted to output the image-to-figure segmentation processing model in the video for carrying out figure segmentation, the improvement of the figure segmentation effect is not obvious and larger power consumption is also required. Therefore, in this embodiment, when the similarity is greater than the first preset similarity and the jitter parameter is smaller than the first preset value, the multi-frame image in the video is output at the second smaller frequency, that is, the number of frames of the two frames of images that are continuously output to the portrait segmentation processing model for portrait segmentation processing actually spaced in the video is greater, so that the power consumption of the video processing system 100 can be reduced.

Referring to fig. 4 and 7, in some embodiments, step 012: according to the dithering parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image, selecting the corresponding output frequency to output the multi-frame image in the video, comprising:

0123: if the jitter parameter corresponding to the current frame image is larger than the first preset jitter parameter, selecting a third output frequency to output a multi-frame image in the video;

0124: and if the jitter parameter corresponding to the current frame image is smaller than the first jitter parameter, acquiring the similarity between the current frame image and the previous frame image, and selecting a corresponding fourth output frequency to output multi-frame images in the video according to the similarity, wherein the fourth output frequency is smaller than the third output frequency.

Referring to fig. 2 and 7, in some embodiments, both

steps

0123 and 0124 may be implemented by the processor 10. That is, the processor 10 is further configured to select the third output frequency to output the multi-frame image in the video if the jitter parameter corresponding to the current frame image is greater than the first preset jitter parameter; and if the dithering parameter corresponding to the current frame image is smaller than the first dithering parameter, obtaining the similarity between the current frame image and the previous frame image, and selecting a corresponding fourth output frequency to output multi-frame images in the video according to the similarity, wherein the fourth output frequency is smaller than the third output frequency.

It should be noted that, the first preset jitter parameter, the third output frequency, and the fourth output frequency may be preset by the manufacturer before the video processing system 100 leaves the factory; the first preset jitter parameter may be the same as or different from the first threshold value and/or the first preset value in the above embodiment, the third output frequency may be the same as or different from the first output frequency in the above embodiment, and the fourth output frequency may be the same as or different from the second output frequency in the above embodiment, which are not limited herein.

In some embodiments, the processor 10 obtains the shake parameter corresponding to the current frame image, if the shake parameter corresponding to the current frame image is greater than the first preset shake parameter, the shake of the shooting device is considered to be more severe when the current frame image is shot, the scene shot subsequently may change, the region of the portrait between the continuous images of the subsequent frames may change, and the similarity between the current frame image and the last frame image is not required to be compared, so that the multi-frame image in the video is directly selected to be output at the third output frequency which is greater, namely, the number of frames of the actual intervals between the two frame images which are continuously output to the portrait segmentation processing model for portrait segmentation processing is smaller, so that the portrait segmentation effect can be improved on the one hand; on the other hand, not comparing the similarity of the current frame image and the previous frame image reduces the power consumption of the video processing system 100.

In some embodiments, the fourth output frequency includes a plurality of different frequencies, and when the jitter parameter corresponding to the current frame image is smaller than the first preset jitter parameter, the similarity between the current frame image and the previous frame image is obtained, and the multi-frame image in the corresponding fourth output frequency output video is selected according to the similarity. This can further reduce the power consumption of the video processing system 100 while achieving better image segmentation results. Specifically, referring to fig. 7 and 8, in some embodiments, the fourth output frequency includes a third frequency and a fourth frequency, and the third frequency is greater than the fourth frequency. Step 0124: if the jitter parameter corresponding to the current frame image is smaller than the first preset jitter parameter, obtaining the similarity between the current frame image and the previous frame image, and selecting a corresponding multi-frame image in the fourth output frequency output video according to the similarity, wherein the method comprises the following steps:

01241: if the dithering parameters of the previous image are smaller than the first preset dithering parameters and the similarity between the current frame image and the previous frame image is smaller than a second threshold value, selecting a multi-frame image in the third frequency output video;

01242: and if the dithering parameters of the previous image are smaller than the first preset dithering parameters and the similarity between the current frame image and the previous frame image is larger than a second threshold value, selecting a multi-frame image in the fourth frequency output video.

Referring to fig. 2 and 8, in some embodiments, both

steps

01241 and 01242 may be performed by the processor 10. That is, the processor 10 is further configured to select the multi-frame image in the third frequency output video if the jitter parameter of the previous image is smaller than the first preset jitter parameter and the similarity between the current frame image and the previous frame image is smaller than the second threshold; and if the dithering parameter of the previous image is smaller than the first preset dithering parameter and the similarity between the current frame image and the previous frame image is larger than a second threshold value, selecting a multi-frame image in the fourth frequency output video.

It should be noted that, the second threshold value, the third frequency, and the fourth frequency may be preset by the manufacturer before the video processing system 100 leaves the factory; the second threshold may be set by the user according to the user's own needs, and the second threshold may be the same as or different from the first preset similarity in the above embodiment, which is not limited herein, but always satisfies that both the third frequency and the fourth frequency are smaller than the third output frequency.

For example, if the jitter parameter corresponding to the current frame image is smaller than the first preset jitter parameter and the similarity between the current frame image and the previous frame image is smaller than the second threshold, the shooting device can be considered to be stable when the current frame image is shot, but the scene in the picture changes, and at the moment, a larger multi-frame image in the video is output at the third frequency, that is, the number of frames of two frames of images which are continuously output to the portrait segmentation processing model for portrait segmentation processing in the video at actual intervals is smaller, so that the portrait segmentation effect can be improved. If the jitter parameter corresponding to the current frame image is smaller than the first preset jitter parameter, and the similarity between the current frame image and the previous frame image is larger than the second threshold, the shooting device can be considered to be stable when the current frame image is shot, the scene in the picture is not changed, and at the moment, a plurality of frame images in the video are output at a smaller fourth frequency, namely, the number of frames of two frame images which are continuously output to the portrait segmentation processing model for portrait segmentation processing at actual intervals in the video is larger, so that the power consumption of the video processing system 100 can be reduced.

After the output is completed, the processor 10 sequentially further processes the images in the video to obtain the portrait segmentation result of each frame of image. For example, if the current frame image is an output image, performing portrait segmentation processing on the current frame image to obtain a portrait segmentation result; if the current frame image is the output image, the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image. More specifically, referring to fig. 1 and 9, in some embodiments, if the current frame image is an output image, performing a portrait segmentation process on the current frame image to obtain a portrait segmentation result includes:

021: if the image subjected to the image segmentation processing does not exist, inputting the current frame image into the image segmentation processing model to acquire a human image segmentation result of the current frame image;

022: if the image subjected to the portrait segmentation processing exists, selecting a first portrait segmentation processing or a second portrait segmentation processing according to the dithering parameters corresponding to the current frame image and/or the similarity between the current frame image and the first image, wherein the first image is the image subjected to the portrait segmentation processing.

Referring to fig. 2 and 9, in some embodiments, both

steps

021 and 022 may be implemented by the processor 10. That is, the processor 10 is further configured to input the current frame image into the image segmentation processing model to obtain the image segmentation result of the current frame image if there is no image that has undergone the image segmentation processing; and if the image subjected to the portrait segmentation processing exists, selecting the first portrait segmentation processing or the second portrait segmentation processing according to the dithering parameters corresponding to the current frame image and/or the similarity between the current frame image and the first image, wherein the first image is the image subjected to the portrait segmentation processing.

Specifically, referring to fig. 10, if the current frame image is an output image and there is no image that has undergone the image segmentation process, the current frame image is directly input to the image segmentation process model to undergo the image segmentation process, so as to obtain a result of image segmentation of the current frame image. In some embodiments, the image segmentation process model is trained in advance. Wherein the model training comprises the following steps: (1) Acquiring training data, in some embodiments, by gathering a large number of portrait pictures and manually labeling portrait areas, a mask map corresponding to the portrait pictures may be formed; scaling the mask map to a preset size to form training data; (2) And inputting training data into a neural network, and learning to obtain a network parameter model. Wherein, in some embodiments, the neural network may be initialized after the training data is input into the neural network; and updating the neural network weight by using an adaptive estimation matrix algorithm. In the training process, dynamically adjusting the learning rate to obtain a network parameter model obtained through learning; and (3) after the neural network converges, storing the network parameter model to complete model training. The image segmentation processing model performs image segmentation processing on an input image, and comprises the following steps: firstly, scaling the input image to a preset size; and inputting the zoomed image into a network parameter model to obtain a human image segmentation mask map, wherein the human image segmentation mask map is used as a human image segmentation preliminary image. After the primary image of the portrait segmentation is obtained, removing the isolated area of the wrong segmentation, and reserving a communication area; and dividing details of the connected region by using an image matting algorithm, highlighting edges to obtain a final image of the image segmentation, and outputting the final image of the image segmentation to a human image segmentation processing model to obtain a human image segmentation result of the input image. Of course, in some embodiments, the image segmentation processing model may be trained in other manners, and the image input to the image segmentation processing model may be subjected to image segmentation processing in other manners to obtain image segmentation results, which are not exemplified here.

If the current frame image is an output image and there is an image that has been subjected to the portrait segmentation processing, a different portrait segmentation processing method may be selected according to the image that has been subjected to the portrait segmentation processing in the previous frame. Specifically, referring to fig. 9 and 11, in some embodiments, step 022: according to the dithering parameter corresponding to the current frame image and/or the similarity between the current frame image and the first image, selecting a first portrait segmentation process or a second portrait segmentation process, wherein the first image is an image subjected to portrait segmentation process, and the method comprises the following steps:

0221: if the jitter parameter corresponding to the current frame image is larger than a second preset jitter parameter and/or the similarity between the current frame image and the first image is smaller than the second preset similarity, performing first person image segmentation processing on the current frame image to obtain a person image segmentation result of the current frame image;

0222: and if the jitter parameter corresponding to the current frame image is smaller than a second preset jitter parameter and/or the similarity between the current frame image and the first image is larger than the second preset similarity, performing second portrait segmentation processing on the current frame image to obtain a portrait segmentation result of the current frame image.

Referring to fig. 2 and 11, in some embodiments, both

steps

0221 and 0222 may be implemented by the processor 10. That is, the processor 10 is further configured to perform the first portrait segmentation processing on the current frame image to obtain a portrait segmentation result of the current frame image if the shake parameter corresponding to the current frame image is greater than the second preset shake parameter and/or the similarity between the current frame image and the first image is less than the second preset similarity; and if the jitter parameter corresponding to the current frame image is smaller than a second preset jitter parameter and/or the similarity between the current frame image and the first image is larger than a second preset similarity, performing second portrait segmentation processing on the current frame image to obtain a portrait segmentation result of the current frame image.

It should be noted that, the second preset similarity and the second preset jitter parameter may be preset by the manufacturer before the video processing system 100 leaves the factory; or can be set by the user according to the own requirements. The second preset similarity may be the same as or different from the first preset similarity in the above embodiment, and the second preset jitter parameter may be the same as or different from the first preset jitter parameter in the above embodiment, which is not limited herein.

For example, in some embodiments, when the current frame image is an output image and there is an image that has undergone a portrait segmentation process, a shake parameter corresponding to the current frame image and/or a similarity between the current frame image and the first image are obtained, and if at least one of the shake parameter is greater than a second preset shake parameter and the similarity is less than the second preset similarity is satisfied, it may be considered that a portrait area of the current frame image and the first image has a larger phase difference, and at this time, the first portrait segmentation process is performed on the current frame image to obtain a portrait segmentation result of the current frame image. The first image is an image that is output to the portrait segmentation processing model for portrait segmentation from the previous frame of the current image, and this is also the case for the first image mentioned below, which will not be described in detail later. For example, assuming that the 1 st frame, the 3 rd frame and the image segmentation processing model in the video are output to perform image segmentation processing, the 5 th frame image is the current frame image and the 5 th frame image is the output image, selecting the 3 rd frame image as the first image, acquiring the jitter parameter corresponding to the 5 th frame image and/or the similarity between the 5 th frame image and the 3 rd frame image, and performing the first image segmentation processing on the 5 th frame image to acquire the image segmentation result of the 5 th frame image if at least one of the jitter parameter is greater than the second preset jitter parameter and the similarity is less than the second preset similarity is satisfied. Because the current frame image and the human image area of the first image have larger phase difference, the human image segmentation result of the first image has no reference value on the human image segmentation result of the current frame image, so the first human image segmentation processing is carried out on the current frame image.

More specifically, in some embodiments, referring to fig. 12, performing a first image segmentation process on a current frame image includes:

02211: and inputting the current frame image into a portrait segmentation processing model to obtain a portrait segmentation result of the current frame image.

Referring to fig. 2 and 12, in some embodiments, step 02211 may also be implemented by the processor 10. That is, the processor 10 is further configured to input the current frame image into the portrait segmentation processing model to obtain a portrait segmentation result of the current frame image.

When the current frame image is an output image and an image subjected to the image segmentation processing exists, a shake parameter corresponding to the current frame image and/or a similarity between the current frame image and the first image are obtained, if at least one of the shake parameter is greater than a second preset shake parameter and the similarity is smaller than the second preset similarity is met, that is, a difference between a human image area of the current frame image and a human image area of the first image is greater, a human image segmentation result of the first image has no reference value to the human image segmentation result obtained from the current frame image, so that the current frame image is directly input into the human image segmentation processing model to obtain the human image segmentation result of the current frame image, and a specific implementation manner of the human image segmentation processing model in the embodiment described in fig. 10 is the same as that of the human image segmentation processing model for the input image, and is not repeated here.

In some embodiments, when the current frame image is an output image and there is an image that has undergone a portrait segmentation process, a shake parameter corresponding to the current frame image and/or a similarity between the current frame image and the first image are obtained, and if at least one of the shake parameter being smaller than a second preset shake parameter and the similarity being greater than a second preset similarity is satisfied, it may be considered that a portrait area of the current frame image and a portrait area of the first image has a smaller phase difference, and at this time, a second portrait segmentation process is performed on the current frame image to obtain a portrait segmentation result of the current frame image. More specifically, in some embodiments, referring to fig. 13, performing a second portrait segmentation process on the current frame image includes:

02221: acquiring a portrait local area of the current frame image according to a portrait segmentation result of the first image;

02222: and inputting the portrait local area of the current frame image into a portrait segmentation processing model to obtain a portrait segmentation result of the current frame image.

Referring to fig. 2 and 13, in some embodiments,

steps

02221 and 02222 may also be implemented by the processor 10. That is, the processor 10 is further configured to obtain a portrait local area of the current frame image according to a portrait segmentation result of the first image; and inputting the portrait local area of the current frame image into a portrait segmentation processing model to obtain a portrait segmentation result of the current frame image.

When the current frame image is an output image and an image subjected to the portrait segmentation processing exists, the shaking parameters corresponding to the current frame image and/or the similarity between the current frame image and the first image are obtained, and if at least one of the shaking parameters smaller than the second preset shaking parameters and the similarity larger than the second preset similarity is met, that is, the portrait region of the current frame image and the portrait region of the first image have smaller phase difference, the portrait segmentation result of the current frame image can be obtained according to the segmentation result of the portrait of the first image, so that the portrait segmentation effect can be further improved.

Specifically, according to the portrait segmentation result of the first image, a portrait local area of the current frame image is obtained. For example, in one example, referring to fig. 14, a first pre-selection frame C1 is acquired according to the portrait segmentation result of the first image. Wherein the portrait portion in the first image is entirely in the first pre-selection frame C1. And setting a second pre-selection frame C2 in the current frame image, wherein the size and the position of the second pre-selection frame C2 in the current frame image are identical to those of the first pre-selection frame C1 in the first image, and taking the image in the second pre-selection frame C2 as a portrait local area of the current frame image. Because the phase difference between the image of the current frame and the image area of the first image is smaller, namely the phase difference between the image local area in the image of the current frame and the image local area in the first image is smaller. Therefore, if the portrait portion in the first image is all in the first pre-selection frame C1, and the size and position of the second pre-selection frame C2 in the current frame image are identical to those of the first pre-selection frame C1 in the first image, the portrait portion of the current frame image can be all in the second pre-selection frame C2. It should be noted that, in some embodiments, the first pre-selection frame C1 and the second pre-selection frame C2 may be rectangular, so that the partial portrait area of the current frame image obtained in this way is rectangular, which is beneficial to the subsequent operations. Of course, in some embodiments, the first pre-selection frame C1 and the second pre-selection frame C2 may have other shapes, which is not limited herein.

Referring to fig. 15, after the image local area of the current frame image is obtained, the image local area of the current frame image is input into the image segmentation processing model to obtain the image segmentation result of the current frame image. Namely, after the portrait local area in the current frame image is cut out from the current image, only the image including the portrait local area is input into the portrait segmentation processing model, and the specific implementation manner is the same as that of the portrait segmentation processing model in the embodiment described in fig. 10, and details of the portrait segmentation processing on the input image are not repeated here. Because the phase difference between the human image area of the current frame image and the human image area of the first image is smaller, after the human image local area of the current frame image is acquired according to the first image, the human image local area of the current frame image is only input into the human image segmentation processing model to carry out human image segmentation processing, and compared with the human image segmentation processing of inputting the whole current frame image into the human image segmentation processing model, the human image segmentation processing is beneficial to improving the human image segmentation accuracy. In some embodiments, after the size of the local area of the cut portrait is the same as that of the original current frame image, the local area of the cut portrait is input to a portrait segmentation processing model to perform portrait segmentation processing to obtain a segmented image, the segmented image is reduced to the size same as that of the local area of the original current frame image, and then the segmented image is placed in a portrait segmentation result image of the current frame image, wherein the position and the size of the reduced segmented image in the portrait segmentation result image of the current frame image are completely the same as those of a second pre-selected frame C2 (i.e. the local area of the current frame image) in the current frame image, so as to obtain a portrait segmentation result of the current frame image. Therefore, the recognition of the portrait details by the portrait segmentation processing model can be facilitated, and the portrait segmentation accuracy can be further improved.

After the output is completed, the processor 10 sequentially further processes the images in the video to obtain the portrait segmentation result of each frame of image. For example, if the current frame image is not the output image, the image segmentation result of the current frame image is obtained according to the image segmentation result of the previous frame image. In some embodiments, if the current frame image is not the output image, the image segmentation result of the current frame image is the same as the image segmentation result of the previous frame image, i.e. the image region of the current frame image is the same as the image region of the previous frame image. For example, assume that 5 consecutive frames of images are included in a video, wherein the 1 st frame image and the 4 th frame image are output images. When the processor 10 further processes the 1 st frame image to obtain a portrait segmentation result of the 1 st frame image, since the 1 st frame image is an output image, the 1 st frame image is input into a portrait segmentation processing model to perform portrait segmentation processing to obtain a portrait segmentation result of the 1 st frame; then, the processor 10 further processes the 2 nd frame image, and as the 2 nd frame image is not an output image, that is, the 2 nd frame image is not input into the portrait segmentation processing model, the portrait segmentation result of the 2 nd frame image is obtained according to the portrait segmentation result of the 1 st frame image; then, the processor 10 further processes the 3 rd frame image, and as the 3 rd frame image is not an output image, that is, the 3 rd frame image is not input into the portrait segmentation processing model, a portrait segmentation result of the 3 rd frame image is obtained according to a portrait segmentation result of the 2 nd frame image; then the processor 10 further processes the 4 th frame image, and since the 4 th frame image is an output image, the 4 th frame image is input into the portrait segmentation processing model to perform portrait segmentation processing to obtain a portrait segmentation result of the 4 th frame; then, the processor 10 further processes the 5 th frame image, and since the 5 th frame image is not an output image, that is, the 5 th frame image is not input into the portrait segmentation processing model, the portrait segmentation result of the 5 th frame image is obtained according to the portrait segmentation result of the 4 th frame image, so that the portrait segmentation result of each frame image in the video is obtained.

Referring to fig. 16, in some embodiments, the video processing method further includes:

04: and acquiring a portrait region and a background region of the current frame image according to the portrait segmentation result of the current frame image, and carrying out blurring treatment on the background region of the current frame image.

Referring to fig. 2 and 16, in some embodiments, step 04 may also be performed by the processor 10. That is, the processor 10 is further configured to obtain a portrait area and a background area of the current frame image according to the portrait segmentation result of the current frame image, and perform blurring processing on the background area of the current frame image.

In some embodiments, after the processor 10 obtains the portrait segmentation result of the current frame image, the processor 10 can obtain a portrait area (i.e. a white portion in the portrait segmentation result image as in fig. 10) and a background area (i.e. a black portion in the portrait segmentation result image as in fig. 10) of the current frame image according to the portrait segmentation result, and the processor 20 performs blurring processing on the background area of the current frame image. When the processor 10 performs blurring processing on the background area of each frame of image, the function of blurring the video background can be realized, so that the application scene of the video processing method can be improved, and the experience of the user in using the video processing system 100 can be improved.

Referring to fig. 17, the present application further provides a mobile terminal 1000. The mobile terminal 1000 includes the video processing system 100 and the photographing module 200 according to any of the above embodiments. The shooting module 200 is used for acquiring video. The mobile terminal 1000 may be a mobile phone, a tablet computer, a notebook computer, an intelligent wearable device (e.g., a smart watch, a smart bracelet, smart glasses, a smart helmet), an unmanned aerial vehicle, a head display device, etc., which are not limited herein. In some embodiments, the shooting module 200 may also be disposed in the mobile terminal 1000, which is not limited herein.

In the video processing system 100 in the mobile terminal 1000 in this embodiment of the present application, by selecting the multi-frame image in the output video with the corresponding output frequency according to the jitter parameter of the current frame image to perform the image segmentation processing, compared with the method of using the fixed frame segmentation, the power consumption of the video processing system 100 can be reduced while obtaining the better image segmentation result, thereby reducing the power consumption of the mobile terminal 1000.

Referring to fig. 18, the present application also provides a non-transitory computer readable storage medium 400 containing a computer program 410. The computer program 410, when executed by the processor 420, causes the processor 420 to perform the video processing method of any of the embodiments described above.

For example, referring to fig. 1 and 18, the computer program, when executed by the processor 420, causes the processor 420 to perform the steps of:

It should be noted that the processor 420 may be the same as the processor 10 disposed in the video processing system 1000, and the processor 420 may be disposed in the mobile device 1000, that is, the processor 420 may not be the same as the processor 10 disposed in the video processing system 1000, which is not limited herein.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims

1. A video processing method, wherein a video includes a plurality of consecutive frames of images, the video processing method comprising:

selecting corresponding output frequency to output multi-frame images in the video according to jitter parameters corresponding to the current frame image;

If the current frame image is an output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result;

if the current frame image is not the output image, acquiring a portrait segmentation result of the current frame image according to the portrait segmentation result in the previous frame image;

selecting a corresponding output frequency to output a multi-frame image in the video according to the jitter parameter corresponding to the current frame image, including:

selecting corresponding output frequency to output multi-frame images in the video according to jitter parameters corresponding to the current frame image and the similarity between the current frame image and the previous frame image;

the selecting the corresponding output frequency to output the multi-frame image in the video according to the jitter parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image comprises the following steps:

if the similarity between the current frame image and the previous frame image is larger than a first preset similarity, obtaining a dithering parameter corresponding to the current frame image, and selecting a corresponding second output frequency according to the dithering parameter to output multi-frame images in the video;

the second output frequency includes a first frequency and a second frequency, the first frequency is greater than the second frequency, and if the similarity between the current frame image and the previous frame image is greater than a first preset similarity, obtaining a jitter parameter corresponding to the current frame image, and selecting a corresponding second output frequency according to the jitter parameter to output a multi-frame image in the video, including:

If the similarity is greater than the first preset similarity and the jitter parameter is greater than a first preset value, selecting multi-frame images in the first frequency output video;

and if the similarity is larger than the first preset similarity and the jitter parameter is smaller than the first preset value, selecting the multi-frame image in the second frequency output video.

2. The method according to claim 1, wherein selecting the corresponding output frequency to output the multi-frame image in the video according to the jitter parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image, further comprises:

and if the similarity between the current frame image and the previous frame image is smaller than a first preset similarity, selecting a first output frequency to output multi-frame images in the video, wherein the second output frequency is smaller than the first output frequency.

3. The video processing method according to claim 1, wherein if the current frame image is an output image, performing a portrait segmentation process on the current frame image to obtain a portrait segmentation result, comprising:

if the image subjected to the portrait segmentation processing does not exist, inputting the current frame image into a neural network of a portrait segmentation processing model so as to acquire a portrait segmentation result of the current frame image;

And if the image subjected to the image segmentation processing exists, selecting a first image segmentation processing or a second image segmentation processing according to the dithering parameters corresponding to the current frame image and the similarity between the current frame image and a first image, wherein the first image is the image subjected to the image segmentation processing.

4. The video processing method according to claim 3, wherein the selecting a first portrait segmentation process or a second portrait segmentation process according to a shake parameter corresponding to the current frame image and a similarity between the current frame image and a first image, the first image being an image on which the portrait segmentation process has been performed, includes:

if the jitter parameter corresponding to the current frame image is larger than a second preset jitter parameter and the similarity between the current frame image and the first image is smaller than a second preset similarity, performing first image segmentation processing on the current frame image to obtain a human image segmentation result of the current frame image;

and if the jitter parameter corresponding to the current frame image is smaller than a second preset jitter parameter and the similarity between the current frame image and the first image is larger than a second preset similarity, performing second portrait segmentation processing on the current frame image to obtain a portrait segmentation result of the current frame image.

5. The video processing method according to claim 3 or 4, wherein said performing the first person segmentation process on the current frame image includes:

inputting the current frame image into a portrait segmentation processing model to obtain a portrait segmentation result of the current frame image;

the performing the second image segmentation processing on the current frame image includes:

acquiring a portrait local area of the current frame image according to a portrait segmentation result of the first image;

and inputting the portrait local area of the current frame image into a portrait segmentation processing model to obtain a portrait segmentation result of the current frame image.

6. The video processing method according to claim 1, characterized in that the video processing method further comprises:

and acquiring a portrait region and a background region of the current frame image according to the portrait segmentation result of the current frame image, and carrying out blurring treatment on the background region of the current frame image.

7. A video processing system comprising successive multi-frame images in a video, the video processing system comprising:

a processor for:

if the current frame image is an output image, carrying out portrait segmentation processing on the current frame image to obtain a portrait segmentation result; a kind of electronic device with high-pressure air-conditioning system

8. The video processing system according to claim 7, wherein selecting the corresponding output frequency to output the multi-frame image in the video according to the jitter parameter corresponding to the current frame image and the similarity between the current frame image and the previous frame image, further comprises:

9. The video processing system according to claim 7, wherein if the current frame image is an output image, performing a portrait segmentation process on the current frame image to obtain a portrait segmentation result, comprises:

10. The video processing system according to claim 9, wherein the selecting of the first image segmentation process or the second image segmentation process based on the shake parameter corresponding to the current frame image and the similarity of the current frame image to the first image, the first image being an image on which the image segmentation process has been performed, comprises:

11. The video processing system according to claim 9 or 10, wherein the performing the first person segmentation process on the current frame image includes:

12. The video processing system of claim 7, wherein the processor is further configured to:

13. A mobile terminal, comprising:

the video processing system of any of claims 7-12; a kind of electronic device with high-pressure air-conditioning system

And the shooting module is used for acquiring video, and the video comprises continuous multi-frame images.

14. A non-transitory computer readable storage medium containing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the video processing method of any one of claims 1 to 6.