WO2022105341A1

WO2022105341A1 - Video data processing method and apparatus, computer storage medium, and electronic device

Info

Publication number: WO2022105341A1
Application number: PCT/CN2021/114602
Authority: WO
Inventors: 汤晓
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2020-11-18
Filing date: 2021-08-25
Publication date: 2022-05-27
Also published as: CN112511779B; CN112511779A

Abstract

Disclosed in the present application are a video data processing method and apparatus, a computer storage medium, and an electronic device. The method comprises: searching video data for a target video frame comprising a message popup, and in response to finding the target video frame, determining an area where the message popup in the target video frame is located; and processing the area where the message popup in the target video frame is located, to obtain a replacement video frame, the replacement video frame not comprising text in the message popup, and the replacement video frame being used for replacing the target video frame.

Description

Video data processing method, device, computer storage medium and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure claims priority to Chinese Patent Application No. 202011292633.6 filed in China on Nov. 18, 2020, the entire contents of which are incorporated by reference in their entirety.

technical field

The present disclosure relates to the field of video technology, and in particular, to a method and device for processing video data.

Background technique

Screen recording is a common function on various terminal devices. After the screen recording function is enabled on a terminal device, the screen recording program will record the screen of the terminal device in real time, thereby obtaining a screen recording video. The screen recording video can be played locally or provided to other terminal devices in the network. play.

In order to protect the user's privacy, when the screen recording program starts to record the screen of the terminal device, it will disable the message push service of the terminal device, so as to avoid the message pop-up window with the user's private information appearing in the recorded video.

SUMMARY OF THE INVENTION

The present disclosure provides a video data processing method and device. The technical solutions of the present disclosure are as follows:

According to some embodiments of the present disclosure, a method for processing video data is provided, comprising: searching for a target video frame in video data, wherein the video data is video data obtained by recording a screen of a target device; the target video The frame contains a message pop-up window; in response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame; process the area where the message pop-up window is located in the target video frame , to obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame.

In some embodiments, the method further includes: monitoring the message push service of the target device in real time, and obtaining the push time of the message to be pushed of the message push service; wherein, the target device is a host in a webcast system The device; wherein the searching for the target video frame in the video data includes: searching for the target video frame from the video frames included in the video data and located within a preset time period after the push moment of the message to be pushed.

In some embodiments, the method further includes: detecting a message prompt sound corresponding to a message pop-up window in the audio track of the video data; wherein, the searching for a target video frame in the video data includes: extracting from the video In the data, intercept a plurality of video frames within a preset duration before the appearance of the message prompt sound, and multiple video frames within a preset period after the appearance of the message prompt sound; Find the target video frame among the plurality of video frames.

In some embodiments, the processing the area where the message pop-up window is located in the target video frame to obtain the replacement video frame includes: cutting the message pop-up window from the target video frame to obtain the replacement video frame.

In some embodiments, the processing of the area where the message pop-up window is located in the target video frame to obtain a replacement video frame includes: performing a blurring process on pixels in the area where the message pop-up window is located to obtain a replacement video frame. .

In some embodiments, the processing of the area where the message pop-up window is located in the target video frame to obtain a replacement video frame includes: determining the size of the area where the message pop-up window is located; generating a size equal to that of the message pop-up window. The occlusion image with the same size in the region where the message pop-up window is located; and the generated occlusion image is added to the region where the message pop-up window is located to obtain a replacement video frame.

In some embodiments, the method further includes: acquiring a user's selection instruction; determining the candidate image template selected by the selection instruction among the preset multiple candidate image templates as the target image template; The multiple candidate image templates, including multiple candidate mosaic styles and multiple preset images; wherein the generating an occlusion image with a size consistent with the size of the area where the message pop-up window is located includes: according to the A target image template to generate an occlusion image whose size is consistent with the size of the area where the message pop-up window is located.

In some embodiments, the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located includes: reading, from the video data, a previous frame of the target video frame that does not include the message pop-up window. A video frame; intercepting an image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window to obtain an occlusion image.

According to some embodiments of the present disclosure, there is provided an apparatus for processing video data, comprising: a searching unit configured to perform searching for a target video frame in video data, wherein the video data is obtained by recording a screen of a target device video data; the target video frame contains a message pop-up window; a determining unit is configured to perform a search in the video data to obtain the target video frame, and determine the area where the message pop-up window is located in the target video frame; processing a unit, configured to process the area where the message pop-up window is located in the target video frame to obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the The replacement video frame is used to replace the target video frame.

In some embodiments, the apparatus further includes: a monitoring unit configured to perform real-time monitoring of the message push service of the target device, and obtain the push time of the message to be pushed of the message push service; wherein the target device is An anchor device in a network live broadcast system; wherein the search unit is configured to perform: search for the target video from a video frame included in the video data and located within a preset time period after the push moment of the message to be pushed frame.

In some embodiments, the apparatus further includes: a detection unit configured to detect a message prompt corresponding to a message pop-up window in the audio track of the video data; wherein the search unit searches the video data for a target video frame, the specific execution is: from the video data, intercepting a plurality of video frames within a preset time period before the appearance time of the message prompt sound and a preset time period after the appearance time of the message prompt sound multiple video frames in the video frame; find the target video frame in the multiple video frames obtained through interception.

In some embodiments, the processing unit is configured to perform: cutting the message pop-up window from the target video frame when obtaining a replacement video frame by processing the area where the message pop-up window is located in the target video frame. , to get the replacement video frame.

In some embodiments, the processing unit processes the area where the message pop-up window is located in the target video frame, and when a replacement video frame is obtained, the processing unit is configured to perform: blurring the pixels in the area where the message pop-up window is located. Process to get the replacement video frame.

In some embodiments, the processing unit includes: a size determination unit configured to perform determination of the size of the area where the message pop-up window is located; an occlusion image with the same size; the adding unit is configured to add the generated occlusion image in the area where the message pop-up window is located to obtain a replacement video frame.

In some embodiments, the processing unit further includes: a template determination unit configured to perform: acquiring a user's selection instruction; selecting the candidate image template selected by the selection instruction from the preset multiple candidate image templates; It is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images; wherein, the generating unit generates a size equal to the area where the message pop-up window is located. When the size of the occlusion image is the same, it is configured to execute: according to the target image template, generate an occlusion image whose size is consistent with the size of the area where the message pop-up window is located.

In some embodiments, when the generating unit generates an occlusion image whose size is consistent with the size of the area where the message pop-up window is located, it is configured to perform reading from the video data that the previous one of the target video frame does not contain the target video frame. The video frame of the message pop-up window is intercepted; the image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window is intercepted to obtain an occlusion image.

According to some embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the following steps: Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window; the target video is obtained in response to the search in the video data frame, determine the area where the message pop-up window is located in the target video frame; process the area where the message pop-up window is located in the target video frame to obtain a replacement video frame, so that the replacement video frame does not contain the message The text in the pop-up window; wherein, the replacement video frame is used to replace the target video frame.

According to some embodiments of the present disclosure, a storage medium is provided, when instructions in the storage medium are executed by a processor of an electronic device, the electronic device can execute the video data provided by any one of the embodiments of the present disclosure processing method.

According to some embodiments of the present disclosure, a computer program product is provided, which, when executed, is used to implement any one of the video data processing methods provided by the embodiments of the present disclosure.

In the present disclosure, by processing the area where the message pop-up window is located in the target video frame, this solution can prevent the text of the message pop-up window recorded during the screen recording process from being seen by the user watching the video. Users can not only protect their privacy during video recording, but also browse messages through the message push service normally.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the principles of the present disclosure and do not unduly limit the present disclosure.

1 is a schematic diagram of a recording and transmission process of video data according to an exemplary embodiment;

2 is a flowchart of a method for processing video data according to an exemplary embodiment;

3 is a schematic diagram of a target video frame and a replacement video frame according to an exemplary embodiment;

4 is a schematic diagram of a method for finding a target video frame in video data according to an exemplary embodiment;

FIG. 5 is a flowchart of another method for processing video data according to an exemplary embodiment;

FIG. 6 is a flow chart of yet another method for processing video data according to an exemplary embodiment;

7 is a schematic diagram of a mosaic image added in a region where a message pop-up window is located, according to an exemplary embodiment;

8 is a schematic diagram of adding a captured image in a region where a message pop-up window is located, according to an exemplary embodiment;

9 is a schematic diagram of intercepting an occlusion image from a previous video frame that does not contain a message pop-up window, according to an exemplary embodiment;

10 is a schematic diagram of cutting a message pop-up window from a target video frame according to an exemplary embodiment;

11 is a schematic diagram of blurring the area where the message pop-up window of the target video frame is located according to an exemplary embodiment;

12 is a schematic structural diagram of an apparatus for processing video data according to an exemplary embodiment;

FIG. 13 is a schematic structural diagram of an electronic device according to an exemplary embodiment.

Detailed ways

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

Screen recording refers to using a screen recording program to record the display screen of the first terminal in real time during the operation of the first terminal, and a video obtained in this way is called a screen recording video.

In the present disclosure, the first terminal refers to the terminal device to which the screen recorded in the screen recording video belongs, and the first terminal may be an electronic device such as a smart phone, a tablet computer, and a desktop computer. Generally, the screen recording program is a program running on the first terminal, and the screen recording program may be a program on the first terminal specially used for screen recording, or may be a subprogram integrated in other programs.

For the recording and transmission process of the screen recording video, reference may be made to FIG. 1 . Each unit in FIG. 1 is used to refer to a computer program installed on a corresponding electronic device and used to implement corresponding functions.

As shown in FIG. 1, in the first terminal, after the user who records the video (hereinafter referred to as the video author) enables the screen recording function of the screen recording unit, the screen recording unit starts to record the screen of the first terminal in real time, and then records the obtained screen The screen recording video is sent to the server via the Internet through the communication unit of the first terminal. The server then sends the screen-recorded video sent by the first terminal to one or more second terminals that need to watch the screen-recorded video through the Internet (in this disclosure, the second terminal is used to refer to an electronic device that plays the screen-recorded video, and the second terminal is used to refer to the electronic device that plays the screen-recorded video. The terminal and the first terminal may be electronic devices of the same type or different types), and the playback unit of the second terminal plays the screen-recorded video to users who watch the screen-recorded video (hereinafter referred to as video viewers).

In some embodiments, the screen recording unit may record the sound played by the first terminal in real time while recording the screen of the first terminal, so as to obtain an audio track synchronized with the screen of the screen recording video.

In some embodiments, the first terminal, the server and the second terminal may all cache the screen recording video in a local storage unit.

It should be noted that the playback of the screen recording video can be in any form of live broadcast and on-demand, or both forms can be used at the same time.

In the live broadcast mode, when the process shown in FIG. 1 is implemented, the first terminal will send the video data to the server in real time while recording the screen recording video, and at the same time, the server will forward the video data received by itself to the first terminal in real time. The second terminal enables the second terminal to play the screen recording video in real time. Equivalently, every time the screen recording program of the first terminal obtains a video frame, it sends the video frame to the server, and then the server immediately forwards the newly received video frame to the second terminal, and the playback unit of the second terminal immediately forwards the video frame to the second terminal. Play on the display screen of the second terminal.

In the on-demand form, when the process shown in Figure 1 is implemented, the first terminal can send the screen recording video in real time during the screen recording process, or can send the complete screen recording video to the server at one time after the screen recording, and the server will send the screen recording video to the server. The complete screen recording video provided by the first terminal is stored in its own storage unit. When any second terminal requests the screen recording video from the server, the server sends the screen recording video to the second terminal for playback by the second terminal. . Of course, after receiving the screen-recording video, the second terminal may first cache the screen-recording video, and then play the video after a period of time.

Screen recording video can be used in various fields such as games and online teaching. For example, a video author installs a screen recording program (equivalent to the screen recording unit in Figure 1) on his mobile phone (ie, the first terminal). After the video author opens the screen recording program and enables the screen recording function, the screen recording program switches to the background. run, and start recording the screen of the first terminal. At this time, the video author can start playing mobile games on the mobile phone. Therefore, the screen recording program can record the game screen on the screen of the first terminal to obtain the game video, which is then passed by the server. Real-time or non-real-time way to forward game video to video viewers interested in this mobile game.

In addition, the video author can also display teaching materials, such as courseware, e-books, etc., on the screen of the first terminal after the screen recording starts, so that the screen recording program can record a teaching video, which can be forwarded to the peer through the server. Video viewers interested in the above teaching materials.

In order to protect the privacy of the video author without affecting the video author's use of the message push function of the first terminal during the screen recording process, an embodiment of the present disclosure provides a method for processing screen recording video, please refer to FIG. 2 , the method may include The following steps:

S21. Search for the target video frame in the video data.

Wherein, the target video frame is used to refer to the video frame containing the message pop-up window. That is to say, when step S21 is performed, each video frame in the screen recording video can be detected one by one, and whenever a video frame is detected to contain a message pop-up window, the video frame is determined as the target video frame.

The above video data is video data obtained by recording the screen of the target device, and the target device is equivalent to the first terminal described above.

In the video data processing solution provided by the present disclosure, in the first aspect, after the screen recording is completed, the obtained complete screen recording video can be processed to obtain the processed video. In the second aspect, the recorded video stream can also be processed in real time during the screen recording process to obtain the processed video stream. The processed video stream can be played to the video viewers in real time, or stored in the corresponding in computer storage media. That is to say, the video data in step S21 may be a complete screen recording video, or may be a video stream generated during the recording process.

In the application scenario of the second aspect, a buffer area can be set in the storage space of the electronic device executing the video data processing solution of the present disclosure, and the buffer area can be written in a recent period of time (eg, within the last 10 seconds) in real time. The recorded video stream is written into the buffer area (equivalent to writing the video frames recorded in the last 10 seconds into the buffer area), and the program executing the video data processing solution of the present disclosure can read the data in the video stream one by one from the buffer area. For each video frame, it is determined whether each read video frame is a target video frame, and if it is a target video frame, the solution provided by the present disclosure is applied for processing.

During the screen recording process, the message push service of the first terminal may push a message pop-up window with the personal information of the video author on the screen of the first terminal, and the message pop-up window is recorded in the video data and viewed by the video viewers After that, the privacy of the video author will be leaked.

For example, in the display interface of the message pop-up window shown in Figure 3, the message pop-up window contains the following message: "Your courier has been placed at the property of XX community, please go to pick it up in time", if the message pop-up window is recorded in In the video data and seen by the video viewer, the residential address of the video author will be leaked.

The purpose of the processing method provided by the present disclosure is to find out the video frame containing the message pop-up window in the video data, and then "code" the area where the message pop-up window is located through the subsequent steps, so that the video data can be played during playback. , the message pop-up window on the screen is invisible to video viewers, so as to protect the privacy of video authors.

The "coding" here is not limited to adding mosaics in the area where the message pop-up window is located, but is used to refer to any image processing including adding mosaics that can prevent the text in the message pop-up window from being displayed in the replacement video frame. method.

At the same time, since the message pop-up window in the video data is coded, even if a message pop-up window containing privacy information is recorded from the screen of the first terminal during the recording of the video data, the message pop-up window will be displayed when the video data is being played. The information contained in the video will not be seen by the video viewer, so the video author can normally use the message push service of the first terminal during the screen recording process.

As shown in Figure 3, it takes a period of time from the start of the message push service of the first terminal to pop up the message pop-up window until the message pop-up window is completely displayed on the screen of the first terminal. It will contain a part of the message pop-up window. Obviously, the message pop-up window displayed in these parts also needs to be coded. Therefore, the target video frame found in step S21 includes, each of the video data contains a complete or partial message pop-up window. window video frame.

S22. In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame.

In step S22, the position of the message pop-up window in the target video frame can be determined, so that an occlusion image can be added to the position in a subsequent step.

S23: Process the area where the message pop-up window is located in the target video frame to obtain a replacement video frame.

The replacement video frame does not display (or does not contain) the text in the message pop-up window, and the replacement video frame is used to replace the corresponding target video frame in the screen recording video.

The image processing method preset in step S23 can be any image processing method that can make the replacement video frame not display the text in the message pop-up window. In the present disclosure, the image processing method in step S23 can be the following three kinds of images Either of the processing methods:

First, add an occlusion image to the area where the message pop-up window is located in the target video frame;

Second, cut (or delete) the area where the message pop-up window is located from the target video frame;

Third, the area where the message pop-up window is located is blurred.

Figure 3 is a schematic diagram of adding an occlusion image. As shown in Figure 3, in the original target video frame, the video viewer can see the message pop-up window and its specific message in the screen, and in the replacement video frame with the occlusion image added , the message pop-up window is covered by the added occlusion image, and the video viewer will naturally not be able to see the message pop-up window and its specific message, thus avoiding the privacy leakage of the video author.

The above-mentioned occlusion image may be a mosaic image obtained after filling the area where the message pop-up window is located with repeated mosaic patterns using a specific mosaic style, or it may be a part of other complete images that does not contain the message pop-up window. image.

It should be noted that the processing methods provided by the embodiments of the present disclosure may be executed by any one of the first terminal, the server, and the second terminal as shown in FIG. 1 , that is, the processing methods provided by the present disclosure The execution body of the method may be the first terminal that records the video, the server that forwards the video, or the second terminal that plays the video.

When the first terminal acts as the execution subject, the processing method provided by the present disclosure can be executed by the first terminal on the video stream in real time during the process of recording video data, or can be executed on the entire screen recording video after the recording ends, and the processed data can be obtained. After the image frame, the first terminal sends the replacement video frame to the server to replace the original target video frame to be sent.

When the method provided by the present disclosure is applied to the first terminal, the method may be executed by a screen recording program of the first terminal.

When the server acts as the execution body, the server can execute the processing method provided by the present disclosure before sending the video data to the second terminal. Specifically, the processing method provided by the present disclosure can be executed in real time while receiving the video stream, or it can be executed after receiving the video stream. After the complete screen recording video, the processing method provided by the present disclosure is performed on the entire screen recording video, and after obtaining the processed image frame, the server sends the replacement video frame to the second terminal to replace the original target video frame to be forwarded.

When the method provided by the present disclosure is applied to a server, the method can be executed by a video processing program in the server.

When the second terminal acts as the execution subject, the second terminal can execute the method provided by the present disclosure before playing the video data received by the second terminal, and specifically can execute the processing method provided by the present disclosure in real time while receiving the video stream and play the video data in real time. To replace the video frame (and other video frames that do not contain the message pop-up window), it is also possible to perform the processing method provided in this disclosure on the entire screen-recording video after receiving the complete screen-recording video. After each target video frame is replaced with the corresponding replacement video frame, the processed screen recording video is played.

When the method provided by the present disclosure is applied to the second terminal, the method may be executed by a video playing program used for playing video data in the second terminal.

In some embodiments, no matter which device described above is performed by the method provided by the present disclosure, the video author can set on the first terminal whether to apply the processing method provided by the present disclosure to process video data before starting screen recording. If the video author selects the option of not coding before the screen recording starts, the first terminal, the server and the second terminal may not execute the processing method provided by the present disclosure, but directly send or play the video data.

On the one hand, the processing method provided by the present disclosure can identify the target video frame containing the message pop-up window in the video data before the video data is played by the second terminal, and then process the area where the message pop-up window of the target video frame is located to obtain The replacement video frame does not contain the text in the message pop-up window, and the replacement video frame is used to replace the target video frame of the original video, so that after the video data is played, the video viewers will not see the message content displayed in the message pop-up window when watching. Thus, information concerning the privacy of the video author that may appear in the message pop-up window is prevented from being leaked to the video viewers.

On the other hand, the processing method provided by the present disclosure directly processes the recorded video frames, without involving restrictions on the screen of the first terminal and the message push service, and the video author can use the first terminal normally during the video recording process. The message push service is provided, and the message content in the message pop-up window is browsed on the screen of the first terminal, and the message pop-up window in the target video frame recorded during the display of the message pop-up window can be performed by the processing method provided by the present disclosure. add mosaic. Therefore, while the processing method provided by the present disclosure protects the privacy of the video author, it does not affect the video author's normal use of the message push service of the first terminal to browse messages during the screen recording process.

The searching for the target video frame in the video data described in step S21 can be specifically implemented by any existing image recognition technology.

In some embodiments, multiple video frames recorded on multiple first terminals in the past containing message pop-ups may be collected as positive samples, and multiple video frames recorded on multiple first terminals that do not contain message pop-ups may be collected as positive samples. The video frames of the window are used as negative samples, and a pre-built image recognition model is trained by using these positive samples and negative samples, so as to train a message pop-up recognition model that can identify whether the video frame contains message pop-ups.

As shown in FIG. 4 , when step S21 is performed, it is only necessary to input the video frame that needs to be detected in the video data to the message pop-up window recognition model, and the message pop-up window recognition model will output the detection result of the video frame. If the input video frame does not contain a message pop-up window, the message pop-up window recognition model will output the same video frame as the input video frame. If the input video frame contains a message pop-up window, the message pop-up window recognition model will output the same video frame. The boundaries of the message popup are marked in the video frame.

Therefore, if the boundary of the message pop-up window is marked in the video frame output by the message pop-up window recognition model, it can be determined that the video frame input this time is the target video frame, and further, in step S22, it can be identified according to the message pop-up window. The boundary marked by the model determines the area where the message pop-up window is located in the target video frame.

After training with a large number of positive samples and negative samples, the message pop-up window recognition model can distinguish the image features of the message pop-up window from the video frame. Therefore, the message pop-up window recognition model can quickly determine the currently detected video frame. Whether there is an image feature of the message pop-up window, and after detecting the image feature of the message pop-up window, the corresponding pixels in the video frame of the image feature of the message pop-up window are further detected, and then the boundary of the message pop-up window is marked.

In some embodiments, in different operating systems, the style of the message pop-up window generally has certain differences, and accordingly, the image features of the message pop-up window also have certain differences. Therefore, when training the message pop-up window recognition model , you can not only train a message popup recognition model, but for each common operating system, use the video frames (negative samples) that do not contain message popups and videos that contain message popups recorded under the operating system. Frame (positive sample) training to obtain a message pop-up window recognition model corresponding to this operating system. That is to say, a plurality of corresponding message pop-up window recognition models can be finally obtained by training for a variety of common operating systems.

When step S21 is performed, the operating system used by the first terminal may be determined first, and then the corresponding message pop-up window identification model is invoked to detect video frames in the video data. The type of the operating system used by the first terminal may be sent by the first terminal to the server, and then forwarded by the server to the second terminal.

Compared with the scheme of using one message popup recognition model to detect all message popups in video frames recorded under all operating systems, when training the corresponding message popup recognition model for each operating system, each message popup The window recognition model needs to learn fewer image features of the message pop-up window, so the training of the model can be completed faster than the previous scheme, and because of the types of image features of the message pop-up window that need to be detected during the detection process It is relatively simple, and the detection result of the message pop-up window recognition model for a specific operating system has higher accuracy than the detection result of the message pop-up window recognition model of the previous scheme.

Using image recognition technology to determine whether each video frame of video data contains a message pop-up window will consume more computing resources of the corresponding electronic device. Therefore, the embodiment of the present disclosure provides another video data processing method, please refer to Figure 5, the method may include the following steps:

The processing method provided by the embodiment shown in FIG. 5 may be executed by the first terminal.

S51. Monitor the message push service of the target device in real time, and obtain the push time of the message to be pushed of the message push service.

The target device is equivalent to the first terminal described above. The message push service refers to the program running on the first terminal responsible for pushing messages. The message push service can display the message on the screen in the form of a message pop-up window after receiving the message to be pushed sent by the message push server to the first terminal. The message to be pushed, that is, the message push service can be considered as a program in the first terminal for controlling the content of the message in the message pop-up window and the display time of the message pop-up window.

After the push time of the message to be pushed is reached, the message push service starts to pop up a message pop-up window displaying the message to be pushed on the screen of the first terminal.

S52: Search for a target video frame from the video frames included in the video data and located within a preset time period after the push time of the message to be pushed.

As shown in Figure 3, after the message pop-up window starts to pop up, it will take a period of time to fully display on the screen of the first terminal. At the same time, after the message pop-up window is completely displayed on the screen, if the user does not perform any operation, The message popup will stay for a while and then disappear automatically.

Therefore, in step S52, the target video frame may be searched for in the video data within the preset time period after the push time of the message to be pushed. The preset duration here can be set as the sum of the estimated pop-up time of the message pop-up window and the stay time after the message pop-up window is completely displayed, or it can be increased on the basis of the sum of the two.

For example, assuming that in a first terminal, the pop-up time of the message pop-up window is 1s (seconds), that is, it takes 1s from the start of the pop-up to the complete display, and the stay time after the message pop-up window is completely displayed is 5s, then the above preset duration It can be set to 6s (or set to 7s, depending on the actual situation).

If the scheduled push time of a message to be pushed is monitored from the message push service at 10:05:20, that is, the message pop-up window starts to pop up at the 20th second at 10:05, then in step S52, you can respond to 10:05: The video frames recorded during the period from 20 to 10:05:27 are detected, and the target video frames containing the message pop-up window are found in these video frames.

Correspondingly, in the video data, for the video frames located before the push time of the message to be pushed and beyond the preset duration after the push time of the message to be pushed, the above search is not required.

For example, in the above example, if the message push service only pushes the message at the push time of 10:05:20 during the screen recording process, it only needs to find the target video frame in the video frames recorded in the above time period. For The video frames recorded outside the above time period (including before and after the above time period) may not be searched.

S53, in response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame.

S54. Process the area where the message pop-up window of the target video frame is located to obtain a replacement video frame.

The execution process of step S53 and step S54 is the same as that of step S22 and step S23 in the embodiment corresponding to FIG. 2, and will not be described in detail here.

In the embodiment of the present disclosure, the message pop-up window is equivalent to a tool for the message push service to display the messages to be pushed to the user. Therefore, the corresponding message pop-up window will appear only when the message push service is to push messages. Before the push time determined by the service, and after the message push service completes the message push and the displayed message pop-up window disappears, it can be considered that no message will be displayed on the screen of the first terminal. The video frame is detected to determine the target video frame, and the coding processing is performed, and the video frame located beyond the preset time period after the push time may not be detected using the aforementioned image recognition technology.

It can be seen that the above solution can reduce the number of video frames that need to be detected by the image recognition technology, thereby reducing the computing resources consumed by the device executing the corresponding processing method.

In some embodiments, the method provided by the embodiment corresponding to FIG. 5 can also be applied to the server and the second terminal after the following adjustments:

The first terminal may monitor the message push service of the first terminal in real time during the process of recording the video data, and record the push time of the message to be pushed obtained by monitoring in the video data, that is, monitor the information obtained during the screen recording process. Several push times and video data are sent to the server together, and the server can also forward the above data to the second terminal, so that both the server and the second terminal can determine the receipt according to the recorded push times of multiple messages to be pushed. Which time segments in the received video data may appear the target video frame, and then only the video frames within these time segments are searched when searching.

In some embodiments, in some devices, the program executing the processing method provided by the present disclosure may not have the right to monitor the message push service. The embodiment of the present disclosure provides another video data processing method, which is used for When having the authority to monitor the message push service, the method for screening video data before searching for the target video frame, please refer to FIG. 6, the method may include the following steps:

S61. Detect the message prompt sound corresponding to the message pop-up window in the audio track of the video data.

As mentioned above, during the screen recording process, the sound output by the first terminal may be recorded together as a sound track synchronized with the video data. Then, on the premise that the video author sets the first terminal to emit a message prompt sound when a message pop-up window appears, when a message pop-up window appears in the video data, a corresponding message prompt sound will also appear on the audio track of the video data.

The time when the message pop-up window is displayed on the screen may not be exactly the same as the time when the first terminal sends out the corresponding message prompt tone. For example, the message prompt tone may be issued first, and the message pop-up window may appear after a few seconds, or the message pop-up window may appear. After a few seconds, the first terminal outputs a corresponding message prompt tone.

The detection of the message prompt tone can be realized by any existing audio feature recognition method. In some embodiments, the audio features of a variety of common message prompt tones can be recorded, and then it is detected whether any one of the audio features of the message prompt sound appears in the audio track of the video data one by one. When the audio track is detected at a certain moment If the audio feature of any pre-recorded message prompt tone appears in the audio system, it is determined that the moment is the appearance time of the message prompt tone.

S62. Intercept, from the video data, a plurality of video frames within a preset time period before the appearance time of the message prompt sound, and multiple video frames within a preset time period after the appearance time of the message prompt sound.

Considering that the time when the first terminal displays the message pop-up window and the time when the corresponding message prompt sound is output may be inconsistent, in step S62, multiple video frames in the preset duration before and after the appearance time of the message prompt sound need to be all displayed. The intercepted video frames can be detected in the subsequent steps, and the target video frames containing the message pop-up window can be found therefrom.

The lengths of the preset duration before the occurrence time and the preset duration after the occurrence time may be determined by the first terminal according to the display time of the previous message pop-up window and the occurrence time of the corresponding message prompt sound. When the method provided by the embodiment of the present disclosure is executed by the server or the second terminal, the first terminal may determine the above-mentioned duration and send it to the server and the second terminal.

In a specific example, it is assumed that a message prompt sound at 10:05:20 is detected in the audio track, and the preset durations before and after the occurrence time are both 10s, then in step S62, the video needs to be intercepted For each video frame recorded during the period from 10:05:10 to 10:05:30 in the data, in step S63, the target video frame is searched out from the multiple video frames within the 20 seconds.

It can be understood that the processing methods provided by the embodiments of the present disclosure are generally applicable in the form of on-demand. If the video data is played to the video viewers of the second terminal in real time in the form of live broadcast, considering the display time of the message pop-up window and the length of the message prompt sound. If the time of occurrence is not synchronized, it is possible that after the message prompt tone is detected from the audio track, a message pop-up window has been continuously displayed for several seconds in the video played by the second terminal. Therefore, the method provided by the embodiment of the present disclosure is applied to the live broadcast less effective.

In contrast, in VOD, since the video data will be stored on the server after the recording is completed, and will not be sent to the second terminal for playback in real time, even if it is sent to the second terminal, the second terminal can first cache it locally, and use the The method provided by the embodiment of the present disclosure processes the video data before playing.

Therefore, after the message prompt sound is detected, it is possible to search for a plurality of video frames before the time when the message prompt sound appears, and code the target video frame that appears in it, so as to ensure that when the second terminal plays the video data, each The area where the message pop-up window is located in a target video frame is covered by the added occlusion image, that is, the message pop-up window in each target video frame is coded.

It is understandable that the method provided by the embodiment of the present disclosure is only applicable when the first terminal enables the function of the message prompt tone. If the video author sets the first terminal to the silent mode, or disables the function of the first terminal's message prompt tone. , the processing method provided by the embodiment of the present disclosure is not applicable.

S63. Search for the target video frame in the multiple video frames obtained through interception.

S64, in response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame.

S65: Process the area where the message pop-up window of the target video frame is located to obtain a replacement video frame.

In the case where the program executing the method provided by the present disclosure does not have the authority to monitor the message push service, it is possible to preliminarily screen out a message pop-up window from a large number of video frames of the video data by detecting the message prompt sound in the audio track of the video data. video frames, and use image recognition technology to detect these screened video frames that may have message pop-up windows. Under the same duration, the data volume of audio data is generally smaller than that of video image data. Correspondingly, the audio feature complexity of detecting whether a message prompt sound occurs at the current moment in the audio track is relatively lower than that of using image recognition technology to detect the current moment. Therefore, the method provided by the embodiment of the present disclosure can appropriately reduce the computing resources consumed by the video processing method provided by the present disclosure without the authority to monitor the message push service.

As mentioned above, in this embodiment of the present disclosure, the image processing method for the area where the message pop-up window of the target video frame is located may be to add an occlusion image to the area where the message pop-up window is located in the target video frame, wherein The occlusion image of the area where the popup window is located can be obtained by any of the following schemes:

In the first solution, a processing program (referring to a program for executing the video data processing method provided by the present disclosure) can generate an occlusion image with the same size as that of a common message pop-up window in advance, and store the generated occlusion image in the local device of the device. In the storage medium, each time an occlusion image is added to a target video frame, the previously generated occlusion image is directly read from the storage medium, and then the read occlusion image is added to the area where the message pop-up window is located.

The second solution is to first determine the size of the area where the message pop-up window to which the occlusion image is currently to be added is located before each occlusion image is added;

Then generate an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

Finally, the occlusion image generated in the previous step is added to the area where the message pop-up window is located to obtain a replacement video frame.

That is to say, in the second solution, each time before adding an occlusion image, a corresponding occlusion image needs to be generated based on the size of the area where the message pop-up window is located in the current target video frame, and then the generated occlusion image can be added to the message The area where the popup is located.

The first solution can directly use the existing occlusion image, and does not need to regenerate a new occlusion image every time the message pop-up window is coded, which can shorten the time required to process each target video frame and improve the processing efficiency.

The second solution can ensure that the size of the occlusion image added each time is the same as the size of the area where the message pop-up window is located. The size is too large to interfere with the normal viewing of other areas in the video frame by the video viewer.

No matter in the first solution or the second solution, the style of the generated occlusion image can be defined by the user in the corresponding selection interface.

That is, the processing program may obtain the user's selection instruction before starting to process the video data, and then determine the candidate image template selected by the selection instruction among the preset multiple candidate image templates as the target image template.

Wherein, the above-mentioned multiple candidate image templates may include multiple candidate mosaic styles and multiple preset images. The preset image here may include an image downloaded from the network by the processing program, or may include a user-defined image (for example, a photo taken by the user).

Then, the corresponding occlusion image can be generated using the target image template. Specifically, in the first scheme, the target image template can be used to generate an occlusion image of the size of a common message pop-up window. In the second scheme, the target image template can be used to generate a size equal to An occlusion image with the same size as the area where the message popup is located.

In some embodiments, when the processing method provided by the present disclosure is executed by the first terminal, the above-mentioned obtaining the user's selection instruction may be, before starting to process the video (if it is applied to a live broadcast scenario, the processing and recording of the video data are synchronized, Then, before starting to process the video, it is actually equivalent to before starting the screen recording), a selection interface of an alternative image template is displayed on the screen of the first terminal, and a variety of alternative mosaic styles and multiple presets can be displayed in the selection interface. , plus the option to display custom images to support video authors using their own uploaded images as occlusion images.

Then, after the video author clicks one of the candidate image templates, the processing program may recognize the click of the video author as a selection instruction, and then determine the clicked candidate image template as the target image template.

When the processing method provided by the present disclosure is executed by the server, the first terminal can obtain the user's selection instruction in the above manner, and then send the selection instruction to the server, so that the server determines the target image template. In addition, the server can also display the above-mentioned selection interface to the administrator of the server on the local control terminal, and the administrator can input the selection instruction by clicking on the selection interface.

When the processing method provided by the present disclosure is executed by the second terminal, the above-mentioned user's selection instruction may be a click instruction of a video viewer currently using the second terminal. Similarly, the second terminal may also display the above-mentioned selection interface on the screen, The video viewer then selects one of a variety of alternative image templates as the target image template.

Among them, a mosaic can be understood as an image obtained by repeatedly filling a certain area with a simple geometric figure. Correspondingly, the various alternative mosaic styles displayed in the interface can be understood as a variety of geometric figures that can be used for filling (or called a mosaic pattern). In addition, the user can also set the filling properties of the selected geometric figures in the selection interface, such as the filling color, the density of filling in a specific area, the size of each geometric figure, etc.

In the foregoing second solution, when the selected target image template is one of multiple alternative mosaic styles, the process of generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located can refer to FIG. 7 .

As shown in Figure 7, after determining the area where the message pop-up window is located in the target video frame, you can read the geometric figure selected as the target image template, and then generate a corresponding blank area based on the size of the area where the message pop-up window is located, and In this generated blank area, multiple selected geometric figures are filled according to the filling attribute set by the user in the selection interface until a certain end condition is satisfied, wherein the filling condition can be that the filling geometric figure is in the area where the message pop-up window is located The ratio of the area covered in the message pop-up window to the total area of the area where the message pop-up window is located is greater than a certain threshold, and finally the filled image is added as an occlusion image to (or covered) the area where the message pop-up window of the target video frame is located, and is replaced. video frame.

Filling the mosaic pattern to generate the occlusion image is relatively simple in actual implementation, and the processing program only needs to save the data of the simple mosaic pattern and copy these patterns during filling. Therefore, by filling the mosaic pattern to generate the occlusion image, the processing can be reduced. The storage space that a program occupies in an electronic device.

When the selected target image template is a preset image, based on the target image template, the process of generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located may refer to FIG. 8 .

As shown in Figure 8, firstly, you can generate a screenshot box whose size is the same as the size of the area where the message pop-up window is located (the rectangular box in Figure 8), and then use this screenshot box to capture a partial image in the selected preset image , the captured image is an occlusion image with the same size as the area where the message pop-up window is located. Finally, the intercepted image can be added to the area where the message pop-up window of the target video frame is located.

The location of the screenshot may be determined randomly, or designated by the user (video author or video viewer), and may also be consistent with the location of the message pop-up window in the target video frame.

There are generally few optional mosaic styles, which are difficult to meet the personalized needs of different users. By intercepting occlusion images from the preset images, users can be allowed to make more personalized settings. For example, the video author can choose the preferred one. Photos are used as objects to capture occluded images.

It can be seen from the schematic diagrams in Figures 7 and 8 that whether the occlusion image is obtained based on the preset alternative mosaic style filling, or the occlusion image is intercepted from the preset image, the occlusion image added at the end is often the same as the target video frame. There is a big difference in the content of the images originally displayed in the video, resulting in a more obtrusive area in the displayed replacement video frame, and the viewing experience of the video viewers is poor during actual playback.

Therefore, an embodiment of the present disclosure also provides a method for generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located. The method may include:

First, read the previous video frame of the target video frame that does not contain the message pop-up window from the video data.

The above steps can also be considered as reading a video frame that is located before the target video frame and is closest to the target video frame and does not contain a message pop-up window.

For example, assuming that the target video frame to which the occlusion image needs to be added is the Nth video frame in the video data, and the first two video frames contain message pop-ups, then the third video before the target video frame can be read. frame, that is, the N-3th video frame of the video data.

Or, if the Nth video frame contains a message popup window, but the previous video frame (ie, the N-1th video frame) does not contain a message popup window, the previous video frame is read.

Then, an image located in the same area as the message pop-up window is intercepted in the read video frame to obtain an occlusion image.

For the process performed by the above method, reference may be made to FIG. 9 .

As shown in Figure 9, it is assumed that the Nth video frame is the target video frame containing the message pop-up window, and the previous read video frame that does not contain the message pop-up window is the second previous video frame, that is, the N-2th video frame Then, based on the size of the area where the message pop-up window is located in the target video frame, a screenshot box of the same size is generated, and then in the area where the message pop-up window is located in the N-2th video frame, use the screenshot box to start from the Nth video frame. - Capture the occlusion image from 2 video frames, and add the occlusion image to the area where the message pop-up window of the target video frame (ie, the Nth video frame) is located to obtain the replacement video frame.

As can be seen from FIG. 9, in the above-mentioned method for generating an occlusion image, since the difference between the content displayed by the adjacent first few video frames and the current target video frame to be processed is small, the added occlusion image and the replacement The image content of other areas in the video frame (except the area where the message pop-up window is located) is similar. After replacing the target video frame in the original video data with the replacement video frame, the video viewer of the video data is not easy to find the corresponding area. other images to improve the viewing experience of video viewers while protecting the privacy of video authors.

In addition to adding the occlusion image, the image processing method for the area where the message pop-up window is located in the embodiment of the present disclosure may also be to cut the message pop-up window from the target video frame. In this processing method, the replacement video frame is the message The video frame after the popup is cut.

Figure 10 is a schematic diagram of cutting a message pop-up window from a target video frame. As shown in Figure 10, after determining the area where the message pop-up window is located in the target video frame, the image cutting technology can be directly used to cut the message from the target video frame. The message pop-up window is cut to obtain a replacement video frame. Through this processing method, in the replacement video frame, the area where the original message pop-up window is located is changed to a blank area. Obviously, the replacement video frame obtained in this way does not contain the text in the message pop-up window. When the replacement video frame is displayed to the video viewer on the second terminal, the video viewer cannot see the message in the message pop-up window, so as to protect the video. The effect of author privacy.

Finally, in the image processing method for the area where the message pop-up window is located in the embodiment of the present disclosure, the pixels in the area where the message pop-up window is located may also be blurred to obtain a replacement video frame. In this processing method, the replacement video frame is the video frame that contains the blurred message popup.

FIG. 11 is a schematic diagram of blurring the area where the message pop-up window is located. As shown in FIG. 11 , after the area where the message pop-up window is located is determined in the target video frame, an image can be applied to the pixels in the area where the message pop-up window is located. The blurring technology blurs the text clearly displayed in the message pop-up window in the target video frame. As can be seen from Figure 11, in the replacement video frame obtained after blurring, the text in the message pop-up window cannot be recognized, which is equivalent to that the replacement video frame does not contain the text in the message pop-up window, even if it is displayed to the video viewer on the terminal device The replacement video frame shown in Figure 11 is displayed without revealing the privacy of the video author.

Compared with the method of adding an occlusion image to the area where the message pop-up window is located, the two processing methods of cutting the message pop-up window from the target video frame and blurring the area where the message pop-up window is located do not require additional acquisition except to be processed. Image resources other than video only need to be cut or blurred for the processing target video frame itself. Therefore, compared with the processing method of adding occlusion images, the latter two processing methods can complete the processing of the target video frame in a shorter time, have higher processing efficiency, and consume less resources on electronic devices than adding occlusion images. The processing scheme for occlusion images.

In combination with the video data processing method provided by any embodiment of the present disclosure, an embodiment of the present disclosure also provides a video data processing apparatus. As shown in FIG. 12 , the apparatus may include the following units:

The searching unit 1201 is configured to perform searching for a target video frame in the video data.

The video data is obtained by recording the screen of the target device, and the target video frame contains a message pop-up window.

The determining unit 1202 is configured to perform, in response to finding the target video frame in the video data, determining the area where the message pop-up window is located in the target video frame.

The processing unit 1203 is configured to perform processing on the area where the message pop-up window of the target video frame is located to obtain the replacement video frame.

Wherein, the replacement video frame does not include the text in the message pop-up window of the target video frame, and the replacement video frame is used to replace the target video frame.

In some embodiments, the above-mentioned processing device further includes:

The monitoring unit 1204 is configured to perform real-time monitoring of the message push service of the target device, and to obtain the push time of the message to be pushed of the message push service; wherein, the target device is the host device in the network live broadcast system, that is, the terminal device used by the host.

Wherein, the search unit 1201 specifically executes:

Find the target video frame from the video frames included in the video data and located within the preset time period after the push time of the message to be pushed.

In some embodiments, the above-mentioned processing device further includes:

The detection unit 1205 is configured to detect the message prompt sound corresponding to the message pop-up window in the audio track of the video data;

Among them, to find the unit, the specific implementation is as follows:

From the video data, intercept a plurality of video frames within a preset time length before the appearance moment of the message prompt sound and a plurality of video frames within a preset time length after the appearance time of the message prompt sound;

Find the target video frame among multiple captured video frames.

In some embodiments, the processing unit 1203 specifically executes:

Cut the message popup from the target video frame to get the replacement video frame.

In some embodiments, the processing unit 1203 specifically executes:

The pixels in the area where the message pop-up window is located are blurred to obtain the replacement video frame.

In some embodiments, the processing unit 1203 may include:

a size determination unit, configured to determine the size of the area where the message pop-up window is located;

a generating unit, configured to generate an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

The adding unit is configured to add the generated occlusion image in the area where the message pop-up window is located to obtain the replacement video frame.

In some embodiments, the processing unit 1203 further includes:

Template determination unit, configured to execute:

Get the user's selection instruction;

Among the preset multiple candidate image templates, the candidate image template selected by the selection instruction is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images ;

Wherein, when the generating unit generates an occlusion image whose size is consistent with the size of the area where the message pop-up window is located, it specifically executes:

According to the target image template, an occlusion image whose size is consistent with the size of the area where the message pop-up window is located is generated.

In some embodiments, the generating unit specifically executes:

Read the previous video frame of the target video frame that does not contain the message pop-up window from the video data;

The occlusion image is obtained by intercepting the image in the same area as the message popup in the previous video frame that does not contain the message popup.

For the specific working principle of the video data processing apparatus provided by any embodiment of the present disclosure, reference may be made to the corresponding steps in the video data processing method provided by any embodiment of the present disclosure, which will not be described in detail here.

The present disclosure relates to an apparatus for processing video data, wherein the searching unit 1201 searches the video data for a target video frame containing a message pop-up window, and when the target video frame is obtained, the determining unit 1202 determines the location of the message pop-up window in the target video frame The processing unit 1203 processes the area where the message pop-up window of the target video frame is located to obtain a replacement video frame that does not contain the text in the message pop-up window, wherein the replacement video frame is used to replace the target video frame in the video data. In the message pop-up window recorded during the screen recording process, the text in it will be deleted by the image processing method for the area where the message pop-up window is located in this solution, and will not be leaked to users who watch the video data. In the process of recording video, you can not only protect your privacy, but also browse messages through the message push service normally.

As described above, the method for processing screen recording video provided by the embodiments of the present disclosure can be applied to the first terminal, the second terminal, and the server. Terminal and server.

Embodiments of the present disclosure further provide a storage medium for storing computer instructions, and when the computer instructions in the storage medium are executed by a processor of an electronic device, the electronic device can perform the following steps:

Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame;

Process the area where the message pop-up window is located in the target video frame, and obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame.

In an exemplary embodiment, a storage medium including instructions, such as a memory including instructions, is also provided, and the above-mentioned instructions can be executed by the processor 1301 of the electronic device shown in FIG. 13 to complete the above-mentioned method. Alternatively, the storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage equipment, etc.

An embodiment of the present disclosure provides a computer program product, including a computer program/instruction, when the computer program/instruction is executed, the following steps are implemented:

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions, and implement the following steps:

In some embodiments, the processor is further configured to implement the following steps:

Monitor the message push service of the target device in real time, and obtain the push time of the message to be pushed of the message push service; wherein, the target device is an anchor device in a live webcast system;

Wherein, the searching for the target video frame in the video data includes:

Find the target video frame from the video frames included in the video data and located within a preset time period after the push moment of the message to be pushed.

Detecting the message prompt sound corresponding to the message pop-up window in the audio track of the video data;

Wherein, the searching for the target video frame in the video data includes:

From the video data, intercept a plurality of video frames within a preset time period before the appearance time of the message prompt sound and a plurality of video frames within a preset time period after the appearance time of the message prompt sound ;

The target video frame is searched for among the plurality of video frames obtained through interception.

In some embodiments, the processing of the area where the message pop-up window is located in the target video frame to obtain a replacement video frame includes:

The message pop-up window is cropped from the target video frame to obtain a replacement video frame.

The pixels in the area where the message pop-up window is located are blurred to obtain a replacement video frame.

determining the size of the area where the message pop-up window is located;

generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

The generated occlusion image is added to the area where the message pop-up window is located to obtain a replacement video frame.

Get the user's selection instruction;

Among the preset multiple candidate image templates, the candidate image template selected by the selection instruction is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images;

Wherein, the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located includes:

In some embodiments, the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located includes:

The previous video frame of the target video frame that does not contain the message pop-up window is read from the video data;

An image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window is intercepted to obtain an occlusion image.

Fig. 13 is a structural diagram of an electronic device according to an exemplary embodiment. Referring to FIG. 13 , for example, the electronic device 1300 may be a terminal device such as a mobile phone, a computer, and a tablet device, and may also be a server device.

Referring to diagram 1300, an electronic device may include one or more of the following components: a processing component 1302, a memory 1304, a power supply component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, And the communication component 1316.

The processing component 1302 is generally used to perform overall operations of the electronic device 1300, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 1302 can include one or more processors 1320 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 1302 may include one or more modules that facilitate interaction between processing component 1302 and other components. For example, processing component 1302 may include a multimedia module to facilitate interaction between multimedia component 1308 and processing component 1302.

The memory 1304 is configured to store various types of data to support operation at the electronic device 1300 . Examples of such data include instructions for any application or method operating on electronic device 1300, contact data, phonebook data, messages, pictures, videos, and the like. Memory 1304 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

Power supply assembly 1306 provides power to various components of electronic device 1300 . Power supply components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 1300 .

Multimedia component 1308 includes a screen that provides an output interface between electronic device 1300 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 1308 includes a front-facing camera and/or a rear-facing camera. When the electronic device 1300 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

Audio component 1310 is configured to output and/or input audio signals. For example, audio component 1310 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 1300 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 1304 or transmitted via communication component 1316 . In some embodiments, audio component 1310 also includes a speaker for outputting audio signals.

The I/O interface 1312 provides an interface between the processing component 1302 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

Sensor assembly 1314 includes one or more sensors for providing status assessments of various aspects of electronic device 1300 . For example, the sensor assembly 1314 can detect the open/closed state of the electronic device 1300, the relative positioning of the components, such as the display and the keypad of the electronic device 1300, the sensor assembly 1314 can also detect the electronic device 1300 or one of the electronic device 1300 Changes in the positions of components, presence or absence of user contact with the electronic device 1300 , orientation or acceleration/deceleration of the electronic device 1300 and changes in the temperature of the electronic device 1300 . Sensor assembly 1314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 1314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 1316 is configured to facilitate wired or wireless communication between electronic device 1300 and other devices. Electronic device 1300 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 1316 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, electronic device 1300 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programming gate array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components are implemented for executing the video data processing method provided by any embodiment of the present disclosure.

Wherein, when the above-mentioned electronic device 1300 is a terminal device such as a mobile phone, a computer, a tablet device, etc., the electronic device may include each component shown in FIG. 13 , and when the above-mentioned electronic device is a server device, the electronic device may only include the components shown in FIG. 13 . The memory 1304, the power component 1306, the processing component 1302 and the communication component 1316.

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Claims

A method for processing video data, comprising:

Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame;

Process the area where the message pop-up window is located in the target video frame, and obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame.
The method according to claim 1, wherein the method further comprises:

Monitor the message push service of the target device in real time, and obtain the push time of the message to be pushed of the message push service; wherein, the target device is an anchor device in a live webcast system;

Wherein, the searching for the target video frame in the video data includes:

Find the target video frame from the video frames included in the video data and located within a preset time period after the push moment of the message to be pushed.
The method according to claim 1 or 2, wherein the method further comprises:

Detecting the message prompt sound corresponding to the message pop-up window in the audio track of the video data;

Wherein, the searching for the target video frame in the video data includes:

From the video data, intercept a plurality of video frames within a preset time period before the appearance time of the message prompt sound and a plurality of video frames within a preset time period after the appearance time of the message prompt sound ;

The target video frame is searched for among the plurality of video frames obtained through interception.
The method according to any one of claims 1 to 3, wherein the processing of the area where the message pop-up window is located in the target video frame to obtain a replacement video frame comprises:

The message pop-up window is cropped from the target video frame to obtain a replacement video frame.
The method according to any one of claims 1 to 4, wherein the processing of the region where the message pop-up window is located in the target video frame to obtain a replacement video frame comprises:

The pixels in the area where the message pop-up window is located are blurred to obtain a replacement video frame.
The method according to any one of claims 1 to 5, wherein the processing of the area where the message pop-up window is located in the target video frame to obtain a replacement video frame comprises:

determining the size of the area where the message pop-up window is located;

generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

The generated occlusion image is added to the area where the message pop-up window is located to obtain a replacement video frame.
The method of claim 6, further comprising:

Get the user's selection instruction;

Among the preset multiple candidate image templates, the candidate image template selected by the selection instruction is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images;

Wherein, the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located includes:

According to the target image template, an occlusion image whose size is consistent with the size of the area where the message pop-up window is located is generated.
The method according to claim 6 or 7, wherein the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located comprises:

The previous video frame of the target video frame that does not contain the message pop-up window is read from the video data;

An image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window is intercepted to obtain an occlusion image.
A device for processing video data, comprising:

a search unit, configured to search for a target video frame in video data, wherein the video data is video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

a determining unit, configured to perform a search in the video data to obtain the target video frame, and determine the area where the message pop-up window is located in the target video frame;

a processing unit, configured to process the area where the message pop-up window is located in the target video frame to obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the The replacement video frame is used to replace the target video frame.
The device of claim 9, further comprising:

a monitoring unit, configured to perform real-time monitoring of the message push service of the target device, and obtain the push time of the message to be pushed of the message push service; wherein, the target device is an anchor device in a live webcast system;

wherein the lookup unit is configured to perform:

Find the target video frame from the video frames included in the video data and located within a preset time period after the push moment of the message to be pushed.
The device according to claim 9 or 10, further comprising:

a detection unit, configured to detect the message prompt sound corresponding to the message pop-up window in the audio track of the video data;

wherein the lookup unit is configured to perform:

From the video data, intercept a plurality of video frames within a preset time period before the appearance time of the message prompt sound and a plurality of video frames within a preset time period after the appearance time of the message prompt sound ;

The target video frame is searched for among the plurality of video frames obtained through interception.
The apparatus according to any one of claims 9 to 11, wherein the processing unit is configured to execute:

Cut the message pop-up window from the target video frame to obtain a replacement video frame.
The apparatus according to any one of claims 9 to 12, wherein the processing unit is configured to execute:

The pixels in the area where the message pop-up window is located are blurred to obtain a replacement video frame.
The device according to any one of claims 9 to 13, wherein the processing unit comprises:

a size determination unit, configured to perform determining the size of the area where the message pop-up window is located;

a generating unit, configured to generate an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

The adding unit is configured to execute adding the generated occlusion image in the area where the message pop-up window is located to obtain a replacement video frame.
The apparatus according to claim 14, wherein the processing unit further comprises:

Template determination unit, configured to execute:

Get the user's selection instruction;

Among the preset multiple candidate image templates, the candidate image template selected by the selection instruction is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images;

Wherein, when the generating unit generates an occlusion image whose size is consistent with the size of the area where the message pop-up window is located, it specifically executes:

According to the target image template, an occlusion image whose size is consistent with the size of the area where the message pop-up window is located is generated.
The apparatus according to claim 14 or 15, wherein the generating unit is configured to perform:

The previous video frame of the target video frame that does not contain the message pop-up window is read from the video data;

An image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window is intercepted to obtain an occlusion image.
An electronic device, comprising:

processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the following steps:

Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame;

Process the area where the message pop-up window is located in the target video frame, and obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame.
The electronic device of claim 17, wherein the processor is further configured to implement the following steps:

Monitor the message push service of the target device in real time, and obtain the push time of the message to be pushed of the message push service; wherein, the target device is an anchor device in a live webcast system;

Wherein, the searching for the target video frame in the video data includes:

Find the target video frame from the video frames included in the video data and located within a preset time period after the push moment of the message to be pushed.
The electronic device according to claim 17 or 18, wherein the processor is further configured to implement the following steps:

Detecting the message prompt sound corresponding to the message pop-up window in the audio track of the video data;

Wherein, the searching for the target video frame in the video data includes:

From the video data, intercept a plurality of video frames within a preset time period before the appearance time of the message prompt sound and a plurality of video frames within a preset time period after the appearance time of the message prompt sound ;

The target video frame is searched for among the plurality of video frames obtained through interception.
The electronic device according to any one of claims 17 to 19, wherein the processing of the region where the message pop-up window is located in the target video frame to obtain a replacement video frame, comprises:

The message pop-up window is cropped from the target video frame to obtain a replacement video frame.
The electronic device according to any one of claims 17 to 20, wherein the processing of the region where the message pop-up window is located in the target video frame to obtain a replacement video frame comprises:

The pixels in the area where the message pop-up window is located are blurred to obtain a replacement video frame.
The electronic device according to any one of claims 17 to 21, wherein the processing of the region where the message pop-up window is located in the target video frame to obtain a replacement video frame, comprises:

determining the size of the area where the message pop-up window is located;

generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located;

The generated occlusion image is added to the area where the message pop-up window is located to obtain a replacement video frame.
The electronic device of claim 22, wherein the processor is further configured to implement the following steps:

Get the user's selection instruction;

Among the preset multiple candidate image templates, the candidate image template selected by the selection instruction is determined as the target image template; wherein, the multiple candidate image templates include multiple candidate mosaic styles and multiple preset images;

Wherein, the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located includes:

According to the target image template, an occlusion image whose size is consistent with the size of the area where the message pop-up window is located is generated.
The electronic device according to claim 22 or 23, wherein the generating an occlusion image whose size is consistent with the size of the area where the message pop-up window is located comprises:

The previous video frame of the target video frame that does not contain the message pop-up window is read from the video data;

An image located in the same area as the message pop-up window in the previous video frame that does not contain the message pop-up window is intercepted to obtain an occlusion image.
A storage medium, characterized in that, when the instructions in the storage medium are executed by a processor of an electronic device, the electronic device can perform the following steps:

Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame;

Process the area where the message pop-up window is located in the target video frame, and obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame. .
A computer program product, comprising a computer program/instruction, characterized in that, when the computer program/instruction is executed by a processor, the following steps are implemented:

Find the target video frame in the video data, wherein the video data is the video data obtained by recording the screen of the target device; the target video frame contains a message pop-up window;

In response to finding the target video frame in the video data, determine the area where the message pop-up window is located in the target video frame;

Process the area where the message pop-up window is located in the target video frame, and obtain a replacement video frame, so that the replacement video frame does not contain the text in the message pop-up window; wherein, the replacement video frame is used to replace the target video frame.