CN117014652A

CN117014652A - Method and device for positioning cartoon video frame, storage medium and electronic equipment

Info

Publication number: CN117014652A
Application number: CN202211028243.7A
Authority: CN
Inventors: 温力; 熊婷; 谢丽娟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2023-11-07

Abstract

The invention discloses a method and a device for positioning a cartoon video frame, a storage medium and electronic equipment. Wherein the method comprises the following steps: the method comprises the steps of obtaining an interaction information stream fed back to a live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video; under the condition that target interaction information is determined in the interaction information stream, extracting a target video frame sequence associated with the target interaction information from the live video, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; acquiring a target video frame pair with the difference degree between every two adjacent video frames in a target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames; and determining a stuck video frame in which the stuck in the live video occurs by utilizing the target video frame pair. The invention solves the technical problem that the existing stuck video frame positioning is inaccurate in the stuck discovery mode.

Description

Method and device for positioning cartoon video frame, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computers, and in particular, to a method and apparatus for positioning a katon video frame, a storage medium, and an electronic device.

Background

In a live service scene, the picture blocking problem is an important index for measuring the viewing experience, and is also one of the service indexes most concerned by service developers. In the existing live broadcast cartoon discovery method, data reporting is mainly performed in a live broadcast key link through a service party, such as the reporting of data including cartoon, second opening, end-to-end delay, success rate of opening, frame skip of opening and the like. Or according to the data index, a series of abnormal events are primarily analyzed and defined to alarm, such as too low live code rate, too low sound code rate and the like.

However, the existing live-broadcast picture cartoon discovery method has the limitation that only the problem in the expectation can be discovered, because the defined data is reported, the abnormal event is reported in the expectation link, and the problem out of the expectation is difficult to discover. That is, in the case of unexpected stuck problem, the existing method cannot accurately find the stuck video frame, that is, the existing stuck discovery mode has the technical problem of inaccurate positioning of the stuck video frame.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for locating a cartoon video frame, a storage medium and electronic equipment, which at least solve the technical problem that the cartoon video frame is located inaccurately in the existing cartoon discovery mode.

According to an aspect of an embodiment of the present invention, there is provided a method for positioning a cartoon video frame, including: acquiring an interaction information stream fed back to a live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video; under the condition that target interaction information is determined in the interaction information stream, extracting a target video frame sequence associated with the target interaction information from the live video, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; acquiring a target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between two adjacent video frames; and determining the stuck video frame of the stuck in the live video by utilizing the target video frame pair.

According to another aspect of the embodiment of the present application, there is also provided a positioning device for a cartoon video frame, including: the first acquisition unit is used for acquiring an interaction information stream fed back to the live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video; the extraction unit is used for extracting a target video frame sequence associated with the target interaction information from the live video under the condition that the target interaction information is determined in the interaction information stream, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; a second obtaining unit, configured to obtain a target video frame pair in which a difference degree between every two adjacent video frames in the target video frame sequence is less than or equal to a target threshold, where the difference degree is determined based on an area of a difference region between two adjacent video frames; and the determining unit is used for determining the stuck video frame which is stuck in the live video by utilizing the target video frame pair.

According to yet another aspect of embodiments of the present application, there is provided a computer program product comprising a computer program/instructions stored in a computer readable storage medium. The processor of the computer device reads the computer program/instructions from the computer readable storage medium, and the processor executes the computer program/instructions to cause the computer device to perform the method of locating a cartoon video frame as above.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the method for positioning a katon video frame according to the above-described computer program.

In the embodiment of the invention, the interactive information stream fed back to the live video is obtained, wherein the interactive information stream comprises a plurality of pieces of interactive information sent by the audience account for watching the live video; under the condition that target interaction information is determined in the interaction information stream, extracting a target video frame sequence associated with the target interaction information from the live video, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; acquiring a target video frame pair with the difference degree between every two adjacent video frames in a target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames; and determining a stuck video frame of which the stuck video is in the live video by utilizing the target video frame pair, further positioning a video frame sequence suspected to be stuck through interaction information, checking the real situation of the stuck according to the determined difference degree of the area of the difference region of the adjacent frames in the video frame sequence, and finally determining the stuck video frame according to the comparison result of the video frames, thereby solving the technical problem of inaccurate positioning of the stuck video frame in the existing live video stuck discovery method.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of a hardware environment of an alternative method of locating a video frame in a video clip according to an embodiment of the present application;

FIG. 2 is a flow chart of an alternative method of locating a cartoon video frame according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative method of locating a video frame in a video clip according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative method of locating a cartoon video frame according to an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another alternative method of locating a video frame of a video clip according to an embodiment of the present application;

FIG. 6 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present application;

FIG. 7 is a flow chart of another alternative method of locating a cartoon video frame according to an embodiment of the present application;

FIG. 8 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present application;

FIG. 9 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 15 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 16 is a schematic diagram of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 17 is a flow chart of yet another alternative method of locating a cartoon video frame according to an embodiment of the present invention;

FIG. 18 is a schematic diagram of an alternative method of alerting a cartoon video frame according to an embodiment of the present invention;

FIG. 19 is an effect diagram of an alternative method of locating a video frame of a video clip according to an embodiment of the present invention;

FIG. 20 is a schematic diagram of an alternative positioning device for a video frame according to an embodiment of the present invention;

fig. 21 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present invention, a method for locating a katon video frame is provided, as an optional implementation manner, the method for locating a katon video frame may be, but is not limited to, a katon video frame locating system applied in a hardware environment as shown in fig. 1, where the katon video frame locating system may include, but is not limited to, a terminal device 102, a network 104, a server 106, a database 108, and a terminal device 110. A first target client and a second target client are respectively running in the terminal device 102 and the terminal device 110 (as shown in fig. 1, a first target client is running in the terminal device 102, and a second target client is running in the terminal device 110, where the first target client may be a viewer client of a live broadcast application, and the second target client is a host client of the live broadcast application). The terminal device 102 and the terminal device 110 respectively include a man-machine interaction screen, a processor and a memory. The man-machine interaction screen is used for displaying live pictures and interaction information streams (such as a live interface shown in fig. 1) in the pictures, and is also used for providing a man-machine interaction interface to receive man-machine interaction operation for feedback operation on the current live picture. The processor is used for responding to the man-machine interaction operation to generate an interaction instruction and sending the interaction instruction to the server. The memory is used for the cached live video stream. Wherein, a viewer client of a live broadcast application is running in the terminal device 102, so that the first object account can watch live broadcast pictures and send interactive information; it may be appreciated that the terminal device 110 runs a hosting client running a live application, so that the second object account starts a live task in the terminal device 110 and sends a live data stream. The first object account may be a viewer account, and the second object account may be a anchor account.

In addition, a processing engine is included in the server 106 for performing storage or reading operations on the database 108. Specifically, the processing engine reads the interactive information stream fed back to the live video from the database 108, and performs a katon analysis on the video frame sequence corresponding to the target interactive information.

Assuming that the terminal device 102 in fig. 1 has a first object account registered therein, and the terminal device 110 has a second object account registered therein, the specific procedure of this embodiment includes the following steps: as step S102, the terminal device 110 sends live broadcast data to the terminal device 102 and the server 106 through the network 104; next, as step S104, displaying the current live broadcast picture in the display interface in the terminal device 104, and feeding back the interactive information to the current live broadcast picture in response to the interactive operation; step S106 is then executed, where the terminal device 102 sends the interaction information to the terminal device 110 and the server 106 through the network 104; next, as by step S108, the interactive information stream is displayed in the terminal device 102; finally, as shown in step S110 to step S116, the server obtains an interaction information stream fed back to the live video under the condition that the live data stream and the interaction information stream are received, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an account number of a viewer watching the live video; under the condition that target interaction information is determined in the interaction information stream, extracting a target video frame sequence associated with the target interaction information from the live video, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; acquiring a target video frame pair with the difference degree between every two adjacent video frames in a target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames; and determining a stuck video frame in which the stuck in the live video occurs by utilizing the target video frame pair.

As an alternative embodiment, when the terminal device 102 or the terminal device 110 has a relatively high computing capability, the steps S110 to S116 may be performed by the terminal device 102 or the terminal device 110. Here, this is an example, and is not limited in any way in the present embodiment.

Alternatively, in the present embodiment, the above-mentioned terminal device may be a terminal device configured with a target client, and may include, but is not limited to, at least one of the following: a mobile phone (e.g., an Android mobile phone, iOS mobile phone, etc.), a notebook computer, a tablet computer, a palm computer, a MID (Mobile Internet Devices, mobile internet device), a PAD, a desktop computer, a smart television, etc. The target client may be a video client, an instant messaging client, a browser client, an educational client, etc. that supports providing shooting game tasks. The network may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communications. The server may be a single server, a server cluster composed of a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.

According to the embodiment of the application, the interactive information stream fed back to the live video is obtained, wherein the interactive information stream comprises a plurality of pieces of interactive information sent by the audience account for watching the live video; under the condition that target interaction information is determined in the interaction information stream, extracting a target video frame sequence associated with the target interaction information from the live video, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video; acquiring a target video frame pair with the difference degree between every two adjacent video frames in a target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames; and determining a stuck video frame of which the stuck video is in the live video by utilizing the target video frame pair, further positioning a video frame sequence suspected to be stuck through interaction information, checking the real situation of the stuck according to the determined difference degree of the area of the difference region of the adjacent frames in the video frame sequence, and finally determining the stuck video frame according to the comparison result of the video frames, thereby solving the technical problem of inaccurate positioning of the stuck video frame in the existing live video stuck discovery method.

As an alternative embodiment, as shown in fig. 2, the method for positioning a katon video frame includes the following steps:

s202, acquiring an interaction information stream fed back to a live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video;

the interactive information flow is described below. It can be understood that the interactive information stream is an information stream formed by a plurality of pieces of interactive information fed back by the user account for the current live video. The form of the interaction information can be information which can be displayed in the current live broadcast picture, and can also be feedback information received by a background. For example, in the case that the interaction information is information that can be displayed in the current live broadcast picture, the interaction information may be bullet screen information displayed in the current live broadcast video picture, comment information displayed in the current live broadcast picture, or gift information displayed in the current live broadcast picture; under the condition that the interaction information is feedback information received by a background, the interaction information flow can also be praise information, collection information, coin-feed information, report information, complaint information and the like fed back to the current live video.

S204, extracting a target video frame sequence associated with target interaction information from the live video under the condition that the target interaction information is determined in the interaction information stream, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video;

the above-described target interaction information is further described below. The target interaction information may be interaction information fed back by the user account and used for indicating that the current live broadcast picture is stuck. For example, in the case that the interactive information stream is a barrage information stream, the target interactive information may be barrage information carrying a "card" word, such as "Haoko-! Bullet screen information such as "," how so card "," whether or not it is card ", etc.;

in the case that the interactive information stream includes comment information in the live broadcast comment area, the target interactive information may also be comment information carrying a "card" word, for example: "good Ka o! Comment information such as "," how so card "," whether or not it is card ", etc.;

under the condition that the interactive information stream carries gift information of the current live broadcast picture, the target interactive information may also be gift identification information sent by the user and used for indicating that the current live broadcast picture is blocked, for example: the gift identification for indicating a live gift with a plurality of characters displayed on a live screen in a full screen manner can also be used for indicating a gift identification for displaying a card character in a live screen in a manner of falling according to snowflake animation;

Under the condition that the interaction information stream carries touch feedback of the current live broadcast picture, the target interaction information can be trigger information of a user account on a target control in a live broadcast interface, for example, a plurality of controls can be displayed in the live broadcast interface of the user account to receive one-key feedback of a user, for example, the target interaction information can be the trigger information of the first control fed back by the user account, for example, a first control "card" used for feeding back the current picture is stuck, a second control "praise" used for feeding back the favorite degree of the current live broadcast content, and a third control "step on" used for feeding back the anti-opinion of the current live broadcast content.

It should be understood that the types of interaction information and target interaction information described above are merely examples, and are not limiting on the specific embodiments of the method of the present application.

Further, in the case that the target interaction information is determined from the interaction information stream, the associated video frame sequence may be further determined according to the target interaction information. The video frame sequence associated with the target interaction information may be a video frame sequence associated with a transmission time of the target interaction information, for example, when the transmission time of the target interaction information is 13 minutes of a live video, the video frame sequence in the 13-minute video segment of the live video is acquired; the video frame sequence associated with the target interaction information may also be a video frame sequence associated with the information content of the target interaction information, for example, when the target interaction information is "anchor head portrait" and then the video frame sequence including the image "anchor head portrait" is obtained from the live video. The above association is merely an example, and the actual association is not limited.

In this embodiment, a target video segment corresponding to the transmission time may be determined according to the transmission time of the target interaction information, and the target video frame sequence may be obtained by sampling the target video segment at a certain sampling frequency. For example, a sequence of target video frames may be sampled from a target video segment at a sampling rate of 1 frame per second. In this embodiment, the sampling period may be determined according to the observation capability of the human eye, for example, the human eye is sensitive to a picture that is unchanged for 0.3 seconds, and the picture is easily judged as stuck based on the picture that is unchanged for 0.3 seconds, so that the sampling period for acquiring the video frame sequence may be determined according to the judging period of the human eye, that is, the target video frame sequence may be determined from the target video frame sequence according to the sampling period of 0.3 s/frame. And further verifying the real situation of the target interaction information according to the target video frame sequence acquired according to the sampling rate sensitive to human eyes.

S206, acquiring a target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames;

It will be appreciated that in this embodiment, in the case where a video frame sequence associated with the target interaction information is acquired, the degree of difference between every two adjacent video frames in the video frame sequence may be further acquired. In this embodiment, the above-described degree of difference may be determined according to the area of the difference region between two adjacent video frames.

As an alternative embodiment, the above-mentioned difference region may be a region where there is an image difference between two adjacent video frames, for example, may be a region where there is a difference in image color, a region where there is a difference in image resolution, a region where there is a difference in image sharpness, a region where there is a difference in objects displayed in an image, or the like.

An alternative difference region to the above is described below in connection with fig. 3. As shown in fig. 3 (a) and (b), a virtual scene is shown in both of the two figures, and in the upper right corner region in fig. 3 (b), a virtual building is shown, so that the region occupied by the virtual building is the difference region 301 corresponding to the (a) and (b) of fig. 3. It will be appreciated that the difference regions determined in the present embodiment are only examples, and the kinds of actually selectable difference regions are not limited.

Further, after determining the difference region through the comparison method, the region area of the difference region may be obtained, and the difference degree indicating the difference degree between the adjacent video frames may be further determined based on the region area of the difference region. Alternatively, the difference degree determined according to the area of the difference region may be in a proportional relationship with the area of the region, for example, the difference degree between two adjacent video frames may be greater when the area of the difference region is greater.

In an alternative manner, in the case where there are a plurality of difference regions, the above-mentioned degree of difference may be determined according to the total area of all the difference regions; in another alternative manner, in the case that there are a plurality of difference regions, the difference degree may be determined according to the total area of a part of the difference regions, for example, a difference region with a difference area greater than a certain threshold may be determined as a target difference region from all the difference regions, and the total area of the target difference region is obtained to determine the difference degree of the adjacent frames; in yet another alternative manner, a difference region with a difference degree greater than a certain threshold may be determined from all the difference regions as the target difference region, for example, a difference region with a hue difference degree greater than a certain threshold may be determined as the target difference region, and the total area of the target difference regions may be acquired to determine the difference degree of the adjacent frames. The above manner of acquiring the degree of difference is merely an example, and the manner of actually acquiring the degree of difference between two adjacent video frames is not limited.

In the method, under the condition that the difference degree between every two adjacent frames is obtained, the video frames with the difference degree smaller than the target threshold value are taken as target video frame pairs. It can be appreciated that in this embodiment, if the difference between two adjacent video frames is smaller than a certain threshold, a suspected stuck event in the corresponding video frame segment may be determined according to the two adjacent video frames.

S208, determining a stuck video frame with stuck in the live video by utilizing the target video frame pair.

In this embodiment, when the target video frame pair is acquired, the target video frame pair may be further utilized to determine a stuck video frame in the live video, where the stuck video frame occurs.

For example, in the case that the target video frame pair is detected, a sequence of inspection video frames matching the target video frame pair may be further acquired for a stuck-at inspection. Under the condition that the frame interval of the target video frame pair is 0.5s, a video frame sequence with smaller acquisition frame interval between the target video frame pair can be further acquired from the live video section, for example, a test video frame sequence with 0.05 s frame interval within 0.5s where the target video frame is acquired from the live video, and the stuck video frame with the stuck is accurately determined based on the test video frame sequence.

For another example, under the condition that the target video frame is obtained, visual content analysis can be performed on the picture content in the target video frame to determine whether real clamping occurs in the target video frame; and under the condition that the target video frame is acquired, reporting the target video frame and corresponding stuck event information to perform manual processing and analysis, and further determining the stuck video frame in which the stuck occurs.

As an optional implementation manner, the acquiring the target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence being less than or equal to the target threshold value includes: sequentially taking every two adjacent video frames in the target video frame sequence as a current video frame pair to be compared, and executing the following operations:

s1, determining at least one difference area between current video frame pairs;

s2, determining a target area corresponding to a target difference area from areas corresponding to each difference area in at least one difference area, wherein the target area is larger than the areas of other difference areas except the target difference area in at least one difference area;

s3, determining the current difference degree corresponding to the current video frame pair based on the target area;

and S4, determining the current video frame pair as a target video frame pair under the condition that the current difference degree is smaller than or equal to a target threshold value.

In this embodiment, when the target video frame sequence is acquired, two adjacent video frames in the target video frame sequence are sequentially acquired as the current video frame pair, and the difference between the two adjacent video frame pairs is determined by the method.

In the process of processing the current video frame pair, under the condition that at least one difference area of the current video frame pair is determined, a target difference area between the current video frame pair is further determined according to area comparison among a plurality of difference areas, and the difference degree of the current video frame pair is determined according to the target difference area.

As a preferable mode, a difference region with the largest area of the plurality of difference regions can be obtained to be determined as a target difference region, and the difference degree of the current video frame pair can be determined according to the area of the target difference region. For example, the area of the target difference region may be directly used as the difference degree of the current video frame pair.

Further, in the case of acquiring the difference degree of the current video frame pair, comparing the difference degree with a target threshold value according to the difference degree, and in the case that the current difference degree is smaller than or equal to the target threshold value, determining the current video frame as a target video frame pair for determining the stuck video frame.

It should be noted that, in this embodiment, the interactive information is first obtained to determine the video frame sequence suspected of having a katon, and because the interactive information is sent by the user account, there may be a certain noise interference, for example, may be the unrealistic interactive information sent by the user account under the condition of "rhythming", so that further analysis of the frame images in the video frame sequence is required by simulating the observation condition of the human eye on the katon phenomenon. The human eye generally has a low perceived capability for small image changes, such as the second difference region 402 in the graph (b) of fig. 4, which is difficult for the human eye to directly observe in comparison with the corresponding position in the graph (a) of fig. 4, with only one more line; the third difference region 403 in the diagram (b) of fig. 4 is that the color change occurs only in a small region, and the human eye is less aware of the small range of color changes than the corresponding position in the diagram (a) of fig. 4; in fig. 4 (b), the first difference region 401 has the largest area compared with the second difference region 402 and the third difference region 403, and the region with the largest variation between images is usually observed by the human eye, so that whether or not the jam occurs is determined based on the region with the largest variation between images. Therefore, in order to simulate the observation capability of the human eye, in this embodiment, the observation capability of the human eye is simulated by the method, and the difference between two adjacent video frames is further obtained, so that the accuracy of the interaction information fed back by the user account can be accurately checked.

Further, since there may be a plurality of minute difference regions in two video frames adjacent to each other in some cases, if the total area of the plurality of minute difference regions is taken as the degree of difference, a verification result opposite to the result observed by the human eye may be caused. As shown in fig. 5, a plurality of stuck lines are displayed in fig. 5, compared with fig. 5, in which (a) is a view, and a user can easily determine that a picture is stuck when viewing the picture; if the total area of the multiple small difference areas is used as the processing mode of the difference degree, the total area sum value of the multiple difference areas may become large, and then, under the condition that the total difference area between the (a) image and the (b) image in fig. 5 is larger than the target threshold value, a judgment result that no jamming occurs between the two images is obtained, which is contrary to the result observed by human eyes, and the jamming verification effect on the video frame may be affected.

Further, in the present embodiment, the target video frame pair is determined by performing the following operations on two adjacent video frames: determining at least one region of difference between the current pair of video frames; determining a target area corresponding to the target difference area from areas corresponding to the difference areas in at least one difference area, wherein the target area is larger than the areas of other difference areas except the target difference area in at least one difference area; determining the current difference degree corresponding to the current video frame pair based on the target area; under the condition that the current difference degree is smaller than or equal to a target threshold value, determining the current video frame pair as a target video frame pair, determining the difference degree according to the area of a difference area with the largest area, determining a target video frame for determining the cartoon video frame according to the difference degree, so as to realize the observation result of a simulated human eye, and performing accuracy check on target interaction information included in an interaction information stream, thereby improving the accuracy of the cartoon video frame check.

As an optional embodiment, the determining at least one difference region between the current video frame pair includes:

s1, acquiring first pixel data corresponding to pixel points in a first video frame and second pixel data corresponding to pixel points in a second video frame in a current video frame pair;

s2, comparing the first pixel data with the second pixel data to obtain a difference reference picture, wherein the difference reference picture is used for indicating pixel differences between pixel points in the first video frame and the second video frame;

and S3, carrying out contour extraction on the difference reference graph to obtain at least one difference region in the difference reference graph.

It can be appreciated that in this embodiment, in the case of acquiring the current video frame pair, the pixel value difference value of each pixel of the two video frames may be acquired, so as to determine the reference difference map according to the pixel value difference value of each pixel point. For example, when the pixel values of the first 9 pixels of the first line of the first video frame in the current video frame pair are (100, 100, 100, 100, 100) and the pixel values of the first 9 pixels of the first line of the second video frame are (110, 110, 110, 110, 110), respectively, the pixel values of the first 9 pixels of the first line of the first reference image obtained from the above-mentioned first video frame and second video frame pair are (10, 10, 10, 10, 10).

In the case of acquiring the above-mentioned reference difference map, a point in the reference difference map where the pixel value is not 0 may be a pixel point in a difference region where a difference exists in the first video frame and the second video frame. And further, the difference areas in the first video frame and the second video frame can be determined according to the areas with the pixels not being 0 in the reference difference graph.

In the case of acquiring the difference regions from the above-described reference difference map, the shapes of the difference regions in the reference difference map may be acquired by a contour extraction algorithm, and the areas of the respective difference regions may be further acquired according to a contour area algorithm. And further, the area of each difference area in the reference difference map can be determined.

According to the embodiment of the application, the first pixel data corresponding to the pixel point in the first video frame and the second pixel data corresponding to the pixel point in the second video frame in the current video frame pair are obtained; comparing the first pixel data with the second pixel data to obtain a difference reference picture, wherein the difference reference picture is used for indicating pixel differences between pixel points in the first video frame and pixel points in the second video frame; and extracting the outline of the difference reference picture to obtain at least one difference region in the difference reference picture, and further rapidly and accurately determining the difference region between the first video frame and the second video frame according to the first video frame and the second video frame.

As an optional embodiment, after comparing the first pixel data and the second pixel data to obtain the difference reference map, the method further includes:

s1, sequentially taking each pixel point in the difference reference map as a current pixel point, and executing the following operations: determining a neighborhood pixel set corresponding to the current pixel point, wherein the distance between each pixel point in the neighborhood pixel set and the current pixel point is smaller than a distance threshold;

s2, determining a median value of pixel values matched with the neighborhood pixel set according to the pixel values of all the pixel points in the neighborhood pixel set;

and S3, taking the median value of the pixel values as the pixel value of the current pixel point.

In this embodiment, after the reference difference map is obtained, median filtering may be performed on the reference difference map to perform smoothing processing on noise points in the reference difference map, so as to remove the influence of the noise points on the verification result.

Alternatively, the neighborhood pixel set may be a pixel point set composed of a plurality of pixel points surrounding the current pixel point, and the distance threshold may be a pixel distance.

The above method is described below with reference to fig. 6. As shown in fig. 6 (a), nine pixel points in a video frame are represented in the form of nine-grid squares, wherein the numerical value in each square indicates the pixel value of the current pixel. It can be seen that in fig. 6 (a), the pixel value of the pixel at the center is 200, and the pixel values of the pixels in its pixel neighborhood are all between 150 and 160, which is a noise point in the image area. Further, in the above manner, the center point in the graph (a) in fig. 6 is taken as the current pixel point, 8 pixels surrounding the current pixel point are obtained as the neighborhood pixel set, the pixel value of each pixel in the set is (150, 151, 152, 153, 155, 156, 157, 158), the median value of the pixel value set is obtained, and then the pixel value 154 is taken as the pixel value of the current pixel point. As shown in fig. 6 b, after the pixel value of the center pixel is updated to the median value of the neighboring pixel values, the display modes of the regions are more identical (i.e., the display modes of the colors are more similar).

By the above embodiment of the present application, each pixel point in the difference reference map is sequentially used as the current pixel point, and the following operations are performed: determining a neighborhood pixel set corresponding to the current pixel point, wherein the distance between each pixel point in the neighborhood pixel set and the current pixel point is smaller than a distance threshold; determining a median value of pixel values matched with the neighborhood pixel set according to the pixel values of all the pixel points in the neighborhood pixel set; and taking the median value of the pixel values as the pixel value of the current pixel point, and carrying out median filtering processing on the difference reference image so as to reduce the influence of noise points in the difference reference image on the detection difference region.

The method for obtaining the target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence being less than or equal to the target threshold is described below in connection with a specific embodiment.

After the corresponding video clip is acquired according to the target interaction information, the most important target is to judge whether the video clip is stuck or not. The step of judging whether the frames are stuck or not is performed after the video clips are subjected to frame decomposition.

The specific steps can be as shown in fig. 7, including the following steps:

s702, processing the contrast of the current video frame pair;

it can be appreciated that, before the step S702, the gray scale processing may be performed on the current video frame pair, that is, the color video frame image is converted into a gray scale image, that is, the RGB values of each pixel point in the original color video frame image are converted into corresponding gray scale values. In this embodiment, the converted current video frame-to-image may be as shown in fig. 8.

In the conventional method, the quantization result is usually output by calculating the similarity of the two images, but the method has the problems of mean value interference, dissatisfaction with human eye observation feeling, difficulty in threshold value division and the like. In this embodiment, the dissimilarity of the pictures is calculated mainly by calculating the maximum difference area between the two pictures, so that the detection result is close to the human eye observation result.

As shown in fig. 8, the difference is clearly seen by two human eyes, but the effect of the present embodiment is described by the above image which is difficult to be distinguished by the algorithm in the industry. As shown in fig. 8 (a) and 8 (b), the first region 801 and the second region 802 are different regions, so that the human eye can significantly observe that the two regions are displayed in different manners (different brightness), but in the conventional manner, the similarity of the two images in fig. 8 is often high, and thus the two images cannot be distinguished, and it is easy to incorrectly determine that a jam occurs between the two images.

In this step, the contrast enhancement processing is performed on the two images, and because the human eyes have differences in attention, the human eyes tend to pay attention to the portions with differences in brightness and structure, so that the contrast enhancement preprocessing is performed on the two images, which is beneficial to reserving the portions with higher attention when the difference is made in the subsequent steps.

In fig. 8, the graph (a) in fig. 8 is denoted by the reference number 0, and the graph (b) in fig. 8 is denoted by the reference number 1. When Contrast processing is performed, contrast Limited self-adaptive histogram equalization (CLAHE) is used, and the method is a variant of self-adaptive histogram equalization, and achieves the technical effect of reducing noise amplification by limiting the amplification degree of the Contrast of each pixel point.

The processing procedure of the step can be recorded as follows:

C _{img_n} ＝Clane(img_n)n∈[0，1]

wherein C is _{img_n} Expressed as a graph after the contrast-limited adaptive histogram equalization process, clane (img—n) is expressed as a contrast-limited adaptive histogram equalization process method.

As shown in fig. 9, after the limiting contrast adaptive histogram equalization processing is performed on the (a) graph in fig. 9, the (c) graph in fig. 9 is obtained; after the limiting contrast adaptive histogram equalization processing is performed on the (b) graph in fig. 9, the (d) graph in fig. 9 is obtained. It can be seen that, after the limited contrast adaptive histogram equalization process, the difference between the first region 901 in the (c) diagram of fig. 9 and the second region 902 in the (d) diagram of fig. 9, which are the main, is more significant.

S704, determining a contrast difference map according to the pixel difference value of the current video frame pair;

after the saliency processing in S702, a gray difference is made at this stage, that is, a gray difference between two pictures is obtained, which can be expressed as:

Sub _img ＝abs(C _{img_0} -C _{img_1} )n∈[0，1]

wherein Sub _img Represented as a gray scale difference image, abs (C _{img_0} -C _{img_1} ) Represented as the absolute value of the difference between two salient images.

As shown in fig. 10, in order to obtain a comparative difference map obtained by performing pixel difference processing on the (c) map in fig. 9 and the (d) map in fig. 9, a difference region 1001 shown in fig. 10 is a region corresponding to the first region 901 in the (c) map in fig. 9 and the second region 902 in the (d) map in fig. 9.

S706, median filtering processing is carried out on the contrast difference graph;

after the image difference value at the upper stage, part of the image noise points can be observed from the comparative difference map in fig. 10, and these noise points are not the primary concern part of the difference area calculation, so that the influence of this noise part needs to be removed by this step.

In this embodiment, since the two images are relatively close, the gray-scale difference pixel value is low, and it is difficult to observe, and the image shown in the following (a) image in fig. 11 is a gray-scale difference binarized image obtained by converting the gray-scale image in fig. 10 through binarization processing, so as to illustrate the noise phenomenon in the gray-scale difference image. The part outlined in the (a) frame in fig. 11 is a plurality of noise points, which are not actually the difference parts that can be observed by human eyes, and thus are interference items of the difference comparison process, and the part outlined in the (a) frame in fig. 11 needs to be processed by the method in this step.

The basic principle of the median filtering method is to replace the value of a point in an image with the median of the values of points in a neighborhood of the point, and to make the surrounding pixel values approach to the true value, so as to eliminate isolated noise points. The processing manner for each pixel point can be expressed as follows by the formula:

g(x,y)＝med{f(x-k,y-l),(k,l∈W)}

wherein f (x, y), g (x, y) are the original image and the processed image, respectively, and W is a two-dimensional template.

In the stage, snowflake noise of the image is eliminated by a median filtering method, a part of area with larger difference is reserved, and the overall median processing mode can be expressed as follows:

M _img ＝med(Sub _img )

wherein M is _img Represented as a median filter processed graph, med (Sub _img ) Represented as a median filtering process.

The post-median-filter-processing legend may be as shown in (b) of fig. 11, where noise points corresponding to positions of white boxes of (a) of fig. 11 have been eliminated by the above-mentioned median-filter processing (note that, in this embodiment, since two images are relatively close, gray-scale difference pixel values are relatively low, and are relatively difficult to observe, an image shown in (b) of fig. 11 is a gray-scale difference binarized image obtained by subjecting a gray-scale image of fig. 10 to median-filter processing and then to binarization processing, so as to illustrate an effect of noise point elimination in the gray-scale difference image, and in an actual processing, the binarization processing is temporarily not performed at this step.

S708, binarizing the contrast difference graph after the median filtering;

after the median filtering process, if the gray distribution of the two images is approximate, the obtained gray difference is lower, and the edge is not obvious enough from the visual angle. Therefore, the whole image is binarized to define the difference area part

This step can be expressed as:

T _img ＝Thresh(M _img ,T _MIN ,T _MAX )

wherein T is _img Representing the binarized picture; thresh (M) _img ,T _MIN ,T _MAX ) Represented as a binarization method; t (T) _MIN Represented as a lower limit of the binary gray value, i.e. a pixel below this gray value will be set to 0, in this case，T _MIN Taking 15; t (T) _MAX Expressed as upper limit of binary gray value, i.e. higher than T _MIN And is lower than T _MAX The gray value pixel of (1) is set to 255, T in this embodiment _MAX The value is taken as 255.

It can be understood that by the processing of this step, a binarized image as shown in the diagram (b) in fig. 11 can be obtained. As shown, the original noise in the white box is removed.

S710, calculating the maximum difference area;

finally, according to the binarized graph obtained in step S708, the contours in the image are obtained by a contour extraction algorithm, and the area of the region is calculated.

It should be noted that, in this step, the sum of all the area differences of the whole image is counted, which may cause difficulty in interpretation of the calculation result and in dividing the threshold criteria due to insignificant area interference. Therefore, taking the maximum difference area more in line with the human eye observation standard as the total calculation result, this step can be expressed as:

Contours＝FindContours(M _img )

Areas＝ContourArea(Contours)

Max_Area＝max(Areas)

Freeze＝Max_Area<FREEZE_AREA_THRESHOLD

Wherein contents represent the extracted Contours within the image, findContours (M _img ) Is a contour extraction algorithm; area is represented as a list of contour Areas, and ContourArea (Contours) is represented as a contour area calculation method; max_area is denoted as the maximum Area, and Max (Areas) is the maximum Area value obtained from the Area values of the plurality of difference Areas; freeze represents the final Freeze frame determination result of the two images, and FREEZE_AREA_THRESHOLD represents the Freeze frame determination AREA THRESHOLD.

Freeze_area_threshold is set to a value of 100, i.e. equivalent to the AREA value of 10px x 10px in the image, by the practical use scenario and the combined interpretability.

With the above embodiment of the present application, the target video frame pair is determined by performing the following operations on two adjacent video frames: determining at least one region of difference between the current pair of video frames; determining a target area corresponding to the target difference area from areas corresponding to the difference areas in at least one difference area, wherein the target area is larger than the areas of other difference areas except the target difference area in at least one difference area; determining the current difference degree corresponding to the current video frame pair based on the target area; under the condition that the current difference degree is smaller than or equal to a target threshold value, determining the current video frame pair as a target video frame pair, determining the difference degree according to the area of a difference area with the largest area, determining a target video frame for determining the cartoon video frame according to the difference degree, so as to realize the observation result of a simulated human eye, and performing accuracy check on target interaction information included in an interaction information stream, thereby improving the accuracy of the cartoon video frame check.

As an optional implementation manner, the determining, by using the target video frame pair, a stuck video frame in the live video includes:

s1, carrying out image scene recognition on video frames in a target video frame pair;

s2, under the condition that the identification result indicates that the video frames in the target video frame pair are video frames carrying the target image scene identification, acquiring the next target video frame pair;

and S3, determining the target video frame pair as a cartoon video frame under the condition that the video frame in the target video frame pair is indicated by the identification result and is not carried with the target image scene identification.

It should be noted that, when the target video frame pair is detected from the video frame sequence, it is further necessary to determine whether the "stuck" displayed in the target video frame pair is a non-dynamic image within the expected range. For example, in the case where the current live broadcast is a game live broadcast, a phenomenon of standing still for a long time occurs in part of the game scene, such as a ranking stage and a waiting stage in the game. As shown in fig. 12, in the game ranking stage, in the case where a teammate player is not matched, the game interface remains unchanged for a long time at the current interface. The scene is a non-dynamic scene image within the expected range, and should not be judged as a stuck phenomenon. Therefore, when the target video frame pair is acquired, the image scene in the target video frame pair needs to be further identified, and when the identified scene is identified as the expected target scene, the current video frame pair is further judged to be a non-stuck video frame; and determining the target video frame pair as a clamping video frame when the identified result indicates that the video frame in the target video frame pair is a video frame which does not carry the target image scene identification.

As an optional implementation manner, the identifying the image scene of the video frame in the target video frame pair includes: and carrying out scene recognition on the video frames in the target video frame pair in an image recognition model, wherein the image recognition model is a convolutional neural network obtained after training by using an image sample carrying a label, and the label is used for indicating that the image sample is an image carrying a target image scene identifier.

The above method will be described with reference to specific embodiments. After judging that the video has the suspected jam condition, further image classification judgment noise is needed because the jam condition which accords with the expectation possibly occurs. For example, in a live game service, a user will default to a static display state in a team of a graph, which accords with an expected service scenario, while a service party is more concerned about whether a picture of a game fight is stuck or not, as shown in (a) and (b) of fig. 3.

For the problem of image scene recognition, in this embodiment, after the corresponding live scenes are expected to be labeled, training of the CNN convolutional neural network model is performed in several scenes such as a game hall, team, fight, etc., the model structure is shown in fig. 13, and each layer of the model is designed as follows:

A. The RGB image is condensed to 128 x 128 resolution as an input to 128 x 3;

B. convolving the 3 x 3 convolution check image of the 16 channels to output a 128 x 16 matrix;

C. designing a 2 x 2 window to perform maximum pooling downsampling and outputting a matrix of 64 x 16;

D. convolving the 3 x 3 convolution check image of the designed 32 channels to output a 64 x 32 matrix;

E. designing a 2 x 2 window to perform maximum pooling downsampling and outputting a matrix of 32 x 32;

F. the convolution check image of 3 x 3 of the 64 channels is designed to carry out convolution to output a matrix of 32 x 64;

G. designing a 2 x 2 window to perform maximum pooling downsampling and outputting a matrix of 16 x 64;

H. flattening the last sampling layer, connecting the last sampling layer with the full-link layer, and changing the last sampling layer into 128-length unidimensional;

I. finally, the unidimensional vectors are fully linked into an output of length n (where n is different according to different live scenes).

And further, the CNN neural network model is used for identifying the image scene in the target video frame pair, and further processing and verification can be performed under the condition that the image scene in the target video frame pair is not a predicted scene.

According to the embodiment of the application, the image scene recognition is carried out on the video frames in the target video frame pair; acquiring a next target video frame pair under the condition that the identified result indicates that the video frame in the target video frame pair is the video frame carrying the target image scene identification; and under the condition that the video frames in the target video frame pair are indicated to be video frames which do not carry the target image scene identification by the identification result, determining the target video frame pair as a stuck video frame, thereby eliminating the stuck noise interference generated by the live broadcast picture in the expected scene and improving the identification accuracy of the live broadcast stuck video frame.

As an optional implementation manner, after determining the target video frame pair as the katon video frame, the method further includes:

s1, extracting key information in a cartoon video frame;

s2, searching service fault description information matched with the key information in a service abnormal database;

s3, prompting alarm information of the cartoon abnormality under the condition that service fault description information matched with the key information is not found.

It should be noted that, the key information in the click video frame may include, but is not limited to, a live broadcasting room identifier corresponding to the click video, a start time of the click event, and an end time of the click event, and the click event is uniquely identified by the key information. In the case that the above-mentioned stuck event is obtained, in this embodiment, further query is further required in the traffic anomaly database based on the key information of the above-mentioned stuck event to determine whether the current stuck event is a reported traffic failure event, and if no service failure description information matching with the key information is found, the alarm information of the stuck anomaly is prompted, so as to avoid repeated reporting of the stuck failure event.

Specifically, after the data is reported and filtered, the obtained katon sample is basically a sample which is very worth of manual analysis. However, under the existing data reporting system of the service itself, the abnormal cause of the katon sample may be captured by the data reporting system. In this step, the key information of the katon sample is used to retrieve the abnormal data of the business to determine whether the katon sample is known.

At this stage, all the jammed samples are unknown abnormal samples which are not found by the data reporting system, and all the parties can be informed to perform manual analysis by actively alarming.

Through the embodiment of the application, key information in the cartoon video frame is extracted; searching service fault description information matched with the key information in a service abnormal database; under the condition that the service fault description information matched with the key information is not found, the method for prompting the alarm information of the jam abnormality is adopted, so that the jam event which is not detected by the known detection system is accurately obtained, manual analysis is carried out based on the detected unknown jam event, and further the detection and analysis effects of the jam event are improved.

As an optional implementation manner, the extracting the target video frame sequence associated with the target interaction information from the live video includes:

S1, acquiring release time of target interaction information;

s2, intercepting a target video segment from the live video according to the release time;

s3, extracting a target video frame sequence from the target video segment according to a preset extraction frequency.

It can be appreciated that in this embodiment, a video segment suspected of having a clip may be located according to the release time of the target interaction information, and then the target video frame sequence may be extracted from the target video segment according to a preset extraction frequency in the video segment.

Specifically, in this embodiment, according to the sending time of the target interaction information, a target video segment corresponding to the sending time may be determined, and a target video frame sequence obtained by sampling from the target video segment according to a certain sampling frequency may be determined. For example, a sequence of target video frames may be sampled from a target video segment at a sampling rate of 1 frame per second. In this embodiment, the sampling period may be determined according to the observation capability of the human eye, for example, the human eye is sensitive to a picture that is unchanged for 0.3 seconds, and the picture is easily judged as stuck based on the picture that is unchanged for 0.3 seconds, so that the sampling period for acquiring the video frame sequence may be determined according to the judging period of the human eye, that is, the target video frame sequence may be determined from the target video frame sequence according to the sampling period of 0.3 s/frame. And further verifying the real situation of the target interaction information according to the target video frame sequence acquired according to the sampling rate sensitive to human eyes.

According to the embodiment of the application, the release time of the target interaction information is obtained; intercepting a target video segment from the live video according to the release time; and extracting a target video frame sequence from the target video fragments according to a preset extraction frequency, and further accurately positioning the video frame sequence corresponding to the video fragments suspected to be blocked according to the release time of the target interaction information, thereby improving the positioning accuracy of the blocked video.

As an optional implementation manner, the capturing the target video segment from the live video according to the release time includes:

s1, taking a reference time before the release time of target interaction information as a clamping start time, wherein the time interval between the reference time and the release time is smaller than or equal to a target interval threshold;

s2, determining a target time period by using the cartoon start time and the release time;

s3, intercepting a target video segment in a target time period from the live video.

Optionally, taking the reference time before the release time of the target interaction information as the katon start time includes: determining the release time interval of two adjacent item target interaction information under the condition that the target interaction information is at least two item target interaction information; aggregating the target interaction information with the release time interval smaller than the target time interval until at least one interaction information sequence is obtained; the first item in each interactive information sequence marks the reference time before the release time of the interactive information and is used as the cartoon start time of the interactive information sequence;

Optionally, determining the target time period using the katon start time and the release time includes: taking the release time of the last item of label interaction information in each interaction information sequence as the release time of the interaction information sequence; and determining the target time period by using the cartoon start time of the interactive information sequence and the release time of the interactive information sequence.

It can be understood that in the above method, when the number of the obtained target interaction information is one, the sending time of the interaction information may be traced back to obtain the estimated duration of the click, and the sending time of the interaction information is taken as the estimated duration of the click, and the video segment between the estimated duration of the click and the estimated duration of the click is taken as the target video segment. Further, the corresponding suspected stuck event can be checked according to the target video clip;

under the condition that the number of the acquired target interaction information is multiple, clustering operation can be carried out on the multiple interaction information to acquire a target video segment of the multiple interaction information, and corresponding suspected katon events are checked according to the video segment. Optionally, when the time interval of the two pieces of target interaction information is smaller than the first preset clustering threshold, the two pieces of target interaction information are clustered, the estimated start time of the katon event is determined according to the sending time of the first piece of interaction information, and the sending time of the second piece of interaction information is used as the estimated end time of the katon event. In another manner, the multi-item target interaction information of the second preset clustering threshold may be clustered when the time period in which the multi-item target interaction information is located is smaller than the second preset clustering threshold, the estimated start time of the katon event is determined according to the transmission time of the first interaction information, and the transmission time of the last interaction information is used as the estimated end time of the katon event.

Specifically, in the case that the interactive information is bullet screen information in the live broadcast picture, according to characteristics of bullet screen feedback, two identification information including live broadcast room identification and feedback time are provided, so that two or more bullet screens with similar time can be used as a task to detect, detection efficiency is improved, and calculation amount is reduced. As shown in fig. 15, in the case where the transmission time of the first bullet screen is "2022-07-07 15:49:19", the detection start time may be further taken as 30s before speaking, that is, "2022-07-07 15:48:49" is taken as the detection start time; in the case where the transmission time of the last bullet screen is "2022-07-07 15:50:30", the transmission time of the last bullet screen may be taken as the end time of the detection stream segment.

Further, a barrage analysis task is formed after barrage aggregation, and playback streams in a period of time of the live broadcasting room before the occurrence of the jamming feedback can be obtained through the key information according to three key information of the live broadcasting room identification, the jamming estimated starting time and the jamming feedback time carried in the barrage. And then the analysis operation of the video frames is performed with respect to the playback stream.

The method of the application is used for acquiring the release time of the target interaction information; intercepting a target video segment from the live video according to the release time; and accurately acquiring the suspected cartoon video segments matched with the interactive information according to a mode of extracting the target video frame sequence from the target video segments at a preset extraction frequency, and performing task clustering on the interactive information with the transmission time close to that of the suspected cartoon video segments, so that the verification efficiency of the suspected cartoon video segments is improved.

As an optional implementation manner, after obtaining the interactive information stream fed back to the live video, the method further includes:

s1, under the condition that a target character is determined from an interactive information stream, acquiring a target word associated with the target character in the interactive information stream;

s2, under the condition that the target word is not a reference word in the noise word stock, determining the interaction information of the target word as target interaction information, wherein each reference word included in the noise word stock carries a target character, and the semantics of the reference word are not used for indicating a picture clamping event.

Specifically, when the interactive information is a live broadcast barrage, after the live broadcast barrage stream is acquired, although the barrage in the live broadcast room is quite many, the semantics of the barrage is quite definite, and the keyword of the card is contained in most cases, so that the live broadcast barrage stream can be subjected to character recognition, and when the barrage containing the character of the card is recognized, the following processing can be performed according to the barrage.

However, merely filtering the keyword "card" still has a portion of the barrage that is significantly semantically inconsistent, such as "bankcard", "membership card", etc. It is therefore necessary to filter the detected characters containing the character "card" words by means of a lexicon. In this embodiment, the word stock is accumulated after a period of manual analysis, the recognition accuracy is continuously improved, and the accumulation degree of the word stock in the current scheme can be operated in the production environment with high efficiency.

For example, in the case of a barrage content of "is stuck? Under the conditions of "," how is blocked "," why is blocked ", the content can be identified as target word; and under the condition that the barrage content is 'can do member cards', the barrage is filtered, and a katon verification task is not created for the barrage.

According to the embodiment of the application, under the condition that the target character is determined from the interactive information stream, the target word associated with the target character in the interactive information stream is obtained; under the condition that the target word is not a reference word in the noise word bank, determining the interaction information of the target word as target interaction information, wherein each reference word included in the noise word bank carries target characters, and the semantics of the reference word are not used for indicating picture stuck events, so that noise information in the interaction information for indicating the stuck events is filtered, and the searching accuracy of the stuck video frames is improved.

A complete embodiment of the present application is described below in conjunction with the flowcharts shown in fig. 16 and 17.

First, an interaction scenario in this embodiment will be described. In this embodiment, as shown in fig. 16, there may be one anchor side and a plurality of viewer sides, and a first server for processing live data by a user and a second server for performing a click-on detection. In this embodiment, the main playing end may play the live game and send the game operation information to the first server, where a game application consistent with the running condition of the main playing end is running, and according to the game operation information sent by the main playing end, the operation behavior of the main playing end is further simulated, and further, a game operation picture of the main playing end is generated in the first server. Meanwhile, the first server receives the barrage information sent by each audience terminal, and further synthesizes the generated game operation pictures and the barrage information to obtain live broadcast stream data, and sends the live broadcast stream data to each audience terminal and the second server.

That is, in the present embodiment, instead of recording a game screen currently displayed by the anchor, live stream data is generated, and live stream data is generated in the server according to the operation behavior of the anchor. Therefore, there is a large difference between the data sent by the anchor end and the data received by the audience end, so that the actual jamming scene of the audience end cannot be obtained only by the jamming detection of the data sent by the anchor end. For example, if a certain anchor push is jammed, but there is no related abnormal index, the slave data side cannot sense whether the anchor is jammed, but the actual audience experience is that the anchor is jammed. And in certain business scenarios, the achieved effect is not good. In this embodiment, if a functional error beyond expectations occurs in the first server through the instruction restore program, the blocking is caused, the live broadcast screen recording and stream pushing link cannot be perceived, and still the blocked screen is recorded and pushed to the audience. Thus, since the live stream data of the viewer side is generated by the first server, if the operation information transmitted from the main cast side is only detected as a stuck event, the stuck event of the live stream data generated by the first server cannot be known. Therefore, in the present embodiment, the detection of the katon event is performed according to the live stream data sent to the viewer, so that the accuracy of the detection can be improved.

As shown in fig. 17, a complete stuck discovery method in the above scenario is described. The embodiment can be used for detecting the upstream jamming of the main broadcasting end of the live broadcasting scene, and the whole flow comprises the main links of jamming text analysis and processing, stream inquiry and stream jamming analysis, effective scene classification, abnormal data inquiry association and effective jamming alarm.

Step S1702 is included in the katon barrage processing link, and a large katon barrage in a live broadcasting room is filtered; and step S1704, a cartoon aggregation.

It will be appreciated that this step may be performed in the second server in fig. 16. Under the condition that live stream data (live stream data comprise bullet screen information) sent by the first server are received, the second server can strip bullet screen information of each live broadcasting room from live broadcasting pictures, and aggregate and filter the live broadcasting pictures. In a large number of barrages in a large disc living broadcast room, the barrages containing the keyword of 'card' are filtered, and further semantic exclusion is carried out by combining with a katon thesaurus, such as noise keywords of 'bank card' and 'membership card'.

After obtaining the barrage containing the character 'card' and showing that the barrage is 'blocking' in the true meaning, task aggregation processing can be carried out on a plurality of barrages with close release time, so that the detection efficiency is improved, and the calculated amount is reduced.

In the stream inquiry link, mainly comprising step S1706, retrieving playback stream according to the Kantin time point;

after the aggregation into a stream detection task from the barrage, several elements of the intercepted stream fragments can be obtained: live room identification, stream segment start time, stream segment end time. According to the three elements, the stream fragments to be analyzed can be obtained by splicing the play address parameters. This step corresponds to converting from literal analysis to stream analysis.

As shown in fig. 17, when the large disc barrage includes "how the live broadcasting room B-20220707 15:50:00 is blocked" and "how the live broadcasting room F-20220707 15:51:28 is blocked??" the corresponding live broadcasting playback stream is further obtained according to the live broadcasting room identification information and barrage time information of the two barrages, and further the subsequent katon verification process is performed.

In the stream stuck detection link, mainly comprising step S1708, stuck detection is performed according to frame level; and step S1710, judging whether the stream is jammed, executing the subsequent steps if the playback stream is jammed, and determining the sample corresponding to the video frame as an invalid analysis sample if the playback stream is not jammed.

In this embodiment, the frame identification granularity is currently selected to be 1 frame per second, and this selection is based on the calculation amount after actually putting into the service scene and the consideration of the detection effects of different granularities. After the frame granularity is determined, the further adopted stream-blocking recognition method is to detect the largest difference area between two frames to judge whether blocking occurs between two adjacent frames, and the method not only accords with the visual characteristics of human eyes for judging two images, but also has the advantage that the threshold value is easy to divide. The maximum difference area algorithm mainly comprises the following steps: the method comprises the steps of contrast enhancement, pixel difference calculation, median filtering treatment, binarization treatment, difference area calculation and maximum difference area threshold judgment;

In the image classification link, the method includes step S1712, CNN model classification of a stuck effective scene and judgment step S1714, judging whether the flow is stuck, executing the subsequent steps if the current frame is not a target scene, and determining the sample corresponding to the video frame as an invalid analysis sample if the current frame is a target scene.

Specifically, after the card is identified, noise may still exist at this time, and some card scenes may be consistent with the service expectations. For example, in a live game scenario, some non-combat live scenes may not change over a period of time. Thus, at this stage, an image classification is made to distinguish the marked, expected scenes, and the scene of interest of the card, such as a fight scene in a live game, is filtered out.

In the data reporting and filtering link, step S1716 is mainly included to determine whether the current frame is a known anomaly, and if not, execute the subsequent steps, and if the current frame is a known anomaly, determine the sample corresponding to the video frame as an invalid analysis sample.

As shown in fig. 14, after the previous steps, the katon sample filtered to this stage is basically of human analysis value. But the business basically has the capability of reporting and analyzing data, and the cartoon scene is likely to have abnormal data reporting. Therefore, according to the stuck stream information, inquiring the abnormal data of the service party, and checking whether the corresponding abnormal data is reported before and after the stuck time period. If the result is found, marking the stuck sample as known abnormal, otherwise, alarming to enter a manual analysis stage. The interface for the alarm in this embodiment may be as shown in fig. 18.

And finally, alarming and platform displaying are executed, filtering is carried out until the last stage, and the fact that the cartoon sample is abnormal which is not found in data reporting is confirmed, so that the sample has an analysis value, can actively alarm a notice attention person to carry out timely manual analysis, and has a detailed information aggregation entry for the attention person to check playback flow, check cartoon detection pictures according to frames, an abnormal information jump entry and the like.

The technical effects of the above embodiment of the present application will be described below with reference to fig. 19. As shown in fig. 19, the present technical solution starts from user katon barrage feedback, and obtains a truly effective unknown katon sample through a series of katon detection, scene classification, and data filtering. Compared with a pure data reporting system, the scheme can find unexpected jamming conditions and form complementation of a problem finding path; compared with a method for purely manually analyzing the jamming feedback of the user, the scheme has the advantages that the labor cost is greatly reduced, and the labor cost is only input into the truly effective jamming samples.

The problems and benefits found based on this approach for a certain month are shown in fig. 19. It can be seen that in a large number of feedback of this kind of article, after this technical scheme handles, in 22339 katon feedback, 19 really effectual unknown katon scenes have been found, and the manpower cost that compares the pure manual analysis katon barrage flow and accomplish nearly thousand times reduces.

According to the embodiment of the application, starting from the bullet screen feedback of the user, the real effective jamming problem is found by combining bullet screen character analysis, picture jamming detection, image classification, data filtering and other means, so that the manual review cost is greatly reduced, the jamming video frame with jamming can be accurately determined, and the technical problem that the jamming video frame found by the existing method is low in efficiency and inaccurate is solved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In accordance with another aspect of an embodiment of the present application, the application also provides a device for positioning the video frames of the clamping and stopping method for implementing the video frames of the clamping and stopping method. As shown in fig. 20, the apparatus includes:

The first obtaining unit 2002 is configured to obtain an interaction information stream fed back to the live video, where the interaction information stream includes a plurality of pieces of interaction information sent by an account number of a viewer watching the live video;

an extracting unit 2004, configured to extract, when the target interaction information is determined in the interaction information stream, a target video frame sequence associated with the target interaction information from the live video, where the target interaction information carries a keyword for describing that the live video is stuck;

a second obtaining unit 2006, configured to obtain a target video frame pair in which a difference degree between every two adjacent video frames in the target video frame sequence is less than or equal to a target threshold, where the difference degree is determined based on an area of a difference region between the two adjacent video frames;

the determining unit 2008 is configured to determine a stuck video frame in the live video, where the stuck video frame occurs, by using the target video frame pair.

Alternatively, in this embodiment, the embodiments to be implemented by each unit module may refer to the embodiments of each method described above, which are not described herein again.

According to still another aspect of the embodiment of the present invention, there is further provided an electronic device for implementing the above-mentioned method for positioning a katon video frame, where the electronic device may be a terminal device or a server as shown in fig. 21. The present embodiment is described taking the electronic device as a terminal device as an example. As shown in fig. 21, the electronic device comprises a memory 2102 and a processor 2104, the memory 2102 having stored therein a computer program, the processor 2104 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring an interaction information stream fed back to a live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video;

s2, extracting a target video frame sequence associated with target interaction information from the live video under the condition that the target interaction information is determined in the interaction information stream, wherein the target interaction information carries keywords for describing the occurrence of blocking of the live video;

s3, acquiring a target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between the two adjacent video frames;

s4, determining a stuck video frame with stuck in the live video by utilizing the target video frame pair.

Alternatively, as will be appreciated by those skilled in the art, the structure shown in fig. 21 is merely illustrative, and the electronic device may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, or other terminal devices. Fig. 21 is not limited to the structure of the electronic device and the electronic apparatus described above. For example, the electronics can also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 21, or have a different configuration than shown in FIG. 21.

The memory 2102 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for positioning a katon video frame in the embodiment of the present invention, and the processor 2104 executes the software programs and modules stored in the memory 2102, thereby executing various functional applications and data processing, that is, implementing the method for positioning a katon video frame. Memory 2102 may include high speed random access memory, but may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory 2102 may further include memory located remotely from the processor 2104, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 2102 may be used to store, but is not limited to, information such as, for example, elements in a scene picture, positioning information of a cartoon video frame, and the like. As an example, as shown in fig. 21, the memory 2102 may include, but is not limited to, a first acquisition unit 2002, an extraction unit 2004, a second acquisition 2006, and a determination unit 2008 in a positioning apparatus including the above-described katon video frame. In addition, other module units in the above-mentioned positioning device for the video frames may be included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 2106 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 2106 includes a network adapter (Network Interface Controller, NIC) that can be connected to other network equipment and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 2106 is a Radio Frequency (RF) module for communicating wirelessly with the internet.

In addition, the electronic device further includes: a display 2108 for displaying virtual scenes in the interface; and a connection bus 2110 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.

According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. When executed by a central processing unit, performs various functions provided by embodiments of the present application.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

According to an aspect of the present application, there is provided a computer readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, causing the computer device to perform the above-described method of locating a stuck video frame.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for performing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the above-described method of the various embodiments of the present invention.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the above, is merely a logical function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, and such changes and modifications are intended to be included within the scope of the invention.

Claims

1. A method for locating a stuck video frame, comprising:

acquiring an interaction information stream fed back to a live video, wherein the interaction information stream comprises a plurality of pieces of interaction information sent by an audience account for watching the live video;

extracting a target video frame sequence associated with target interaction information from the live video under the condition that the target interaction information is determined in the interaction information stream, wherein the target interaction information carries keywords for describing that the live video is stuck;

Acquiring a target video frame pair with the difference degree between every two adjacent video frames in the target video frame sequence being smaller than or equal to a target threshold value, wherein the difference degree is determined based on the area of a difference region between two adjacent video frames;

and determining a stuck video frame which is stuck in the live video by utilizing the target video frame pair.

2. The method of claim 1, wherein the obtaining the target video frame pair having a difference between every two adjacent video frames in the target video frame sequence less than or equal to a target threshold comprises:

sequentially taking every two adjacent video frames in the target video frame sequence as a current video frame pair to be compared, and executing the following operations:

determining at least one region of difference between the current pair of video frames;

determining a target area corresponding to a target difference area from areas corresponding to each difference area in the at least one difference area, wherein the target area is larger than the areas of other difference areas except the target difference area in the at least one difference area;

determining the current difference degree corresponding to the current video frame pair based on the target area;

And determining the current video frame pair as the target video frame pair under the condition that the current difference degree is smaller than or equal to the target threshold value.

3. The method of claim 2, wherein said determining at least one region of difference between the current pair of video frames comprises:

acquiring first pixel data corresponding to pixel points in a first video frame and second pixel data corresponding to pixel points in a second video frame in the current video frame pair;

comparing the first pixel data with the second pixel data to obtain a difference reference map, wherein the difference reference map is used for indicating pixel differences between pixel points in the first video frame and pixel points in the second video frame;

and carrying out contour extraction on the difference reference graph to obtain the at least one difference region in the difference reference graph.

4. A method according to claim 3, further comprising, after said comparing said first pixel data and said second pixel data to obtain a difference reference map:

sequentially taking each pixel point in the difference reference map as a current pixel point, and executing the following operations: determining a neighborhood pixel set corresponding to the current pixel point, wherein the distance between each pixel point in the neighborhood pixel set and the current pixel point is smaller than a distance threshold;

Determining a median value of pixel values matched with the neighborhood pixel set according to the pixel values of all the pixel points in the neighborhood pixel set;

and taking the median value of the pixel values as the pixel value of the current pixel point.

5. The method of claim 1, wherein determining, with the target video frame pair, a stuck video frame in the live video that is stuck comprises:

performing image scene recognition on the video frames in the target video frame pair;

acquiring the next target video frame pair under the condition that the identified result indicates that the video frame in the target video frame pair is the video frame carrying the target image scene identifier;

and determining the target video frame pair as the stuck video frame under the condition that the video frame in the target video frame pair is indicated by the identification result and is not carried with the target image scene identification.

6. The method of claim 5, further comprising, after said determining said target video frame pair as said stuck video frame:

extracting key information in the cartoon video frame;

searching service fault description information matched with the key information in a service abnormal database;

And prompting alarm information of the cartoon abnormality under the condition that the service fault description information matched with the key information is not found.

7. The method of claim 5, wherein said image scene recognition of video frames in said target video frame pair comprises:

and carrying out scene recognition on the video frames in the target video frame pair in an image recognition model, wherein the image recognition model is a convolutional neural network obtained after training by using an image sample carrying a label, and the label is used for indicating that the image sample is an image carrying a target image scene identifier.

8. The method of claim 5, wherein extracting a sequence of target video frames associated with the target interaction information from the live video comprises:

acquiring the release time of the target interaction information;

intercepting a target video segment from the live video according to the release time;

and extracting the target video frame sequence from the target video segment according to a preset extraction frequency.

9. The method of claim 8, wherein said capturing a target video clip from said live video according to said release time comprises:

Taking a reference time before the release time of the target interaction information as a cartoon start time, wherein the time interval between the reference time and the release time is smaller than or equal to a target interval threshold value;

determining a target time period by using the click start time and the release time;

and intercepting the target video segment in the target time period from the live video.

10. The method of claim 9, wherein the step of determining the position of the substrate comprises,

the step of taking the reference time before the release time of the target interaction information as the click-on start time comprises the following steps: determining the release time interval of two adjacent item target interaction information under the condition that the target interaction information is at least two item target interaction information; aggregating the target interaction information of which the release time interval is smaller than the target time interval until at least one interaction information sequence is obtained; taking the reference time before the release time of the first piece of target interaction information in each interaction information sequence as the cartoon start time of the interaction information sequence;

the determining the target time period by using the katon starting time and the release time comprises the following steps: taking the release time of the last piece of target interaction information in each interaction information sequence as the release time of the interaction information sequence; and determining the target time period by using the blocking start time of the interactive information sequence and the release time of the interactive information sequence.

11. The method according to any one of claims 1 to 10, further comprising, after said obtaining an interactive information stream for live video feedback:

under the condition that a target character is determined from the interactive information stream, acquiring a target word associated with the target character in the interactive information stream;

and under the condition that the target word is not a reference word in a noise word stock, determining the interaction information of the target word as the target interaction information, wherein each reference word included in the noise word stock carries the target character, and the semantics of the reference word are not used for indicating a picture katon event.

12. A device for locating a stuck video frame, comprising:

the system comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring an interaction information stream fed back to a live video, and the interaction information stream comprises a plurality of pieces of interaction information sent by an account number of a viewer watching the live video;

the extraction unit is used for extracting a target video frame sequence associated with the target interaction information from the live video under the condition that the target interaction information is determined in the interaction information stream, wherein the target interaction information carries keywords for describing the occurrence of the blocking of the live video;

A second obtaining unit, configured to obtain a target video frame pair in which a difference degree between every two adjacent video frames in the target video frame sequence is smaller than or equal to a target threshold, where the difference degree is determined based on an area of a difference region between two adjacent video frames;

and the determining unit is used for determining the stuck video frame which is stuck in the live video by utilizing the target video frame pair.

13. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any one of claims 1 to 11.

14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 11.

15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 11 by means of the computer program.