CN113112480B

CN113112480B - Video scene change detection method, storage medium and electronic device

Info

Publication number: CN113112480B
Application number: CN202110409501.5A
Authority: CN
Inventors: 付卫兴; 宋君; 任必为
Original assignee: Beijing Vion Intelligent Technology Co ltd
Current assignee: Beijing Vion Intelligent Technology Co ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2024-03-29
Anticipated expiration: 2041-04-16
Also published as: CN113112480A

Abstract

The invention provides a video scene change detection method, a storage medium and electronic equipment, wherein the video scene change detection method comprises the following steps: after a standard frame image of a dynamic video is obtained, semantic segmentation is carried out, a semantic segmentation result is obtained, at least one pixel segmentation area is selected as a standard comparison area, and a weight is given to each pixel point in the standard comparison area; semantic segmentation is carried out after the current frame image of the dynamic video is obtained; a target comparison area is defined in a semantic segmentation result diagram of the current frame image; carrying out category attribute information comparison on pixel points at the same position in the standard comparison area in a one-to-one correspondence manner; obtaining a matching similarity score Mp of the target comparison region and the standard comparison region; and judging whether the video scene corresponding to the dynamic video is transformed or not according to the matching similarity score. The method solves the problems of poor detection precision and weak robustness of the video scene change detection method in the prior art.

Description

Video scene change detection method, storage medium and electronic device

Technical Field

The present invention relates to the field of video image analysis technologies, and in particular, to a video scene change detection method, a storage medium, and an electronic device.

Background

Scene change is a phenomenon that video scenes change in a continuous video stream, and the phenomenon is usually caused by rotation of a camera due to man-made or other external factors, so that a monitoring area deviates from a designated monitoring area, and therefore, a video scene change detection technology for identifying the video scene change is actively applied to popularization.

The existing means for detecting video scene change include: for example, the number of the cells to be processed,

1. the method has the defects that the threshold range definition is greatly influenced by different scenes and different weather conditions, the unreasonable threshold range definition is easy to occur, and therefore, the method has the conditions of use limitation and inaccurate judgment result;

2. calculating robust features of different scenes to convert the robust features into offset, further obtaining the offset degree of scene images to judge whether the video scene changes, and the method has the condition that the robust features are calculated inaccurately due to the fact that the calculation result of the robust features is greatly influenced by weather conditions, so that the accuracy of the judgment result of whether the video scene changes is influenced;

3. the method comprises the steps of comparing the image characteristics of a preset scene image with the image of a scene to be identified, quantifying the comparison result, and further judging whether a video scene changes, wherein the final judgment result is increased in error probability and poor in robustness due to uncertainty in selecting the salient target region;

4. by calculating the similarity of the spatial histograms of the scene images, whether the two frames of scene images belong to the same scene is judged according to the threshold value, the problem that the use sensitivity is poor because the threshold value is not easy to accurately determine due to the influence of illumination exists, and the accuracy of a final judging result is poor.

In summary, the existing video scene change detection method has the problems of poor detection precision and weak robustness, so how to provide a video scene change detection method with high detection precision, accuracy and good robustness for video scene change becomes a problem to be solved in the prior art.

Disclosure of Invention

The invention mainly aims to provide a video scene change detection method, a storage medium and electronic equipment, so as to solve the problems of poor detection precision and weak robustness of the video scene change detection method in the prior art.

To achieve the above object, according to one aspect of the present invention, there is provided a video scene change detection method including: step S1, obtaining a standard frame image of a dynamic video, and performing semantic segmentation on the standard frame image to obtain a semantic segmentation result, wherein the semantic segmentation result comprises: displaying a semantic segmentation result diagram of at least one pixel segmentation area corresponding to at least one segmented static object in the standard frame image and category attributes corresponding to each pixel segmentation area; s2, selecting at least one pixel segmentation area as a standard comparison area, and giving a weight to each pixel point in the standard comparison area; step S3, obtaining a current frame image of the dynamic video, and carrying out semantic segmentation on the current frame image to obtain a semantic segmentation result graph of the current frame image; a target comparison area corresponding to the position of the standard comparison area in the semantic segmentation result diagram of the standard frame image is defined in the semantic segmentation result diagram of the current frame image; step S4, traversing all pixel points in the target comparison area, and comparing category attribute information with the pixel points at the same position in the standard comparison area in a one-to-one correspondence manner; step S5, according to the category attribute information comparison result of each pixel point and the weight value of each pixel point in the standard comparison area, a matching similarity score Mp of the target comparison area and the standard comparison area is obtained by using a matching similarity score calculation formula; the matching similarity score calculation formula is:wherein n is the number of pixel points in the target comparison area; when the comparison result of the category attribute information of the pixel points is the same, R' _n ＝R _n ，R _n The weight of the pixel point in the standard comparison area corresponding to the pixel point is obtained; when the category attribute information comparison results of the pixel points are different, R' _n =0; and S6, when the matching similarity score Mp is smaller than a scene transformation threshold value Thr, the video scene corresponding to the dynamic video is judged to be transformed, and when the matching similarity score Mp is larger than or equal to the scene transformation threshold value Thr, the video scene is judged to be unchanged.

Further, the semantic segmentation result graph includes a plurality of pixel segmentation regions, and in step S2, the static objects corresponding to the plurality of pixel segmentation regions are ranked according to their stability scores, and the one with the highest stability score among the plurality of pixel segmentation regions is used as the standard comparison region.

Further, the stability score of a static object is inversely proportional to the area size of its corresponding pixel segmentation area.

Further, in step S1, performing semantic segmentation on the standard frame image includes screening out all the segmented dynamic objects in the standard frame image.

Further, the weight assigned to each pixel point in the standard comparison area is the same normalized weight, and the normalized weight is greater than the scene transformation threshold Thr.

Further, the category attribute corresponding to each pixel division area is calibrated by a number.

Further, after each interval time T or each interval Q frame image, obtaining a current frame image of the dynamic video, performing the operations of step S1 to step S6, and updating the weight and/or the category attribute of the pixel point of the standard comparison area of the standard frame image according to the judging result of the video scene change.

Further, when it is determined that the video scene is unchanged, the weight of each pixel point in the standard comparison area of the standard frame image, which is the same as the category attribute information comparison result in the target comparison area of the current frame image, is increased by r, namely: r is R _n +r, wherein the weight of each pixel point with different category attribute information comparison results is unchanged; or when the video scene is judged to be transformed, the weight of all pixel points in the standard comparison area of the standard frame image is reduced by r, namely: r is R _n -R, and when R _n -r is equal to or less than Ti, wherein Ti is a critical threshold; and updating the category attribute of the standard comparison area to be the category attribute in the target comparison area of the current frame image.

Further, in the initial state, the weights R of all the pixel points in the standard comparison area of the standard frame image _n The total number of the components is 0.5,and 0 is＜Ti≤0.1。

According to another aspect of the present invention there is provided a storage medium, the storage medium being a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions when executed by a processor are adapted to carry out the steps of the video scene change detection method described above.

According to another aspect of the present invention, there is provided an electronic apparatus including: the device comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element are communicated with each other through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the steps of the video scene change detection method.

By applying the technical scheme, the pixel segmentation area and the semantics of the static object in the standard frame image of the dynamic video are acquired in a semantic segmentation mode, wherein the semantics comprise category attributes corresponding to the pixel segmentation area, and then the preferential selection and the standard comparison area is selected from the pixel segmentation area by referring to the category attribute stability of the static object; the standard comparison area is used for comparing the information of the pixel level with the target comparison area of the current frame image of the dynamic video, the matching similarity score Mp between the standard comparison area and the target comparison area can be obtained, namely the overlapping area between the standard comparison area and the target comparison area can be obtained, the similarity of the static object corresponding to the standard comparison area of the current frame image and the target comparison area of the standard frame image is determined by integrating the category attribute stability of the static object and the matching similarity score Mp, and whether the video scene changes is accurately determined according to the comparison of the matching similarity score Mp and the scene transformation threshold value Thr; therefore, the method is very sensitive to the perception of video scene transformation, and greatly improves the accuracy of video scene transformation detection; based on the characteristic of high robustness of semantic segmentation, the model for processing the standard frame image and the current frame image is trained to be mature, and the output result is not influenced by factors such as scenes, weather conditions or illumination, so that the stability and reliability of video scene change detection are greatly improved, and the video scene change detection has higher robustness.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 illustrates a flow chart of a video scene change detection method according to an alternative embodiment of the invention;

FIG. 2 illustrates a standard frame image of a dynamic video acquired in a video scene change detection method embodying an alternative embodiment of the present invention;

fig. 3 shows a semantic segmentation result diagram obtained after semantic segmentation of the standard frame image of fig. 2 in a video scene change detection method implementing an alternative embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," "includes," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

In order to solve the problems of poor detection precision and weak robustness of the video scene change detection method in the prior art, the invention provides a video scene change detection method, a storage medium and electronic equipment; wherein the storage medium is a computer readable storage medium having stored thereon computer program instructions, wherein the program instructions when executed by a processor are adapted to carry out the steps of the video scene change detection method described above and below; an electronic device includes: the device comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element are communicated with each other through the communication bus; the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the video scene change detection method described above and below.

Fig. 1 is a flow chart of an alternative embodiment video scene change detection method. As shown in fig. 1, the method includes:

step S1, obtaining a standard frame image of a dynamic video, and performing semantic segmentation on the standard frame image to obtain a semantic segmentation result, wherein the semantic segmentation result comprises: displaying a semantic segmentation result diagram of at least one pixel segmentation area corresponding to at least one segmented static object in the standard frame image and category attributes corresponding to each pixel segmentation area;

s2, selecting at least one pixel segmentation area as a standard comparison area, and giving a weight to each pixel point in the standard comparison area;

step S3, obtaining a current frame image of the dynamic video, and carrying out semantic segmentation on the current frame image to obtain a semantic segmentation result graph of the current frame image; a target comparison area corresponding to the position of the standard comparison area in the semantic segmentation result diagram of the standard frame image is defined in the semantic segmentation result diagram of the current frame image;

step S4, traversing all pixel points in the target comparison area, and comparing category attribute information with the pixel points at the same position in the standard comparison area in a one-to-one correspondence manner;

step S5, according to the category attribute information comparison result of each pixel point and the weight value of each pixel point in the standard comparison area, a matching similarity score Mp of the target comparison area and the standard comparison area is obtained by using a matching similarity score calculation formula; the matching similarity score calculation formula is:wherein n is the number of pixel points in the target comparison area; when the comparison result of the category attribute information of the pixel points is the same, R' _n ＝R _n ，R _n The weight of the pixel point in the standard comparison area corresponding to the pixel point is obtained; when the category attribute information comparison results of the pixel points are different, R' _n ＝0；

And S6, when the matching similarity score Mp is smaller than a scene transformation threshold value Thr, the video scene corresponding to the dynamic video is judged to be transformed, and when the matching similarity score Mp is larger than or equal to the scene transformation threshold value Thr, the video scene is judged to be unchanged.

Acquiring a pixel segmentation area and semantics of a static object in a standard frame image of a dynamic video in a semantic segmentation mode, wherein the semantics comprise class attributes corresponding to the pixel segmentation area, and selecting a preferable extraction standard comparison area from the pixel segmentation area by referring to the stability of the class attributes of the static object; the standard comparison area is used for comparing the information of the pixel level with the target comparison area of the current frame image of the dynamic video, the matching similarity score Mp between the standard comparison area and the target comparison area can be obtained, namely the overlapping area between the standard comparison area and the target comparison area can be obtained, the similarity of the static object corresponding to the standard comparison area of the current frame image and the target comparison area of the standard frame image is determined by integrating the category attribute stability of the static object and the matching similarity score Mp, and whether the video scene changes is accurately determined according to the comparison of the matching similarity score Mp and the scene transformation threshold value Thr; therefore, the method is very sensitive to the perception of video scene transformation, and greatly improves the accuracy of video scene transformation detection; based on the characteristic of high robustness of semantic segmentation, the model for processing the standard frame image and the current frame image is trained to be mature, and the output result is not influenced by factors such as scenes, weather conditions or illumination, so that the stability and reliability of video scene change detection are greatly improved, and the video scene change detection has higher robustness.

It should be noted that, after the standard frame image is processed by the trained semantic segmentation model, a pixel segmentation area is obtained and is a pixel communication area with the same category attribute, for example, in the standard frame image in fig. 2, roads, lane identification lines, isolation belts, identification bars, identification plates, vehicles and the like are identified, which respectively correspond to category attributes of different objects, while a plurality of pixel segmentation areas in fig. 3 are distinguished by different gray values and are in one-to-one correspondence with objects with different category attributes in fig. 2; the pixels of all the pixel points in the same pixel division area are the same. It should be further added that, in the dynamic object-vehicle covered in fig. 2 of the present embodiment, correspondingly, in fig. 3, there is a pixel segmentation area corresponding to the vehicle with the category attribute, and the trained semantic segmentation model may also directly filter the dynamic object in fig. 2, and correspondingly, the pixel segmentation area corresponding to the vehicle with the category attribute in fig. 3 may also be filtered. The dynamic object referred to in the present invention is an object that normally moves or has a tendency to move under natural conditions in an image, for example: animals, humans, vehicles, boats, aircraft, or temporary debris, etc.; accordingly, static objects, which are objects that are normally static or have no motion trend under natural conditions in the image, such as roads, lane markings, isolation belts, marking bars, marking plates, etc. in fig. 2, are not meant to be exhaustive.

In addition, in an alternative embodiment of the present invention, in order to clearly and simply distinguish between the pixel division regions corresponding to the non-category attributes, the category attributes corresponding to the pixel division regions are calibrated by numbers. The pixel points corresponding to the pixel dividing areas are marked by numbers, for example, the number 0 for the road, the number 15 for the lane marking line, the number 5 for the isolation belt, the number 11 for the marking rod and the number 13 for the marking plate.

In the embodiment of the invention, any frame image can be selected in the dynamic video according to a time axis to serve as a standard frame image, then, after each interval time T or each interval Q frame image, one frame image is selected to serve as a current frame image, and the current frame image selected each time is compared with the standard frame image. In order to ensure timeliness of video scene change detection and avoid video scene change which is not found in time due to rotation of a camera caused by external force, optionally, selecting the interval time between a current frame image of a first frame and a standard frame image or between two adjacent current frame images to be 3 seconds to 10 seconds, preferably 7 seconds; also alternatively, an image having 10 frames to 80 frames, preferably 50 frames, between the first frame current frame image and the standard frame image or between two adjacent frame current frame images is selected.

In the embodiment of the invention, no illustration is given of the current frame image and the semantic segmentation result diagram obtained after the semantic segmentation is carried out on the current frame image; only by text, after each execution of step S6, if it is determined that the video scene corresponding to the dynamic video is unchanged, the positions and shapes of the static objects in the current frame image corresponding to the comparison with the standard frame image are the same, that is, the visual effects of the static objects except the vehicle in the current frame image, which is not shown, are the same as those of the static objects except the vehicle in the standard frame image of fig. 2, and there is no obvious difference; correspondingly, the semantic segmentation result diagram of the current frame image which is not shown is identical with the visual effect corresponding to the other pixel segmentation areas except the pixel segmentation area corresponding to the vehicle, and has no obvious difference; otherwise, if the video scene corresponding to the dynamic video is judged to be transformed, the vision of the rest static objects except the static object of the vehicle in the current frame image which is not shown in the figure and the standard frame image of the figure 2 is obviously different; and the semantic segmentation result graph of the current frame image which is not shown in the figure is obviously different from the visual effect corresponding to the rest of pixel segmentation areas except the pixel segmentation areas corresponding to the vehicles.

In the illustrated embodiment of the present invention, as shown in fig. 2 and 3, the semantic division result graph includes a plurality of pixel division regions, and in step S2, static objects corresponding to the plurality of pixel division regions are ranked according to their stability scores, and the one having the highest stability score among the plurality of pixel division regions is used as the standard comparison region. That is, in various static objects presented in the standard frame image, one with the smallest motion amplitude under the influence of the same external factors needs to be selected as a reference comparison object, and a pixel segmentation area corresponding to the reference comparison object in the corresponding semantic segmentation result image is used as a standard comparison area, so that the comparison result of the pixel levels of the current frame image and the standard frame image in the later period is more accurate, and the situation that the video scene transformation is misjudged due to the fact that the static objects corresponding to the current frame image and the standard frame image are changed due to the influence of the external factors is avoided to the greatest extent; the accuracy of video scene change detection is greatly improved.

In the illustrated embodiment of the present invention, as shown in fig. 3, the plurality of pixel division areas of the semantic division result diagram in fig. 3 are distinguished by different gray values, and of course, in order to make visual effects intuitively distinguished, different pixel division areas may be distinguished using different colors.

It should be noted that, the stability score of the static object depends on the magnitude of the motion amplitude of the static object affected by the external factors, for example, under the same wind force, the tree stability score is lower than that of the signboard because the tree leaves can shake and the signboard is not affected; the probability of the position or shape change of the signboard caused by external force is larger than that of the lane marker line printed on the road, so that the stability score of the signboard is lower than that of the lane marker line.

Optionally, since the vehicle is defined as a dynamic object in the present invention, after the semantic segmentation of fig. 2, the pixel segmentation area corresponding to the category attribute of the vehicle in fig. 3 is also filtered (not removed for the sake of detailed description in the present invention), that is, in the step S1, the semantic segmentation of the standard frame image includes screening out all the dynamic objects segmented in the standard frame image.

Further alternatively, the stability score of a static object is inversely proportional to the area size of its corresponding pixel-partitioned area. This is because the relatively small volume of static objects is less likely to move when affected by external factors, thus ensuring higher accuracy in video scene change detection.

In step S2, the weight assigned to each pixel in the standard comparison area is the same normalized weight, and the normalized weight is greater than the scene transformation threshold Thr. In the present embodiment, the normalized weight given to each pixel point in the standard comparison area is 0.5. This is advantageous for optimizing the balance of the update to the weights.

Optionally, the scene change threshold Thr is greater than or equal to 0.2 and less than or equal to 0.5; preferably 0.5.

In another optional embodiment of the present invention, in order to greatly improve the robustness of video scene transformation detection, after each interval time T or each interval Q frame image, a current frame image of a dynamic video is obtained to perform operations from step S1 to step S6, and the weight and/or class attribute of the pixel point of the standard comparison area of the standard frame image is updated according to the determination result of video scene transformation.

Specifically, when it is determined that the video scene is unchanged, the weight of each pixel point in the standard comparison area of the standard frame image, which is the same as the category attribute information comparison result in the target comparison area of the current frame image, is increased by r, namely: r is R _n +r, wherein the weight of each pixel point with different category attribute information comparison results is unchanged; or (b)

When the video scene is judged to be transformed, the weight of all pixel points in the standard comparison area of the standard frame image is reduced by r, namely: r is R _n -R, and when R _n -r is equal to or less than Ti, wherein Ti is a critical threshold; and updating the category attribute of the standard comparison area to be the category attribute in the target comparison area of the current frame image.

Optionally, in the initial state, weights R of all pixel points in the standard comparison area of the standard frame image _n The total number of the components is 0.5,and Ti is more than 0 and less than or equal to 0.1. Preferably, r is->Ti is 0.1.

The technical scheme of the invention about video scene change detection is preferably applied to traffic scenes, namely, traffic electronic police is utilized to monitor traffic roads, as shown in fig. 2 and 3. The information of roads, lane marking lines, traffic signs (marking rods or marking plates and the like) and the like is fixed and static, corresponding areas in the images are segmented through a semantic segmentation method, pixel coincidence degrees of each element area of the current image and each static object area of the background image are counted, a coincidence degree threshold value, namely a scene change threshold value Thr, can be set according to the jitter amplitude and experience of the on-site image capturing equipment, and the situation that the coincidence degree threshold value is lower than the scene shift is the scene shift.

When the camera moves in a small angle, the positions of the lane marking lines and the traffic signs in the image are displaced, and because the positions of the lane marking lines and the traffic signs in the image occupy a relatively small space in the image, the influence of the small-angle movement of the image capturing device on the pixel overlap ratio of the image capturing device is very obvious, and the definition of the overlap ratio threshold value is simple. The semantic segmentation method can be used for sufficiently accurately carrying out semantic segmentation of a specific scene, so that the method is sensitive enough in judgment.

The invention correspondingly provides a video scene change detection system, which is used for executing the video scene change detection method, and comprises the following steps:

the video input device is used for acquiring a standard frame image and a current frame image of the dynamic video;

the image processing device comprises a semantic segmentation model, which is used for carrying out semantic segmentation on the standard frame image to obtain a semantic segmentation result, wherein the semantic segmentation result comprises: displaying a semantic segmentation result diagram of at least one pixel segmentation area corresponding to at least one segmented static object in the standard frame image and category attributes corresponding to each pixel segmentation area; the semantic segmentation is carried out on the current frame image so as to obtain a semantic segmentation result graph of the current frame image; a target comparison area corresponding to the position of the standard comparison area in the semantic segmentation result diagram of the standard frame image is defined in the semantic segmentation result diagram of the current frame image;

the image screening and assigning device is used for selecting at least one pixel segmentation area as a standard comparison area and assigning a weight to each pixel point in the standard comparison area;

the image information comparison device is used for traversing all the pixel points in the target comparison area and comparing the category attribute information with the pixel points at the same position in the standard comparison area in a one-to-one correspondence manner;

the matching degree score calculating device is used for obtaining a matching similarity score Mp of the target comparison area and the standard comparison area by utilizing a matching similarity score calculating formula according to the category attribute information comparison result of each pixel point and the weight of each pixel point in the standard comparison area; the matching similarity score calculation formula is:wherein n is the number of pixel points in the target comparison area; when the comparison result of the category attribute information of the pixel points is the same, R' _n ＝R _n ，R _n The weight of the pixel point in the standard comparison area corresponding to the pixel point is obtained; when the category attribute information comparison results of the pixel points are different, R' _n ＝0；

And the result output device is used for judging that the video scene corresponding to the dynamic video is transformed when the matching similarity score Mp is smaller than the scene transformation threshold value Thr, and judging that the video scene is unchanged when the matching similarity score Mp is larger than or equal to the scene transformation threshold value Thr.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for detecting video scene changes, comprising:

step S1, obtaining a standard frame image of a dynamic video, and performing semantic segmentation on the standard frame image to obtain a semantic segmentation result of the standard frame image, wherein the semantic segmentation result comprises: displaying a semantic segmentation result diagram of at least one pixel segmentation area corresponding to at least one segmented static object in the standard frame image and category attributes corresponding to the pixel segmentation areas;

s2, selecting at least one pixel segmentation area as a standard comparison area, and assigning a weight to each pixel point in the standard comparison area;

step S3, obtaining a current frame image of the dynamic video, and carrying out semantic segmentation on the current frame image to obtain a semantic segmentation result image; a target comparison area corresponding to the position of the standard comparison area in the semantic segmentation result diagram of the standard frame image is defined in the semantic segmentation result diagram of the current frame image;

step S4, traversing all the pixel points in the target comparison area, and comparing the pixel points with the pixel points at the same position in the standard comparison area in a one-to-one correspondence manner;

step S5, according to the category attribute information comparison result of each pixel point and the weight of each pixel point in the standard comparison area, obtaining a matching similarity score Mp of the target comparison area and the standard comparison area by using a matching similarity score calculation formula;

the matching similarity score calculation formula is as follows:

wherein n isThe number of the pixel points in the target comparison area; when the comparison results of the category attribute information of the pixel points are the same, R' _n ＝R _n ，R _n The weight of the pixel point in the standard comparison area corresponding to the pixel point is obtained; when the category attribute information comparison results of the pixel points are different, R' _n ＝0；

And S6, when the matching similarity score MP is smaller than a scene transformation threshold value Thr, judging that the video scene corresponding to the dynamic video is transformed, and when the matching similarity score MP is larger than or equal to the scene transformation threshold value Thr, judging that the video scene is unchanged.

2. The method according to claim 1, wherein the semantic division result map includes a plurality of pixel division regions, and in the step S2, the static objects corresponding to the plurality of pixel division regions are ranked according to their stability scores, and the one having the highest stability score among the plurality of pixel division regions is used as the standard comparison region.

3. The video scene change detection method according to claim 2, wherein the stability score of the static object is inversely proportional to the area size of the pixel division area to which it corresponds.

4. The method according to claim 1, wherein in the step S1, the semantic segmentation of the standard frame image includes screening out all the segmented dynamic objects in the standard frame image.

5. The video scene change detection method according to claim 1, wherein the weight given to each of the pixel points in the standard comparison region is the same normalized weight, and the normalized weight is greater than the scene change threshold Thr.

6. The method of claim 1, wherein the class attribute associated with each of the pixel segments is digitally scaled.

7. The method according to any one of claims 1 to 6, wherein after each interval time T or each interval Q frame image, obtaining a frame of the current frame image of the dynamic video performs operations from step S1 to step S6, and updating weights and/or category attributes of pixels in a standard comparison area of the standard frame image according to a determination result of the video scene change.

8. The method of claim 7, wherein,

when the video scene is not changed, increasing the weight of each pixel point with the same category attribute information comparison result in the standard comparison area of the standard frame image and the target comparison area of the current frame image by r, namely: r is R _n +r, wherein the weight of each pixel point with different category attribute information comparison results is unchanged; or (b)

9. The method of claim 8, wherein,

in the initial state, the weights R of all pixel points in the standard comparison area of the standard frame image _n The total number of the components is 0.5,and Ti is more than 0 and less than or equal to 0.1.

10. A storage medium, characterized in that the storage medium is a computer readable storage medium, on which computer program instructions are stored, wherein the program instructions, when executed by a processor, are adapted to carry out the steps of the video scene change detection method according to any of claims 1-9.

11. An electronic device, comprising: the device comprises a processor, a memory, a communication element and a communication bus, wherein the processor, the memory and the communication element complete communication with each other through the communication bus; the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the video scene change detection method according to any of claims 1-9.