CN114038197A

CN114038197A - Scene state determination method and device, storage medium and electronic device

Info

Publication number: CN114038197A
Application number: CN202111408964.6A
Authority: CN
Inventors: 余言勋; 杜治江
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-02-11
Anticipated expiration: 2041-11-24
Also published as: CN114038197B

Abstract

The embodiment of the invention provides a method and a device for determining a scene state, a storage medium and an electronic device, wherein the method comprises the following steps: determining N images from an image sequence acquired aiming at a target scene, wherein N is an integer not less than 3; determining at least two image groups based on the N images, wherein each image group comprises two images, and the images contained in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in at least two image groups; and determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the corresponding classes of each image group. By the method and the device, the problem of inaccurate scene state determination in the related technology is solved, and the effect of improving the scene state determination accuracy is achieved.

Description

Scene state determination method and device, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of communication, in particular to a method and a device for determining a scene state, a storage medium and an electronic device.

Background

At present, necessary environment configuration is required to be completed for detecting illegal behaviors occurring in road traffic. The environment here includes lane lines (drawn manually), snap lines (drawn manually), stop lines (drawn manually), detection areas (drawn manually) and the like of the scene being monitored. These scenes must all rely on manual drawing; when a scene changes, such as construction, accidents and the like, the manually drawn scene cannot acquire change information at the first time, so that information lag is caused, and great inconvenience is brought to traffic managers. Fig. 1 is a tunnel scene arrangement line graph, in which, as shown in fig. 1, a thick solid line indicates a lane line, an area formed by a contour line indicates a detection area, and a broken line indicates a flow trigger line. If the camera is now rotating and is not perceived, a large number of false positives can be generated.

In the related art, when the state of a scene is determined, scene recognition such as lane surface recognition and lane line recognition is performed once every certain time, two results within a certain time interval are compared, and if the difference between the two results is large, the scene is considered to be changed. However, since the road surface scene has variability, it is difficult to ensure accuracy of algorithm identification when the road surface scene in the scene changes.

Therefore, the problem that the scene state is determined inaccurately exists in the related art.

In view of the above problems in the related art, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a scene state, a storage medium and an electronic device, which are used for at least solving the problem of inaccurate determination of the scene state in the related art.

According to an embodiment of the present invention, there is provided a method for determining a scene state, including: determining N images from an image sequence acquired aiming at a target scene, wherein N is an integer not less than 3; determining at least two image groups based on the N images, wherein each image group comprises two images, and the images contained in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups; and determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

According to another embodiment of the present invention, there is provided a scene state determination apparatus including: the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining N images from an image sequence acquired aiming at a target scene, and N is an integer not less than 3; a second determining module, configured to determine at least two image groups based on N images, where each image group includes two images, and the images included in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups; and the third determining module is used for determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

According to yet another embodiment of the invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program, when executed by a processor, implements the steps of the method as set forth in any of the above.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

According to the method and the device, N images are determined from an image sequence collected aiming at a target scene, at least two image groups are determined according to the N images, an inter-class similar reference value corresponding to each image group is determined according to the similarity degree of the two images contained in each image group of the at least two image groups, and whether the state of the target scene in the image sequence is changed or not is determined according to the difference degree of the inter-class similar reference values corresponding to each image group. The N images can be divided into at least two image groups, the inter-class similar reference value corresponding to each two images is determined, and whether the state of the target scene changes or not is determined according to the difference degree of the inter-class similar reference values, so that the problem that the scene state is determined inaccurately in the related technology can be solved, and the effect of improving the accuracy of determining the scene state is achieved.

Drawings

Fig. 1 is a drawing of a tunnel scene configuration in the related art;

fig. 2 is a block diagram of a hardware structure of a mobile terminal of a method for determining a scene state according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of determining a scene state according to an embodiment of the invention;

FIG. 4 is a schematic view of a target area according to an exemplary embodiment of the present invention;

fig. 5 is a schematic structural diagram of a Triplet Net model according to an exemplary embodiment of the present invention;

FIG. 6 is a schematic diagram of determining whether a state of a target scene has changed using a TripletNet model according to an exemplary embodiment of the present invention;

fig. 7 is a block diagram of a scene state determination apparatus according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the operation on the mobile terminal as an example, fig. 2 is a hardware structure block diagram of the mobile terminal of the method for determining a scene state according to the embodiment of the present invention. As shown in fig. 2, the mobile terminal may include one or more (only one shown in fig. 2) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 2 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 2, or have a different configuration than shown in FIG. 2.

The memory 104 may be used to store a computer program, for example, a software program and a module of an application software, such as a computer program corresponding to the method for determining a scene state in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, a method for determining a scene state is provided, and fig. 3 is a flowchart of a method for determining a scene state according to an embodiment of the present invention, as shown in fig. 3, the flowchart includes the following steps:

step S302, determining N images from an image sequence collected aiming at a target scene, wherein N is an integer not less than 3;

step S304, determining at least two image groups based on N images, wherein each image group comprises two images, and the images contained in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups;

step S306, determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

In the above embodiment, the target scene may be a traffic scene, and may also be other scenes. When the target scene is a traffic scene, the image sequence may be an image collected by the traffic post monitoring device. The N images can be determined in the image sequence, and the N images are divided into at least two image groups, where each image group includes two images, which may be completely different images or partially different images. The same image may be included in different image groups, but the images in different image groups are not identical.

In the above embodiment, an inter-class similarity reference value corresponding to each image group may be determined, where the inter-class similarity reference value may represent a degree of similarity between two images in the same group. After the inter-class similarity reference value corresponding to each image group is determined, whether the state of the target scene is changed or not can be determined according to the difference degree of the inter-class similarity reference value corresponding to each image group. When the shooting angle or shooting parameters of the image pickup device are changed, the state of the target scene can be changed. For example, when the camera device is installed at a fixed point, during the use process, the operator may pull the lens to see farther or closer, or select the camera device to shoot the other side of the road, etc., which is the monitored scene of the camera, that is, the state of the target scene is changed.

Optionally, the main body of the above steps may be a background processor, or other devices with similar processing capabilities, and may also be a machine integrated with at least an image acquisition device and a data processing device, where the image acquisition device may include a graphics acquisition module such as a camera, and the data processing device may include a terminal such as a computer and a mobile phone, but is not limited thereto.

In an exemplary embodiment, the target scene includes a traffic scene, and after the N images are determined from the image sequence acquired for the target scene, before the inter-class similarity reference value corresponding to each image group is determined according to the similarity degree of the two images included in each image group of the at least two image groups, the method further includes: recognizing lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image; and performing interference reduction processing on the lane edge area in each image. In this embodiment, when the target scene is a traffic scene, lane edge lines in each image may be identified through algorithms such as semantic segmentation, edge detection, lane surface detection, and the like, a lane edge region is determined according to the lane edge lines, and interference reduction processing is performed on the lane edge region. The lane edge area may be an area surrounded by lane edge lines, a schematic diagram of the lane edge line area may refer to fig. 4, as shown in fig. 4, an area in a thick solid frame is the lane edge area, and a thick solid line is the lane edge line.

In the above embodiment, when the target scene is a traffic scene, a vehicle traveling on the road surface may cause changed information included in the captured image. The changing information influences the accuracy of identifying whether the state of the target scene changes, so that the interference reduction processing can be carried out on the lane edge area, namely the area with the possibility of changing, and the interference information on the road surface is removed, so that the accuracy of determining whether the state of the target scene changes is improved. The interference reduction processing may include adding a mask to the lane edge area, and the like.

In the above embodiment, the lane edge region may be a lane edge line, when the interference reduction processing is performed, only the lane edge line is subjected to the interference reduction processing, so as to determine the interference region, and when the similarity degree of the two images included in each image group is determined, only the similarity degree of the other regions except the interference region in each image group is determined.

In an exemplary embodiment, the performing interference reduction processing on the lane edge area in each image includes: and setting the pixel value of each pixel point contained in the lane edge area in each image as a target pixel value. In this embodiment, when performing the interference reduction processing, the pixel value of each pixel point included in the lane edge area may be set as the target pixel value. The target pixel value may be 0 (this value is merely an exemplary illustration, and may also be set to other values, which is not limited by the present invention).

In one exemplary embodiment, the at least two image groups include a first image group including a history image and a current image, and a second image group including the current image and a subsequent image; the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image. In the present embodiment, when N is 3, the 3 images may be divided into two image groups, which are a first image group and a second image group, respectively. Among them, the 3 images may be images acquired from a video stream. In the video stream, a current image, a history image, and a subsequent image may be determined. The first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image. After the current image, the history image, and the subsequent image are determined, the current image and the history image may be determined as an image group, and the current image and the subsequent image may be determined as an image group.

In an exemplary embodiment, the time difference between the first acquisition instant and the second acquisition instant is a first time interval; the time difference between the second acquisition time and the third acquisition time is a second time interval. In this embodiment, the current image, the history image, and the subsequent image may be continuous three-frame images or discontinuous images. When the current image, the history image, and the subsequent image are continuous three-frame images, the first time interval and the second time interval are the same and are shooting intervals of the image capturing apparatus. When the current image, the history image and the three frames of images of the subsequent images are discontinuously captured, the first time interval and the second time interval may be the same or different. And the first time interval and the second time interval may be custom set time intervals.

In the above embodiment, the process of determining whether the state of the target scene changes may be an automatically triggered detection process, or may be a manually triggered detection process. When the detection process is automatically triggered, the first time interval and the second time interval may be preset fixed time intervals. When the detection process is artificially triggered, the first time interval and the second time interval may be determined according to the trigger time.

In the above embodiment, when the video is a real-time frame, the algorithm automatically acquires one frame per second, accumulates to acquire 3 frames, and uses the three frames as the historical image P1, the current image P2, and the subsequent image P3 to determine whether the state of the target scene changes or not through the three frames; when new data is continuously sent in, discarding p1 frames, taking p2 frames as historical images, taking p3 as the current image and taking p4 as the subsequent image, and then sequentially determining whether the state of the target scene is changed. Wherein the first time interval and the second time interval may be set in sequence using a recommended 1 second sample or by the user.

In an exemplary embodiment, the determining, according to a similarity degree between two images included in each of the at least two image groups, an inter-class similarity reference value corresponding to each image group includes: determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image; the determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group comprises: determining whether a state of the target scene in the image sequence is changed based on a difference value of the first inter-class similar reference value and the second inter-class similar reference value. In this embodiment, when determining whether the state of the target scene changes, the inter-class similarity reference value corresponding to each image group may be determined. When two image groups are included, a first inter-class similarity reference value corresponding to the first image group may be determined, and a second inter-class similarity reference value corresponding to the second image group may be determined. And determining a difference value of the first inter-class similar reference value and the second inter-class similar reference value, and determining whether the state of the target scene changes according to the difference value.

In the above-described embodiment, in determining whether or not the state of the target scene changes, the degree of similarity of the images within the two image groups is determined, and further, the first inter-class similarity reference value and the second inter-class similarity reference value are determined. After the first inter-class similar reference value and the second inter-class similar reference value are determined, whether the state of the target scene changes is not simply determined according to the first inter-class similar reference value or the second inter-class similar reference value, but a difference value between the first inter-class similar reference value and the second inter-class similar reference value is determined again according to the first inter-class similar reference value and the second inter-class similar reference value, and whether the state of the target scene changes is determined according to the difference value. When the state of the target scene is determined to be changed, the distance between the classes and the distance between the classes are judged to determine whether the scene is changed, so that the robustness of the target network model is improved. The images of the same type are a plurality of images acquired when the scene state is unchanged, and the images of different types are a plurality of images acquired when the scene state is changed.

In an exemplary embodiment, the determining whether the state of the target scene in the image sequence is changed based on the difference value between the first inter-class similar reference value and the second inter-class similar reference value includes: determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value; determining that a state of the target scene changes in response to the disparity value being greater than a predetermined threshold. In this embodiment, after determining the difference value between the first inter-class similar reference value and the second inter-class similar reference value, if the difference value is greater than the predetermined threshold, it is determined that the state of the target scene is considered to have changed. The predetermined threshold may be a preset threshold, which is not limited in the present invention.

In one exemplary embodiment, the method further comprises: in response to the disparity value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed. In the present embodiment, in the case where the difference value is smaller than the predetermined threshold value, it can be considered that the state of the target scene has not changed.

In an exemplary embodiment, the determining at least two image groups based on the N images; and according to the similarity degree of two images contained in each image group of the at least two image groups, determining an inter-class similarity reference value corresponding to each image group, comprising: inputting the N images into a target network model, and determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value based on the target network model. In this embodiment, when determining the first inter-class similar reference value and the second inter-class similar reference value, the first inter-class similar reference value and the second inter-class similar reference value may be determined by using the target network model, and the difference value may be determined by using the target network model. The target network model may be a Triplet Net model.

In one exemplary embodiment, the target network model comprises a Triplet Net model; the first inter-class similarity reference value comprises a first Euclidean distance between a feature vector of the historical image and a feature vector of the current image; the second inter-class similarity reference value comprises a second Euclidean distance between the feature vector of the current image and the feature vector of the subsequent image. In this embodiment, the target Network model may be a Triplet Net model, where a schematic structural diagram of the Triplet Net model may refer to fig. 5, and as shown in fig. 5, the Triplet Net includes 3 identical feedforward neural networks, and the 3 identical feedforward neural networks share weights with each other, and three samples are input each time: x-, x, and x + are respectively a candidate sample, a homogeneous sample and a heterogeneous sample. The eigenvector at the embedding layer is the distance L2, the distance coding is carried out on three inputs through three branches, and the network finally outputs two Euclidean distances of d1 and d 2. If the distance difference between the two classes is too large, x-and x + are considered to be in two different classes (the distance between the classes is too large). In the above embodiment, scene recognition is performed using TripletNet, which can improve the accuracy of scene recognition.

In the above embodiments, the similar reference value may be expressed by a euclidean distance. After obtaining the N images, the images may be processed for interference reduction. And sending the image subjected to the interference reduction processing into a TripletNet model. Namely, the historical image, the current image and the subsequent image are subjected to interference reduction processing and then output to the TripletNet model. Referring to fig. 6, as shown in fig. 6, the first two groups of inputs are a history image and a current image, the third frame is a subsequent image, and the target network model determines that the scene is changed by calculating a euclidean distance d1 (i.e., the euclidean distance between the history image and the current image), d2 (i.e., the euclidean distance between the current image and the subsequent image), if the distance d2 is significantly greater than d 1.

In the foregoing embodiment, a group of image frames (3 images, i.e., the historical frame P1, the current frame P2, and the subsequent frame P3) in the video frames are acquired, and a certain time is set between each frame (the time can be set as required), and the shorter the set time is, the higher the algorithm precision is. After the 3 images are subjected to road surface elimination, lane edges are detected and edge zones of roads are identified by using algorithms such as semantic segmentation and lane line identification; and (4) setting the pixels of the identified road surface area to be 0 (adding a mask) to remove the interference information such as the road surface and the like. And sending the TripletNet to identify whether the scene changes, wherein the identification principle is as follows: the similarity (namely Euclidean distance) of P2 and P1 and P3 is respectively determined, and whether scene change occurs in the group of image frames is judged based on the difference degree between the two similarities. The method has the advantages that the road surface area is preprocessed once, scenes such as a lane surface and the like which are easy to change are shielded, and the influence of lanes on the scenes is greatly reduced. Using the processed picture as input, and identifying scene change through TripletNet, wherein the TripletNet not only considers the inter-class distance, but also considers the intra-class distance; the judgment result is that the inter-class distance is judged on the basis of the intra-class distance instead of only considering the inter-class distance, whether the change occurs or not is determined through a fixed value, and the accuracy of determining the state of the scene is improved.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a device for determining a scene state is further provided, where the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 7 is a block diagram of a scene state determination apparatus according to an embodiment of the present invention, and as shown in fig. 7, the apparatus includes:

a first determining module 72, configured to determine N images from an image sequence acquired for a target scene, where N is an integer not less than 3;

a second determining module 74, configured to determine at least two image groups based on the N images, wherein each image group includes two images, and the images included in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups;

a third determining module 76, configured to determine whether the state of the target scene in the image sequence has changed based on the degree of difference between the similar reference values of the classes corresponding to each image group.

In an exemplary embodiment, the target scene includes a traffic scene, and the apparatus may be configured to, after determining N images from the sequence of images acquired for the target scene, before determining the inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images included in each of the at least two image groups: recognizing lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image; and performing interference reduction processing on the lane edge area in each image.

In an exemplary embodiment, the apparatus may implement the interference reduction processing on the lane edge area in each image by: and setting the pixel value of each pixel point contained in the lane edge area in each image as a target pixel value.

In one exemplary embodiment, the at least two image groups include a first image group including a history image and a current image, and a second image group including the current image and a subsequent image; the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image.

In an exemplary embodiment, the time difference between the first acquisition instant and the second acquisition instant is a first time interval; the time difference between the second acquisition time and the third acquisition time is a second time interval.

In an exemplary embodiment, the second determining module 74 may determine the inter-class similarity reference value corresponding to each image group according to the similarity degree of two images included in each image group of the at least two image groups by: determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image; the third determining module 76 may determine whether the state of the target scene in the image sequence is changed based on the degree of difference between the similar reference values of the classes corresponding to each image group by: determining whether a state of the target scene in the image sequence is changed based on a difference value of the first inter-class similar reference value and the second inter-class similar reference value.

In an exemplary embodiment, the third determining module 76 may implement the determining whether the state of the target scene in the image sequence is changed based on the difference value of the first inter-class similar reference value and the second inter-class similar reference value by: determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value; determining that a state of the target scene changes in response to the disparity value being greater than a predetermined threshold.

In an exemplary embodiment, the third determining module 76 may further implement the determining whether the state of the target scene in the image sequence is changed based on the difference value between the first inter-class similar reference value and the second inter-class similar reference value by: in response to the disparity value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed.

In an exemplary embodiment, the second determining module 74 may implement the determining at least two image groups based on the N images as follows; and according to the similarity degree of two images contained in each image group of the at least two image groups, determining an inter-class similarity reference value corresponding to each image group: inputting the N images into a target network model, and determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value based on the target network model.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method as set forth in any of the above.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining scene state, comprising:

determining N images from an image sequence acquired aiming at a target scene, wherein N is an integer not less than 3;

determining at least two image groups based on the N images, wherein each image group comprises two images, and the images contained in different image groups are completely or partially different; and are

Determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups; and

and determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

2. The method according to claim 1, wherein the target scene comprises a traffic scene, and after the N images are determined from the sequence of images acquired for the target scene, before the inter-class similarity reference value corresponding to each image group is determined according to the similarity degree of the two images included in each image group of the at least two image groups, the method further comprises:

recognizing lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image;

and performing interference reduction processing on the lane edge area in each image.

3. The method of claim 2, wherein the performing interference reduction processing on the lane edge region in each image comprises:

and setting the pixel value of each pixel point contained in the lane edge area in each image as a target pixel value.

4. The method according to any one of claims 1 to 3, wherein the at least two image groups comprise a first image group comprising a history image and a current image and a second image group comprising the current image and a subsequent image;

the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image.

5. The method of claim 4,

the time difference between the first acquisition time and the second acquisition time is a first time interval;

the time difference between the second acquisition time and the third acquisition time is a second time interval.

6. The method according to claim 4, wherein determining the inter-class similarity reference value corresponding to each image group according to the similarity degree of two images included in each image group of the at least two image groups comprises:

determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image;

the determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group comprises:

determining whether a state of the target scene in the image sequence is changed based on a difference value of the first inter-class similar reference value and the second inter-class similar reference value.

7. The method of claim 6, wherein the determining whether the state of the target scene in the image sequence has changed based on the difference value between the first inter-class similarity reference value and the second inter-class similarity reference value comprises:

determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value;

determining that a state of the target scene changes in response to the disparity value being greater than a predetermined threshold.

8. The method of claim 7, further comprising:

in response to the disparity value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed.

9. The method according to any one of claims 6 to 8, wherein said determining at least two image groups based on N images; and according to the similarity degree of two images contained in each image group of the at least two image groups, determining an inter-class similarity reference value corresponding to each image group, comprising:

inputting the N images into a target network model, and determining a difference value between the first inter-class similar reference value and the second inter-class similar reference value based on the target network model.

10. The method of claim 9, wherein the target network model comprises a Triplet Net model;

the first inter-class similarity reference value comprises a first Euclidean distance between a feature vector of the historical image and a feature vector of the current image;

the second inter-class similarity reference value comprises a second Euclidean distance between the feature vector of the current image and the feature vector of the subsequent image.

11. An apparatus for determining a scene state, comprising:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining N images from an image sequence acquired aiming at a target scene, and N is an integer not less than 3;

a second determining module, configured to determine at least two image groups based on N images, where each image group includes two images, and the images included in different image groups are completely or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of two images contained in each image group in the at least two image groups;

and the third determining module is used for determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

12. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 10.

13. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 10.