CN114038197B

CN114038197B - Scene state determining method and device, storage medium and electronic device

Info

Publication number: CN114038197B
Application number: CN202111408964.6A
Authority: CN
Inventors: 余言勋; 杜治江
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2023-06-13
Anticipated expiration: 2041-11-24
Also published as: CN114038197A

Abstract

The embodiment of the invention provides a method and a device for determining scene states, a storage medium and an electronic device, wherein the method comprises the following steps: n images are determined from an image sequence acquired aiming at a target scene, wherein N is an integer not less than 3; determining at least two image groups based on the N images, wherein each image group contains two images, and the images contained in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in at least two image groups; and determining whether the state of the target scene in the image sequence is changed based on the degree of difference of the similar reference values between the classes corresponding to each image group. The invention solves the problem of inaccurate scene state determination in the related technology, and achieves the effect of improving the accuracy of scene state determination.

Description

Scene state determining method and device, storage medium and electronic device

Technical Field

The embodiment of the invention relates to the field of communication, in particular to a scene state determining method and device, a storage medium and an electronic device.

Background

At present, detection of illegal and illegal behaviors occurring on road traffic must be completed by configuring necessary environments. The environment here contains a lane line (manual drawing) of the monitored scene, a snap line (manual drawing), a stop line (manual drawing), a detection area (manual drawing), and the like. Both of these scenes must rely on manual rendering; when the scene changes, such as construction, accidents and the like, the manually drawn scene cannot acquire the change information at the first time, so that the information is lagged, and great inconvenience is brought to traffic management personnel. Fig. 1 is a tunnel scene arrangement demarcation diagram, and as shown in fig. 1, a thick solid line indicates a lane line, a region constituted by a contour line indicates a detection region, and a broken line indicates a flow trigger line. A large number of false positives can occur if the camera is now rotating and is not perceived.

In the related art, when determining the state of a scene, scene recognition is performed once every certain time, such as lane face recognition and lane line recognition, two results within a certain time interval are compared, and if the difference between the two results is large, the scene is considered to be changed. However, since the road scene has variability, it is difficult to ensure accuracy of algorithm recognition when the road scene in the scene changes.

As can be seen from the above, the related art has a problem of inaccurate determination of scene status.

In view of the above problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a scene state, a storage medium and an electronic device, which are used for at least solving the problem of inaccurate scene state determination in the related art.

According to an embodiment of the present invention, there is provided a scene status determining method including: determining N images from an image sequence acquired aiming at a target scene, wherein N is an integer not less than 3; determining at least two image groups based on N images, wherein each image group contains two images, and the images contained in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups; and determining whether the state of the target scene in the image sequence is changed or not based on the degree of difference of the similar reference values between the classes corresponding to each image group.

According to another embodiment of the present invention, there is provided a scene status determining apparatus including: the first determining module is used for determining N images from an image sequence acquired for a target scene, wherein N is an integer not less than 3; a second determining module, configured to determine at least two image groups based on N images, where each image group includes two images, and the images included in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups; and the third determining module is used for determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group.

According to yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program when executed by a processor implements the steps of the method as described in any of the above.

According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the method and the device, N images are determined from the image sequence acquired for the target scene, at least two image groups are determined according to the N images, the inter-class similarity reference value corresponding to each image group is determined according to the similarity degree of the two images contained in each image group in the at least two image groups, and whether the state of the target scene in the image sequence is changed or not is determined according to the difference degree of the inter-class similarity reference value corresponding to each image group. Because the N images can be divided into at least two image groups, the similar reference value between the classes corresponding to each two images is determined, and whether the state of the target scene changes is determined according to the difference degree of the similar reference values between the classes, the problem that the state of the determined scene is inaccurate in the related technology can be solved, and the effect of improving the accuracy of the state of the determined scene is achieved.

Drawings

FIG. 1 is a diagram of a tunnel scene configuration scribe in the related art;

fig. 2 is a block diagram of a hardware structure of a mobile terminal according to a scene status determination method according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method of determining scene status according to an embodiment of the invention;

FIG. 4 is a schematic diagram of a target area according to an exemplary embodiment of the present invention;

fig. 5 is a schematic diagram of the structure of a Triplet Net model according to an exemplary embodiment of the present invention;

FIG. 6 is a schematic diagram of determining whether a state of a target scene has changed using a TripletNet model according to an exemplary embodiment of the present invention;

fig. 7 is a block diagram of a scene status determination device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 2 is a block diagram of a hardware structure of the mobile terminal according to a scene status determining method according to an embodiment of the present invention. As shown in fig. 2, the mobile terminal may include one or more (only one is shown in fig. 2) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may further include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 2 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 2, or have a different configuration than shown in fig. 2.

The memory 104 may be used to store computer programs, such as software programs of application software and modules, such as computer programs corresponding to a method for determining a scene status in an embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

In this embodiment, a method for determining a scene status is provided, and fig. 3 is a flowchart of a method for determining a scene status according to an embodiment of the present invention, as shown in fig. 3, where the flowchart includes the following steps:

step S302, determining N images from an image sequence acquired for a target scene, wherein N is an integer not less than 3;

step S304, at least two image groups are determined based on N images, wherein each image group contains two images, and the images contained in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups;

step S306, determining whether the state of the target scene in the image sequence is changed based on the degree of difference of the similar reference values between the classes corresponding to each image group.

In the above embodiment, the target scene may be a traffic scene, but may also be other scenes. When the target scene is a traffic scene, the image sequence may be images acquired by a traffic post monitoring device. N images can be determined in the image sequence, and the N images are divided into at least two image groups, wherein each image group comprises images which can be completely different or partially different. The same image may be included in different groups of images, but the images in different groups of images are not exactly the same.

In the above embodiment, the inter-class similarity reference value corresponding to each image group may be determined, where the inter-class similarity reference value may represent the degree of similarity of two images in the same group. After determining the inter-class similarity reference value corresponding to each image group, it may be determined whether the state of the target scene is changed according to the degree of difference of the inter-class similarity reference value corresponding to each image group. Wherein, when the shooting angle or shooting parameter of the image pickup apparatus changes, the state of the target scene can be caused to change. For example, when the image capturing apparatus is installed at a fixed point, during use, an operator pulls the lens to see farther or closer, or selects the image capturing apparatus to capture the other side of the road, which is that the monitored scene of the camera changes, that is, the state of the target scene changes.

Alternatively, the main body of execution of the above steps may be a background processor, or other devices with similar processing capability, and may also be a machine integrated with at least an image acquisition device and a data processing device, where the image acquisition device may include a graphics acquisition module such as a camera, and the data processing device may include a terminal such as a computer, a mobile phone, and the like, but is not limited thereto.

In an exemplary embodiment, the target scene includes a traffic scene, after determining N images from the image sequence acquired for the target scene, before determining the inter-class similarity reference value corresponding to each image group according to the similarity degree of two images included in each image group in the at least two image groups, the method further includes: identifying lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image; and carrying out interference reduction processing on the lane edge area in each image. In this embodiment, when the target scene is a traffic scene, the lane edge lines in each image may be identified by algorithms such as semantic segmentation, edge detection, lane surface detection, etc., and the lane edge regions are determined according to the lane edge lines, so as to perform interference reduction processing on the lane edge regions. The lane edge area may be an area surrounded by lane edge lines, and the schematic view of the lane edge line area may be shown in fig. 4, as shown in fig. 4, where the area in the thick solid frame is the lane edge area, and the thick solid line is the lane edge line.

In the above embodiment, when the target scene is a traffic scene, a vehicle traveling on a road surface may cause information of a change included in the acquired image. The information of these changes has an accuracy of affecting whether the state of the recognition target scene is changed, and therefore, the interference-reducing process can be performed on the lane edge area, that is, the area where the change is possible, and the interference information on the road surface can be removed to improve the accuracy of determining whether the state of the target scene is changed. The interference reduction process may include adding a mask or the like to the lane edge region.

In the above embodiment, the lane edge area may be a lane edge line, and when the interference reduction processing is performed, the interference reduction processing is performed only on the lane edge line, so that the interference area is determined, and when the similarity degree of the two images included in each image group is determined, only the similarity degree of the other areas except the interference area in each image is determined.

In an exemplary embodiment, the performing interference reduction processing on the lane edge area in each image includes: setting a pixel value of each pixel point contained in the lane edge area in the respective images as a target pixel value. In the present embodiment, at the time of performing the disturbance reduction processing, the pixel value of each pixel point included in the lane edge area may be set as the target pixel value. Wherein the target pixel value may be 0 (this value is only an exemplary illustration, and other values may be set, which is not limited in this invention).

In an exemplary embodiment, the at least two image groups include a first image group including a history image and a current image and a second image group including the current image and a subsequent image; the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image. In this embodiment, when N is 3, the 3 images may be divided into two image groups, which are a first image group and a second image group, respectively. Wherein the 3 images may be images acquired from a video stream. In the video stream, a current image, a history image, and a subsequent image may be determined. The first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image. After determining the current image, the history image, and the subsequent image, the current image and the history image may be determined as one image group, and the current image and the subsequent image may be determined as one image group.

In an exemplary embodiment, the time difference between the first acquisition instant and the second acquisition instant is a first time interval; and the time difference between the second acquisition time and the third acquisition time is a second time interval. In this embodiment, the current image, the history image, and the subsequent image may be continuous three-frame images or discontinuous images. When the current image, the history image, and the subsequent image are consecutive three-frame images, the first time interval and the second time interval are the same, and are shooting intervals of the image pickup apparatus. When the current image, the historical image and the follow-up image are three frames of images which are shot discontinuously, the first time interval and the second time interval can be the same or different. And the first time interval and the second time interval may be custom set time intervals.

In the above embodiment, the process of determining whether the state of the target scene is changed may be an automatic triggering detection process or an artificial triggering detection process. In the case of an automatically triggered detection procedure, the first time interval and the second time interval may be fixed time intervals set in advance. In the case of a human triggered detection procedure, the first time interval and the second time interval may be determined according to the trigger time.

In the above embodiment, under the condition that the video is sent in real time, the algorithm automatically acquires one frame every second, and accumulates and acquires 3 frames, and the three frames are used as the history image P1, the current image P2, the subsequent image P3, and whether the state of the target scene is changed is judged through the three frames of images; when new data is continuously input, p1 frames are discarded, p2 frames are used as historical images, p3 is used as a current image, p4 is used as a subsequent image, and then whether the state of the target scene is changed is determined sequentially. The first time interval and the second time interval may be sequentially or user-set using recommended 1 second sampling.

In an exemplary embodiment, the determining the inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images included in each image group in the at least two image groups includes: determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image; the determining whether the state of the target scene in the image sequence is changed based on the difference degree of the similar reference values between the classes corresponding to each image group comprises the following steps: based on the difference value of the first inter-class similarity reference value and the second inter-class similarity reference value, it is determined whether a state of the target scene in the image sequence has changed. In this embodiment, when determining whether the state of the target scene changes, the inter-class similarity reference value corresponding to each image group may be determined. When two image groups are included, a first inter-class similarity reference value corresponding to a first image group may be determined, and a second inter-class similarity reference value corresponding to a second image group may be determined. And determining the difference value of the first inter-class similarity reference value and the second inter-class similarity reference value, and determining whether the state of the target scene is changed or not according to the difference value.

In the above embodiment, when determining whether the state of the target scene is changed, the degree of similarity of the images in the two image groups is determined, and then the first inter-class similarity reference value and the second inter-class similarity reference value are determined. After the first inter-class similarity reference value and the second inter-class similarity reference value are determined, whether the state of the target scene changes is not simply determined according to the first inter-class similarity reference value or the second inter-class similarity reference value, but the difference value of the first inter-class similarity reference value and the second inter-class similarity reference value is determined again, and whether the state of the target scene changes is determined according to the difference value. When determining whether the state of the target scene changes, whether the scene changes or not is determined by judging the inter-class and intra-class distances, so that the robustness of the target network model is improved. The same type of images refer to a plurality of images acquired when the state of the scene is unchanged, and different types of images refer to a plurality of images acquired when the state of the scene is changed.

In an exemplary embodiment, the determining whether the state of the target scene in the image sequence is changed based on the difference value between the first inter-class similarity reference value and the second inter-class similarity reference value includes: determining a difference value between the first inter-class similarity reference value and the second inter-class similarity reference value; in response to the variance value being greater than a predetermined threshold, a change in state of the target scene is determined. In this embodiment, after determining the difference value between the first inter-class similarity reference value and the second inter-class similarity reference value, if the difference value is greater than a predetermined threshold value, it is determined that the state of the target scene is considered to have changed. The predetermined threshold may be a predetermined threshold, which is not limited in the present invention.

In an exemplary embodiment, the method further comprises: and in response to the difference value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed. In the present embodiment, when the difference value is smaller than the predetermined threshold value, it can be considered that the state of the target scene has not changed.

In an exemplary embodiment, the determining at least two image groups based on the N images; and determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups, wherein the inter-class similarity reference value comprises: and inputting the N images into a target network model, and determining a difference value of the first inter-class similarity reference value and the second inter-class similarity reference value based on the target network model. In this embodiment, when determining the first inter-class similarity reference value and the second inter-class similarity reference value, the first inter-class similarity reference value and the second inter-class similarity reference value may be determined by the target network model, and the difference value may be determined by the target network model. The target network model may be a Triplet Net model.

In one exemplary embodiment, the target network model comprises a Triplet Net model; the first inter-class similarity reference value comprises a first Euclidean distance between a feature vector of the historical image and a feature vector of the current image; the second inter-class similarity reference value includes a second Euclidean distance of the feature vector of the current image and the feature vector of the subsequent image. In this embodiment, the target Network model may be a Triplet Net model, where a schematic structural diagram of the Triplet Net model may be seen in fig. 5, and as shown in fig. 5, the Triplet Net includes 3 identical feedforward neural networks, and the 3 identical feedforward neural networks share weights with each other, and three samples are input each time: and x-, x, x+ are candidate samples, like samples and heterogeneous samples respectively. And the characteristic vector of the ebedding layer is L2 distance, three-input distance coding is carried out through three branches, and the network finally outputs d1 and d2 Euclidean distances. If the two distances are too large, then it is considered that x-and x+ are two different classes (the inter-class distance is too large). In the above embodiment, the scene recognition is performed by using the TripletNet, so that the accuracy of the scene recognition is improved.

In the above embodiment, the similar reference value may be expressed by the euclidean distance. After N images are obtained, the images may be subjected to interference reduction processing. And sending the image subjected to interference reduction treatment into a triplenet model. And outputting the historical image, the current image and the subsequent image to the TripletNet model after interference reduction processing. The schematic diagram of determining whether the state of the target scene changes by using the replete model can be seen in fig. 6, as shown in fig. 6, the first two sets of inputs are a history image and a current image, the third frame is a subsequent image, and the target network model determines that the scene changes by calculating the euclidean distance d1 (i.e. the euclidean distance between the history image and the current image), d2 (i.e. the euclidean distance between the current image and the subsequent image), if the distance d2 is significantly greater than d 1.

In the foregoing embodiment, a set of image frames (3 images, i.e., the history frame P1, the current frame P2, and the subsequent frame P3) in the video frames are acquired, each frame is separated by a certain time (the time can be set as required), and the shorter the time set by the separation is, the higher the algorithm accuracy is. After the 3 images are subjected to pavement elimination, algorithms such as semantic segmentation, lane line recognition and the like are used for detecting lane edges and recognizing the edge zone of the road; the disturbance information such as road surface is removed by setting the identified road surface area pixel to 0 (adding a mask). Sending a TripletNet to identify whether a scene changes or not, wherein the identification principle is as follows: the similarity (i.e., euclidean distance) between P2 and P1, P3 is determined, and whether a scene change occurs in the set of image frames is determined based on the degree of difference between the two similarities. The road surface area is preprocessed once, and the scenes which are easy to change, such as the road surface area, are shielded, so that the influence of the lane on the scenes is greatly reduced. Using the processed picture as input, performing scene change recognition through the TripletNet, wherein the TripletNet not only considers the inter-class distance, but also considers the intra-class distance; the judgment result is that the inter-class distance is judged on the basis of the intra-class distance, instead of only considering the inter-class distance, whether the change occurs is determined by a fixed value, and the accuracy of determining the state of the scene is improved.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiment also provides a device for determining a scene state, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 7 is a block diagram of a scene status determining apparatus according to an embodiment of the present invention, as shown in fig. 7, including:

a first determining module 72, configured to determine N images from a sequence of images acquired for a target scene, where N is an integer not less than 3;

a second determining module 74, configured to determine at least two image groups based on N images, where each image group includes two images, and the images included in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups;

a third determining module 76, configured to determine whether the state of the target scene in the image sequence is changed based on the degree of difference of the similar reference values between the classes corresponding to each image group.

In an exemplary embodiment, the target scene includes a traffic scene, and the apparatus may be configured to, after determining N images from the image sequence acquired for the target scene, determine, according to a similarity degree of two images included in each of the at least two image groups, before determining the inter-class similarity reference value corresponding to each image group: identifying lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image; and carrying out interference reduction processing on the lane edge area in each image.

In an exemplary embodiment, the apparatus may implement the interference reduction processing on the lane edge area in each image by: setting a pixel value of each pixel point contained in the lane edge area in the respective images as a target pixel value.

In an exemplary embodiment, the at least two image groups include a first image group including a history image and a current image and a second image group including the current image and a subsequent image; the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image.

In an exemplary embodiment, the time difference between the first acquisition instant and the second acquisition instant is a first time interval; and the time difference between the second acquisition time and the third acquisition time is a second time interval.

In an exemplary embodiment, the second determining module 74 may determine the inter-class similarity reference value corresponding to each image group according to the similarity between the two images included in each of the at least two image groups by: determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image; the third determining module 76 may determine whether the state of the target scene in the image sequence is changed by implementing the degree of difference based on the inter-class similarity reference value corresponding to each image group as follows: based on the difference value of the first inter-class similarity reference value and the second inter-class similarity reference value, it is determined whether a state of the target scene in the image sequence has changed.

In an exemplary embodiment, the third determining module 76 may determine whether the state of the target scene in the image sequence has changed by implementing the difference value based on the first inter-class similarity reference value and the second inter-class similarity reference value as follows: determining a difference value between the first inter-class similarity reference value and the second inter-class similarity reference value; in response to the variance value being greater than a predetermined threshold, a change in state of the target scene is determined.

In an exemplary embodiment, the third determining module 76 may further implement the determining whether the state of the target scene in the image sequence changes based on the difference value between the first inter-class similarity reference value and the second inter-class similarity reference value by: and in response to the difference value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed.

In one exemplary embodiment, the second determination module 74 may implement the determining at least two image groups based on the N images by; and determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups: and inputting the N images into a target network model, and determining a difference value of the first inter-class similarity reference value and the second inter-class similarity reference value based on the target network model.

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.

Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program when executed by a processor implements the steps of the method described in any of the above.

In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

In an exemplary embodiment, the electronic apparatus may further include a transmission device connected to the processor, and an input/output device connected to the processor.

Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining a scene status, comprising:

n images are determined from an image sequence acquired aiming at a target scene, wherein N is an integer not smaller than 3, and the target scene comprises a traffic scene;

determining at least two image groups based on N images, wherein each image group contains two images, and the images contained in different image groups are completely different or partially different; and is combined with

Determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups; and

determining whether the state of the target scene in the image sequence is changed or not based on the difference degree of the similar reference values between the classes corresponding to each image group;

the at least two image groups comprise a first image group and a second image group, wherein the first image group comprises a historical image and a current image, and the second image group comprises the current image and a subsequent image; the first acquisition time of the historical image is earlier than the second acquisition time of the current image, and the second acquisition time is earlier than the third acquisition time of the subsequent image.

2. The method according to claim 1, wherein after determining N images from the sequence of images acquired for the target scene, before determining the inter-class similarity reference value corresponding to each image group according to the similarity of the two images included in each of the at least two image groups, the method further comprises:

identifying lane edges in each image contained in each image group in the at least two image groups, and determining lane edge areas in each image;

and carrying out interference reduction processing on the lane edge area in each image.

3. The method of claim 2, wherein the performing interference reduction processing on the lane edge region in each image includes:

setting a pixel value of each pixel point contained in the lane edge area in the respective images as a target pixel value.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the time difference between the first acquisition time and the second acquisition time is a first time interval;

and the time difference between the second acquisition time and the third acquisition time is a second time interval.

5. The method according to claim 1, wherein determining the inter-class similarity reference value corresponding to each image group according to the similarity of the two images included in each of the at least two image groups comprises:

determining a first inter-class similarity reference value corresponding to the first image group based on the similarity degree of the historical image and the current image; determining a second inter-class similarity reference value corresponding to the second image group based on the similarity degree of the current image and the subsequent image;

the determining whether the state of the target scene in the image sequence is changed based on the difference degree of the similar reference values between the classes corresponding to each image group comprises the following steps:

based on the difference value of the first inter-class similarity reference value and the second inter-class similarity reference value, it is determined whether a state of the target scene in the image sequence has changed.

6. The method of claim 5, wherein the determining whether the state of the target scene in the image sequence has changed based on the difference value between the first inter-class similarity reference value and the second inter-class similarity reference value comprises:

determining a difference value between the first inter-class similarity reference value and the second inter-class similarity reference value;

in response to the variance value being greater than a predetermined threshold, a change in state of the target scene is determined.

7. The method of claim 6, wherein the method further comprises:

and in response to the difference value being less than or equal to the predetermined threshold, determining that the state of the target scene has not changed.

8. The method according to any one of claims 5 to 7, wherein the determining at least two image groups is based on N images; and determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups, wherein the inter-class similarity reference value comprises:

and inputting the N images into a target network model, and determining a difference value of the first inter-class similarity reference value and the second inter-class similarity reference value based on the target network model.

9. The method of claim 8, wherein the target network model comprises a Triplet Net model;

the first inter-class similarity reference value comprises a first Euclidean distance between a feature vector of the historical image and a feature vector of the current image;

the second inter-class similarity reference value includes a second Euclidean distance of the feature vector of the current image and the feature vector of the subsequent image.

10. A scene status determining device, comprising:

the first determining module is used for determining N images from an image sequence acquired aiming at a target scene, wherein N is an integer not smaller than 3, and the target scene comprises a traffic scene;

a second determining module, configured to determine at least two image groups based on N images, where each image group includes two images, and the images included in different image groups are completely different or partially different; determining an inter-class similarity reference value corresponding to each image group according to the similarity degree of the two images contained in each image group in the at least two image groups;

a third determining module, configured to determine whether a state of the target scene in the image sequence changes based on a degree of difference of the inter-class similarity reference values corresponding to each image group;

11. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 9.

12. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 9.