CN115393792A

CN115393792A - Target abnormal state detection method and device and electronic equipment

Info

Publication number: CN115393792A
Application number: CN202210974023.7A
Authority: CN
Inventors: 李元豪; 章合群; 周祥明; 白家男
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2022-11-25

Abstract

The embodiment of the application provides a method and a device for detecting an abnormal state of a target and electronic equipment, relates to the technical field of intelligent city management, and is used for detecting whether an abnormal state occurs in a hard shoulder in time. The method comprises the following steps: acquiring position reference information of a first target and position reference information of a second target in a preset area in an image to be processed; determining the distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target; and if the distance is greater than or equal to a first threshold value, determining that target loss occurs between the first target and the second target, wherein the first threshold value is determined according to the size information of the first target and the second target.

Description

Target abnormal state detection method and device and electronic equipment

Technical Field

The application relates to the technical field of smart city management, in particular to a method and a device for detecting an abnormal state of a target and electronic equipment.

Background

In recent years, governments in various places are continuously building urban expressways connecting different hot spots in urban areas in order to facilitate the driving and going out of citizens. In the urban expressway construction process, in order to avoid citizens from driving into a branch road section which is not completely built, continuous movable isolation piers are usually temporarily arranged at the intersection of a road for reminding vehicles to detour. The piers can also be easily removed when the road is completely built. In addition, the vehicle speed of going on city expressway is very fast, and when the vehicle was out of control or the accident in the proruption on city expressway, flexible hard shoulder can cushion in order to alleviate the injury that personnel and vehicle received to a certain extent.

However, in daily life, some citizens occasionally pass through the road for convenience, and the isolation piers are moved without watching the residents, so that a gap is formed between the continuous isolation piers for passing through the road. This brings great hidden trouble to the safety and quality of road construction and traffic safety. Therefore, how to detect whether the abnormal state of the isolation pier occurs in time needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting an abnormal state of a target and electronic equipment, which are used for detecting whether an abnormal state occurs in a hard shoulder in time.

In a first aspect, an embodiment of the present application provides a method for detecting an abnormal state of a target, including: acquiring position reference information of a first target and position reference information of a second target in a preset area in an image to be processed; determining the distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target; and if the distance is greater than or equal to a first threshold value, determining that target loss occurs between the first target and the second target, wherein the first threshold value is determined according to the size information of the first target and the second target.

Based on the scheme, whether the target is lost or displaced between the targets of the image to be processed is determined by analyzing and judging the distance between every two adjacent targets. The hidden danger caused by the loss or the displacement of the isolation pier to the safety and the quality of the road construction and the traffic safety can be avoided.

In a possible implementation manner, the first target and the second target are adjacent candidate targets among a plurality of targets in a preset area in the image to be processed, or the first target and the second target are candidate targets separated by a preset number of targets among the plurality of targets in the preset area in the image to be processed.

In a possible embodiment, the image to be processed is a current video frame in a video of a captured target, and the method further includes: in response to the distance being greater than or equal to the first threshold, increasing a value of a timing flag by a reference value; the timing mark is used for indicating the number of video frames with target loss in the shot video of the target; judging whether the value of the timing mark is larger than a second threshold value or not, and sending first alarm information in response to the fact that the value of the timing mark is larger than the second threshold value; the first warning information is used for indicating that the target is lost in the shot video of the target.

Based on the scheme, the first warning information is sent when the timing mark is larger than the second threshold value, so that the false warning that the target is lost due to the shielding of vehicles or pedestrians in a short time can be reduced.

In one possible embodiment, the value of the timing flag is set to a base value in response to the distance being less than the first threshold.

Based on the scheme, when the distance between the first target and the second target is smaller than the first threshold value, the situation that the target is lost does not exist in the image to be processed can be determined, so that the timing mark is set as a basic value, and the false alarm that the target is lost due to the fact that vehicles or pedestrians are blocked in a short time is avoided.

In a possible implementation manner, in response to that the value of the timing flag is less than or equal to the second threshold, a video frame next to the current video frame in the video of the shot target is acquired as the image to be processed.

Based on the scheme, when the timing mark is smaller than or equal to the second threshold value, the next video frame is obtained, namely when the target loss cannot be accurately determined, the first warning information is not sent. Therefore, false alarm of target loss caused by short-time vehicle or pedestrian shielding can be reduced.

In a possible implementation manner, the position reference information includes key points of corresponding targets, and the key points are determined based on detection frames of the corresponding targets in the image to be processed.

Based on the scheme, because the arrangement of the isolation piers is not on the same horizontal line, the key points of the targets can be determined according to actual conditions to determine the distance between the targets, so that the determination accuracy can be improved when whether the target loss condition exists in the image to be processed or not is determined.

In a possible implementation manner, the acquiring the position reference information of the first target and the position reference information of the second target in the preset region in the image to be processed includes: determining a first detection frame of the first target and a second detection frame of the second target, wherein the first detection frame and the second detection frame are obtained by performing target detection on the image to be processed through a target detection network; determining a central point of the first detection frame as position reference information of the first target; and determining the central point of the second detection frame as the position reference information of the second target.

Based on the scheme, the detection frames of all targets in the image to be processed can be detected through the target detection network, the targets in the preset area can be determined through the central point coordinates, interference caused by the targets outside the preset area can be avoided, and therefore the accuracy of target detection is improved.

In a possible implementation manner, according to a detection frame of each target in a preset area in the image to be processed, determining an area image of each target in the preset area from the image to be processed; and respectively inputting the area image of each target in the preset area into a target classification network, and determining whether the state of each target is abnormal.

Based on the scheme, the abnormal state of the single isolation pier can be identified through the target classification network. The hidden troubles brought to the safety and quality of road construction and traffic safety due to the damage or breakage of the isolation piers can be avoided.

In a possible implementation manner, the respectively inputting the area images of the targets in the preset area into the target classification network to determine whether the state of each target is abnormal includes: and responding to the targets with the preset states in the targets, and sending second alarm information aiming at the targets with the preset states.

Based on the scheme, the abnormal state of the single isolation pier can be alarmed through the target classification network. The hidden troubles brought to the safety and quality of road construction and traffic safety due to the damage or breakage of the isolation piers can be avoided.

In a possible implementation manner, for any one target in a preset region in the image to be processed, respectively determining a common area ratio of the any one target to each target in a previous frame image according to a detection frame of the any one target and a detection frame of each target in the previous frame image; the previous frame image is a video frame before the image to be processed in the video where the image to be processed is located, and the common area ratio is used for representing the ratio of the area of the intersection part of the detection frames of the two targets to the area of the union part of the detection frames of the two targets; if the previous frame image does not comprise a target with the common area ratio of the any target larger than or equal to a third threshold value, determining that the previous frame image does not comprise the any target, newly establishing a target identifier for the any target, and storing position reference information of the any target in the image to be processed; and if the previous frame image comprises a target with the common area ratio of which to the any one target is greater than or equal to a third threshold value, determining that the previous frame image comprises the any one target, and updating the position reference information of the any one target in the image to be processed.

Based on the scheme, the position reference information of the target existing in the previous frame image can be updated and the target not existing in the previous frame image can be added by matching the target in the image to be processed with each target in the previous frame image according to the common area ratio. Therefore, the position reference information of each target can be more accurately determined to judge whether the target is lost in the image to be processed.

In a second aspect, an embodiment of the present application provides an apparatus for detecting an abnormal state of a target, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring position reference information of a first target and position reference information of a second target in a preset area in an image to be processed;

the processing unit is used for determining the distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target; and if the distance is greater than or equal to a first threshold value, determining that target loss occurs between the first target and the second target, wherein the first threshold value is determined according to the size information of the first target and the second target.

In a possible implementation manner, the image to be processed is a current video frame in a video of a captured target, and the processing unit is further configured to: in response to the distance being greater than or equal to the first threshold, increasing a value of a timing flag by a reference value; the timing mark is used for indicating the number of video frames with target loss in the shot video of the target; judging whether the value of the timing mark is larger than a second threshold value or not, and sending first alarm information in response to the fact that the value of the timing mark is larger than the second threshold value; the first warning information is used for indicating that the target is lost in the shot video of the target.

In a possible implementation, the processing unit is further configured to: in response to the distance being less than the first threshold, setting a value of the timing flag to a base value.

In a possible implementation, the obtaining unit is further configured to: and in response to the value of the timing mark being smaller than or equal to the second threshold value, acquiring a video frame next to the current video frame in the video of the shot target as the image to be processed.

In a possible embodiment, the key point comprises a center point of the corresponding target. Before the obtaining unit obtains the position reference information of the first target and the position reference information of the second target in the preset area in the image to be processed, the processing unit is further configured to: determining a first detection frame of the first target and a second detection frame of the second target, wherein the first detection frame and the second detection frame are obtained by performing target detection on the image to be processed through a target detection network; determining a central point of the first detection frame as position reference information of the first target; and determining the central point of the second detection frame as the position reference information of the second target.

In a possible implementation, the obtaining unit is further configured to: determining the area image of each target in a preset area from the image to be processed according to the detection frame of each target in the preset area in the image to be processed; and respectively inputting the area image of each target in the preset area into a target classification network, and determining whether the state of each target is abnormal.

In a possible implementation manner, the processing unit is further configured to input the area image of each target in the preset area into a target classification network, and determine whether the state of each target is abnormal, further: and responding to the targets with the preset states in the targets, and sending second alarm information aiming at the targets with the preset states.

In a possible implementation, the processing unit is further configured to: respectively determining the common area occupation ratio of any target in the image to be processed and each target in the previous frame image according to the detection frame of the any target and the detection frame of each target in the previous frame image aiming at any target in a preset area in the image to be processed; the previous frame image is a video frame before the image to be processed in the video where the image to be processed is located, and the common area ratio is used for representing the ratio of the area of the intersection part of the detection frames of the two targets to the area of the union part of the detection frames of the two targets; if the previous frame image does not include a target with the common area ratio of which to the any target is larger than or equal to a third threshold value, determining that the previous frame image does not include the any target, newly creating a target identifier for the any target, and storing position reference information of the any target in the image to be processed; and if the previous frame image comprises a target of which the common area occupation ratio with the any one target is larger than or equal to a third threshold value, determining that the previous frame image comprises the any one target, and updating the position reference information of the any one target in the image to be processed.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing computer instructions;

a processor coupled to the memory for executing the computer instructions in the memory and when executing the computer instructions implementing the method of any of the first aspects.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including:

the computer readable storage medium stores computer instructions which, when executed on a computer, cause the computer to perform the method of any of the first aspects.

For each of the second aspect to the fourth aspect and possible technical effects achieved by each aspect, please refer to the above description of the technical effects that can be achieved by the first aspect or various possible schemes in the first aspect, and details are not repeated here.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application.

Fig. 1 is a schematic view of an application scenario of a target abnormal state detection method according to an embodiment of the present application;

fig. 2 is an exemplary flowchart of a method for detecting an abnormal state of a target according to an embodiment of the present application;

fig. 3 is a schematic diagram of a preset area provided in the embodiment of the present application;

fig. 4 is a schematic structural diagram of a target abnormal state detection system according to an embodiment of the present disclosure;

FIG. 5 is a flow diagram of a target detection module provided by an embodiment of the present application;

fig. 6 is a flowchart of an alarm logic determination module provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of an apparatus for detecting an abnormal state of a target according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

The terms "first" and "second" in the embodiments of the present application are used to distinguish different objects, and are not used to describe a specific order. Furthermore, the term "comprises" and any variations thereof are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The "plurality" in the present application may mean at least two, for example, two, three or more, and the embodiments of the present application are not limited.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document generally indicates that the preceding and following related objects are in an "or" relationship unless otherwise specified.

In recent years, city expressways connecting different hot spots in urban areas are continuously built in various places for facilitating the driving and traveling of citizens. In the process of building urban expressway, in order to avoid citizens from driving into a branch road section which is not completely built, continuous and movable isolation piers are usually temporarily arranged at the intersection of the road for warning vehicles to go around. In addition, the vehicle speed of going on city expressway is very fast, and when the vehicle was out of control or the accident in the proruption on city expressway, flexible hard shoulder can cushion in order to alleviate the injury that personnel and vehicle received to a certain extent. However, in daily life, some citizens occasionally move the continuous isolation piers freely to cause a gap to pass when no one looks for convenience of greedy maps, which brings huge hidden dangers to the safety and quality of road construction and traffic safety. Therefore, how to identify abnormal states such as displacement loss, damage, lodging and the like of the isolation pier in time needs to be solved urgently.

In view of this, the present application provides a method for detecting an abnormal state of a target. In the method, the position reference information of the first target and the second target can be respectively determined by detecting the first target and the second target which are included in the preset area in the image to be processed. And determining whether the target is lost in the image to be processed according to the distance between the first target and the second target. When the method is applied to detecting the displacement loss of the isolation pier, the hidden danger brought to road construction and traffic safety due to the loss of the isolation pier can be avoided.

Fig. 1 is a schematic view of an application scenario of a target abnormal state detection method provided in an embodiment of the present application, and includes an acquisition device and a computer device. The acquisition equipment can be various cameras arranged near a road comprising one or more isolation piers and is used for acquiring videos shot by the one or more isolation piers and sending the acquired videos to the computer equipment. And the computer device can be used for storing the video sent by the acquisition device and analyzing the video, such as analyzing the position reference information of one or more isolation piers in a preset area in the image to be processed in the video, and whether the isolation pier is lost in the image to be processed or not.

In specific implementation, the acquisition device and the computer device can be in communication connection through one or more networks. The network may be a wired network or a WIreless network, for example, the WIreless network may be a mobile cellular network, or may be a WIreless-Fidelity (WIFI) network, and of course, other possible networks may also be used, which is not limited in this application.

After introducing an exemplary application scenario of the embodiment of the present application, in order to further explain a technical solution provided by the embodiment of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. In the actual process or the control device, the processes may be executed in sequence or in parallel according to the method shown in the embodiment or the drawings.

Referring to fig. 2, an exemplary flowchart of a method for detecting an abnormal state of a target according to an embodiment of the present application, which may be applied to a computer device shown in fig. 1, may include the following steps:

s201, position reference information of a first target and position reference information of a second target in a preset area in the image to be processed are obtained.

The image to be processed may be a current video frame in a video captured to a target, which is captured in real time by a capturing device as shown in fig. 1. Optionally, when the target state in the offline video is detected by the target abnormal state detection method provided in the embodiment of the present application, the image to be processed may be any video frame in the video where the target is captured.

In some embodiments, the preset area may be a closed area set in advance according to a position where the target may appear in each video frame of the video where the target is captured. For example, since the position of the acquisition device is fixed, and the position of the isolation pier is also fixed in a certain time, the position of the isolation pier in each frame of image is also relatively fixed. Fig. 3 is a schematic diagram of a preset area provided in the embodiment of the present application. Assuming that the image to be processed is as shown in fig. 3, the area inside the dashed line frame in the image to be processed is a preset area, and a single hard shoulder is inside the solid line frame in the preset area.

In one possible case, the image to be processed may include a plurality of objects within a preset region, and the first object and the second object may be adjacent candidate objects among the plurality of objects within the preset region in the image to be processed. For example, assume that the pier a, the pier B and the pier C are included in a preset area in the image to be processed. And the hard shoulder A is adjacent to the hard shoulder B, the hard shoulder B is adjacent to both the hard shoulder A and the hard shoulder C, and the hard shoulder C is adjacent to the hard shoulder B. Thus when the pier a is the first target, the pier B can be the second target. When the hard shoulder B is the first target, both the hard shoulder a and the hard shoulder C may be the second target. When the pier C is the first target, the pier B may be the second target.

In another possible case, the first target and the second target may also be a preset number of candidate targets spaced apart from each other among a plurality of targets within a preset region in the image to be processed. For example, assume that the image to be processed includes a hard shoulder a, a hard shoulder B, a hard shoulder C, a hard shoulder D, and a hard shoulder E in a predetermined area. And the five hard shoulder blocks are arranged in sequence according to the sequence of the hard shoulder block A, the hard shoulder block B, the hard shoulder block C, the hard shoulder block D and the hard shoulder block E. At a preset number of 1, the pier a is the first target, and the pier C may be the second target. Pier D can be the second target if pier B is the first target. If the pier C is the first target, the pier a and the pier E can be the second target, and so on.

In some embodiments, the position reference information includes key points of the corresponding target, wherein the key points may be determined based on a detection frame of the corresponding target in the image to be processed.

In a possible implementation manner, the key point of the corresponding target included in the position reference information may be a certain vertex of the detection box of the corresponding target. For example, the key point of each target in the preset area in the image to be processed may be the top left vertex of the detection frame of each target. The key point of the corresponding target included in the position reference information may also be a middle point of a certain side of the detection frame of the corresponding target. For example, the key point of each target in the preset region in the image to be processed may be a middle point of a short side where a top left vertex of a detection frame of each target is located.

Wherein, the image to be processed has a preset areaThe detection frame of each target in the domain can be obtained by the following method: the computer device can input the image to be processed acquired by the acquisition device into the target detection network. Each target in the image to be processed can be detected through the target detection network, so that a detection frame of each target and the coordinate (x) of the upper left vertex of the detection frame are obtained ₀ ，y ₀ ) And the coordinates (x) of the lower right vertex of the detection frame ₁ ，y ₁ ). That is, the first detection frame of the first target and the second detection frame of the second target in the preset area in the image to be processed, the coordinates of the upper left vertex and the coordinates of the lower right vertex of the first detection frame, and the coordinates of the upper left vertex and the coordinates of the lower right vertex of the second detection frame can be determined by the method.

It should be appreciated that the target detection network may be any one of image or video based target detection or keypoint detection or any one of the derived advanced networks. For example, the target detection network may be yolov3 detection network, which is not limited in this application.

In a possible implementation manner, the key point of the corresponding target included in the position reference information may also be a central point of the detection frame of the corresponding target. The coordinates of the central point of the detection frame of each target in the image to be processed can be determined according to the coordinates of the upper left vertex and the lower right vertex of the detection frame, and can satisfy formula (1). Then, the coordinates of the central point of the detection frames of the plurality of targets with the coordinates of the central point in the image to be processed in the preset area can be determined.

In the formula, x _center Abscissa, y, of the coordinate of the center point _center The ordinate of the coordinate of the center point.

In a possible implementation manner, the training sample set applied in the training process of the target detection network may be obtained by acquiring video frame images in videos including targets in various scenes. And the target in the normal state and the target in the abnormal state in the video frame image can be used as positive samples in the training sample set for training, so that the target detection network can detect all the targets in the video frame image. The abnormal state may include a lodging state, a breakage state, and the like. The various scenes may include different weather scenes such as cloudy days, sunny days, rainy days, etc., and may also include different environmental scenes such as highways, city trunks, etc. The present application is not limited to the scenario.

S202, determining the distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target.

The computer device may determine a distance between the first target and the second target according to a distance between the key point of the first detection box and the key point of the second detection box.

In one example, when the first target and the second target are adjacent candidate targets in a plurality of targets in a preset region in the image to be processed, and the key point is a central point, for the plurality of targets in the preset region in the image to be processed, determining the distance between the first target and the second target may be determining the distance between every two adjacent candidate targets in the preset region in the image to be processed. Therefore, the distance between each two adjacent candidate targets can be determined in turn by sorting in an increasing manner according to the abscissa value of the center point coordinate of each target. For example, the preset area of the image to be processed may include the hard shoulder a, the hard shoulder B, and the hard shoulder C. Wherein the abscissa values of the central point coordinates of the hard shoulder A, the hard shoulder B and the hard shoulder C are sequentially increased. It is therefore possible to determine the distance between the pier a and the pier B, starting from the pier a whose abscissa value of the central point coordinate is the smallest, and then to determine the distance between the pier B and the pier C.

S203, if the distance is larger than or equal to the first threshold value, the target loss between the first target and the second target is determined.

Wherein the first threshold is determined based on size information of the first object and the second object. For example, it may be determined based on the width of the first and second objects.

In one example, the first threshold may be half of the sum of the widths of the first target and the second target, satisfying equation (2).

Wherein W is a first threshold value, x ₀ Is the abscissa, x, of the top left vertex of the detection box of the first target ₁ Is the abscissa of the lower right vertex of the detection box of the first target, then | x ₀ -x ₁ And | is the width of the first target. x' ₀ Is the abscissa, x 'of the top left vertex of the detection frame of the second target' ₁ Is the abscissa of the lower right vertex of the detection frame of the second target, and is | x' ₀ -x′ ₁ ' | is the width of the second target.

In another example, errors may also be allowed in calculating the first threshold based on the widths of the first and second targets. Where the maximum allowable error may be δ, the first threshold may satisfy equation (3). It should be understood that the maximum allowable error may be set according to practical situations or experience, such as 10% or 20%, and the like, and the application is not limited thereto.

In a possible implementation manner, if the distance between the first target and the second target is greater than or equal to the first threshold, the computer device may send the first warning message to a terminal device of a related worker or a computer device for managing a target state. The first warning information is used for indicating that the target is lost in the video of the shot target.

In another possible implementation manner, if the distance between the first target and the second target is smaller than the first threshold, a frame next to a current video frame in the video where the target is captured may be acquired as an image to be processed, and the processes of S201 to S203 are repeatedly performed.

Based on the scheme, whether the target is lost or displaced between the targets of the image to be processed is determined by analyzing and judging the distance between every two adjacent targets. The hidden troubles brought to the safety and quality of road construction and traffic safety by the loss or the displacement of the isolation piers can be avoided.

In some embodiments, to reduce false positives of target loss due to short-time vehicle or pedestrian occlusion, the computer device may also set a timing flag to determine the number of frames the target is lost. When the distance between the first target and the second target is detected to be greater than or equal to the first threshold value in the image to be processed, it is determined that a target loss situation may exist between the first target and the second target, the value of the timing mark may be increased by a reference value, and it is determined whether the value of the timing mark is greater than the second threshold value.

If the value of the timing flag is greater than the second threshold, the computer device may send the first warning information to a terminal device of a related worker or a computer device for managing the target state. And if the value of the timing mark is less than or equal to the second threshold value, acquiring the next frame of the current video frame in the video of the shot target as the image to be processed.

If the distance between the first target and the second target is detected to be smaller than the first threshold value in the image to be processed, the computer device may set the value of the timing flag to the base value. It should be understood that the reference value, the base value and the second threshold value may be set according to experience or practical situations, for example, the reference value may be 1, the base value may be 0, and the second threshold value may be 12, which is not specifically limited in this application.

For example, assume that the pier a and the pier B can be detected in the image to be processed, and the value of the timing flag is 12, the second threshold value is 12, and the base value is 0. And if the distance between the isolation pier A and the isolation pier B is detected to be larger than or equal to the first threshold value, adding 1 to the value of the timing mark, wherein the value of the timing mark is 13. Since the value of the timing flag is greater than the second threshold value at this time, the computer device may send the first warning message to the relevant staff. If it is detected at this time that the distance between the pier a and the pier B is smaller than the first threshold value, the value of the timing flag is set to 0. Since the value of the timing flag is smaller than the second threshold value at this time, the computer device may acquire a next frame of the current video frame in the video in which the target is captured as the image to be processed.

In some embodiments, after determining the detection frame of each target in the preset area in the image to be processed through the target detection network, the computer device may further determine the area image of each target in the preset area from the image to be processed according to the detection frame of each target. And respectively inputting the area images of all the targets into a target classification network so as to determine whether the states of all the targets are abnormal. Among other things, abnormal conditions may include lodging or breakage. The region image of the target may be an image within a region of a detection frame of the target.

In some embodiments, if it is determined that each target in the preset region in the image to be processed includes a target in an abnormal state, the computer device may send, in response to each target in the preset region in the image to be processed including a target in a preset state, second warning information to a terminal device of a relevant worker or a computer device for managing a target state for the target in the preset state. Wherein, the preset state can be lodging and/or breakage. For example, when the preset state is lodging, the computer device may send the second warning message when the object with the lodging state is included in each object. When the preset state is broken, the computer device may send second warning information when the targets in the broken state are included in the respective targets. When the preset state is the lodging and the breakage, the computer device may send the second warning information when the object in the state of the lodging or the breakage is included in each object.

Optionally, for any one target in the preset region in the image to be processed, when the state of the target is determined through the target classification network, the confidence that the target is in each state may be output. And selecting the state with the highest confidence coefficient as the state of the target. Where confidence is a floating point number ranging between 0 and 1. For example, when the state of the hard shoulder a is identified by the target classification network, it may be output that the confidence that the state of the hard shoulder a is normal is 0.8, the confidence that the state of the hard shoulder a is damaged is 0.1, and the confidence that the state of the hard shoulder a is fallen is 0.2. The confidence is the largest when the state of the hard shoulder a is normal, and the state of the hard shoulder a can be determined to be normal.

In one example, since the hard shoulder includes the following two features: 1) The color of the separator is typically red and yellow, and yellow is a superposition of red and green in the RGB color system. 2) As can be seen from fig. 3, normally the upper half of the insulating body of the hard shoulder can identify several hole areas, whereas the falling hard shoulder cannot identify several hole areas. Based on the characteristics of the isolation piers, the target classification network can focus on a part with larger difference by adopting an attention mechanism. The object classification network may therefore employ a classification model that combines RGB channel attention and image region attention.

Based on the scheme, by investigating and analyzing the isolation piers, the embodiment of the application provides a classification network combining color channel attention and image space attention aiming at the color characteristics and the shape characteristics of the isolation body of the isolation pier, so that all the isolation piers in the current frame can be more comprehensively identified, and each isolation pier target can be accurately classified. Therefore, the accuracy of the detection of the isolation pier and the accuracy of the state identification of the isolation pier can be provided.

It should be appreciated that the target classification network described above may also employ any network or any derivative improvement network based on image recognition or classification. This is not a limitation of the present application.

In some embodiments, the training sample set used in the training process of the object classification network may be obtained by capturing video frame images in videos including objects under various scenes. And the targets in the normal state in the video frame images are used as positive samples, the targets in the lodging state and the breakage state are used as samples in different categories to carry out multi-category training on the target classification network, so that the target classification network can accurately identify different states of the targets.

In one possible implementation, the number of samples of the target in the normal state is greater than the number of samples of the target in the abnormal state during the sampling process. Therefore, the number of samples of the target in the abnormal state can be increased by superimposing processing such as noise and rotational conversion on the samples of the target in the abnormal state. The sample number magnitude of the target in the normal state and the sample number magnitude of the target in the abnormal state are kept consistent in the training process of the target detection network and the target classification network, and therefore the identification accuracy of the target detection network and the target classification network is improved.

Based on the same concept of the above method, refer to fig. 4 as a schematic structural diagram of an abnormal state detection system of an object provided in an embodiment of the present application. The system 400 may include a video acquisition module 401, a target detection module 402, a multi-target tracking module 403, and an alarm logic determination module 404.

The video acquisition module 401 is configured to acquire a video of a shot target, and send the acquired video to the target detection module 402. The acquisition method can refer to the related description in the method embodiment shown in fig. 2, and is not described herein again.

The object detection module 402 may be comprised of an object detection network and an object classification network. For detecting and classifying the target in the image to be processed in the video sent by the video acquisition module 401, and sending the integrated target information to the multi-target tracking module 403. The target information may include position reference information of the target and a state of the target.

Fig. 5 is a flowchart of an object detection module according to an embodiment of the present application. The process may include:

s501, inputting an image to be processed.

And S502, detecting the network by the target.

The target detection module 402 may detect a target in the image to be processed through a target detection network, and the detection method may refer to the related description in the method embodiment shown in fig. 2, which is not described herein again.

And S503, judging whether the target is a target in the preset area.

If the target detected in the target detection network is a target in a preset area, executing S504, intercepting the area image of each target in the image to be processed according to the detection frame of each target, and sending the area image of each target to a target classification network; and if the target is not in the preset area, not executing any operation. After the detection frame of each target is determined by the target detection network, whether the target is in the preset area may be determined by the center point coordinate, and the specific method may refer to the related description in the method embodiment shown in fig. 2, which is not described herein again.

S504, the target is classified into a network.

The target detection module 402 may identify the area image of each target through the target classification network, so as to determine the state of each target, and the method for determining the state of the target may refer to the related description in the foregoing method embodiments, which is not described herein again.

And S505, integrating the target information.

For any one target, target information may be integrated according to the position reference information of the target determined by the target detection network and the state of the target determined by the target classification network. Wherein the position reference information includes information of the detection frame.

A multi-target tracking module 403, configured to match each target of the image to be processed and each target in the previous frame of image according to the target information of each target of the image to be processed determined by the target detecting module 402. And the previous frame image is a video frame before the image to be processed in the video where the image to be processed is located.

In one example, for any one object in the image to be processed, the common area occupation ratio (interaction of Union, ioU) of the object and each object in the previous frame image may be respectively determined according to the detection frame of the object in the image to be processed and the detection frame of each object in the previous frame image. Wherein IoU is used to represent the ratio of the area of the intersection part of the detection frames of the two targets to the area of the union part of the detection frames of the two targets.

For example, assume that the image to be processed includes a hard shoulder a and a hard shoulder B, and the image of the previous frame includes a hard shoulder C and a hard shoulder D. For the isolation pier A, ioU of the isolation pier A and the isolation pier C and IoU of the isolation pier A and the isolation pier D can be determined according to the detection frame of the isolation pier A, the detection frame of the isolation pier C and the detection frame of the isolation pier D. Likewise, for the block B, ioU for the block B and the block C, and IoU for the block B and the block D can be determined from the detection frame of the block B, the detection frame of the block C, and the detection frame of the block D.

In another example, for any one object in the image to be processed, the detection frame of the object at the similar region of the detection frame of the object in the previous frame image may also be determined according to the detection frame of the object, and the detection frames in the image to be processed and the detection frames in the previous frame image may also be determined to be IoU of the two objects. Wherein, the similar region may refer to a region whose distance from the detection frame of the target in the image to be processed is within a fourth threshold range. It should be noted that the fourth threshold may be preset according to practical situations or experience, and the present application does not limit this.

In some embodiments, when the multi-target tracking module 403 determines whether each target of the image to be processed and each target in the previous frame image match according to IoU, the following three cases may be included:

the first condition is as follows: and if the previous frame image does not comprise the target which is larger than or equal to the third threshold value IoU of the target, determining that the previous frame image does not comprise the target, namely the target is a new target, establishing a target identification for the target, and storing the position reference information of the target.

Case two: and if the previous frame image comprises the target which is larger than or equal to the third threshold value with the IoU of the target, determining that the previous frame image comprises the target, and updating the position reference information of the target in the image to be processed.

Case three: if an object exists in the previous frame image, and the IoU of the object and any one object in the image to be processed are both smaller than the third threshold, it can be determined that the object is lost in the image to be processed.

In one example, the multi-target tracking module may set status flags for individual targets. The status flags may include a new (Create) flag, an Update (Update) flag, a Lost (Lost) flag, and a deleted (Delete) flag. When a certain target in the image to be processed is as described in the first case, a Create flag may be set for the target identifier corresponding to the target. When a certain target in the image to be processed is as shown in the second case, an Update flag may be set for the target identifier corresponding to the target. When a certain target in the previous frame image is as shown in the third case, a Lost flag may be set for the target identifier corresponding to the target. When the number of frames of the Lost flag maintained by a certain target identifier is greater than the second threshold, a Delete flag may be set for the target identifier, and the target information corresponding to the target identifier may be deleted.

And an alarm logic determining module 404, configured to send an alarm message when the target is in an abnormal state or is lost. Referring to fig. 6, a flowchart of an alarm logic determining module provided in the embodiment of the present application may include:

s601, circularly acquiring information of each target.

And circularly acquiring the matching result of each target in the image to be processed according to the multi-target tracking module 403, and determining the target information and the state mark of each target according to the matching result.

S602, determining whether the target is in a preset area.

If the target is in the preset area, executing S603; if the target is not in the preset area, S605 is executed. The method for determining whether the target is in the preset region may refer to the related description in the embodiment of the method shown in fig. 2, and will not be described herein again.

S603, it is determined whether the state of the target is abnormal.

If the target state is normal, executing S604; if the status of the target is abnormal, S606 is executed.

S604, adding the target into the target set.

The target set is used for storing each target in the image to be processed.

S605, determine whether to traverse all the targets.

Determining whether all the targets in the image to be processed are added into the target set, and if all the targets in the image to be processed are traversed, executing S609; if all the targets in the image to be processed have not been traversed, S601 is executed.

S606, determining whether the state of the target is lodging.

If the state of the target is determined to be lodging, executing S607; if the state of the target is determined to be broken, S608 is executed.

S607, the value of the warning flag is set to 2.

The value of the alarm flag, set to 2, may be used to indicate that the state of the target is lodging.

And S608, setting the value of the alarm flag to be 3.

When the value of the alarm flag is set to 3, it can be used to indicate that the state of the object is broken.

And S609, determining whether the target is lost.

If the target is shifted or lost in the image to be processed, executing S610; if the target is not lost in the image to be processed, S612 is executed. The method for determining whether there is a target loss in the image to be processed may refer to the related description in the embodiment of the method shown in fig. 2, and will not be described herein again.

S610, setting the value of the alarm flag to be 1.

When the value of the alarm flag is set to 1, the alarm flag can be used for indicating that the target is lost in the image to be processed.

S611，t＝t+1。

Where t is used to indicate a timing flag.

And S612, setting the value of the alarm flag to be 0.

When the value of the warning flag is set to 0, the state of the target is normal.

It should be understood that the value of the alarm flag may be set according to actual conditions, for example, when the value of the alarm flag is 7, the state of the target is normal. When the value of the warning flag is 6, the state of the target is lodging. When the value of the warning flag is 5, the state of the object is damaged. And when the value of the alarm mark is 4, the target loss exists in the image to be processed. This is not a particular limitation of the present application.

S613，t＝0。

And S614, judging whether t is larger than a second threshold value.

If t is greater than the second threshold, S615 is performed, and if t is less than or equal to the second threshold, S616 is performed.

And S615, reporting the alarm information.

The values of the alarm flags may correspond to the alarm information one to one, and the alarm logic determination module 404 may send the alarm information corresponding to the values of the alarm flags to the terminal devices of the relevant staff or the computer devices for managing the target state.

And S616, acquiring the next video frame.

And acquiring a next video frame of the image to be processed in the shot video of the target, and taking the next video frame as the image to be processed to continue the process of S601-S616.

Based on the same concept of the above method, referring to fig. 7, for an objective abnormal state detection apparatus 700 provided by the embodiment of the present application, the apparatus 700 can perform the steps of the above method, and details are not described here to avoid repetition. The apparatus 700 comprises an acquisition unit 701 and a processing unit 702. In one scenario:

an obtaining unit 701, configured to obtain position reference information of a first target and position reference information of a second target in a preset region in an image to be processed;

a processing unit 702, configured to determine a distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target; and if the distance is greater than or equal to a first threshold value, determining that target loss occurs between the first target and the second target, wherein the first threshold value is determined according to the size information of the first target and the second target.

In a possible implementation manner, the image to be processed is a current video frame in a video of the captured target, and the processing unit 702 is further configured to: in response to the distance being greater than or equal to the first threshold, increasing a value of a timing flag by a reference value; the timing mark is used for indicating the number of video frames with target loss in the shot video of the target; judging whether the value of the timing mark is larger than a second threshold value or not, and sending first alarm information in response to the fact that the value of the timing mark is larger than the second threshold value; the first warning information is used for indicating that the target is lost in the video of the shot target.

In a possible implementation, the processing unit 702 is further configured to: in response to the distance being less than the first threshold, setting a value of the timing flag to a base value.

In a possible implementation manner, the obtaining unit 701 is further configured to: and acquiring a next video frame of the current video frame in the video of the shot target as the image to be processed in response to the value of the timing mark being less than or equal to the second threshold value.

In a possible embodiment, the key point comprises a center point of the corresponding target. Before the obtaining unit 701 obtains the position reference information of the first target and the position reference information of the second target in the preset region in the image to be processed, the processing unit 702 is further configured to: determining a first detection frame of the first target and a second detection frame of the second target, wherein the first detection frame and the second detection frame are obtained by performing target detection on the image to be processed through a target detection network; determining a central point of the first detection frame as position reference information of the first target; and determining the central point of the second detection frame as the position reference information of the second target.

In a possible implementation manner, the obtaining unit 701 is further configured to: determining the area image of each target in a preset area from the image to be processed according to the detection frame of each target in the preset area in the image to be processed; and respectively inputting the area image of each target in the preset area into a target classification network, and determining whether the state of each target is abnormal.

In a possible implementation manner, the processing unit 702 respectively inputs the area images of the targets in the preset area into a target classification network, and when determining whether the state of each target is abnormal, is further configured to: and responding to the targets with the preset states in the targets, and sending second alarm information aiming at the targets with the preset states.

In a possible implementation, the processing unit 702 is further configured to: respectively determining the common area occupation ratio of any target in the image to be processed and each target in the previous frame image according to the detection frame of the any target and the detection frame of each target in the previous frame image aiming at any target in a preset area in the image to be processed; the previous frame image is a video frame before the image to be processed in the video where the image to be processed is located, and the common area ratio is used for representing the ratio of the area of the intersection part of the detection frames of the two targets to the area of the union part of the detection frames of the two targets; if the previous frame image does not include a target with the common area ratio of which to the any target is larger than or equal to a third threshold value, determining that the previous frame image does not include the any target, newly creating a target identifier for the any target, and storing position reference information of the any target in the image to be processed; and if the previous frame image comprises a target with the common area ratio of which to the any one target is greater than or equal to a third threshold value, determining that the previous frame image comprises the any one target, and updating the position reference information of the any one target in the image to be processed.

Based on the same concept of the above method, referring to fig. 8, a schematic structural diagram of an electronic device provided in the embodiment of the present application is shown, where the electronic device includes at least one processor 802, and a memory 801 connected or coupled to the at least one processor 802, and the electronic device may further include a communication interface 803. Electronic devices may interact with other devices via the communication interface 803. Illustratively, the communication interface 803 may be a transceiver, circuit, bus, module, pin, or other type of communication interface. When the electronic device is a chip-type device or circuit, the communication interface 803 in the electronic device may also be an input/output circuit, which can input information (or called receiving information) and output information (or called sending information), the processor is an integrated processor or microprocessor or an integrated circuit or a logic circuit, and the processor can determine the output information according to the input information.

The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 802 may cooperate with the memory 801 and the communication interface 803. The present application does not limit the specific connection medium among the processor 802, the memory 801, and the communication interface 803.

Optionally, referring to fig. 8, the processor 802, the memory 801, and the communication interface 803 are connected to each other through a bus 840. The bus 800 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

In the present embodiment, the memory 801 is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 801 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 801 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 801 of the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing instructions, computer programs, and/or data.

In an embodiment of the present application, the processor 802 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for detecting an abnormal state in combination with the object disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.

By programming the processor 802, the code corresponding to the target abnormal state detection method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute the steps of the target abnormal state detection method when running.

Specifically, in the embodiment of the present application, the memory 801 stores instructions executable by the at least one processor 802, and the at least one processor 802 may execute the steps included in the foregoing target abnormal state detection method by calling the instructions or the computer program stored in the memory 801. Illustratively, the processor 802 is configured to acquire, by using the communication interface 803, coordinates of a central point of a plurality of targets in a preset area in the image to be processed; a processor 802, further configured to: determining the distance between a first target and a second target according to the center point coordinate of the first target and the center point coordinate of the second target; the first target and the second target are adjacent targets of the plurality of targets.

Further, the processor 802 is further configured to determine that a target loss occurs between the first target and the second target if the distance is greater than or equal to a first threshold, where the first threshold is determined according to widths of the first target and the second target.

Embodiments of the present application also provide a computer-readable storage medium having stored thereon computer instructions, which, when executed on a computer, cause the computer to perform the steps of any of the above-mentioned methods.

In some possible embodiments, the aspects of the method for detecting an abnormal state of the object provided by the present application may also be implemented in the form of a computer program product comprising program code means for causing an electronic device to carry out the steps of any of the methods described above in the present description, when the computer program product is run on the electronic device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

While specific embodiments of the present application have been described above, it will be understood by those skilled in the art that these are by way of example only, and that the scope of the present application is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and principles of this application, and these changes and modifications are intended to be included within the scope of this application. While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of detecting an abnormal state of a target, comprising:

acquiring position reference information of a first target and position reference information of a second target in a preset area in an image to be processed;

determining the distance between the first target and the second target according to the position reference information of the first target and the position reference information of the second target;

and if the distance is greater than or equal to a first threshold value, determining that target loss occurs between the first target and the second target, wherein the first threshold value is determined according to the size information of the first target and the second target.

2. The method according to claim 1, wherein the first target and the second target are adjacent candidate targets among a plurality of targets in a preset area in the image to be processed, or the first target and the second target are candidate targets separated by a preset number of targets among the plurality of targets in the preset area in the image to be processed.

3. The method of claim 1, wherein the image to be processed is a current video frame in a video in which the target is captured, the method further comprising:

in response to the distance being greater than or equal to the first threshold, increasing a value of a timing flag by a reference value; the timing mark is used for indicating the number of video frames with target loss in the shot video of the target; and

judging whether the value of the timing mark is larger than a second threshold value or not, and sending first alarm information in response to the fact that the value of the timing mark is larger than the second threshold value; the first warning information is used for indicating that the target is lost in the video of the shot target.

4. The method of claim 3, further comprising:

in response to the distance being less than the first threshold, setting a value of the timing flag to a base value.

5. The method of claim 3, further comprising:

and in response to the value of the timing mark being smaller than or equal to the second threshold value, acquiring a video frame next to the current video frame in the video of the shot target as the image to be processed.

6. The method according to claim 1, wherein the position reference information includes key points of corresponding targets, and the key points are determined based on detection frames of corresponding targets in the image to be processed.

7. The method according to claim 6, wherein the key point includes a central point of a corresponding target, and the acquiring the position reference information of the first target and the position reference information of the second target in the preset region in the image to be processed includes:

determining a first detection frame of the first target and a second detection frame of the second target, wherein the first detection frame and the second detection frame are obtained by performing target detection on the image to be processed through a target detection network;

determining a central point of the first detection frame as position reference information of the first target; and

and determining the central point of the second detection frame as the position reference information of the second target.

8. The method of claim 6, further comprising:

determining the area image of each target in a preset area from the image to be processed according to the detection frame of each target in the preset area in the image to be processed;

and respectively inputting the area image of each target in the preset area into a target classification network, and determining whether the state of each target is abnormal.

9. The method according to claim 8, wherein the inputting the area image of each target in the preset area into the target classification network respectively, and determining whether the state of each target is abnormal comprises:

and responding to the targets with the preset states in the targets, and sending second alarm information aiming at the targets with the preset states.

10. The method according to any one of claims 6-9, further comprising:

respectively determining the common area occupation ratio of any target in the image to be processed and each target in the previous frame of image according to the detection frame of any target and the detection frame of each target in the previous frame of image; the previous frame image is a video frame before the image to be processed in the video where the image to be processed is located, and the common area ratio is used for representing the ratio of the area of the intersection part of the detection frames of the two targets to the area of the union part of the detection frames of the two targets;

if the previous frame image does not include a target with the common area ratio of which to the any target is larger than or equal to a third threshold value, determining that the previous frame image does not include the any target, newly creating a target identifier for the any target, and storing position reference information of the any target in the image to be processed;

and if the previous frame image comprises a target with the common area ratio of which to the any one target is greater than or equal to a third threshold value, determining that the previous frame image comprises the any one target, and updating the position reference information of the any one target in the image to be processed.

11. An abnormal state detection device of a target, comprising:

12. An electronic device, comprising:

a memory for storing computer instructions;

a processor coupled to the memory for executing the computer instructions in the memory and when executing the computer instructions implementing the method of any of claims 1 to 10.

13. A computer-readable storage medium, comprising:

the computer readable storage medium stores computer instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 10.