WO2022195837A1

WO2022195837A1 - Image analysis device, image analysis system, and image analysis method

Info

Publication number: WO2022195837A1
Application number: PCT/JP2021/011313
Authority: WO
Inventors: 友輔生内; 圭吾長谷川
Original assignee: 株式会社日立国際電気
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2022-09-22
Also published as: JP7478306B2; JPWO2022195837A1

Abstract

Provided are an image analysis device, an image analysis system, and an image analysis method capable of more accurate detection of a shielded object. The present invention comprises an image acquisition unit (301) which acquires a captured image; an image processing unit (303) which outputs an analysis result for an input image acquired by the image acquisition unit (301); and a correction data acquisition unit (302) which acquires data used for a correction condition, wherein the image processing unit (303) discriminates an event occurrence state in which a detection target object is determined to be present in the image on the basis of a detection condition of the detection target object, an event-free state in which the detection target object is determined not to be present in the image on the basis of a non-detection condition of the detection target object, and a neural state in which the detection target object is determined to be shielded by using a correction condition, and a condition of transition from the neutral state to the event occurrence state is set looser than a condition of transition from the event-free state to the event occurrence state.

Description

Image analysis device, image analysis system and image analysis method

The present invention relates to an image analysis device, an image analysis system, and an image analysis method, and more particularly to an image analysis device, an image analysis system, and an image analysis method that analyze captured images and detect objects.

There are many methods for analyzing captured images. For example, the application of AI (Artificial Intelligence) such as Deep Learning (DL) is progressing for tasks such as object recognition, attribute identification, analysis, and prediction using images as input. By incorporating such functions into products and systems, it is possible to reduce labor costs and improve the added value of products by substituting human work. Many companies have high interest in AI because it contributes to sales promotion of products and systems.

AI such as DL often uses supervised learning. Using a set of pairs of input images and expected output signal pairs for a given problem, we train a model that solves the problem of outputting the correct output signal from the input images. The pairs of input images and expected outputs are called training data. In practice, generalization performance is required to output (infer) appropriate results even for unknown input images. A neural network exists as a representative method.

On the other hand, in Patent Literature 1, a display terminal displays specific object information notified from a surveillance camera, and according to an operation instruction from an administrator, a nearby surveillance camera is displayed based on position information/time information from the surveillance camera. A surveillance camera system is disclosed that sends a control signal to an object to track based on specific object information.

Japanese Patent No. 6403784

　In the object recognition process, it is possible to use confidence (a statistical measure of how certain AI's inference results are) to determine detection/non-detection. For example, if the certainty is 60% or higher, it is detected, and if it is less than 60%, it is judged as non-detected.

However, in the conventional image analysis method described above, there are various events in the operational environment that cause a decrease in inference accuracy due to a decrease in the confidence of the object to be detected. The events here include, for example, differences in shooting environment, background, angle of view, and time zone at the time of learning data collection, and existence of objects not included in learning data. Specifically, in the task of object recognition, inference accuracy may be degraded if a detection target object is partially hidden by other objects, and the detection target object may be overlooked.

In addition, when operating with video or live video, it is possible to use the AI inference results of multiple frames instead of a single frame to track objects and determine whether they are detected or not, thereby reducing the possibility of overlooking detection targets. However, even with these methods, it is necessary to satisfy detection conditions and tracking conditions between multiple frames to be determined. For this reason, even if an object has been detected or tracked once, it is assumed that if the object to be detected is blocked by another object and the frames cannot be detected by the learned model, the frames will be missed. be done.

　In order not to miss the occluded detection target object, it is necessary to detect that the detection target object is occluded for frames that cannot be detected by the learned model, and to correct the inference results for detection. However, it is conceivable that if the inference result is only corrected and the detection is determined, the detection state is determined even though the object to be detected is already out of the detection range, and an erroneous alarm occurs. Therefore, it is necessary to detect the shielding of the object to be detected, correct the inference result, and prevent false alarms due to the correction.

In addition, Patent Document 1 discloses a technique for tracking based on specific object information, but does not describe a detection method when the detection target object is shielded as described above.

In view of the above problems, an object of the present invention is to provide an image analysis device, an image analysis system, and an image analysis method that enable more accurate detection of a hidden object.

In order to achieve the above object, one representative image analysis apparatus of the present invention includes an image acquisition unit that acquires a photographed image, and an image that outputs analysis results for the input image acquired by the image acquisition unit. A processing unit and a correction data acquisition unit that acquires data used for correction conditions, and the image processing unit determines that the detection target object exists in the image based on the conditions for detecting the detection target object. a state of occurrence of an event to be detected, a state of no event in which it is determined that the detection target object does not exist in the image based on a condition of non-detection of the detection target object, and a state in which the detection target object is shielded using the correction condition. The condition for transition from the neutral state to the event occurrence state is set looser than the transition condition from the no-event state to the event occurrence state. It is characterized by being

Further, one of the image analysis methods of the present invention is an image analysis method for performing image analysis using a processing device, and includes steps of acquiring a photographed image, acquiring data used for correction conditions, An event occurrence state for determining that the detection target object exists in the image based on the object detection condition, and determining that the detection target object does not exist in the image based on the detection target object non-detection condition. and a neutral state in which it is determined that the object to be detected is shielded using the correction condition, and a transition from the neutral state to the event occurrence state is performed. A condition for transition is set looser than a condition for transition from the no-event state to the event-occurred state.

According to the present invention, an image analysis apparatus, an image analysis system, and an image analysis method can more accurately detect a shielded object.
Problems, configurations, and effects other than those described above will be clarified by the following embodiments.

FIG. 1 is a system configuration diagram showing an example of the image analysis system of the present invention. FIG. 2 is a hardware configuration diagram showing an example of the image analysis apparatus of the present invention. FIG. 3 is a functional block diagram showing an example of the image analysis device of the present invention. 4 is a functional block diagram showing an example of an image processing unit in FIG. 3. FIG. FIG. 5 is a diagram of a first example for explaining an example of the flow of processing when correction conditions are not used. FIG. 5 is a diagram of a second example for explaining an example of the flow of processing when correction conditions are not used. FIG. 7 is a diagram of a first example for explaining the flow of processing according to the first correction condition of the image analysis apparatus of the present invention. FIG. 8 is a diagram of a second example for explaining the flow of processing according to the first correction condition of the image analysis apparatus of the present invention. FIG. 9 is a diagram of a third example for explaining the flow of processing according to the first correction condition of the image analysis apparatus of the present invention. FIG. 10 is a diagram of a first example for explaining the flow of processing according to the second correction condition of the image analysis apparatus of the present invention. FIG. 11 is a diagram of a second example for explaining the flow of processing according to the second correction condition of the image analysis apparatus of the present invention. FIG. 12 is a diagram explaining an example of state transition of the image analysis apparatus of the present invention. FIG. 13 is an example of comparing the results of the image analysis apparatus of the present invention with the case where there is no correction condition. FIG. 14 shows an example of a processing flowchart of the image analysis apparatus according to the present invention.

A form for carrying out the present invention will be described.

(System configuration)
FIG. 1 is a system configuration diagram showing an example of the image analysis system of the present invention.

The analysis server 101, camera 102, and database server 103 are connected via a network 104. A network 104 is a line capable of data communication that connects each server. Any type of line, such as a dedicated line, an intranet, an IP network such as the Internet, etc., does not matter. Video data acquired by the camera 102 is analyzed by the analysis server 101 and the output result is stored in the database server 103 . Note that the configuration in FIG. 1 is an example, and various modifications are possible, such as performing AI inference and image analysis system processing on the camera 102 .

The analysis server 101 can be applied as an image analysis device that analyzes images, and its configuration and processing details will be described later.

For the camera 102, a configuration of a camera in which information is obtained by forming an image of incident light on an imaging device via a lens and an aperture can be applied. Examples of the imaging device here include a CCD (Charge-Coupled Device) image sensor and a CMOS (Complementary Metal Oxide Semiconductor) image sensor. The camera 102 shoots an image at, for example, 3 frames per second (3 fps) or more, and the information is sent to the analysis server 101 and the database server 103 . A plurality of cameras 102 can be installed according to the situation, and can be arranged in various places. For example, it may be installed at a monitoring location as a monitoring camera.

The database server 103 is a device that records images captured by the camera 102, information necessary for processing by the analysis server 101, processing results of the analysis server 101, and the like. As a device for recording, for example, HDD (Hard Disk Drive), SSD (Solid State Drive), DDS (Digital Data Storage), etc., can be applied according to need.

(Hardware configuration of image analysis device)
FIG. 2 is a hardware configuration diagram showing an example of the image analysis apparatus of the present invention. A hardware configuration example of the analysis server 101 will be described with reference to FIG.

The hardware consists of a computer system equipped with a processing unit such as a CPU (Central Processing Unit), and each function is executed. As the processing device, in addition to the CPU, a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), a Graphics Processing Unit (GPU), etc. may be applied. The analysis server 101 includes a processor section 201 , a main storage section 202 , an auxiliary storage section 203 , an input/output interface section 204 , a display interface section 205 and a network interface section 206 , which are connected by a bus 207 . An input/output interface unit 204 is connected to an input device 208 such as a keyboard and mouse to provide a user interface. The display interface unit 205 is connected to the display output device 209 . A network interface unit 206 is an interface for connecting the analysis server 101 and the network 104 .

The auxiliary storage unit 203 is usually composed of a non-volatile memory such as an HDD or flash memory, and stores programs executed by the analysis server 101 and data to be processed by the programs. The main storage unit 202 is composed of a RAM, and temporarily stores programs, data necessary for executing the programs, and the like according to instructions from the processor unit 201 . The processor unit 201 executes programs loaded from the auxiliary storage unit 203 to the main storage unit 202 . Note that the configuration in FIG. 2 is an example, and various modifications are possible.

(Functional block of image analysis device)
FIG. 3 is a functional block diagram showing an example of the image analysis device of the present invention. Functional blocks of the analysis server 101 will be described with reference to FIG.

The analysis server 101 is composed of an auxiliary storage unit 203 , an image acquisition unit 301 , a correction data acquisition unit 302 , an image processing unit 303 , a storage control unit 304 and a display control unit 305 . The image acquisition unit 301 acquires the signal obtained from the auxiliary storage unit 203 as an image. A correction data acquisition unit 302 acquires the signal obtained from the auxiliary storage unit 203 as an image or time-series data. An image processing unit 303 receives the image obtained by the image obtaining unit 301 and the data obtained by the correction data obtaining unit 302, performs AI inference processing and shielding detection processing, and determines the state of the system from the results. A storage control unit 304 performs storage control of the output result using the result of the image processing unit 303 and stores it in the auxiliary storage unit 203 . A display control unit 305 controls the display of the result of the image processing unit 303 and the information stored in the auxiliary storage unit 203 and outputs the information to the display output device 209 .

The image acquisition unit 301 acquires image data from a video signal input from a video storage device or the like in which image data is stored. In order to reduce the influence of noise, flicker, etc., the image data may be subjected to preprocessing such as a smoothing filter, an edge enhancement filter, and density conversion. A data format such as RGB color, YUV, or monochrome may be selected according to the application. Furthermore, in order to reduce the processing cost, the image data may be reduced to a predetermined size.

FIG. 4 is a functional block diagram showing an example of the image processing unit 303 in FIG. Next, functional blocks of the image processing unit 303 will be described with reference to FIG.

The image processing unit 303 is composed of an AI inference processing unit 401 , a shielding detection processing unit 402 , and an inference result correction unit 403 . The AI inference processing unit 401 performs inference processing (eg, object detection) using AI on the input image acquired by the image acquisition unit 301 . The shielding detection processing unit 402 uses the input image acquired by the image acquisition unit 301 and the correction data acquired by the correction data acquisition unit 302 (background image, previous frame image, inference result, etc.) to determine the correction conditions. Determine the presence or absence of the object to be detected. The inference result correction unit 403 uses the results obtained by the AI inference processing unit 401 and the shielding detection processing unit 402 to determine the final determination of the image analysis system, and outputs the result to the storage control unit 304 .

Inference processing by AI is, for example, the process of extracting feature values using neural networks, deep learning, etc., and the degree of certainty can be estimated from the feature values related to the detection target in the image. As a specific example, CNN (Convolution Neural Networks) or the like can be used.

(Process when correction conditions are not used)
5 and 6 are diagrams for explaining an example of the flow of processing when correction conditions are not used. FIG. 5 shows an example of transition from "no event" to "event occurrence". FIG. 6 shows an example of transition from "event occurrence" to "no event".

　Processing when no correction condition is used can determine two states: "no event" (first state) and "event occurrence" (second state). The "no event" state here is a state in which the existence of a person, who is a detection target object, is not confirmed in the image based on the non-detection condition of the detection target object. The “event occurrence” state is a state in which it is determined that a person, who is a detection target object, exists in the image based on the conditions for detecting the detection target object. When operating with video or live video, based on the transition condition that combines the first condition "detection event occurrence condition" and the second condition "detection event continuation condition" using the inference result of AI The detection result as a system is determined by changing the state of the system.

Here, the "detection event occurrence condition" (first condition) is a condition for transitioning from the "no event" state to the "event occurrence" state. The 'detection event continuation condition' (second condition) is a condition for 'event occurrence' to continue from the 'event occurrence' state. In the specific condition examples shown in FIGS. 5 and 6, the “detection event occurrence condition” (first condition) is the case where there are 3 out of 3 consecutive frames with a certainty of 60% or higher when an object is detected. Further, the "detection event continuation condition" (second condition) is a case where one out of three consecutive frames has a confidence of 60% or more at the time of object detection.

　In Figures 5 and 6, object detection processing is applied to consecutive frames in time series. In the example of FIG. 5, the initial state is "no event". In frames 12, 13, and 14 of the image, the degree of certainty of the person surrounded by the detection frame, which is the detection target, changes between 80%, 85%, and 85%. In this case, there are 3 out of 3 consecutive frames with a degree of confidence of 60% or higher when an object is detected, which satisfies the first condition, the "detection event generation condition." Therefore, in frame 14, the state of "no event" is changed to the state of "event occurrence". After that, when the second condition, ie, the condition for continuation of the detection event, is satisfied, the state becomes an event occurrence state.

In the example of FIG. 6, the state is "event occurrence" at first. In frames 22, 23, and 24, the confidence factor of the person to be detected is not obtained. In this case, none of the three consecutive frames has a certainty of 60% or higher when an object is detected, and the second condition, the "detection event continuation condition," is not satisfied. Therefore, in frame 24, the state of "event occurred" is changed to the state of "no event". After that, the state is "no event" until the first condition "detection event generation condition" is satisfied.

(Processing under the first correction condition)
7 to 9 are diagrams for explaining the flow of processing according to the first correction condition of the image analysis apparatus of the present invention. FIG. 7 shows an example of transition from "no event" to "event occurrence". FIG. 8 shows an example of transitions in the order of "event occurrence", "neutral", and "event occurrence". FIG. 9 shows an example of transition from "event occurrence" to "no event". The processing here is performed by the analysis server 101, which is an image analysis device.

In the processing according to the first correction condition in FIGS. 7 to 9, in addition to "no event" (first state) and "event occurrence" (second state), compared to FIGS. , "neutral" (third state). "Neutral" is a state in which it is determined that a person who is a detection target object is shielded. In this process, auxiliary information (results of analysis by tracking technology or the like) is used to correct AI inference results. Furthermore, in addition to the first condition and the second condition, processing is performed by combining the correction condition, which is the third condition. The state of the system is changed based on such transition conditions.

In FIGS. 7 to 9, in the frame in which the object to be detected was last detected during tracking, if the position of the object is within a preset correction range, the "correction condition" (third condition) is satisfied. do. In this case, it is determined that "the object to be detected is shielded" and the inference result is corrected. As for the setting of the correction range here, it is also possible to set the range in advance. For example, if it is a fixed camera, the range can be assumed in advance. Further, the correction range may be calculated from the moving state of the detection target object. It is possible to determine the correction range by estimating that the object to be detected is not out of the screen with the amount of movement up to that frame. For example, if the moving speed of the object to be detected in the previous frames is slow, there is a high possibility that the object will still exist in the screen in the previous frame.

Specific example conditions in FIGS. 7 to 9 will be explained. The "detection event occurrence condition" (first condition) is the case where there are 3 out of 3 consecutive frames with a degree of confidence of 60% or more at the time of object detection. Further, the "detection event continuation condition" (second condition) is a case where one out of three consecutive frames has a confidence of 60% or more at the time of object detection. Also, the "correction condition" (third condition) is satisfied if the position of the object is within a preset correction range in the last frame in which the object to be detected is detected during tracking. Further, when the "correction condition" (third condition) is satisfied, the transition is made to "neutral" (third state). In this case, if the "detection event continuation condition" (second condition) is satisfied within 3 frames, which are the correction frames, the state is changed to "event occurrence", and if the "detection event continuation condition" is not satisfied within 3 frames, " transition to the "no event" state.

　In Figures 7 to 9, object detection processing is applied to consecutive frames in time series. In the example of FIG. 7, the initial state is "no event". In image frames 32, 33, and 34, the degree of certainty of the person surrounded by the detection frame, which is the detection target, changes between 80%, 85%, and 85%. In this case, there are 3 out of 3 consecutive frames with a degree of confidence of 60% or higher when an object is detected, which satisfies the first condition, the "detection event generation condition." Therefore, in frame 34, the state of "no event" is changed to the state of "event occurred".

In the example of FIG. 8, the first frame 41 is in the "event occurrence" state.

Frames

42, 43, and 44 do not have the confidence of the person to be detected, so the confidence is less than 60%. Since the

frames

42 and 43 satisfy the "detection event continuation condition" (second condition), they are "event occurrence". In frame 44, the second condition, ie, the "detection event continuation condition" is not satisfied. Furthermore, the last detected object during tracking is the object 41a in the frame 41 with a certainty of 60% or higher. The object 41a here exists within the set correction range 41b. Therefore, the frame 44 satisfies the "correction condition" (third condition) and transitions to the "neutral" state (third state). In this "neutral" state, if the "detection event continuation condition" (second condition) is satisfied within the correction frame (three frames in FIGS. 7 to 9), the state shifts to the "event occurrence" state. In FIG. 8, in frame 45, the detection target object is detected with a certainty of 80%, so the "detection event continuation condition" is satisfied.

In FIG. 9, in the frame 51, the object 51a does not exist within the set correction range 51b. Therefore, when the "detection event continuation condition" (second condition) is no longer satisfied at frame 54, the "correction condition" (third condition) is not satisfied. That is, at frame 54, since both the second condition and the third condition are not satisfied, the transition is made to the "no event" state without going through the "neutral" state.

(Processing under the second correction condition)
10 and 11 are diagrams for explaining the flow of processing according to the second correction condition of the image analysis apparatus of the present invention. FIG. 10 shows an example of transition in order of "event occurrence", "neutral", and "event occurrence". FIG. 11 shows an example of transition in order of "event occurrence", "neutral", and "no event". The processing here is performed by the analysis server 101, which is an image analysis device.

10 and 11, similarly to the first correction conditions shown in FIGS. A "neutral" (third state) is provided. In this process, auxiliary information (results of analysis by background subtraction technology or the like) is used to correct AI inference results.

The processing under the second correction condition in FIGS. 10 and 11 is an example of correction of AI inference results using background subtraction technology as a correction method. Here, the difference from the background image captured in advance in the frame that no longer satisfies the second condition, the “detection event continuation condition,” is binarized with a certain threshold value. If the difference area is larger than the area of the detection frame of the frame in which the detection target object was last detected, it is determined that "the target object is shielded". In this case, the frame satisfies the "correction condition" (third condition) and transitions to the "neutral" state.

A specific example of conditions in FIGS. 10 and 11 will be explained. The "detection event occurrence condition" (first condition) is the case where there are 3 out of 3 consecutive frames with a degree of confidence of 60% or more at the time of object detection. Further, the "detection event continuation condition" (second condition) is a case where one out of three consecutive frames has a confidence of 60% or more at the time of object detection. The "correction condition" (third condition) is satisfied when the difference area between the frame that no longer satisfies the "detection event continuation condition" and the background image is larger than the detection frame of the last detected object. Further, when the "correction condition" (third condition) is satisfied, the state shifts to "neutral" (third state). In this case, it is determined that "the object to be detected is shielded" and the inference result is corrected. Then, if the "detection event continuation condition" (second condition) is satisfied within 3 frames, which are the correction frames, the state transitions to "event occurrence", and if the "detection event continuation condition" is not satisfied within 3 frames, Transition to the "no event" state.

In the example of FIG. 10, the first frame 61 is in the "event occurrence" state. In frames 62, 63, and 64, the confidence of the person to be detected is not obtained, so the confidence is less than 60%. Since the

frames

62 and 63 satisfy the "detection event continuation condition" (second condition), they are "event occurrence". In frame 64, the second condition, ie, the "detection event continuation condition" is not satisfied. Here, it is determined whether or not the "correction condition" (third condition) is satisfied. First, the difference between the pre-detection image 68, which is a background image captured in advance, and the frame 64 to be determined now is obtained. Frame 64 now shows additional cars and people compared to pre-detection image 68 where only trees were shown. Therefore, the car and person portions correspond to the difference and are displayed in white in the difference image 69 . This white area is larger than the area of the person of the detection target object surrounded by the frame of the frame 61 where the event occurred, because the car and the person are shown. Therefore, it is presumed that there was a large shielding object in front of the person, and it is determined that the "correction condition" is satisfied. This transitions to the "neutral" state. In this "neutral" state, if the "detection event continuation condition" (second condition) is satisfied within the correction frame (three frames in FIGS. 10 and 11), the state shifts to the "event occurrence" state. In FIG. 10, in frame 65, the detection target object is detected with a certainty of 80%, so the "detection event continuation condition" is satisfied, and the state transitions to the "event occurrence" state.

In FIG. 11, the processing of frames 71 to 74 is the same as in FIG. 10, and in frame 74, the "correction condition" (third condition) is satisfied and the state shifts to the "neutral" state. However, since the object to be detected is not detected for two frames after the frame 74, the certainty is less than 60%. Therefore, the "detection event continuation condition" is not satisfied within the correction frame (three frames in FIGS. 10 and 11) in the "neutral" state. Therefore, at frame 75, the transition to the "no event" state is made.

An example of detecting the shielding of a detection target object other than FIGS. 7 to 11 will be described. By inputting the frame image that no longer satisfies the "detection event continuation condition" (second condition) and the frame image that finally detected the detection target object, deep learning may be used to determine whether or not the detection target object is shielded. . Here, if it is determined that the object to be detected is shielded, the same processing as in FIGS. 7 to 11 can be performed on the assumption that the "correction condition" is satisfied. For example, there is a case where a car passes in front of a person who is a detection target object. In addition, it may be determined that "target object is shielded" when the detection frame for target object detection gradually becomes smaller. It is assumed that the detection frame will gradually become smaller when there is a shielding object. At that time, when the detection frame becomes smaller than a predetermined amount, it is determined that the object to be detected is occluded, and the "correction condition" is satisfied, and the same processing as in FIGS. 7 to 11 can be performed. .

(Description of state transition)
FIG. 12 is a diagram explaining an example of state transition of the image analysis apparatus of the present invention.

Arrow 1 indicates that "event occurrence" continues. The condition here corresponds to the "detection event continuation condition" (second condition).

Arrow 2 indicates transition from "event occurrence" to "neutral". The condition here corresponds to the case where the "detection event continuation condition" (second condition) is not satisfied but the "correction condition" (third condition) is satisfied.

Arrow 3 indicates transition from "neutral" to "event occurrence". The condition here corresponds to the case where the "detection event continuation condition" (second condition) is satisfied within the correction frame.

Arrow 4 indicates a transition from "neutral" to "no event". The condition here corresponds to the case where the "detection event continuation condition" (second condition) is not satisfied within the correction frame.

Arrow 5 indicates transition from "event occurrence" to "no event". The condition here corresponds to the case where the "detection event continuation condition" (second condition) and the "correction condition" (third condition) are not satisfied. In this case, the transition is made from "event occurrence" to "no event" without going through "neutral".

Arrow 6 indicates transition from "no event" to "event occurrence". The condition here corresponds to the case where the "detection event occurrence condition" (first condition) is satisfied.

Arrow 7 indicates continuation of "no event". The condition here corresponds to the case where the "detection event occurrence condition" (first condition) is not satisfied.

Here, the "detection event continuation condition" (first condition) is a loose condition that is easier to achieve than the "detection event occurrence condition" (second condition). For example, assuming that the "detection event generation condition" includes M frames out of N consecutive frames with a certainty of X% or more of the detection target object at the time of object detection (N and M are integers and N≧M), the "detection The “event continuation condition” is, for example, the case where there are P out of N consecutive frames (P is an integer and M>P) where the certainty of the detection target object at the time of object detection is Y% or higher. Moreover, the certainty factors of both may be the same, or may be changed. In this case X≧Y is preferred. In this way, the condition can be loosened by lowering the appearance probability of the frame at a certainty or more.

Here, the condition for transitioning from "event occurrence" to "neutral" indicated by arrow 3 is not limited to the above. If it is Therefore, it is possible to set a looser condition than the "detection event generation condition" regardless of the "detection event continuation condition". For example, if the “detection event occurrence condition” is that there are M frames out of N consecutive frames with a certainty of X% or more at the time of object detection (N and M are integers, N≧M), “neutral” to “event The condition for transitioning to "occurrence" is, for example, that there are Q out of N consecutive frames with a certainty Z% or more at the time of object detection (Q is an integer and M>Q). In this case, the condition of arrow 4 is a case where this condition is not satisfied. Moreover, the certainty factors of both may be the same, or may be changed. In this case X≧Z is preferred. In this way, the condition can be loosened by lowering the appearance probability of the frame at a certainty or more.

(Comparative example)
FIG. 13 is an example of comparing the results of the image analysis apparatus of the present invention with the case where there is no correction condition. A comparison between the results of the image analysis apparatus of the present invention (method with correction conditions) and the method without correction conditions will be described using the table of FIG. As a premise, frame no. 1 to frame no. Up to frame no. 15, the object to be detected exists within the detection range on the screen. Assume that the object to be detected in 5 satisfies the correction conditions. In the figure, "O" indicates the occurrence of an event, "X" indicates no event, and "△" indicates a neutral state.

　In the case of no correction condition, frame no. 1, no. 2 is the state of "no event". After that, no. Up to 7, the "detection event continuation condition" is satisfied and the state is "event occurrence". frame no. 8 to frame no. Up to 13, the second condition "detection event continuation condition" is not satisfied. Therefore, frame no. 8 to frame no. Up to 13, the state of the image transits to the state of "no event", and a loss of alarm occurs. no. 14, no. In 15, the "detection event generation condition" is satisfied and the state becomes "event generation".

On the other hand, in the image analysis apparatus of the present invention (method with correction conditions), no. Up to 7, it is the same as the method without correction conditions. On the other hand, frame no. 8 transitions to the "neutral" state. In this state, frame no. At 9, the "detection event continuation condition" (second condition) is satisfied, and the state changes to the "event occurrence" state. This state is no. Continues up to 15. With this, no. 9 to 13 misreports can be prevented.

(flowchart)
FIG. 14 shows an example of a processing flowchart of the image analysis apparatus of the present invention.

First, the processor unit 201 of the analysis server 101 executes the program loaded from the auxiliary storage unit 203 to the main storage unit 202 to activate the image analysis system. The image analysis system may allow the user to check the result on a GUI (Graphical User Interface), or may allow the user to check only the states of "event occurrence", "no event", and "neutral".

In addition, the first condition "detection event occurrence condition", the second condition "detection event continuation condition", the third condition "correction condition", the threshold for object detection, the correction range, etc. are prepared in advance. You can read the configuration file and set it. Alternatively, the user may be allowed to select using a GUI. It should be noted that it is desirable that the "detection event continuation condition" (second condition) is easier to achieve than the "detection event occurrence condition" (first condition). Also, one "correction condition" (third condition) may be used, or a combination of a plurality of conditions may be used for determination.

After starting the system, in step 1001, the user selects an object to be detected and the camera 102 to be used for detection. Note that the type of object to be detected and the number of cameras may be one or more. Hereinafter, an example in which the number of objects and cameras to be detected is one will be described.

At step 1002, the image acquired from the camera 102 is read. Here, the frame rate, image size, and the like of the camera 102 may be set in advance on the camera 102 side, or may be arbitrarily selected by the user through the GUI.

At step 1003, it is determined whether the end command has been executed. When the termination command is executed, the image analysis system is terminated. If not, go to step 1004 . Here, the end command may be a keyboard operation or a GUI operation.

In step 1004, AI inference processing is performed using the read image, and the result is output. Already explained processing can be used for the inference processing of AI here.

At step 1005, the state of the system is determined, and subsequent processing is determined. If the state of the system is "no event", go to step 1006; On the other hand, in the case of the “event occurrence” state or the “neutral” state, the process proceeds to step 1007 .

At step 1006, it is determined whether the AI inference result satisfies the "detection event occurrence condition" (first condition). to read the image of the next frame.

In step 1007, it is determined whether the AI inference result satisfies the "detection event continuation condition" (second condition). move on.

At step 1008, the state of the image analysis system is changed to "event occurrence".

In step 1009, the user is notified of the occurrence of the event. Here, the occurrence notification may be made on the GUI, or the event occurrence notification may be delivered to the small terminal via the network 104 or the like. When the notification of the occurrence of the event is completed, the process advances to step 1002 to read the image of the next frame.

At step 1010, the state of the image analysis system is determined, and subsequent processing is determined. If the state of the image analysis system is "event occurrence", the process proceeds to step 1011; On the other hand, in the case of the "neutral" state, the process proceeds to step 1012 .

In step 1011, it is determined whether or not the "correction condition" (third condition) is satisfied using the background subtraction technique, the tracking technique, etc. If the "correction condition" is satisfied, the process proceeds to step 1013; Proceed to 1015.

In step 1012, it is determined whether or not the "correction condition" is satisfied.If the "correction condition" is satisfied, the process proceeds to step 1013, and if the "correction condition" is not satisfied, the process proceeds to step 1015.

At step 1013, the state of the image analysis system is changed to "neutral".

At step 1014, a warning notification is issued to notify the user of the "neutral" state. Here, the warning notification may be made on the GUI, or the event occurrence notification may be delivered to the small terminal via the network 104 or the like. After completing the warning notification, the process proceeds to step 1002 to read the image of the next frame.

At step 1015, the state of the image analysis system is changed to "no event", and the process proceeds to step 1002 to read the image of the next frame.

The above description of the overall processing is an example of specific operations, and is not limited to this.

(effect)
As described above, in the above embodiment, when a condition looser than the "detection event occurrence condition" (for example, the "detection event continuation condition") is met, the "neutral" state immediately transitions to the "event occurrence" state. made it As a result, by correcting the inference result of AI, it is possible to more accurately perform object detection processing for a detection target object that has been shielded. At the same time, it is possible to prevent erroneous reporting in which the "event occurrence" state continues even though the object to be detected is out of the detection range. This makes it possible to prevent false alarms and false alarms, and to perform image analysis processing with higher accuracy.

Furthermore, by setting a "correction condition" as a condition for shifting to "neutral", it is possible to more accurately estimate that there is a hidden object to be detected. That is, it is possible to prevent erroneous recognition that the object to be detected is out of the screen. The correction conditions at this time can be set to more suitable correction conditions by setting a correction range using a tracking technique or using a background subtraction technique.

Situations in which the present invention is expected to be applied include obstacles that partially or completely hide the object to be detected during operation with moving images or live images. In addition, by recognizing changes in the brightness and color of the object to be detected as shielding due to the shooting environment, it can be applied to environments such as outdoors where brightness and color tend to change.

As described above, the embodiments of the present invention have been described, but the present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the configurations described. Moreover, it is possible to add, delete, or replace part of the configuration of each embodiment with another configuration.

For example, in the above embodiment, it was explained that inference processing by AI was used. However, a tracking image analysis method other than using inference processing by AI can also be applied to the above embodiment. In this case, the object to be detected detected by this image analysis may be handled in the same way.

Also, although the analysis server 101 and the camera 102 have been described as separate members, they may be integrated. That is, it can be configured as an image analysis apparatus having the function of the analysis server 101 on the side of the camera 102 without going through the network 104 .

101... Analysis server 102... Camera 103... Database server 104... Network 201... Processor unit 202... Main storage unit 203... Auxiliary storage unit 204... Input/output interface unit 205... Display interface unit 206... Network interface unit 207 Bus 208 Input device 209 Display output device 301 Image acquisition unit 302 Correction data unit 303 Image processing unit 304 Storage control unit 305 Display control unit 401 ... AI inference processing unit, 402 ... shielding detection processing unit, 403 ... inference result correction unit

Claims

An image acquisition unit that acquires a captured image, an image processing unit that outputs analysis results for the input image acquired by the image acquisition unit, and a correction data acquisition unit that acquires data used for correction conditions,
The image processing unit determines an event occurrence state that the detection target object exists in an image based on a detection target object detection condition, and determines the detection target object non-detection condition based on the detection target object detection condition. is not present in the image, and a neutral state is determined by using the correction condition to determine that the detection target object is shielded, and the event occurrence is determined from the neutral state. wherein a condition for transition to the state of is set looser than a condition for transition from the no-event state to the event-occurred state.
In the image analysis device according to claim 1,
the condition for continuing the event occurrence state is set looser than the condition for transitioning from the no-event state to the event occurrence state;
The image analysis apparatus, wherein the transition from the event occurrence state to the neutral state is made when the correction condition is satisfied when the event occurrence state cannot be continued.
In the image analysis device according to claim 2,
The image analysis apparatus, wherein the correction condition is a condition that is satisfied when the position of the detection target object is within a preset correction range in a frame of an image in which the detection target object is last detected.
In the image analysis device according to claim 2,
The correction condition is a condition that is satisfied when the difference area between the frame of the image that no longer satisfies the event occurrence state and the background image is larger than the detection frame of the detection target object in the frame in which the detection target object was last detected. An image analysis device characterized by:
A camera and the image analysis device according to claim 1 for acquiring an image taken by the camera,
An image analysis system, wherein the camera and the image analysis device are capable of communicating via a network.
An image analysis method for performing image analysis using a processing device,
obtaining a captured image;
a step of acquiring data used for correction conditions;
An event occurrence state in which it is determined that the detection target object exists in the image based on a detection target object detection condition, and the detection target object does not exist in the image based on the detection target object non-detection condition. and a neutral state in which it is determined that the detection target object is blocked using the correction condition,
An image analysis method, wherein a condition for transition from the neutral state to the event occurrence state is set looser than a condition for transition from the no-event state to the event occurrence state.
In the image analysis method according to claim 6,
the condition for continuing the event occurrence state is set looser than the condition for transitioning from the no-event state to the event occurrence state;
An image analysis method, wherein transition from the event occurrence state to the neutral state is performed when the correction condition is satisfied when the event occurrence state cannot be continued.