WO2022059223A1 - Video analyzing system and video analyzing method - Google Patents

Video analyzing system and video analyzing method Download PDF

Info

Publication number
WO2022059223A1
WO2022059223A1 PCT/JP2021/005097 JP2021005097W WO2022059223A1 WO 2022059223 A1 WO2022059223 A1 WO 2022059223A1 JP 2021005097 W JP2021005097 W JP 2021005097W WO 2022059223 A1 WO2022059223 A1 WO 2022059223A1
Authority
WO
WIPO (PCT)
Prior art keywords
interaction
person
monitoring
video
area
Prior art date
Application number
PCT/JP2021/005097
Other languages
French (fr)
Japanese (ja)
Inventor
良起 伊藤
健一 森田
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2022059223A1 publication Critical patent/WO2022059223A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention relates to a video analysis system and a video analysis method that detect a person's state or an object from an image of a surveillance area and detect a monitoring target based on the detection result.
  • the delivery act in the monitoring area is detected by calculating the information regarding the posture of the human body from the image taken in the monitoring area. Further, the monitoring importance is calculated using the position of occurrence and the type of the delivered article, and the detection result of the delivery act is output according to the monitoring importance. Further, regarding the output of the detection result of the delivery act, a method of notifying the "monitor of the monitoring center" by a screen display, an alarm lamp, an alarm sound, or the like is described.
  • a plurality of videos distributed from a plurality of video distribution devices are displayed for the purpose of reducing the burden of monitoring the video and making it easier to capture the moment when a thing occurs.
  • Acquisition means for acquiring object position detecting means for detecting the positions of a plurality of objects included in each of a plurality of images, direction detecting means for detecting the orientation of each object at the positions of a plurality of objects, and orientation of each object.
  • an image processing apparatus having a setting means for setting a priority for each of a plurality of images based on the above, and a display means for displaying a plurality of images based on the priority.
  • the monitoring importance is set for each of a plurality of persons who have interacted with each other in the monitoring area, and the persons with high monitoring importance are accurately narrowed down, thereby reducing the work load of the responder in monitoring and the system. It is an object of the present invention to provide a video analysis system capable of reducing the processing load of the above.
  • the video analysis system as one aspect of the present invention is an image analysis system that detects an event in the surveillance area by using an image captured in the surveillance area, and is an event caused by the involvement of a plurality of persons based on the image.
  • An interaction detector that detects an interaction and outputs the type of the interaction and the direction of the interaction indicating how each of the plurality of persons interacted with another person in the interaction, the type of the interaction, and the type of the interaction.
  • Based on the monitoring importance determination unit that determines the monitoring importance of each person for a plurality of persons involved in the interaction by comparing the direction with the preset monitoring reference information, and the monitoring importance.
  • a video analysis system including an output control unit that outputs the detection result of the event.
  • the present invention detects an interaction, which is an event caused by the involvement of a plurality of people, based on the image.
  • the interaction detection step that outputs the type of the interaction and the direction of the interaction indicating how each of the plurality of persons interacted with another person in the interaction, and the type and direction of the interaction are preset.
  • the monitoring importance determination step for determining the monitoring importance of each person for a plurality of persons involved in the interaction by comparing with the monitoring reference information, and the detection result for the event based on the monitoring importance.
  • the present invention it is possible to set the monitoring importance for each of a plurality of persons who have interacted with each other in the monitoring area, and to reduce the work load of the responder in the monitoring and the processing load of the system.
  • the "event” in the present embodiment is a situation preset as a detection target in a certain monitoring area.
  • an interaction which is an event involving a plurality of persons, is targeted for detection.
  • actions such as shaking hands, handing over luggage, rubbing, and assault are included in the interaction.
  • FIG. 1 is an explanatory diagram of the video surveillance system according to the present embodiment.
  • the video monitoring system 1 is roughly classified into a shooting system 2, a video analysis system 3, and a monitoring center system 4.
  • the photographing system 2 is composed of a camera unit installed in a monitored area.
  • the image analysis system 3 the interaction between the persons to be detected and the attributes of the person are determined by analyzing the input image from the image pickup device, and further, the information on the generation position and the information on the direction of the interaction are obtained. , Determine the monitoring importance of each person by comparing with the preset monitoring standard information.
  • the monitoring center system 4 receives the analysis result from the video analysis system 3, effectively displays it to the observer and the on-site staff, and performs a search after an event related to an interaction or a person occurs.
  • the direction of interaction indicates from which person to which person a predetermined action related to the interaction is performed.
  • the direction of interaction is set from the person who delivered the article (executor of delivery) to the person who received it (executor of delivery).
  • the direction of interaction is set from the perpetrator (perpetrator of the assault) to the victim (perpetrator of the assault).
  • the direction of interaction is determined for each type of interaction.
  • interactions that take place in both directions such as shaking hands and rubbing.
  • the attributes of a person include whether they are ordinary people or security guards, age, and gender.
  • the video monitoring system 1 sets the monitoring importance for each person by using the direction of interaction and the attributes of the person involved in the interaction, and realizes reduction of the work load of the responder in monitoring and reduction of the processing load of the system. is doing.
  • the weight of the monitoring importance of a plurality of persons who have delivered and received is not taken into consideration.
  • the article is an article requiring attention in the monitoring area, it is desirable that the monitoring importance of the person who delivered the article is set higher than that of the person who delivered the article.
  • the monitoring importance is judged differently between the transfer from the low security area to the high security area and the opposite direction.
  • the video monitoring system 1 realizes accurate determination of monitoring importance by using information on not only the type of interaction and its occurrence position but also the direction and attributes of the action.
  • FIG. 2 is a diagram showing the overall configuration of the video surveillance system according to the present embodiment.
  • the photographing system 2 is composed of one or a plurality of camera units 21 installed in the monitoring target area, and the captured images are sequentially input to the image input unit 31 of the image analysis system 3.
  • the camera unit 21 is a surveillance camera arranged so that the entire area to be monitored can be imaged. If the area setting for detecting the interaction is not required, the surveillance camera may be a mobile camera that is not fixed, and the format does not matter as long as the area to be monitored can be imaged.
  • the camera unit 21 and the video input unit 31 are connected by a wired communication means or a wireless communication means, and a frame image is continuously transmitted from the camera unit 21 to the video input unit 31.
  • the interaction recognition is a time-series data analysis model that assumes the input of a plurality of frame images
  • it is desirable that the frame rate of continuous transmission of the frame images is equal to or higher than the required value for the interaction recognition.
  • the frame rate may be lower than the required value.
  • processing for suppressing the decrease in accuracy such as interpolation by interpolation or extrapolation of time series data, may be performed.
  • the camera unit 21 and the image analysis system 3 do not have to have a one-to-one correspondence, and may be used as one image analysis system with a plurality of camera units. Even in the case of executing such multiple processes, the frame rate required by each process is subject to the above-mentioned restrictions.
  • the camera unit 21 may be equipped with a part or all of the functions of the video analysis system described later.
  • the video analysis system 3 is composed of a video input unit 31, a video processing unit 32, and a storage unit 33.
  • the video input unit 31 receives video input from the camera unit 21 and transmits video data to the video processing unit 32.
  • the video to be analyzed may not be a video directly input from the camera unit 21, but may be a video in a separately stored recorder.
  • the storage location of the video does not matter.
  • the video processing unit 32 reads the monitoring reference information stored in the storage unit 33, which will be described later, and analyzes the video input from the video input unit 31 to determine the monitoring importance of each individual who has interacted with the video processing unit 32.
  • the storage unit 33 stores the monitoring reference information set in the management control unit 43, which will be described later.
  • the monitoring reference information is used for determining the monitoring importance, which is the output of the video processing unit 32.
  • the video analysis system 3 is not limited to the on-premises type system constructed on the server in the operation facility, but may be constructed on the server outside the facility such as by utilizing the cloud service. ..
  • the monitoring center system 4 is composed of a recording unit 41, a video display unit 42, a management control unit 43, and a search unit 44.
  • the recording unit 41 has a function of holding information such as the generated interaction, the direction of the interaction, the person attribute, the generated area, and the generated time obtained by the video analysis by the video analysis system 3 as a database.
  • the video display unit 42 displays information about the behavior of the person who has performed the interaction at the current time and a part or all of the frame at the time of detecting the interaction according to the monitoring importance.
  • the management control unit 43 has a function of inputting setting information to the storage unit 33 by a watchman, a field staff, or the like in order to store the monitoring reference information used by the video processing unit 32.
  • the search unit 44 has a function of searching for the corresponding person from the information stored in the recording unit 41 using the attribute and interaction type of the person as a query, and the position of the person at the current time and the movement within the facility up to that point. It has a function to check information such as trajectories.
  • FIG. 12 is a hardware configuration diagram of the video surveillance system according to the present embodiment.
  • the camera unit 1102 is connected to the computer 1103 via a network. Further, the computer 1103 can communicate with the computer 1104 via the network.
  • the computer 1103 includes a CPU (Central Processing Unit) as an arithmetic control device, a RAM (Random access memory) as a main storage device, and an HDD (hard disk drive) as an auxiliary storage device.
  • the computer 1103 realizes the function as the video analysis system 3 by reading various programs from the HDD, developing them in the RAM, and executing them by the CPU. Further, the computer 1103 communicates with the camera unit 1102 and the computer 1104 via a predetermined communication interface (IF).
  • IF predetermined communication interface
  • input / output devices such as a keyboard and a display are also connected to the computer 1103 via a predetermined IF.
  • the computer 1104 includes a CPU as an arithmetic control device, a RAM as a main storage device, and an HDD as an auxiliary storage device. Various programs are read from the HDD, expanded into the RAM, and executed by the CPU to execute the monitoring center system 4. To realize the function as. Further, the computer 1104 is connected to the computer 1103 and an input / output device such as a keyboard and a display via a predetermined interface (IF).
  • IF predetermined interface
  • FIG. 3 is a diagram showing a block diagram of the video analysis system according to the present embodiment.
  • the video input unit 31, the video processing unit 32, and the storage unit 33 that constitute the video analysis system 3 will be described.
  • the video input unit 31 sequentially receives images from one or more camera units 21 and outputs the images to the subsequent image processing unit 32.
  • the input may be an image.
  • the video processing unit 32 includes a calculation unit 321, an interaction detection unit 322, a monitoring importance determination unit 323, and an output control unit 324.
  • the calculation unit 321 is further composed of a person detection unit 3211 and an attribute determination unit 3212.
  • the person detection unit 3211 detects a person from the still image of the current frame using the image or video received from the video input unit.
  • a means for determining by using a Haar-like feature a means for determining by using R-CNN (Regions with CNN), or a means for estimating a skeleton coordinate group estimated for each person using a skeleton estimation means is used.
  • R-CNN Registered with CNN
  • the person detection unit 3211 performs person tracking after the person is detected. In person tracking, a rectangular image of a person and a person ID assigned to that person may be associated with each other in front and back frames, and a general person tracking method such as template matching or optical flow may be used. ..
  • the rectangular image of the person obtained by the person detection unit is input to the attribute determination unit 3212, and the attribute of the person is determined.
  • the attributes of a person it can be used as information that contributes to the determination of the monitoring importance of each individual. Further, the attribute can be used as the query in the search unit 44 described above.
  • attributes include security guards and staff in the facility, general facility users, age and gender, and the like. If it is possible to distinguish between the security guards and staff in the facility and general facility users, it is assumed that the interaction caused by the security guards and staff is an action performed within the scope of their duties, so the importance of monitoring is not set. , The effect of not causing extra alarm can be expected.
  • a rectangular image of a person is imaged as HOG (Histograms of Oriented Radiants), SIFT (Scale-Invariant Features Transfer Form), or a vector output from the middle layer of a trained deep learning model network.
  • HOG Hemograms of Oriented Radiants
  • SIFT Scale-Invariant Features Transfer Form
  • the determination means may be constructed as the first stage as an acceptor for classifying security guards and field staff and general visitors, and as the second stage as an acceptor for determining age and gender among general visitors, or the same. It may be learned in the acceptors of. Furthermore, by separately using a means for determining the goods owned by a person, it is also possible to give an attribute to express the prohibited goods or dangerous goods to the person determined to be possessed. Is effective for.
  • the interaction detection unit 322 determines the presence / absence, type, and direction of the interaction using the information obtained from the person detection unit 3211.
  • the determination method there are means for using the image feature amount and means for using the skeleton information for any pair of people as described above.
  • a feature amount representing a person's posture calculated from the estimation result of the skeleton by the person skeleton detecting means, a feature amount calculated from a relative distance between arbitrary skeleton points of a certain person pair, or A time-series feature amount that expresses the amount of movement of the skeleton per unit time or the amount of change in relative distance from the previous and next image frames may be used.
  • the attribute information obtained from the attribute determination unit 3212 may be used as a feature amount, and for example, a feature amount expressing age or gender may be used.
  • a feature amount expressing age or gender may be used.
  • a feature amount expressing an article possessed by the person may be used. For example, if an article whose ownership is determined by one person is determined to be owned by another person after a certain period of time, it can be interpreted that a delivery act has occurred when the ownership determination is switched.
  • These features may be used not only alone but also in combination. For example, if the interaction is determined using only the features related to the posture, the interaction may be erroneously detected even between people located at a long distance. However, by using the features related to the relative distance together, the erroneous detection may occur. The number can be reduced. As described above, effective interaction detection can be performed by using the feature amount alone or in combination.
  • a means using a method based on CNN can be mentioned.
  • a means for learning a discriminator such as an SVM, a decision tree, or an LSTM (Long Short-Term Memory) based on a feature amount representing a posture, a relative distance, an attribute, or the like can be mentioned.
  • the monitoring importance determination unit 323 includes the attributes of the person obtained from the attribute determination unit 3212, the type and direction of the interaction obtained from the interaction detection unit 322, and the person obtained from the person detection unit 3211 in the embodiment of the present embodiment.
  • the location information is also input, and by collating with the information set in the monitoring reference information 331, the monitoring importance for each individual who has interacted is set.
  • the calculation of the area information can be determined by collating the preset information of the area with the information of the person rectangle. For example, if the area is set on the image coordinates in a camera with a certain PTZ setting, the area where the person is located can be determined depending on which area the estimated position of the person's feet is located on.
  • the output control unit 324 transmits the monitoring importance of each individual determined by the monitoring importance determination unit 323 to the monitoring center system 4. All events for which monitoring importance has been calculated may be transmitted, or thresholds may be preset so that only those with high monitoring importance are transmitted.
  • the storage unit 33 stores the monitoring reference information 331 for use in the monitoring importance determination unit 323.
  • the monitoring standard information 331 has three types of security level setting information: an interaction security level set for each interaction type, an attribute security level set for each attribute type, and an area security level set for each area type. Further, it has the weight information of each of those security level settings and the weight information of the performer or the executed person set for each interaction type.
  • the monitoring reference information 331 can be set from the management control unit 43.
  • step S1 When an image is input from the photographing system to the image analysis system in step S1, person detection is performed in step S2. Next, the number of people is measured in step S3. If two or more people are detected in the screen, the process proceeds to step S4, and if only one person or less is detected, the process after step S4 is not performed and the input of the next frame is waited for. Then, the process returns to step S1.
  • a mask process is partially performed on the monitored area immediately before step 2 in order to reduce the amount of calculation. You may.
  • step S4 the interaction is determined.
  • the determination is performed on any pair of people on the screen, but it is preferable not to perform the determination process on the pair of people over a certain distance for each interaction type in order to reduce the amount of calculation.
  • the relative distance between people before determining the action it is necessary to calculate the relative distance in the world coordinate system. Therefore, the position of a person in the world coordinate system is estimated by using preset area information or by using a depth estimation technique using a stereo camera or a monocular camera, etc., without performing preset settings, and the distance between the persons is estimated. Calculate the relative distance.
  • a setting table is prepared separately, and the threshold value for judgment is set collectively. For example, when the threshold value is set to "3 m", the action determination process is not performed for the person pair whose relative distance in the world coordinate system exceeds the threshold value of "3 m". If the discriminator is not a multi-class classifier that supports recognition of various types of interaction, but a two-class classifier that is learned for each interaction type, the threshold value can be set for each interaction type. In addition, when detecting an interaction that crosses an area boundary, it is also possible to reduce the amount of calculation and determine the interaction between people located in different areas without performing the interaction determination between people located in the same area. It is suitable for reducing false positives.
  • step S6 the processing after step S6 is not performed, and the input of the next frame is waited for and the process returns to step S1.
  • steps S6 to S8 the attribute is calculated for each person who caused the event.
  • steps S9 to S11 the monitoring importance is determined for each detected event. After controlling the output of the determined monitoring importance in step S12, the process returns to step S1 after waiting for the input of the next frame.
  • the flow processing shown in this figure does not necessarily have to be processed by a single process, and may be processed asynchronously using a plurality of processes in order to improve calculation efficiency.
  • the monitoring standard information in the embodiment of this embodiment is composed of the security level setting information in Table 51, the weight for the security level setting target in Table 52, and the performer weight for each interaction type in Table 53. From Tables 51 and 52, the weighted points for each individual are calculated in consideration of the interaction type, the person attribute, and the occurrence area, and further, the monitoring importance is calculated using the weighted points and the performer weight in Table 53. do. These pieces of information are set by the management control unit 43, stored in the monitoring reference information 331, and read by the monitoring importance determination unit 323. The settings and effects of each table will be described in detail below.
  • Table 51 is composed of three types of security level setting tables: Table 511 for setting the security level for each interaction type, Table 512 for setting the security level for each attribute type, and Table 513 for setting the security level for each area type.
  • the security level is set in 4 stages from 3 points to 0 points, and is set to "high level", “medium level”, “low level”, and "no level” in descending order of points. .. High level indicates the object with the highest attention level, and no level indicates the object that does not require attention.
  • the importance of each interaction type is set, for example, 1 point for "delivery” and 3 points for "assault".
  • each security level items that are not subject to setting are explicitly set as level 0.
  • items that are not subject to setting are explicitly set as level 0.
  • "handshake” and "hug” are set as level 0.
  • each item in Table 511, Table 512, and Table 513 is scored in four stages from 3 points to 0 points, but the class number is not limited to the present embodiment, and the setter It is desirable that can be set freely.
  • Table 52 is a setting table that stores weights for each of the three security level setting targets.
  • the sum of the weights of the three types is set to 100%, the interaction is set to 30%, the attribute is set to 20%, and the area is set to 50%.
  • the interaction can be set to 70%, the attribute can be set to 30%, and the area can be set to 0%.
  • the weighted points of the individual may be set to 0 points. For example, if a person whose attribute is determined to be a "security guard" engages in any of the interactions, it is considered that the interaction was performed within the scope of the job, and it is generally not possible to monitor the "security guard". This is because it is considered appropriate.
  • the above formula is an example of a formula for calculating weighted points, and different formulas may be used.
  • Table 53 is a setting table for calculating the monitoring importance in consideration of the direction of interaction with respect to the weighted points calculated from Tables 51 and 52.
  • the "delivery" act is set to have a weight of 20% for the performer, that is, the person who delivered the goods.
  • the weight for the person to be executed that is, the person to whom the article is delivered is set to 80%.
  • a setting in which the executor has less weight, as in this example, means that the executor is more important than the executor. In the act of "delivery", it is assumed that the person who received the item is more important than the person who gave up the item.
  • the "frustration" act which is a two-way interaction, is set to 50% in Table 53 so that the weights of the performer and the person to be executed are equivalent.
  • the weight of the performer is set to 90% for the "assault” act in which the direction of the perpetrator who is the performer and the victim who is the victim is clear.
  • the weighted points given to the "general / youth" when the "general / youth” performs the "delivery” act "inside the entrance gate” are described above.
  • the monitoring importance calculated for multiple events is added or the value of a large event continues until the monitoring importance value for that person is reset. It is desirable to continue to be adopted.
  • a threshold value can be set for the monitoring importance for the purpose of suppressing the number of events transmitted to the monitoring center system 4. For example, if the threshold value is set to 2.0, the monitored person calculated to have a monitoring importance of 1.5 is not transmitted, and the individual calculated to be 3.0 is transmitted.
  • a person to be notified such as a field staff or a security guard, may be designated according to the score of monitoring importance.
  • FIGS. 6 and 7 are views showing an example of a setting screen for monitoring reference information according to the present embodiment. Further, FIG. 6 is a setting screen for creating the table of Table 51, and FIG. 7 is a setting screen for creating the tables of Table 52 and Table 53.
  • FIG. 6 is a GUI for setting the interaction type, the attribute of the person, and the security level regarding the area, and in particular, the setting of the interaction security level shown in the area 611 will be described below.
  • the number of stages of interaction security level in region 611 is three stages from 1 point to 3 points, but as described above, the number of stages is not limited to this embodiment and can be freely set by the setter. Is desirable. It is also desirable that the size of the security level can be freely set by the setter. Further, for an interaction for which a security level is not set, the importance may be specified as 0 point.
  • the setter can press the pull-down column shown in the area 6111 and select the interaction registered from the list for each security level column.
  • the area 62 in FIG. 7 is a GUI for setting the weight of the security level setting target, and Table 52 is set in the setting of this area.
  • Region 621 sets weights for interactions, attributes, and areas as a percentage.
  • Area 63 is a GUI for setting the performer weight for each interaction type, and Table 53 is set in the setting of this area.
  • the weight on the performer side is set as a percentage for each interaction type.
  • the weight for the person to be executed may be automatically calculated and displayed by inputting the "executor weight".
  • the input target may be the person to be executed.
  • the interactions required to be set in the area 631 are all actions registered in the area 611 and given a score larger than 0 points, and the corresponding line is automatically added at the same time as the registration by pressing the area 6115. .. By pressing the area 632 "Save settings" button, the information set in this area is reflected in the monitoring reference information 331.
  • the area 633 may be pressed.
  • a display calling attention to the deficiency of the registered items at the end of the entire setting screen It is desirable to implement such as outputting.
  • FIG. 8 is a diagram showing a display example in the video display unit 42 in the present embodiment, and the area 7 shows an output screen.
  • the area 7 may be displayed on the entire notification screen or may be displayed on a part of the notification screen.
  • the person displayed on the screen A (area 71), the screen B (area 72), the screen C (area 73), and the screen D (area 74) has a monitoring importance set, and the monitoring center system.
  • an example of displaying an image at the current time for all of these persons is shown.
  • the video used in this display is a real-time video based on the tracking of the person who caused the event.
  • the person tracking is performed by the person detection unit 3211 as described above, and the video is transmitted to the video display unit 42 together with information on the monitoring importance.
  • the person displayed on each screen is shown in the order of monitoring importance, and the detected event, the place of occurrence, the time of occurrence, and the current position are shown.
  • the column displayed in the order of monitoring importance may display the actual value output by the monitoring importance determination unit 323.
  • the area sizes of screens A to D are dynamically changed according to the determined monitoring importance. For example, when the monitoring importance of all persons is reset and none of the persons are displayed on the screen, the person a who performs the "delivery” act at the time "09:20:00" is displayed on the largest screen. .. Next, when it is determined that the person b who performed the "assault” act at the time "09:30:50" has a higher monitoring importance than the person who performed the "delivery” act, the display area of the person b. Is larger than the display area of the person a.
  • the person displayed on the screen may be displayed by performing image processing such as superimposing the detection frame of the person or trimming the image in order to easily distinguish it from other people photographed on the same screen. desirable.
  • the screen may be scrolled when a plurality of events occur.
  • FIG. 9 is a diagram showing a display example of the image display unit in the present embodiment, and shows a state in which one event of the region 75 in FIG. 8 is selected.
  • screen B region 76
  • FIG. 9 screen B (region 76) is selected and displays a frame at the time of detection for the interaction generated by the tracked person. Since the interaction takes place over a certain period of time, it is desirable to display the frame with the highest determination accuracy among the start and end of the interaction. Alternatively, it may be possible to play a short clip from the start to the end of the interaction.
  • the observer can grasp under what circumstances the interaction was performed when the event occurred. Furthermore, if the staff has completed the response to the event confirmed as described above, if it is determined that the response is unnecessary, or if false positives are obvious, select the line on the display screen or area 75. And can be deleted.
  • this screen can be checked not only by the person to be notified, such as the monitor of the monitoring center, who is expected to use a large display, but also by the staff and security guards who respond at the site. Also, by using a smartphone terminal, a tablet terminal, AR goggles, or the like, a part or all of the area 7 can be confirmed at the site.
  • FIG. 10 is a diagram showing a display example of the search unit according to the present embodiment, and the area 9 shows the entire search screen.
  • the area 91 shows the search input screen
  • the area 92 shows the output screen.
  • the events are narrowed down by using the interaction type, the attribute, and the occurrence time as a query.
  • the pull-down column is pressed for each search item column in the area 911, and the registered item is selected from the list.
  • the selected item is added to the "Registered Items" list at the bottom by pressing the "Add” button.
  • the check box of the item desired to be deleted is pressed, and the area 912 is pressed.
  • each column is not necessarily required to be input and may be blank. For example, if all the items are left blank and the search is executed, all the information stored in the storage unit 41 is output as the search result.
  • information such as the current location and the time of occurrence is displayed in the area 921 regarding the search result.
  • information about the current position may be displayed as long as the stay in the monitoring area is confirmed and traceable even at the current time.
  • a frame image corresponding to the information in the area 921, a short-time clip image from the start to the end of the interaction, or the like is displayed.
  • the area 921 and the area 922 are collectively regarded as one search result, and in the embodiment of the present embodiment, the entire search result can be confirmed by scrolling.
  • the search results may be switched in a grid pattern consisting only of the images or videos shown in the area 922.
  • the search unit 44 By using the search unit 44, even if the event is deleted from the display output by the video display unit 42, the event can be efficiently searched from the recording unit 41. In addition, since it is possible to search for similar cases and check the number of such cases, it can be useful for taking countermeasures and preventive measures for events that are expected to occur in the future.
  • the video surveillance system 1 detects an interaction from the surveillance video, and determines the monitoring importance for each individual using the type and direction of the interaction, as well as the attributes and area information of the person.
  • the observer sets the priority of the response in order to set the monitoring importance for each individual instead of giving the equivalent monitoring importance to all the persons who interacted in the monitoring area. This makes it easier to deal with the actors efficiently.
  • FIG. 11 is a diagram showing a display example by the video display unit 42 in the present embodiment. It shows the information of each person in detail about the interaction between two people whose monitoring importance is set. Region 81 shows the frame image in which the interaction was detected. The frame image shows a state of delivery between the person 811 and the person 812. In addition, the area 84 shows information about the attributes, the place of occurrence, and the current position of the two persons, as well as the direction of delivery and the place of occurrence. As described in the area 84, the information about the two persons is displayed separately on the screen X (area 82) and the screen Y (area 83), respectively.
  • the image of each person captured in the area 81 is displayed with the magnification adjusted so that the observer can easily see it.
  • the place where the event occurs the current position of the person, and the movement locus are shown by using the floor map.
  • the circle indicates the event occurrence site
  • the pentagon indicates the current position and traveling direction of the person
  • the dotted line indicates the movement trajectory connecting the event occurrence site to the current position.
  • the area 823 and the area 833 the state of the person at the current time can be confirmed. From the screens shown in the area 82 and the area 83, it is possible to grasp the movement locus of the performer and the person to be executed from the occurrence of the event to the current time.
  • the following processing is required to display the movement trajectory on the floor map.
  • the area information set in advance is used, or the depth estimation technique using a stereo camera or a monocular camera is used without performing the preset setting. Estimate the position of a person in the world coordinate system. By connecting the acquired position information and time information in chronological order, the movement trajectory of the person can be displayed on the floor map.
  • both the performer and the person to be executed show the movement trajectory from the occurrence of the event in this drawing, the person who has not caused the event has already been tracked or saved in a separately prepared storage medium. If the person can be tracked after the event occurs by using the video of the above, the movement trajectory of each person to the event occurrence site may be displayed. For example, in the delivery act, the executor who delivered the article displays the movement trajectory until the event occurs, and the person to be delivered the article displays the movement trajectory after the event occurs, so that the person is not a person. It is possible to perform video monitoring focusing on the movement trajectory of an article.
  • the image display unit can display only one event. Specifically, it is possible to display the state at the time of event occurrence, the state at the current time, and the movement trajectory of each person who generated the interaction, and further, the movement trajectory of the performers until the event occurrence can be displayed. can. As a result, the observer can check not only the presence or absence of the event but also the movements of the performers before and after the event occurrence at a glance, so that the detailed information of the event can be easily and accurately grasped.
  • the disclosed video analysis system 3 is a video analysis system that detects an event in the monitoring area by using a video captured in the monitoring area, and is a plurality of video analysis systems based on the video.
  • An interaction detection unit 322 that detects an interaction, which is an event caused by the involvement of a person, and outputs the type of interaction and the direction of the interaction indicating how each of the plurality of people interacted with another person in the interaction.
  • a calculation unit 321 that detects an image of a person included in the video and calculates an attribute feature amount representing the attribute of the detected person is further provided, and the monitoring importance determination unit 323 further monitors using the attribute feature amount. Since the importance is determined, it is possible to make a highly accurate determination in consideration of the attributes of the person.
  • the interaction detection unit 322 detects the skeleton of the person based on the image, and the posture feature amount representing the posture of the person calculated from the estimation result of the detected skeleton and one between arbitrary parts between the people. Or, the distance feature amount calculated from multiple distances, the movement feature amount representing the movement amount per unit time of the skeleton calculated from the difference between the previous and next image frames based on the image, and the ownership relationship of the article to the person. At least one of the feature amount of the article to be expressed is calculated, and the type of interaction and the direction of the interaction are detected based on the calculated feature amount. According to such a configuration, it is possible to make a highly accurate determination in consideration of the posture of the person, the distance between the persons, the type of the article, and the like.
  • the monitoring importance determination unit 323 determines the monitoring importance by using the information regarding the positions of a plurality of persons at the time of the interaction, so that the irrational event is excluded from the positional relationship and the determination accuracy is improved. Can be improved.
  • the video analysis system 3 monitors information on the type of interaction, the direction of the interaction, and the security level for each occurrence area in order to determine the monitoring importance for each person who generated the interaction to be detected. It has a storage unit 33 that holds it as information. By retaining this information in advance and using it as appropriate, a simple and highly accurate determination can be realized.
  • search unit 44 capable of searching for the person who generated the interaction by searching the detection record of the interaction using the type of interaction and / or the information about the person as a search query, it is detected and accumulated. It is possible to make effective use of the interaction that has occurred.
  • the output control unit 324 changes the size of the display on the display terminal according to the monitoring importance for each person who has generated the interaction. Therefore, it is possible to easily recognize the importance and control the amount of information according to the importance.
  • the behavior before and after the interaction can be confirmed by displaying the movement locus before and after the interaction on the screen. Whether or not the movement locus of a plurality of persons needs to be generated may be determined by the importance. In this case, it is possible to selectively output the movement locus for an important person.
  • the direction of the interaction may be clearly indicated by displaying on the screen information indicating that the person is the performer or the person to be executed of the predetermined action related to the interaction.
  • the content of the interaction is associated with the current state of the person. Can also be displayed.
  • the present invention is not limited to the above-described embodiment, but includes various modifications.
  • the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations.
  • it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
  • each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit.
  • each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function.
  • Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
  • SSD Solid State Drive
  • Video surveillance system 1 ... Video surveillance system, 2 ... Shooting system, 21 ... Camera section, 3 ... Video analysis system, 31 ... Video input unit, 32 ... Video processing unit, 321 ... Calculation unit, 3211 ... Person detection unit, 3212 ... Attribute determination unit, 322 ... Interaction detection unit, 323 ... Monitoring importance determination unit, 324 ... output control unit, 33 ... storage unit, 331 ... monitoring reference information, 4 ... Monitoring center system, 41 ... Recording unit, 42 ... Video display unit, 43 ... Management control unit, 44 ... Search unit

Abstract

This invention sets a monitoring significance for each of a plurality of persons having interacted with each other in a monitoring area and achieves reduction of the workload of a monitoring and reacting person as well as reduction of system's processing load. For this purpose, this video analyzing system for detecting events in a monitoring area by use of video obtained by imaging the monitoring area comprises: an interaction detecting unit that, on the basis of the video, detects interaction being an event caused by participation of a plurality of persons and that outputs the type of the interaction and the directions of the interaction each indicating how a respective one of the plurality of persons has participated in the interaction together with the other or the others; a monitoring significance determining unit that compares the type and directions of the interaction with predetermined monitoring reference information to determine a monitoring significance for each of the plurality of persons having participated in the interaction; and an output control unit that outputs a detection result of the event on the basis of the monitoring significance.

Description

映像解析システムおよび映像解析方法Video analysis system and video analysis method
 本発明は、監視エリアを撮像した映像から、人物の状態や物体を検知し、検知結果に基づいて、監視対象を検出する映像解析システムおよび映像解析方法に関する。 The present invention relates to a video analysis system and a video analysis method that detect a person's state or an object from an image of a surveillance area and detect a monitoring target based on the detection result.
 近年、コンサート会場やアミューズメント施設などのイベント会場、駅や空港などの公共施設における映像監視の必要性が増大している。例えば、セキュリティエリア内外における荷物の受け渡し行為や物の置き去り行為等に対しては、爆発物や有害液体等の危険物を用いたテロ行為を防ぐために、人物の監視、行為の検出、または行為を行ったもしくは行う予兆のある人物への声掛けといった対応がなされることが保安上要請される。また、人物同士の揉み合い、撮像人物の転倒やうずくまる動作等の早期発見によって、施設管理者は施設内において発生した要救護対象者の迅速な保護を行うことができ、安全の確保に貢献することが可能になる。 In recent years, there has been an increasing need for video surveillance at event venues such as concert venues and amusement facilities, and public facilities such as train stations and airports. For example, in order to prevent terrorist acts using dangerous materials such as explosives and toxic liquids, for the act of handing over luggage or leaving things inside or outside the security area, monitor people, detect acts, or act. For security reasons, it is required to take measures such as calling out to a person who has gone or has a sign of doing so. In addition, by early detection of people rubbing each other, falling and crouching movements of the imaged person, the facility manager can promptly protect the person requiring rescue in the facility and contribute to ensuring safety. Will be possible.
 例えば、特許文献1に記載の画像監視装置においては、監視領域を撮影した画像から、人体の姿勢に関する情報を算出することで、監視領域における受け渡し行為を検出している。さらに、発生位置と受け渡された物品の種類を用いて監視重要度を算出し、前記監視重要度に応じて受け渡し行為の検出結果を出力している。また、受け渡し行為の検出結果の出力について、「監視センタの監視員」に対する画面表示、警報ランプ、警報音等による通知方法が記載されている。 For example, in the image monitoring device described in Patent Document 1, the delivery act in the monitoring area is detected by calculating the information regarding the posture of the human body from the image taken in the monitoring area. Further, the monitoring importance is calculated using the position of occurrence and the type of the delivered article, and the detection result of the delivery act is output according to the monitoring importance. Further, regarding the output of the detection result of the delivery act, a method of notifying the "monitor of the monitoring center" by a screen display, an alarm lamp, an alarm sound, or the like is described.
 また、特許文献2に記載の画像処理装置においては、映像を監視する負担を軽減すると共に物事が起こった瞬間を捉え易くすることを目的として、複数の映像配信装置から配信された複数の映像を取得する取得手段と、複数の映像のそれぞれに含まれる複数の物体の位置を検出する物体位置検出手段と、複数の物体の位置における各物体の向きを検出する向き検出手段と、各物体の向きに基づいて複数の映像のそれぞれに優先度を設定する設定手段と、優先度に基づいて複数の映像を表示する表示手段とを有する画像処理装置が開示されている。 Further, in the image processing device described in Patent Document 2, a plurality of videos distributed from a plurality of video distribution devices are displayed for the purpose of reducing the burden of monitoring the video and making it easier to capture the moment when a thing occurs. Acquisition means for acquiring, object position detecting means for detecting the positions of a plurality of objects included in each of a plurality of images, direction detecting means for detecting the orientation of each object at the positions of a plurality of objects, and orientation of each object. Disclosed is an image processing apparatus having a setting means for setting a priority for each of a plurality of images based on the above, and a display means for displaying a plurality of images based on the priority.
特開2017-028561号公報Japanese Unexamined Patent Publication No. 2017-028561 特開2017-017441号広報Japanese Patent Laid-Open No. 2017-017441 Public Relations
 上述の従来技術を用いると、本来は監視重要度が高くない人物に対しても過剰な発報が行われるケースが発生するため、監視業務に関わる作業員は発報された事象に対応しなければならず業務負担が増大する。また、事象検出後に結果を出力する際に、監視重要度の高い行為者が多いと、これらの行為者について映像解析システムによる人物追跡や行動認識等を行う必要が生じるため、人数の増加による処理負荷の増大を引き起こす。そこで、本発明では、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視重要度の高い人物を精度良く絞り込むことで、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減が可能な映像解析システムを提供することを目的とする。 If the above-mentioned conventional technology is used, there will be cases where excessive notification is given to a person whose monitoring importance is not originally high, so workers involved in monitoring work must respond to the reported event. The work burden will increase. In addition, when outputting the result after event detection, if there are many actors with high monitoring importance, it will be necessary to perform person tracking and action recognition by the video analysis system for these actors, so processing by increasing the number of people Causes an increase in load. Therefore, in the present invention, the monitoring importance is set for each of a plurality of persons who have interacted with each other in the monitoring area, and the persons with high monitoring importance are accurately narrowed down, thereby reducing the work load of the responder in monitoring and the system. It is an object of the present invention to provide a video analysis system capable of reducing the processing load of the above.
 本発明の一態様としての映像解析システムは、監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像解析システムにおいて、前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出部と、前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定部と、前記監視重要度に基づいて、前記事象の検出結果を出力する出力制御部と、を有する映像解析システムを提供する。
 また、本発明は、監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像解析方法において、前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出ステップと、前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定ステップと、前記監視重要度に基づいて、前記事象に検出結果を出力する出力制御ステップと、を含む映像解析方法を提供する。
The video analysis system as one aspect of the present invention is an image analysis system that detects an event in the surveillance area by using an image captured in the surveillance area, and is an event caused by the involvement of a plurality of persons based on the image. An interaction detector that detects an interaction and outputs the type of the interaction and the direction of the interaction indicating how each of the plurality of persons interacted with another person in the interaction, the type of the interaction, and the type of the interaction. Based on the monitoring importance determination unit that determines the monitoring importance of each person for a plurality of persons involved in the interaction by comparing the direction with the preset monitoring reference information, and the monitoring importance. Provided is a video analysis system including an output control unit that outputs the detection result of the event.
Further, according to the present invention, in a video analysis method for detecting an event in the surveillance area using an image obtained by capturing a surveillance area, the present invention detects an interaction, which is an event caused by the involvement of a plurality of people, based on the image. The interaction detection step that outputs the type of the interaction and the direction of the interaction indicating how each of the plurality of persons interacted with another person in the interaction, and the type and direction of the interaction are preset. The monitoring importance determination step for determining the monitoring importance of each person for a plurality of persons involved in the interaction by comparing with the monitoring reference information, and the detection result for the event based on the monitoring importance. Provides an output control step to output and a video analysis method including.
 本発明によれば、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現することができる。 According to the present invention, it is possible to set the monitoring importance for each of a plurality of persons who have interacted with each other in the monitoring area, and to reduce the work load of the responder in the monitoring and the processing load of the system.
本実施の形態における映像監視システムの説明図である。It is explanatory drawing of the video surveillance system in this embodiment. 本実施の形態における映像監視システムの全体構成を示した図である。It is a figure which showed the whole structure of the image surveillance system in this embodiment. 本実施の形態における映像解析システムのブロック図を示した図である。It is a figure which showed the block diagram of the image analysis system in this embodiment. 本実施の形態における映像解析システムのフローチャートを示した図である。It is a figure which showed the flowchart of the image analysis system in this embodiment. 本実施の形態における監視基準情報のデータ構造を示した図である。It is a figure which showed the data structure of the monitoring standard information in this embodiment. 本実施の形態における監視基準情報の設定画面例を示した図である。It is a figure which showed the setting screen example of the monitoring standard information in this embodiment. 本実施の形態における監視基準情報の設定画面例を示した図である。It is a figure which showed the setting screen example of the monitoring standard information in this embodiment. 本実施の形態における映像表示部の表示例を示した図である。It is a figure which showed the display example of the image display part in this embodiment. 本実施の形態における映像表示部の表示例を示した図である。It is a figure which showed the display example of the image display part in this embodiment. 本実施の形態における検索部の表示例を示した図である。It is a figure which showed the display example of the search part in this embodiment. 本実施の形態における映像表示部の表示例を示した図である。It is a figure which showed the display example of the image display part in this embodiment. 本実施の形態における映像監視システムのハードウェア構成を示した図である。It is a figure which showed the hardware configuration of the video surveillance system in this embodiment.
 以下、本発明にかかる映像監視システムの実施の形態について説明する。本実施の形態では、イベント会場、駅や空港などの公共施設におけるテロ行為や危険行為の早期発見を目的として、監視エリア内における物品の受け渡し行為やもみ合い行為等の複数人物による連携行動、すなわちインタラクションの検出を行うものである。本発明によると、インタラクションを行った各人物について監視重要度を設定し、対応すべき事象に優先順位を設定可能になるため、施設内の監視員や現場スタッフが効率的かつ迅速に事象への対応を行うことを促進する。また、前記スタッフの人員が要対応者よりも少ない場合において、重要度の高い人物を取り逃すリスクを低減可能である。さらに、監視重要度の高い人物から事象発生後の人物追跡や行動認識等の処理を行うことが可能になるため、限られたコンピュータリソースを適切に配分可能とする。 Hereinafter, embodiments of the video surveillance system according to the present invention will be described. In this embodiment, for the purpose of early detection of terrorist acts and dangerous acts in public facilities such as event venues, stations and airports, collaborative actions by multiple persons such as delivery of goods and conflicting acts in the surveillance area, that is, interaction. Is to be detected. According to the present invention, it is possible to set the monitoring importance for each person who interacts and set the priority for the event to be dealt with, so that the observer in the facility and the on-site staff can efficiently and quickly respond to the event. Encourage the response. In addition, when the number of staff members is smaller than that of the person requiring a response, it is possible to reduce the risk of missing a person of high importance. Furthermore, since it is possible to perform processing such as person tracking and action recognition after an event occurs from a person with high monitoring importance, it is possible to appropriately allocate limited computer resources.
 なお、本実施の形態における「事象」とは、ある監視領域において検出対象として事前設定された状況である。特に本実施の形態では、複数の人物が関与して生じる事象であるインタラクションを検出対象とする。例えば、握手、荷物の受け渡し、もみ合い、暴行などの行動がインタラクションに含まれる。以下、図面を用いて、実施例を説明する。 Note that the "event" in the present embodiment is a situation preset as a detection target in a certain monitoring area. In particular, in the present embodiment, an interaction, which is an event involving a plurality of persons, is targeted for detection. For example, actions such as shaking hands, handing over luggage, rubbing, and assault are included in the interaction. Hereinafter, examples will be described with reference to the drawings.
 図1は、本実施の形態における映像監視システムの説明図である。図1に示すように、映像監視システム1は、撮影システム2、映像解析システム3、監視センタシステム4に大別される。撮影システム2は、監視対象エリアに設置されたカメラ部によって構成される。また、映像解析システム3では、撮像装置からの入力映像を解析することで、検出対象である人物間のインタラクションおよび前記人物の属性を判定し、さらに、発生位置の情報とインタラクションの方向に関する情報を、事前設定された監視基準情報に照らし合わせることで、各人物の監視重要度を判定する。また、監視センタシステム4では、映像解析システム3からの解析結果を受け取り、監視員や現場スタッフへの効果的な表示や、インタラクションや人物に関する事象発生後の検索を行う。 FIG. 1 is an explanatory diagram of the video surveillance system according to the present embodiment. As shown in FIG. 1, the video monitoring system 1 is roughly classified into a shooting system 2, a video analysis system 3, and a monitoring center system 4. The photographing system 2 is composed of a camera unit installed in a monitored area. Further, in the image analysis system 3, the interaction between the persons to be detected and the attributes of the person are determined by analyzing the input image from the image pickup device, and further, the information on the generation position and the information on the direction of the interaction are obtained. , Determine the monitoring importance of each person by comparing with the preset monitoring standard information. Further, the monitoring center system 4 receives the analysis result from the video analysis system 3, effectively displays it to the observer and the on-site staff, and performs a search after an event related to an interaction or a person occurs.
 ここで、インタラクションの方向とは、インタラクションに係る所定の行動がどの人物からどの人物に行われたかを示すものである。例えば、受け渡しであれば、物品を渡した人物(受け渡しの実行者)から受け取った人物(受け渡しの被実行者)にインタラクションの方向を設定する。同様に、暴行であれば、加害者(暴行の実行者)から被害者(暴行の被実行者)にインタラクションの方向を設定する。このように、インタラクションの方向は、インタラクションの種類ごとに定まるものである。また、握手やもみ合いのように、双方向で行われるインタラクションも存在する。
 また、人物の属性とは、一般人か警備員か、年齢、性別などである。
 映像監視システム1は、インタラクションの方向やインタラクションに関与した人物の属性を用いることで、人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現している。
Here, the direction of interaction indicates from which person to which person a predetermined action related to the interaction is performed. For example, in the case of delivery, the direction of interaction is set from the person who delivered the article (executor of delivery) to the person who received it (executor of delivery). Similarly, in the case of assault, the direction of interaction is set from the perpetrator (perpetrator of the assault) to the victim (perpetrator of the assault). In this way, the direction of interaction is determined for each type of interaction. There are also interactions that take place in both directions, such as shaking hands and rubbing.
In addition, the attributes of a person include whether they are ordinary people or security guards, age, and gender.
The video monitoring system 1 sets the monitoring importance for each person by using the direction of interaction and the attributes of the person involved in the interaction, and realizes reduction of the work load of the responder in monitoring and reduction of the processing load of the system. is doing.
 この点について、具体例を挙げて説明する。発生位置と受け渡された物品の種類を用いて人物の監視重要度を算出する算出法によると、受け渡しと受け取りを行った複数の人物の監視重要度の軽重は考慮されない。しかし、例えば物品が当該監視エリアにおける要注意物品であったとすると、前記物品を受け渡された人物の監視重要度が、受け渡した人物よりも高く設定されることが望ましい。または、監視エリア内に異なるセキュリティレベルのエリアがある場合、セキュリティレベルの低いエリアから高いエリアへの受け渡しと、その逆方向の受け渡しでは、監視重要度は異なって判定されることが望ましい。 This point will be explained with a concrete example. According to the calculation method of calculating the monitoring importance of a person using the position of occurrence and the type of the delivered article, the weight of the monitoring importance of a plurality of persons who have delivered and received is not taken into consideration. However, for example, if the article is an article requiring attention in the monitoring area, it is desirable that the monitoring importance of the person who delivered the article is set higher than that of the person who delivered the article. Alternatively, when there are areas with different security levels in the monitoring area, it is desirable that the monitoring importance is judged differently between the transfer from the low security area to the high security area and the opposite direction.
 また、例えば、受け渡しを行った人物が警備員をはじめとする監視エリア内の保安維持業務の従事者(保安要員)であれば、監視重要度は高く設定されるべきではない。しかし、このような属性を考慮しなければ、真に監視すべき受け渡しを行った人物と保安要員の監視重要度は同等に判定される。この方法では、システムが提示する重要度に差がない複数の人物に対応が必要となり、監視員などの負担が増大する。さらに、現場スタッフや警備員等の対応にあたる人員が、要対応者の人数よりも少ない場合、重要な対象者を取り逃す恐れがある。
 以上から、映像監視システム1は、インタラクションの種別およびその発生位置のみならず、行動の方向と属性に関する情報を用いることで正確な監視重要度の判定を実現しているのである。
Further, for example, if the person who delivered the item is a security guard or other security maintenance worker (security personnel) in the monitoring area, the importance of monitoring should not be set high. However, if such attributes are not taken into consideration, the monitoring importance of the person who made the delivery that should be truly monitored and the security personnel are judged to be equal. In this method, it is necessary to deal with a plurality of persons who have no difference in the importance presented by the system, which increases the burden on the observer and the like. Furthermore, if the number of on-site staff, security guards, etc., is less than the number of people requiring a response, there is a risk of missing an important target person.
From the above, the video monitoring system 1 realizes accurate determination of monitoring importance by using information on not only the type of interaction and its occurrence position but also the direction and attributes of the action.
 以下、撮影システム2、映像解析システム3、および監視センタシステム4について具体的に説明する。
 図2は、本実施の形態における映像監視システムの全体構成を示した図である。撮影システム2は、監視対象エリアに設置された一つまたは複数のカメラ部21から構成され、撮像された映像は、映像解析システム3の映像入力部31へ順次入力される。カメラ部21は、監視を対象とするエリア全体を撮像できるように配置された監視カメラである。なお、インタラクションの検出のためのエリア設定を必要としない場合には、監視カメラは固定されていない移動型のカメラでもよく、監視対象のエリアを撮像できていれば、形式は問わない。一方、エリア設定を要する場合には、壁や支柱等へ固定された監視カメラを使用し、キャリブレーション設定が事前になされていることが望ましい。また、そのような場合は、パン・チルト・ズーム(PTZ)操作が不可能な固定カメラの利用が想定されるが、それらの設定とキャリブレーション設定の組み合わせが事前に調整されている場合、PTZ操作が可能なカメラを利用してもよく、同一のカメラで様々なエリアを監視してもよい。
Hereinafter, the photographing system 2, the video analysis system 3, and the monitoring center system 4 will be specifically described.
FIG. 2 is a diagram showing the overall configuration of the video surveillance system according to the present embodiment. The photographing system 2 is composed of one or a plurality of camera units 21 installed in the monitoring target area, and the captured images are sequentially input to the image input unit 31 of the image analysis system 3. The camera unit 21 is a surveillance camera arranged so that the entire area to be monitored can be imaged. If the area setting for detecting the interaction is not required, the surveillance camera may be a mobile camera that is not fixed, and the format does not matter as long as the area to be monitored can be imaged. On the other hand, when area setting is required, it is desirable to use a surveillance camera fixed to a wall, a support, etc., and to set calibration in advance. In such a case, it is assumed that a fixed camera that cannot operate pan / tilt / zoom (PTZ) is used, but if the combination of these settings and the calibration setting is adjusted in advance, the PTZ is used. An operable camera may be used, or the same camera may be used to monitor various areas.
 また、カメラ部21と映像入力部31は、有線通信手段または無線通信手段によって接続され、カメラ部21から映像入力部31へフレーム画像を連続送信する。インタラクション認識が複数のフレーム画像の入力を前提とする時系列データ分析モデルである場合、フレーム画像の連続送信のフレームレートは、インタラクション認識の要求値以上であることが望ましい。一方、フレームレートが要求値より低下した場合に生じるインタラクション認識の精度低下が許容できる場合、フレームレートは要求値を下回っても良い。この場合、インタラクション認識において、時系列データの内挿または外挿による補間など、精度低下を抑制する処理を行っても良い。また、カメラ部21と映像解析システム3は一対一の対応ではなくてもよく、複数のカメラ部と一つの映像解析システムとして用いてもよい。このような多重プロセスの実行の場合においても、各プロセスが必要とするフレームレートは前述の制約に準ずる。なお、カメラ部21は、後述する映像解析システムの一部または全部の機能を搭載してもよい。 Further, the camera unit 21 and the video input unit 31 are connected by a wired communication means or a wireless communication means, and a frame image is continuously transmitted from the camera unit 21 to the video input unit 31. When the interaction recognition is a time-series data analysis model that assumes the input of a plurality of frame images, it is desirable that the frame rate of continuous transmission of the frame images is equal to or higher than the required value for the interaction recognition. On the other hand, if the decrease in the accuracy of interaction recognition that occurs when the frame rate is lower than the required value is acceptable, the frame rate may be lower than the required value. In this case, in the interaction recognition, processing for suppressing the decrease in accuracy, such as interpolation by interpolation or extrapolation of time series data, may be performed. Further, the camera unit 21 and the image analysis system 3 do not have to have a one-to-one correspondence, and may be used as one image analysis system with a plurality of camera units. Even in the case of executing such multiple processes, the frame rate required by each process is subject to the above-mentioned restrictions. The camera unit 21 may be equipped with a part or all of the functions of the video analysis system described later.
 映像解析システム3は、映像入力部31、映像処理部32、記憶部33から構成される。映像入力部31は、カメラ部21から映像の入力を受け付け、映像処理部32へ映像データを送信する。なお、解析対象とする映像は、カメラ部21から直接入力された映像ではなく、別途保存されたレコーダ内の映像でもよい。映像の保存場所は問わない。映像処理部32は、後述する記憶部33に保存された監視基準情報を読み込み、映像入力部31より入力された映像を解析することで、インタラクションを行った各個人の監視重要度を判定する機能を有する。記憶部33は、後述する管理制御部43において設定された監視基準情報を保存する。監視基準情報は、映像処理部32の出力となる監視重要度の判定に用いる。また、本実施の形態では、映像解析システム3は運用施設内のサーバへ構築するようなオンプレミス型のシステムに限定されるものではなく、クラウドサービスを活用するなど施設外部サーバに構築されてもよい。 The video analysis system 3 is composed of a video input unit 31, a video processing unit 32, and a storage unit 33. The video input unit 31 receives video input from the camera unit 21 and transmits video data to the video processing unit 32. The video to be analyzed may not be a video directly input from the camera unit 21, but may be a video in a separately stored recorder. The storage location of the video does not matter. The video processing unit 32 reads the monitoring reference information stored in the storage unit 33, which will be described later, and analyzes the video input from the video input unit 31 to determine the monitoring importance of each individual who has interacted with the video processing unit 32. Has. The storage unit 33 stores the monitoring reference information set in the management control unit 43, which will be described later. The monitoring reference information is used for determining the monitoring importance, which is the output of the video processing unit 32. Further, in the present embodiment, the video analysis system 3 is not limited to the on-premises type system constructed on the server in the operation facility, but may be constructed on the server outside the facility such as by utilizing the cloud service. ..
 監視センタシステム4は、記録部41と映像表示部42と管理制御部43と検索部44から構成される。記録部41は、映像解析システム3による映像解析によって得られた発生インタラクション、インタラクションの方向、人物属性、発生エリア、発生時刻等の情報をデータベースとして保持する機能を有する。映像表示部42では、インタラクションを行った人物の現在時刻における行動や、インタラクション検出時のフレームの一部または全部に関する情報を、監視重要度に従って表示する。管理制御部43では、映像処理部32で用いる監視基準情報を保存するため、監視員や現場スタッフ等によって記憶部33へ設定情報を入力する機能を有する。検索部44は、人物の属性やインタラクション種別をクエリとして、記録部41に保存された情報から該当する人物を検索する機能を有し、該当人物の現在時刻における位置とそれまでの施設内の移動軌跡等の情報を調べる機能を有する。 The monitoring center system 4 is composed of a recording unit 41, a video display unit 42, a management control unit 43, and a search unit 44. The recording unit 41 has a function of holding information such as the generated interaction, the direction of the interaction, the person attribute, the generated area, and the generated time obtained by the video analysis by the video analysis system 3 as a database. The video display unit 42 displays information about the behavior of the person who has performed the interaction at the current time and a part or all of the frame at the time of detecting the interaction according to the monitoring importance. The management control unit 43 has a function of inputting setting information to the storage unit 33 by a watchman, a field staff, or the like in order to store the monitoring reference information used by the video processing unit 32. The search unit 44 has a function of searching for the corresponding person from the information stored in the recording unit 41 using the attribute and interaction type of the person as a query, and the position of the person at the current time and the movement within the facility up to that point. It has a function to check information such as trajectories.
 図12は、本実施の形態における映像監視システムのハードウェア構成図である。図12では、カメラユニット1102がネットワークを介してコンピュータ1103に接続されている。さらに、コンピュータ1103は、ネットワークを介してコンピュータ1104と通信可能である。 FIG. 12 is a hardware configuration diagram of the video surveillance system according to the present embodiment. In FIG. 12, the camera unit 1102 is connected to the computer 1103 via a network. Further, the computer 1103 can communicate with the computer 1104 via the network.
 カメラユニット1102は、監視領域に1つ又は複数設置され、映像データをコンピュータ1103に適宜送信する。コンピュータ1103は、演算制御装置としてのCPU(Central Processing Unit)、主記憶装置としてのRAM(Random access memory)、補助記憶装置としてのHDD(hard disk drive)を備える。コンピュータ1103は、各種プログラムをHDDから読み出してRAMに展開し、CPUによって実行することで、映像解析システム3としての機能を実現する。また、コンピュータ1103は、所定の通信インタフェース(IF)を介してカメラユニット1102及びコンピュータ1104と通信する。なお、図示を省略したが、キーボードやディスプレイなどの入出力装置も所定のIFを介してコンピュータ1103に接続される。 One or a plurality of camera units 1102 are installed in the monitoring area, and video data is appropriately transmitted to the computer 1103. The computer 1103 includes a CPU (Central Processing Unit) as an arithmetic control device, a RAM (Random access memory) as a main storage device, and an HDD (hard disk drive) as an auxiliary storage device. The computer 1103 realizes the function as the video analysis system 3 by reading various programs from the HDD, developing them in the RAM, and executing them by the CPU. Further, the computer 1103 communicates with the camera unit 1102 and the computer 1104 via a predetermined communication interface (IF). Although not shown, input / output devices such as a keyboard and a display are also connected to the computer 1103 via a predetermined IF.
 コンピュータ1104は、演算制御装置としてのCPU、主記憶装置としてのRAM、補助記憶装置としてのHDDを備え、各種プログラムをHDDから読み出してRAMに展開し、CPUによって実行することで、監視センタシステム4としての機能を実現する。また、コンピュータ1104は、所定のインタフェース(IF)を介してコンピュータ1103、キーボードやディスプレイなどの入出力装置と接続される。 The computer 1104 includes a CPU as an arithmetic control device, a RAM as a main storage device, and an HDD as an auxiliary storage device. Various programs are read from the HDD, expanded into the RAM, and executed by the CPU to execute the monitoring center system 4. To realize the function as. Further, the computer 1104 is connected to the computer 1103 and an input / output device such as a keyboard and a display via a predetermined interface (IF).
 次に、図3を参照して、映像解析システム3の詳細を説明する。図3は、本実施の形態における映像解析システムのブロック図を示した図である。以下、映像解析システム3を構成する映像入力部31、映像処理部32、記憶部33について説明する。 Next, the details of the video analysis system 3 will be described with reference to FIG. FIG. 3 is a diagram showing a block diagram of the video analysis system according to the present embodiment. Hereinafter, the video input unit 31, the video processing unit 32, and the storage unit 33 that constitute the video analysis system 3 will be described.
 映像入力部31は、一つまたは複数のカメラ部21から映像を順次受け付け、後段の映像処理部32へ映像を出力する。ただし、映像処理部32が時系列情報を扱わない場合、入力は画像であってもよい。 The video input unit 31 sequentially receives images from one or more camera units 21 and outputs the images to the subsequent image processing unit 32. However, when the video processing unit 32 does not handle the time series information, the input may be an image.
 映像処理部32は、算出部321、インタラクション検出部322、監視重要度判定部323、そして出力制御部324から構成される。
 算出部321は、さらに人物検出部3211と属性判定部3212から構成される。
The video processing unit 32 includes a calculation unit 321, an interaction detection unit 322, a monitoring importance determination unit 323, and an output control unit 324.
The calculation unit 321 is further composed of a person detection unit 3211 and an attribute determination unit 3212.
 人物検出部3211は、前記映像入力部から受け付けた画像または映像を用いて、現フレームの静止画中から人物を検出する。人物検出の手段には、Haar-like特徴の利用やR-CNN(Regions with CNN)等の利用によって判定する手段や、骨格推定手段を用いて人物ごとに推定された骨格座標群から推定領域を判定する手段などがあり、本実施の形態ではその手段を問わない。また、人物検出部3211は人物検出ののちに人物追跡を行う。人物追跡では、ある人物の矩形画像と、その人物へ割り当てられた人物IDとが、前後フレームで対応付けられていればよく、テンプレートマッチングやオプティカルフロー等、一般的な人物追跡手法を用いればよい。 The person detection unit 3211 detects a person from the still image of the current frame using the image or video received from the video input unit. As the means for detecting a person, a means for determining by using a Haar-like feature, a means for determining by using R-CNN (Regions with CNN), or a means for estimating a skeleton coordinate group estimated for each person using a skeleton estimation means is used. There is a means for determining, and the means is not limited in the present embodiment. Further, the person detection unit 3211 performs person tracking after the person is detected. In person tracking, a rectangular image of a person and a person ID assigned to that person may be associated with each other in front and back frames, and a general person tracking method such as template matching or optical flow may be used. ..
 次に、属性判定部3212へ前記人物検出部で得た人物の矩形画像を入力し、人物の属性を判定する。人物の属性を用いることによって、各個人の監視重要度の判定に寄与する情報として活用することができる。さらに、前述した検索部44におけるクエリとして属性を用いることができる。属性の例として、施設内の警備員やスタッフ、一般の施設利用者、年齢や性別等が挙げられる。施設内の警備員やスタッフと一般の施設利用者を判別できると、警備員やスタッフが起こしたインタラクションは職務の範囲で行った行動であることが想定されるため、監視重要度を設定せず、余分な発報を生じさせないという効果が期待できる。また、一般の施設利用者の年齢や性別の推定により、例えば、仮に特定の年齢層の人物が要注意行動を発生させやすいという事前の統計情報がある場合、当該特定の年齢層に対する監視重要度を高めに設定することで効果的な映像監視が可能になる。さらに、アミューズメント施設やイベント施設等においては、前記施設利用者のうち事前登録された要注意人物や出入禁止対象人物等との照合を行うことも効果的である。属性の推定方法には、人物の矩形画像をHOG(Histograms of Oriented Gradients)、SIFT(Scale-Invariant Feature Transform)、または学習済みディープラーニングモデルのネットワークの中間層から出力されるベクトル等の画像特徴量へ変換し、SVM(Support Vector Machine)や決定木等の識別機を学習させる手段や、CNN(Convolutional Neural Network)に基づく手法によってエンドトゥエンドに判定する手段などがある。これらの判定手段は、一段目として警備員や現場スタッフと一般来場者を分類する識別機、二段目として一般来場者のうち年齢や性別を判定する識別機として構築されてもよく、または同一の識別機において学習されてもよい。さらに、人物の所有物品の判定手段を別途用いることで、禁止物品や危険物品を所持していると判定された人物に対しては、それを表現する属性を付与することも、適切な属性表現のために効果的である。 Next, the rectangular image of the person obtained by the person detection unit is input to the attribute determination unit 3212, and the attribute of the person is determined. By using the attributes of a person, it can be used as information that contributes to the determination of the monitoring importance of each individual. Further, the attribute can be used as the query in the search unit 44 described above. Examples of attributes include security guards and staff in the facility, general facility users, age and gender, and the like. If it is possible to distinguish between the security guards and staff in the facility and general facility users, it is assumed that the interaction caused by the security guards and staff is an action performed within the scope of their duties, so the importance of monitoring is not set. , The effect of not causing extra alarm can be expected. In addition, based on the estimation of the age and gender of general facility users, for example, if there is prior statistical information that a person in a specific age group is likely to generate a behavior requiring attention, the importance of monitoring for that specific age group is available. Effective video monitoring becomes possible by setting the value higher. Further, in amusement facilities, event facilities, etc., it is also effective to collate with the pre-registered persons requiring attention, persons subject to entry / exit prohibition, etc. among the facility users. As an attribute estimation method, a rectangular image of a person is imaged as HOG (Histograms of Oriented Radiants), SIFT (Scale-Invariant Features Transfer Form), or a vector output from the middle layer of a trained deep learning model network. There is a means of converting to SVM (Support Vector Machine) and learning a discriminator such as a decision tree, and a means of making an end-to-end judgment by a method based on CNN (Convolutional Neural Network). These determination means may be constructed as the first stage as an acceptor for classifying security guards and field staff and general visitors, and as the second stage as an acceptor for determining age and gender among general visitors, or the same. It may be learned in the acceptors of. Furthermore, by separately using a means for determining the goods owned by a person, it is also possible to give an attribute to express the prohibited goods or dangerous goods to the person determined to be possessed. Is effective for.
 インタラクション検出部322は、前記人物検出部3211から得た情報を用いて、インタラクションの有無、種別、およびその方向を判定する。判定方法には、任意の人物ペアについて、前述のように画像特徴量を用いる手段や、骨格情報を用いる手段がある。骨格情報を用いる場合、前記人物骨格検知手段による骨格の推定結果から算出される人物の姿勢を表す特徴量や、ある人物ペアの任意の骨格点間の相対距離から算出される特徴量、または、前後の画像フレームから骨格の単位時間当たりの移動量や相対距離の変化量を表現する時系列の特徴量を用いてもよい。さらに、属性判定部3212から得た属性情報を特徴量として用いてもよく、例えば、年齢や性別を表現する特徴量を用いてもよい。または、人物に着目するのみならず、人物が所持する物品を表現する特徴量を用いてもよい。例えば、ある人物による所有判定がなされた物品が、一定時間後に別の人物の所有判定となった場合、所有判定が切り替わった時点において、受け渡し行為が発生したと解釈することができる。これらの特徴量は、単独で用いられるだけではなく、複合的に用いられてもよい。例えば、姿勢に関する特徴量のみを用いてインタラクションを判定すると、遠距離に位置する人物間であってもインタラクションを誤検知する可能性があるが、相対距離に関する特徴量も併用することで、誤検知数を低減可能である。以上のように、特徴量を単独または複合的に用いることで、効果的なインタラクション検出を行うことができる。 The interaction detection unit 322 determines the presence / absence, type, and direction of the interaction using the information obtained from the person detection unit 3211. As the determination method, there are means for using the image feature amount and means for using the skeleton information for any pair of people as described above. When skeleton information is used, a feature amount representing a person's posture calculated from the estimation result of the skeleton by the person skeleton detecting means, a feature amount calculated from a relative distance between arbitrary skeleton points of a certain person pair, or A time-series feature amount that expresses the amount of movement of the skeleton per unit time or the amount of change in relative distance from the previous and next image frames may be used. Further, the attribute information obtained from the attribute determination unit 3212 may be used as a feature amount, and for example, a feature amount expressing age or gender may be used. Alternatively, not only focusing on a person, but also a feature amount expressing an article possessed by the person may be used. For example, if an article whose ownership is determined by one person is determined to be owned by another person after a certain period of time, it can be interpreted that a delivery act has occurred when the ownership determination is switched. These features may be used not only alone but also in combination. For example, if the interaction is determined using only the features related to the posture, the interaction may be erroneously detected even between people located at a long distance. However, by using the features related to the relative distance together, the erroneous detection may occur. The number can be reduced. As described above, effective interaction detection can be performed by using the feature amount alone or in combination.
 画像情報を用いる場合は、CNNに基づく手法を用いる手段が挙げられる。または、骨格情報を用いる場合は、姿勢、相対距離、または属性等を表す特徴量により、SVM、決定木、またはLSTM(Long Short-Term Memory)等の識別機を学習させる手段が挙げられる。 When using image information, a means using a method based on CNN can be mentioned. Alternatively, when skeleton information is used, a means for learning a discriminator such as an SVM, a decision tree, or an LSTM (Long Short-Term Memory) based on a feature amount representing a posture, a relative distance, an attribute, or the like can be mentioned.
 監視重要度判定部323は、属性判定部3212から得られた人物の属性、インタラクション検出部322から得られたインタラクションの種別と方向、そして本実施例の形態では人物検出部3211から得られる人物が位置するエリア情報も入力とし、監視基準情報331に設定された情報と照合することで、インタラクションを行った個人ごとの監視重要度を設定する。エリア情報の算出は、エリアの事前設定情報と人物矩形の情報とを照合して判定することができる。例えば、あるPTZ設定のカメラにおいて画像座標上でエリア設定を行えば、人物の足元の推定位置がどのエリア上に位置するかによって、人物が位置するエリアを判定することができる。 The monitoring importance determination unit 323 includes the attributes of the person obtained from the attribute determination unit 3212, the type and direction of the interaction obtained from the interaction detection unit 322, and the person obtained from the person detection unit 3211 in the embodiment of the present embodiment. The location information is also input, and by collating with the information set in the monitoring reference information 331, the monitoring importance for each individual who has interacted is set. The calculation of the area information can be determined by collating the preset information of the area with the information of the person rectangle. For example, if the area is set on the image coordinates in a camera with a certain PTZ setting, the area where the person is located can be determined depending on which area the estimated position of the person's feet is located on.
 出力制御部324は、監視重要度判定部323で判定された個人ごとの監視重要度を、監視センタシステム4へ送信する。監視重要度が算出された事象全てを送信してもよく、または監視重要度が高いもののみが送信されるように、閾値を事前設定してもよい。 The output control unit 324 transmits the monitoring importance of each individual determined by the monitoring importance determination unit 323 to the monitoring center system 4. All events for which monitoring importance has been calculated may be transmitted, or thresholds may be preset so that only those with high monitoring importance are transmitted.
 記憶部33では、監視重要度判定部323で用いるための監視基準情報331を記憶する。監視基準情報331は、インタラクション種別ごとに設定されたインタラクションセキュリティレベル、属性種別ごとに設定された属性セキュリティレベル、エリア種別ごとに設定されたエリアセキュリティレベルの三種のセキュリティレベル設定情報を有する。さらに、それらのセキュリティレベル設定それぞれの重み情報およびインタラクション種別ごとに設定される実行者または被実行者の重み情報を有する。監視基準情報331は、管理制御部43から設定することができる。 The storage unit 33 stores the monitoring reference information 331 for use in the monitoring importance determination unit 323. The monitoring standard information 331 has three types of security level setting information: an interaction security level set for each interaction type, an attribute security level set for each attribute type, and an area security level set for each area type. Further, it has the weight information of each of those security level settings and the weight information of the performer or the executed person set for each interaction type. The monitoring reference information 331 can be set from the management control unit 43.
 次に、図4に示すフローチャートを参照して、本実施の形態における映像解析システムの処理の流れを説明する。
 ステップS1において撮影システムから映像解析システムへ映像が入力されると、ステップS2において人物検出が行われる。
 次に、ステップS3で人数計測を行う。もし画面内に2名以上が検出された場合には、ステップS4へ移行し、1名以下のみの検出であった場合には、ステップS4以降の処理は行わずに次フレームの入力を待機して、ステップS1へ戻る。なお、インタラクション検出を所望または許容するエリアと所望または許容しないエリアが画面内に混在している場合、計算量削減のために、ステップ2の直前に監視エリアに対して部分的にマスク処理を行ってもよい。
Next, the flow of processing of the video analysis system according to the present embodiment will be described with reference to the flowchart shown in FIG.
When an image is input from the photographing system to the image analysis system in step S1, person detection is performed in step S2.
Next, the number of people is measured in step S3. If two or more people are detected in the screen, the process proceeds to step S4, and if only one person or less is detected, the process after step S4 is not performed and the input of the next frame is waited for. Then, the process returns to step S1. When an area where interaction detection is desired or permitted and an area where interaction detection is desired or not permitted are mixed in the screen, a mask process is partially performed on the monitored area immediately before step 2 in order to reduce the amount of calculation. You may.
 ステップS4ではインタラクション判定を行う。判定は、画面内の任意の人物ペアに対して行われるが、計算量削減のためにインタラクション種別ごとに一定の距離以上の人物ペアに対しては判定処理を行わないことが好適である。例えば、受け渡し行為の検出においては、明らかに互いに手の届かない距離に位置している二名の人物間に対して受け渡しの有無を判定する必要はない。行動判定の前に人物間の相対距離の判定を行う場合、相対距離を世界座標系で算出する必要がある。そのため、事前に設定されたエリア情報を用いるか、または、ステレオカメラや単眼カメラによる深度推定技術等を用いて、事前設定を行うことなく、世界座標系における人物の位置を推定し、人物間の相対距離を算出する。さらに、別途設定テーブルを用意し、一括して判定用閾値を設定する。例えば、閾値を「3m」と設定した場合、世界座標系における相対距離が「3m」という前記閾値を超える人物ペアに対しては行動の判定処理を行わない。もし識別機が多種のインタラクションの認識に対応した多クラス分類器ではなく、インタラクション種別ごとに学習された二クラス分類器である場合、前記閾値の設定はインタラクション種別ごとに行うこともできる。また、エリア境界を横断するインタラクションを検出する際には、同一のエリア内に位置する人物間でインタラクション判定を行わず、異なるエリアに位置する人物間でのインタラクションを判定することも計算量削減および誤検知低減のために好適である。 In step S4, the interaction is determined. The determination is performed on any pair of people on the screen, but it is preferable not to perform the determination process on the pair of people over a certain distance for each interaction type in order to reduce the amount of calculation. For example, in detecting a delivery act, it is not necessary to determine whether or not a delivery is made between two persons who are clearly located within reach of each other. When determining the relative distance between people before determining the action, it is necessary to calculate the relative distance in the world coordinate system. Therefore, the position of a person in the world coordinate system is estimated by using preset area information or by using a depth estimation technique using a stereo camera or a monocular camera, etc., without performing preset settings, and the distance between the persons is estimated. Calculate the relative distance. Further, a setting table is prepared separately, and the threshold value for judgment is set collectively. For example, when the threshold value is set to "3 m", the action determination process is not performed for the person pair whose relative distance in the world coordinate system exceeds the threshold value of "3 m". If the discriminator is not a multi-class classifier that supports recognition of various types of interaction, but a two-class classifier that is learned for each interaction type, the threshold value can be set for each interaction type. In addition, when detecting an interaction that crosses an area boundary, it is also possible to reduce the amount of calculation and determine the interaction between people located in different areas without performing the interaction determination between people located in the same area. It is suitable for reducing false positives.
 インタラクション判定の結果、インタラクションが行われた、すなわち事象発生と判定された場合、ステップS5の分岐によりステップS6以降の処理へ移行する。一方で、事象が発生していないと判定された場合、ステップS6以降の処理は行わず、次フレームの入力を待機してステップS1へ戻る。 As a result of the interaction determination, when the interaction is performed, that is, it is determined that an event has occurred, the process proceeds to the process after step S6 by the branch of step S5. On the other hand, if it is determined that no event has occurred, the processing after step S6 is not performed, and the input of the next frame is waited for and the process returns to step S1.
 ステップS6からステップS8では、事象を発生させた人物ごとに属性算出を行う。次に、ステップS9からステップS11では、検出事象ごとに監視重要度判定を行う。判定された監視重要度についてステップS12で出力制御を行ったのち、次フレームの入力を待機してステップS1へ戻る。 In steps S6 to S8, the attribute is calculated for each person who caused the event. Next, in steps S9 to S11, the monitoring importance is determined for each detected event. After controlling the output of the determined monitoring importance in step S12, the process returns to step S1 after waiting for the input of the next frame.
 なお、本図中に示したフローの処理は、必ずしも単一のプロセスで処理される必要はなく、計算効率向上のために、複数のプロセスを用いて非同期に処理されてもよい。 Note that the flow processing shown in this figure does not necessarily have to be processed by a single process, and may be processed asynchronously using a plurality of processes in order to improve calculation efficiency.
 次に、図5を参照して、本実施の形態における監視基準情報の設定例を示す。本実施例の形態における監視基準情報は、表51のセキュリティレベル設定情報、表52のセキュリティレベル設定対象への重み、そして表53のインタラクション種別ごとの実行者重みから構成される。表51および表52から、インタラクション種別、人物属性、発生エリアを考慮した、個人ごとの重み付き点数を算出し、さらに、前記重み付き点数と表53の実行者重みを用いて監視重要度を算出する。これらの情報は管理制御部43から設定され、監視基準情報331へ保存され、監視重要度判定部323で読み込まれる。以下、各表の設定内容と効果について詳細に説明する。 Next, with reference to FIG. 5, a setting example of monitoring standard information in the present embodiment is shown. The monitoring standard information in the embodiment of this embodiment is composed of the security level setting information in Table 51, the weight for the security level setting target in Table 52, and the performer weight for each interaction type in Table 53. From Tables 51 and 52, the weighted points for each individual are calculated in consideration of the interaction type, the person attribute, and the occurrence area, and further, the monitoring importance is calculated using the weighted points and the performer weight in Table 53. do. These pieces of information are set by the management control unit 43, stored in the monitoring reference information 331, and read by the monitoring importance determination unit 323. The settings and effects of each table will be described in detail below.
 表51は、インタラクション種別ごとのセキュリティレベルを設定する表511、属性種別ごとのセキュリティレベルを設定する表512、エリア種別ごとのセキュリティレベルを設定する表513という三種のセキュリティレベルの設定テーブルから構成される。本実施例の形態では、セキュリティレベルは3点から0点の4段階で設定されるものとし、点数の降順に「レベル高」、「レベル中」、「レベル低」、「レベルなし」とする。レベル高は最も注意レベルの高い対象を示し、レベルなしは注意を要さない対象を示す。
 表511に示すインタラクションセキュリティレベルの設定では、例えば「受け渡し」は1点、「暴行」は3点のように、インタラクション種別ごとの重要度が設定されている。同様に、表512に示す属性セキュリティレベルの設定では、「スタッフ」は0点、「出入禁止対象者」は3点のように設定され、表513に示すエリアセキュリティレベルの設定では、「入場ゲート内」を3点、「売店」を2点のように設定されている。
Table 51 is composed of three types of security level setting tables: Table 511 for setting the security level for each interaction type, Table 512 for setting the security level for each attribute type, and Table 513 for setting the security level for each area type. To. In the embodiment of this embodiment, the security level is set in 4 stages from 3 points to 0 points, and is set to "high level", "medium level", "low level", and "no level" in descending order of points. .. High level indicates the object with the highest attention level, and no level indicates the object that does not require attention.
In the interaction security level settings shown in Table 511, the importance of each interaction type is set, for example, 1 point for "delivery" and 3 points for "assault". Similarly, in the attribute security level setting shown in Table 512, "staff" is set to 0 points, "entrance prohibited person" is set to 3 points, and in the area security level setting shown in Table 513, "entrance gate" is set. "Inside" is set as 3 points, and "Shop" is set as 2 points.
 各セキュリティレベルとも、設定対象外の項目は明示的にレベル0として設定され、例えば、インタラクションセキュリティレベルの設定欄においては、「握手」や「ハグ」がレベル0として設定されている。本実施の形態では、表511、表512、および表513の各項目について3点から0点の4段階の点数としたが、階級数は本実施の形態に限定されるものではなく、設定者が自由に設定できることが望ましい。 For each security level, items that are not subject to setting are explicitly set as level 0. For example, in the interaction security level setting field, "handshake" and "hug" are set as level 0. In the present embodiment, each item in Table 511, Table 512, and Table 513 is scored in four stages from 3 points to 0 points, but the class number is not limited to the present embodiment, and the setter It is desirable that can be set freely.
 表52は、三種のセキュリティレベルの設定対象それぞれに対する重みを格納する設定テーブルである。表52の例では、三種の重みの総和を100%とし、インタラクションが30%、属性が20%、エリアが50%と設定されている。本実施の形態においては、これらの重みを用いて表51の括弧内の点数から重み付き点数を計算し、個人ごとの監視重要度の算出に利用する。例えば、「受け渡し」行為を「一般/青年」が「入場ゲート内」で行った場合、「一般/青年」へ付与される重み付き点数は、表52によって1×0.3+2×0.2+3×0.5=2.2点と計算される。三種のセキュリティレベルの設定対象それぞれに対して重みを設定可能にすることで、施設ごとに異なる需要に柔軟に対応可能になる。例えば、行われたインタラクション種別と人物属性のみを重視するがエリアは問わないという需要があった場合、インタラクションを70%、属性を30%、エリアを0%のように設定することができる。また、一つ以上の設定対象で0点となった場合、その個人の重み付き点数を0点としてもよい。例えば、属性が「警備員」と判定された人物がいずれかのインタラクションを行った場合、職務の範囲において行ったインタラクションであることが考えられ、「警備員」を監視対象とすることは一般に不適切と考えられるためである。上式は重み付き点数の算出式の一例であり、異なる算出式を用いてもよい。 Table 52 is a setting table that stores weights for each of the three security level setting targets. In the example of Table 52, the sum of the weights of the three types is set to 100%, the interaction is set to 30%, the attribute is set to 20%, and the area is set to 50%. In the present embodiment, these weights are used to calculate the weighted points from the points in parentheses in Table 51, which are used to calculate the monitoring importance for each individual. For example, if the "delivery" act is performed by the "general / youth" in the "entrance gate", the weighted points given to the "general / youth" are 1 x 0.3 + 2 x 0.2 + 3 x according to Table 52. It is calculated as 0.5 = 2.2 points. By making it possible to set weights for each of the three types of security level setting targets, it is possible to flexibly respond to different demands for each facility. For example, if there is a demand that only the type of interaction and the attribute of the person performed are emphasized but the area does not matter, the interaction can be set to 70%, the attribute can be set to 30%, and the area can be set to 0%. Further, when one or more setting targets get 0 points, the weighted points of the individual may be set to 0 points. For example, if a person whose attribute is determined to be a "security guard" engages in any of the interactions, it is considered that the interaction was performed within the scope of the job, and it is generally not possible to monitor the "security guard". This is because it is considered appropriate. The above formula is an example of a formula for calculating weighted points, and different formulas may be used.
 表53は、表51および表52から算出された重み付き点数に対して、インタラクションの方向を考慮し、監視重要度を計算するための設定テーブルである。表53を参照すると、例えば「受け渡し」行為は実行者、すなわち物品を受け渡した人物に対する重みは20%と設定される。一方で、被実行者、すなわち物品を受け渡された人物に対する重みは80%と設定される。この例のように、実行者の方が重みが少なくなる設定は、実行者よりも被実行者の重要度が高いことを意味する。「受け渡し」行為においては、物品を手放した人物よりも、その物品を受け取った人物の重要度が高いことが想定される。また、双方向のインタラクションとなる「もみ合い」行為は実行者および被実行者の重みが等価となるよう、表53においては50%と設定されている。そして、実行者である加害者と被実行者である被害者の方向性が明らかな「暴行」行為に関しては、実行者の重みが90%となるように設定されている。重み付き点数から監視重要度を算出する例として、「受け渡し」行為を「一般/青年」が「入場ゲート内」で行った場合に「一般/青年」へ付与される重み付き点数は、前述したように表51および表52から2.2点となるが、「受け渡し」行為の実行者重みが20%と設定されている場合、監視重要度は2.2×0.2=0.44となる。 Table 53 is a setting table for calculating the monitoring importance in consideration of the direction of interaction with respect to the weighted points calculated from Tables 51 and 52. Referring to Table 53, for example, the "delivery" act is set to have a weight of 20% for the performer, that is, the person who delivered the goods. On the other hand, the weight for the person to be executed, that is, the person to whom the article is delivered is set to 80%. A setting in which the executor has less weight, as in this example, means that the executor is more important than the executor. In the act of "delivery", it is assumed that the person who received the item is more important than the person who gave up the item. Further, the "frustration" act, which is a two-way interaction, is set to 50% in Table 53 so that the weights of the performer and the person to be executed are equivalent. The weight of the performer is set to 90% for the "assault" act in which the direction of the perpetrator who is the performer and the victim who is the victim is clear. As an example of calculating the monitoring importance from the weighted points, the weighted points given to the "general / youth" when the "general / youth" performs the "delivery" act "inside the entrance gate" are described above. As shown in Tables 51 and 52, the points are 2.2 points, but when the performer weight of the "delivery" action is set to 20%, the monitoring importance is 2.2 x 0.2 = 0.44. Become.
 監視重要度の大きい事象を発生させた人物が、続けて監視重要度の小さい事象を発生させた際に、監視重要度の値が上書きされ、低く見積もられてしまうことを避ける必要がある。そのため、複数のインタラクションを発生させた人物は、その人物に対する監視重要度の値がリセットされるまで、複数の事象で算出された監視重要度が加算される、または大きい事象の値が継続して採用され続けることが望ましい。 It is necessary to avoid that the value of the monitoring importance is overwritten and underestimated when the person who generated the event of high monitoring importance continuously generates the event of low monitoring importance. Therefore, for a person who has generated multiple interactions, the monitoring importance calculated for multiple events is added or the value of a large event continues until the monitoring importance value for that person is reset. It is desirable to continue to be adopted.
 また、同一人物が関与した複数のインタラクションの種別と方向を総合的に用いて、当該人物の監視重要度を求めたり、対応を異ならせることも可能である。例えば、歩行者同士がぶつかるという事象を種別「歩行者衝突」のインタラクションとして設定し、ぶつかった側を実行者、ぶつかられた側を被実行者として方向を規定したケースを考える。ある人物が関与したインタラクションの実績を参照したときに、「歩行者衝突」に高い頻度で関与しており、常に実行者側であった場合、この人物は故意にぶつかっている可能性が高いと考えることができる。このケースでは、監視重要度を高く設定し、同様の「歩行者衝突」に関与したときに即座に保安要員を派遣するという対応をとればよい。一方、「歩行者衝突」に高い頻度で関与しているが、方向が一定ではない(実行者側と被実行者側が同程度)ならば、体調不良などの可能性が考えられる。このケースでは、監視重要度を高く設定し、「うずくまる」などの挙動がみられたときに救護要員を派遣するという対応が望ましい。 It is also possible to comprehensively use the types and directions of multiple interactions involving the same person to determine the monitoring importance of the person or to make different responses. For example, consider a case in which an event in which pedestrians collide with each other is set as an interaction of the type "pedestrian collision", and the direction is defined with the collided side as the executor and the collided side as the executed person. When referring to the results of interactions involving a person, if they are frequently involved in "pedestrian collisions" and are always on the performer side, it is highly likely that this person is intentionally hit. I can think. In this case, the importance of monitoring should be set high, and security personnel should be dispatched immediately when a similar "pedestrian collision" is involved. On the other hand, although it is frequently involved in "pedestrian collision", if the direction is not constant (the performer side and the executed person side are about the same), it is possible that the person is in poor physical condition. In this case, it is desirable to set the monitoring importance to a high level and dispatch rescue personnel when behavior such as "crouching" is observed.
 以上によって計算された監視重要度は出力制御部324へ送信される。出力制御部では、監視センタシステム4へ送信する事象数を抑制する目的で、監視重要度に閾値を設けることができる。例えば、閾値を2.0と設定すれば、監視重要度1.5と算出された監視対象者は送信されず、3.0と算出された個人は送信される。なお、本実施の形態では示していないが、現場スタッフや警備員など、発報対象者を監視重要度の点数に応じて指定してもよい。 The monitoring importance calculated by the above is transmitted to the output control unit 324. In the output control unit, a threshold value can be set for the monitoring importance for the purpose of suppressing the number of events transmitted to the monitoring center system 4. For example, if the threshold value is set to 2.0, the monitored person calculated to have a monitoring importance of 1.5 is not transmitted, and the individual calculated to be 3.0 is transmitted. Although not shown in this embodiment, a person to be notified, such as a field staff or a security guard, may be designated according to the score of monitoring importance.
 次に、図6および図7を参照して、図5の監視基準情報を設定するためのGUI(Graphical User Interface)を説明する。図6および図7は、本実施の形態における監視基準情報の設定画面例を示した図である。また、図6は表51の表を作成するための設定画面であり、図7は表52および表53の表を作成するための設定画面である。 Next, a GUI (Graphical User Interface) for setting the monitoring reference information of FIG. 5 will be described with reference to FIGS. 6 and 7. 6 and 7 are views showing an example of a setting screen for monitoring reference information according to the present embodiment. Further, FIG. 6 is a setting screen for creating the table of Table 51, and FIG. 7 is a setting screen for creating the tables of Table 52 and Table 53.
 図6は、インタラクション種別、人物の属性、そしてエリアに関するセキュリティレベルを設定するGUIであり、以下では特に領域611に示すインタラクションセキュリティレベルの設定について説明する。領域611におけるインタラクションセキュリティレベルの段階数は1点から3点の3段階となっているが、前述の通り、段階数は本実施の形態に限定されるものではなく、設定者が自由に設定できることが望ましい。また、セキュリティレベルの大きさも同様に設定者側が自由に設定できることが望ましい。さらに、セキュリティレベルを設定しないインタラクションについては、重要度を0点として明示できるようにしてもよい。設定者は、領域6111で示すプルダウン欄を押下し、セキュリティレベルの列ごとに、リストから登録されているインタラクションを選択することができる。選択したインタラクション種別を領域6112の「追加」ボタンを押下し、下段の「登録済みのインタラクション」リストへ追加する。追加後、登録済みのインタラクションを削除するためには、削除を所望するインタラクションに対応する領域6113のチェックボックスを押下し、領域6114を押下する。領域6115「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報331へ反映される。本領域で設定された情報を確認するためには、領域6116を押下すればよい。領域612に示す属性セキュリティレベルの設定および領域613に示すエリアセキュリティレベルの設定も、同様に行うことができる。ただし、エリア種別は、システムによって選択肢が固定されているインタラクション種別および属性種別と異なり、システムが想定していないエリアにも対応できるように、エリア種別自体の追加が別途可能である。 FIG. 6 is a GUI for setting the interaction type, the attribute of the person, and the security level regarding the area, and in particular, the setting of the interaction security level shown in the area 611 will be described below. The number of stages of interaction security level in region 611 is three stages from 1 point to 3 points, but as described above, the number of stages is not limited to this embodiment and can be freely set by the setter. Is desirable. It is also desirable that the size of the security level can be freely set by the setter. Further, for an interaction for which a security level is not set, the importance may be specified as 0 point. The setter can press the pull-down column shown in the area 6111 and select the interaction registered from the list for each security level column. Click the "Add" button in the area 6112 to add the selected interaction type to the "Registered Interactions" list at the bottom. After the addition, in order to delete the registered interaction, the check box of the area 6113 corresponding to the interaction desired to be deleted is pressed, and the area 6114 is pressed. By pressing the area 6115 "Save settings" button, the information set in this area is reflected in the monitoring reference information 331. In order to confirm the information set in this area, the area 6116 may be pressed. The attribute security level setting shown in the area 612 and the area security level setting shown in the area 613 can be set in the same manner. However, the area type is different from the interaction type and attribute type in which the options are fixed depending on the system, and the area type itself can be added separately so that the area that the system does not anticipate can be dealt with.
 図7における領域62は、セキュリティレベル設定対象の重みを設定するためのGUIであり、本領域の設定で表52を設定する。領域621では、インタラクション、属性、およびエリアに関する重みを百分率で設定する。領域622「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報331へ反映される。本領域で設定された情報を確認するためには、領域623を押下すればよい。 The area 62 in FIG. 7 is a GUI for setting the weight of the security level setting target, and Table 52 is set in the setting of this area. Region 621 sets weights for interactions, attributes, and areas as a percentage. By pressing the area 622 "Save settings" button, the information set in this area is reflected in the monitoring reference information 331. In order to confirm the information set in this area, the area 623 may be pressed.
 領域63はインタラクション種別ごとの実行者重みを設定するためのGUIであり、本領域の設定で表53を設定する。領域631では、インタラクション種別ごとに実行者側の重みを百分率で設定する。「実行者重み」の入力によって被実行者側に対する重みが自動計算され、表示されるようにしてもよい。または、入力の対象は被実行者側でもよい。領域631で設定が要求されるインタラクションは、領域611で登録され、かつ0点より大きい点数が付与されている全ての行動であり、領域6115の押下による登録と同時に対応する行が自動追加される。領域632「設定を保存」ボタンを押下することで、本領域で設定された情報は監視基準情報331へ反映される。本領域で設定された情報を確認するためには、領域633を押下すればよい。ただし、領域611で登録されたにもかかわらず、領域63で実行者側重みが登録されていないという不整合を回避するため、本設定画面全体の終了時に、登録事項の不備に関して注意を促す表示を出力するなどの実装を行うことが望ましい。 Area 63 is a GUI for setting the performer weight for each interaction type, and Table 53 is set in the setting of this area. In the area 631, the weight on the performer side is set as a percentage for each interaction type. The weight for the person to be executed may be automatically calculated and displayed by inputting the "executor weight". Alternatively, the input target may be the person to be executed. The interactions required to be set in the area 631 are all actions registered in the area 611 and given a score larger than 0 points, and the corresponding line is automatically added at the same time as the registration by pressing the area 6115. .. By pressing the area 632 "Save settings" button, the information set in this area is reflected in the monitoring reference information 331. In order to confirm the information set in this area, the area 633 may be pressed. However, in order to avoid inconsistency that the performer side weight is not registered in the area 63 even though it is registered in the area 611, a display calling attention to the deficiency of the registered items at the end of the entire setting screen. It is desirable to implement such as outputting.
 次に、図8を参照して、監視員に対する検出事象の通知画面例について説明する。図8は、本実施の形態における映像表示部42における表示例を示した図であり、領域7は出力画面を示す。領域7が通知画面の全体に表示されてもよいし、通知画面の一部に表示されてもよい。 Next, an example of a detection event notification screen for the observer will be described with reference to FIG. FIG. 8 is a diagram showing a display example in the video display unit 42 in the present embodiment, and the area 7 shows an output screen. The area 7 may be displayed on the entire notification screen or may be displayed on a part of the notification screen.
 領域7のうち、画面A(領域71)、画面B(領域72)、画面C(領域73)、画面D(領域74)に表示されている人物は、監視重要度が設定され、監視センタシステム4へ送信された人物である。本実施の形態では、これらの全人物について現在時刻における映像を表示する例を示している。本表示で用いる映像は、事象を発生させた人物の追跡に基づくリアルタイムの映像である。人物追跡は前述の通り人物検出部3211が行い、前記映像は監視重要度に関する情報とともに映像表示部42へ送信される。また、領域75は、各画面に表示されている人物が監視重要度順に示され、検出事象、発生場所、発生時刻、および現在位置が示されている。監視重要度順として表示されている列は、監視重要度判定部323で出力された実際の値を表示してもよい。 Of the areas 7, the person displayed on the screen A (area 71), the screen B (area 72), the screen C (area 73), and the screen D (area 74) has a monitoring importance set, and the monitoring center system. The person sent to 4. In this embodiment, an example of displaying an image at the current time for all of these persons is shown. The video used in this display is a real-time video based on the tracking of the person who caused the event. The person tracking is performed by the person detection unit 3211 as described above, and the video is transmitted to the video display unit 42 together with information on the monitoring importance. Further, in the area 75, the person displayed on each screen is shown in the order of monitoring importance, and the detected event, the place of occurrence, the time of occurrence, and the current position are shown. The column displayed in the order of monitoring importance may display the actual value output by the monitoring importance determination unit 323.
 本実施例の形態では、画面AからDの領域サイズは判定された監視重要度に応じて動的に変更される。例えば、全人物の監視重要度がリセットされ、いずれの人物も画面に表示されていない時、時刻「09:20:00」に「受け渡し」行為を行った人物aは最も大きい画面で表示される。次に、時刻「09:30:50」に「暴行」行為を行った人物bが、前記「受け渡し」行為を行った人物よりも監視重要度が大きいと判定された場合、人物bの表示領域は、人物aの表示領域よりも大きくなる。なお、画面に表示されている人物は、同一画面に撮影される他の人物との識別を容易にするため、人物の検出枠の重畳または画像のトリミングなどの画像処理を施して表示することが望ましい。また、画面サイズの制約上、複数の事象が発生している場合には、画面をスクロールできるようにしてもよい。 In the embodiment of this embodiment, the area sizes of screens A to D are dynamically changed according to the determined monitoring importance. For example, when the monitoring importance of all persons is reset and none of the persons are displayed on the screen, the person a who performs the "delivery" act at the time "09:20:00" is displayed on the largest screen. .. Next, when it is determined that the person b who performed the "assault" act at the time "09:30:50" has a higher monitoring importance than the person who performed the "delivery" act, the display area of the person b. Is larger than the display area of the person a. In addition, the person displayed on the screen may be displayed by performing image processing such as superimposing the detection frame of the person or trimming the image in order to easily distinguish it from other people photographed on the same screen. desirable. Further, due to the limitation of the screen size, the screen may be scrolled when a plurality of events occur.
 以上に述べたようなリアルタイムの追跡映像のみならず、表示画面や領域75の行を選択することにより、事象発生時の様子を切り替えられることが望ましい。図9は、本実施の形態における画像表示部の表示例を示した図であり、図8における領域75の一事象を選択した様子を示している。図9では、画面B(領域76)が選択され、追跡されている人物が発生させたインタラクションについて検出時のフレームを表示している。インタラクションは一定の時間の幅をもって行われるため、インタラクションの開始と終了のうち、最も判定確度が高いフレームを表示することが望ましい。または、インタラクションの開始から終了までの短時間のクリップを再生できるようにしてもよい。以上により、監視員は、事象発生時にどのような状況でインタラクションが行われたのか把握することができる。さらに、以上のようにして確認された事象に対してスタッフによる対応が完了した場合、対応が不要と判断された場合、または誤検知が明らかな場合等は、表示画面や領域75の行を選択し、削除することができる。 It is desirable to be able to switch the state when an event occurs by selecting not only the real-time tracking image as described above but also the display screen and the line of the area 75. FIG. 9 is a diagram showing a display example of the image display unit in the present embodiment, and shows a state in which one event of the region 75 in FIG. 8 is selected. In FIG. 9, screen B (region 76) is selected and displays a frame at the time of detection for the interaction generated by the tracked person. Since the interaction takes place over a certain period of time, it is desirable to display the frame with the highest determination accuracy among the start and end of the interaction. Alternatively, it may be possible to play a short clip from the start to the end of the interaction. From the above, the observer can grasp under what circumstances the interaction was performed when the event occurred. Furthermore, if the staff has completed the response to the event confirmed as described above, if it is determined that the response is unnecessary, or if false positives are obvious, select the line on the display screen or area 75. And can be deleted.
 また、本画面を確認することができるのは、監視センタの監視員のような、大型のディスプレイの利用が想定される発報対象者に限らず、現場で対応するスタッフや警備員であっても、スマートフォン端末やタブレット端末、またはARゴーグル等を利用することによって、領域7の一部または全部を現場において確認することができる。 In addition, this screen can be checked not only by the person to be notified, such as the monitor of the monitoring center, who is expected to use a large display, but also by the staff and security guards who respond at the site. Also, by using a smartphone terminal, a tablet terminal, AR goggles, or the like, a part or all of the area 7 can be confirmed at the site.
 次に、図10を参照して、インタラクション種別や属性等を利用した検索手段について説明する。映像表示部42において監視員や現場スタッフ等による対応が完了した事象や、一旦対応の必要なしと判断された事象は、表示出力から削除されるため、その後事象を確認する場合には、発生事象に関する情報のデータベースである記録部41からの事象の検索手段が必要となる。 Next, with reference to FIG. 10, a search method using interaction types, attributes, and the like will be described. Events for which the response by the observer, field staff, etc. has been completed in the video display unit 42, and events for which it is once determined that no response is necessary are deleted from the display output. A means for searching for an event from the recording unit 41, which is a database of information related to the above, is required.
 図10は、本実施の形態における検索部の表示例を示した図であり、領域9は検索画面全体を示す。領域9のうち、領域91は検索の入力画面を示し、領域92は出力画面を示す。
 領域91において、領域911では、インタラクション種別、属性、および発生時刻をクエリとした事象の絞り込みを行う。具体的に、領域911の各検索項目列についてプルダウン欄を押下し、リストから登録されている項目を選択する。選択した項目は「追加」ボタンの押下により、下段の「登録済みの項目」リストへ追加される。追加後、登録済みの項目を削除するためには、削除を所望する項目のチェックボックスを押下し、領域912を押下する。なお、本実施例の形態においては、インタラクション種別、属性、および発生時刻をクエリとした検索の例を示しているが、インタラクションの方向やエリア情報などをクエリとした検索を行ってもよい。また、各列は必ずしも入力が要求されるものではなく、空欄であってもよい。例えば、全項目を空欄にして検索を実行すると、記憶部41に保存されている全情報が検索結果として出力される。
FIG. 10 is a diagram showing a display example of the search unit according to the present embodiment, and the area 9 shows the entire search screen. Of the areas 9, the area 91 shows the search input screen, and the area 92 shows the output screen.
In the area 91, in the area 911, the events are narrowed down by using the interaction type, the attribute, and the occurrence time as a query. Specifically, the pull-down column is pressed for each search item column in the area 911, and the registered item is selected from the list. The selected item is added to the "Registered Items" list at the bottom by pressing the "Add" button. After the addition, in order to delete the registered item, the check box of the item desired to be deleted is pressed, and the area 912 is pressed. In the embodiment of the present embodiment, an example of a search using the interaction type, attribute, and occurrence time as a query is shown, but the search may be performed using the interaction direction, area information, or the like as a query. Further, each column is not necessarily required to be input and may be blank. For example, if all the items are left blank and the search is executed, all the information stored in the storage unit 41 is output as the search result.
 領域92において、領域921では検索結果に関して、現在場所や発生時刻等の情報が表示される。領域921に示すように、現在時刻においても監視エリア内での滞在が確認され、追跡が可能であるならば、現在位置に関する情報を表示してもよい。領域922では、領域921の情報に対応するフレーム画像またはインタラクションの開始から終了までの短時間のクリップ映像等を表示する。以上、領域921と領域922をまとめて一件の検索結果とし、本実施例の形態においては、スクロールによって検索結果全体を確認することができる。検索結果は、領域922に示す画像または映像のみによるグリッド状に切り替えられるようにしてもよい。 In the area 92, information such as the current location and the time of occurrence is displayed in the area 921 regarding the search result. As shown in area 921, information about the current position may be displayed as long as the stay in the monitoring area is confirmed and traceable even at the current time. In the area 922, a frame image corresponding to the information in the area 921, a short-time clip image from the start to the end of the interaction, or the like is displayed. As described above, the area 921 and the area 922 are collectively regarded as one search result, and in the embodiment of the present embodiment, the entire search result can be confirmed by scrolling. The search results may be switched in a grid pattern consisting only of the images or videos shown in the area 922.
 検索部44の利用によって、映像表示部42による表示出力から削除された事象であったとしても、記録部41から効率的に事象を検索することができる。また、類似事例の検索やその発生件数を確認することができるため、今後発生することが予想される事象への対応策や防止策を講じるために役立てることができる。 By using the search unit 44, even if the event is deleted from the display output by the video display unit 42, the event can be efficiently searched from the recording unit 41. In addition, since it is possible to search for similar cases and check the number of such cases, it can be useful for taking countermeasures and preventive measures for events that are expected to occur in the future.
 以上に説明したように、映像監視システム1は、監視映像からインタラクションを検出し、前記インタラクションの種別および方向、ならびに人物の属性とエリア情報を用いて、個人ごとに監視重要度を判定する。本発明によれば、監視エリア内でインタラクションを行った人物について全員に等価な監視重要度を与えるのではなく、個人ごとに監視重要度を設定するため、監視員が対応に優先順位を設定することが容易になり、行為者への効率的な対応を行うことができる。 As described above, the video surveillance system 1 detects an interaction from the surveillance video, and determines the monitoring importance for each individual using the type and direction of the interaction, as well as the attributes and area information of the person. According to the present invention, the observer sets the priority of the response in order to set the monitoring importance for each individual instead of giving the equivalent monitoring importance to all the persons who interacted in the monitoring area. This makes it easier to deal with the actors efficiently.
 以下、本発明である映像監視システム1の別の実施形態について説明する。なお、上述した実施の形態と共通する発明については説明を省略し、本実施の形態における特有の処理について説明する。 Hereinafter, another embodiment of the video surveillance system 1 of the present invention will be described. The inventions common to the above-described embodiments will be omitted, and the specific processing in the present embodiment will be described.
 前述した実施例における映像表示部42における表示例では、現在時刻において監視重要度が設定された全事象について、監視重要度順に従って表示を行っている。一方で、ある一事象のみに着目した表示方法も考えられる。 In the display example in the video display unit 42 in the above-described embodiment, all the events for which the monitoring importance is set at the current time are displayed in the order of monitoring importance. On the other hand, a display method focusing on only one event can be considered.
 図11は、本実施の形態における映像表示部42によるの表示例を示した図である。監視重要度が設定された二名間のインタラクションについて、各人物の情報を詳細に示したものである。領域81はインタラクションが検出されたフレーム画像を示している。前記フレーム画像には、人物811と人物812の間における受け渡し行為の様子が示されている。また、領域84には、二名の人物の属性、発生場所、および現在位置、ならびに、受け渡しの方向および発生場所に関する情報が示されている。前記領域84に記載の通り、二名の人物に関する情報は、画面X(領域82)および画面Y(領域83)にてそれぞれ分かれて表示されている。前記領域82および領域83の内部の画面について、領域821および領域831では、領域81において撮像された各人物の画像が、監視員が容易に視認できるように倍率調整されて表示されている。また、領域822および領域832では、フロアマップを用いて、事象の発生場所、人物の現在位置、および移動軌跡が示されている。本図面においては、丸印が事象の発生現場、五角形が人物の現在位置と進行方向、そして点線が事象の発生現場から現在位置までを結ぶ移動軌跡を示す。また、領域823および領域833では、現在時刻における人物の様子が確認できる。領域82および領域83に示す画面によって、事象を発生させてから現在時刻に至るまでの実行者および被実行者の移動軌跡を把握することができる。 FIG. 11 is a diagram showing a display example by the video display unit 42 in the present embodiment. It shows the information of each person in detail about the interaction between two people whose monitoring importance is set. Region 81 shows the frame image in which the interaction was detected. The frame image shows a state of delivery between the person 811 and the person 812. In addition, the area 84 shows information about the attributes, the place of occurrence, and the current position of the two persons, as well as the direction of delivery and the place of occurrence. As described in the area 84, the information about the two persons is displayed separately on the screen X (area 82) and the screen Y (area 83), respectively. Regarding the screens inside the area 82 and the area 83, in the area 821 and the area 831, the image of each person captured in the area 81 is displayed with the magnification adjusted so that the observer can easily see it. Further, in the area 822 and the area 832, the place where the event occurs, the current position of the person, and the movement locus are shown by using the floor map. In this drawing, the circle indicates the event occurrence site, the pentagon indicates the current position and traveling direction of the person, and the dotted line indicates the movement trajectory connecting the event occurrence site to the current position. Further, in the area 823 and the area 833, the state of the person at the current time can be confirmed. From the screens shown in the area 82 and the area 83, it is possible to grasp the movement locus of the performer and the person to be executed from the occurrence of the event to the current time.
 フロアマップへの移動軌跡の表示を行うために、以下の処理が必要である。まず、人物811と人物812の画像が取得されたのち、カメラから取得されたフレーム画像に対して、一定時刻ごと、または一定フレームごとに人物同定を行う。次に、フロアマップ上において前記人物の位置を特定するため、事前に設定されたエリア情報を用いるか、または、ステレオカメラや単眼カメラによる深度推定技術等を用いて、事前設定を行うことなく、世界座標系における人物の位置を推定する。取得した位置情報と時刻情報を時系列で繋ぐことで、前記人物の移動軌跡をフロアマップへ表示することができる。 The following processing is required to display the movement trajectory on the floor map. First, after the images of the person 811 and the person 812 are acquired, the person is identified for the frame image acquired from the camera at regular time intervals or at regular frame intervals. Next, in order to identify the position of the person on the floor map, the area information set in advance is used, or the depth estimation technique using a stereo camera or a monocular camera is used without performing the preset setting. Estimate the position of a person in the world coordinate system. By connecting the acquired position information and time information in chronological order, the movement trajectory of the person can be displayed on the floor map.
 また、本図面では実行者および被実行者共に事象発生からの移動軌跡を示しているが、事象を発生させていない人物の追跡が既に行われている、または別途用意された記憶媒体へ保存済みの映像を利用することで事象発生後に当該人物の追跡が行える場合、各人物について事象の発生現場までの移動軌跡を表示してもよい。例えば、受け渡し行為においては、物品を受け渡した実行者は事象発生までの移動軌跡を表示し、物品を受け渡された被実行者は事象発生後からの移動軌跡を表示することで、人物ではなく物品の移動軌跡に着目した映像監視を行うことができる。 In addition, although both the performer and the person to be executed show the movement trajectory from the occurrence of the event in this drawing, the person who has not caused the event has already been tracked or saved in a separately prepared storage medium. If the person can be tracked after the event occurs by using the video of the above, the movement trajectory of each person to the event occurrence site may be displayed. For example, in the delivery act, the executor who delivered the article displays the movement trajectory until the event occurs, and the person to be delivered the article displays the movement trajectory after the event occurs, so that the person is not a person. It is possible to perform video monitoring focusing on the movement trajectory of an article.
 以上に説明したように、本実施の形態によれば、画像表示部では一事象のみに着目した表示が可能である。具体的に、インタラクションを発生させた人物それぞれの事象発生時の様子、現在時刻における様子、および移動軌跡を表示することができ、さらに、事象発生までの実行者らの移動軌跡も表示することができる。これにより、監視員は事象の有無のみならず、実行者らの事象発生前後の動きも一目で確認することができるため、容易かつ正確に事象の詳細情報を把握することができる。 As described above, according to the present embodiment, the image display unit can display only one event. Specifically, it is possible to display the state at the time of event occurrence, the state at the current time, and the movement trajectory of each person who generated the interaction, and further, the movement trajectory of the performers until the event occurrence can be displayed. can. As a result, the observer can check not only the presence or absence of the event but also the movements of the performers before and after the event occurrence at a glance, so that the detailed information of the event can be easily and accurately grasped.
 上述の各実施の形態に示した通り、開示の映像解析システム3は、監視領域を撮影した映像を用いて、監視領域における事象を検出する映像解析システムであって、映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、インタラクションの種類と、複数の人物の各々がインタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出部322と、インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定部323と、監視重要度に基づいて、事象の検出結果を出力する出力制御部324とを有する。かかる構成及び動作により、監視エリア内でインタラクションを行った複数の人物ごとに監視重要度を設定し、監視における対応者の業務負荷の軽減およびシステムの処理負荷の低減を実現することができる。 As shown in each of the above-described embodiments, the disclosed video analysis system 3 is a video analysis system that detects an event in the monitoring area by using a video captured in the monitoring area, and is a plurality of video analysis systems based on the video. An interaction detection unit 322 that detects an interaction, which is an event caused by the involvement of a person, and outputs the type of interaction and the direction of the interaction indicating how each of the plurality of people interacted with another person in the interaction. The monitoring importance determination unit 323, which determines the monitoring importance of each person for a plurality of persons involved in the interaction by comparing the type and direction of the interaction with the preset monitoring standard information, and the monitoring importance. Based on this, it has an output control unit 324 that outputs an event detection result. With such a configuration and operation, it is possible to set the monitoring importance for each of a plurality of persons interacting in the monitoring area, reduce the work load of the responder in monitoring, and reduce the processing load of the system.
 また、映像に含まれる人物の像を検出し、検出された人物の属性を表す属性特徴量を算出する算出部321をさらに備え、監視重要度判定部323は、属性特徴量をさらに用いて監視重要度を判定するので、人物の属性を考慮した高精度な判定が可能である。 Further, a calculation unit 321 that detects an image of a person included in the video and calculates an attribute feature amount representing the attribute of the detected person is further provided, and the monitoring importance determination unit 323 further monitors using the attribute feature amount. Since the importance is determined, it is possible to make a highly accurate determination in consideration of the attributes of the person.
 また、インタラクション検出部322は、映像に基づいて、人物の骨格を検出し、検出した骨格の推定結果から算出される人物の姿勢を表す姿勢特徴量と、人物間の任意の部位間の一つまたは複数の距離から算出される距離特徴量と、映像に基づいた前後の画像フレーム間の差分から算出される骨格の単位時間当たりの移動量を表す移動特徴量と、人物に対する物品の所有関係を表現する物品特徴量と、の中から少なくともいずれか一つを算出し、算出した特徴量に基づいて、インタラクションの種類及びインタラクションの方向を検出する。かかる構成によれば、人物の姿勢や人物間の距離、物品の種別などを考慮した高精度な判定が可能である。 Further, the interaction detection unit 322 detects the skeleton of the person based on the image, and the posture feature amount representing the posture of the person calculated from the estimation result of the detected skeleton and one between arbitrary parts between the people. Or, the distance feature amount calculated from multiple distances, the movement feature amount representing the movement amount per unit time of the skeleton calculated from the difference between the previous and next image frames based on the image, and the ownership relationship of the article to the person. At least one of the feature amount of the article to be expressed is calculated, and the type of interaction and the direction of the interaction are detected based on the calculated feature amount. According to such a configuration, it is possible to make a highly accurate determination in consideration of the posture of the person, the distance between the persons, the type of the article, and the like.
 さらに、監視重要度判定部323は、インタラクションが発生した時点の複数の人物の位置に関する情報を利用して、監視重要度を判定するので、位置関係から不合理な事象を除外し、判定精度を向上することができる。 Further, the monitoring importance determination unit 323 determines the monitoring importance by using the information regarding the positions of a plurality of persons at the time of the interaction, so that the irrational event is excluded from the positional relationship and the determination accuracy is improved. Can be improved.
 また、映像解析システム3は、検出対象のインタラクションを発生させた各人物に対し監視重要度を判定するための、インタラクションの種類、インタラクションの方向、及び、発生エリア毎のセキュリティレベルの情報を監視基準情報として保持する記憶部33を有する。これらの情報を予め保持し、適宜使用することで、簡易且つ高精度な判定を実現することができる。 Further, the video analysis system 3 monitors information on the type of interaction, the direction of the interaction, and the security level for each occurrence area in order to determine the monitoring importance for each person who generated the interaction to be detected. It has a storage unit 33 that holds it as information. By retaining this information in advance and using it as appropriate, a simple and highly accurate determination can be realized.
 また、インタラクションの種類及び/または前記人物に関する情報を検索クエリとして、インタラクションの検出実績を検索することで、インタラクションを発生させた人物の検索が可能な検索部44を設けることで、検知して蓄積したインタラクションを有効利用することができる。 Further, by providing a search unit 44 capable of searching for the person who generated the interaction by searching the detection record of the interaction using the type of interaction and / or the information about the person as a search query, it is detected and accumulated. It is possible to make effective use of the interaction that has occurred.
 また、出力制御部324は、インタラクションを発生させた各人物に対する監視重要度に応じて、表示端末への表示の大きさを変化させる。このため、重要度を簡易に認識させるとともに、重要度に応じて情報量をコントロールすることができる。
 また、インタラクションを発生させた各人物について、インタラクションの時間的に前後の移動軌跡を画面上に表示することで、インタラクション前後の挙動を確認することができる。
 複数の人物の移動軌跡の生成の要否は、重要度により判定してもよい。この場合には、重要な人物について選択的に移動軌跡を出力可能となる。
 さらに、検出したインタラクションにおける複数の人物について、インタラクションに係る所定の行動の実行者または被実行者であることを示す情報を画面に表示することで、インタラクションの方向を明示してもよい。
 また、インタラクションが検出されたフレーム画像を画面に表示しつつ、当該インタラクションに関与した人物の現在の位置及び/または現在の映像を画面に表示することで、インタラクションの内容と人物の現状とを関連付けて表示することもできる。
Further, the output control unit 324 changes the size of the display on the display terminal according to the monitoring importance for each person who has generated the interaction. Therefore, it is possible to easily recognize the importance and control the amount of information according to the importance.
In addition, for each person who has generated an interaction, the behavior before and after the interaction can be confirmed by displaying the movement locus before and after the interaction on the screen.
Whether or not the movement locus of a plurality of persons needs to be generated may be determined by the importance. In this case, it is possible to selectively output the movement locus for an important person.
Further, for a plurality of persons in the detected interaction, the direction of the interaction may be clearly indicated by displaying on the screen information indicating that the person is the performer or the person to be executed of the predetermined action related to the interaction.
In addition, by displaying the frame image in which the interaction is detected on the screen and displaying the current position and / or the current image of the person involved in the interaction on the screen, the content of the interaction is associated with the current state of the person. Can also be displayed.
 なお、本発明は上述した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、SSD(Solid State Drive)等の記録装置、または、ICカード、SDカード、DVD等の記録媒体に置くことができる。 The present invention is not limited to the above-described embodiment, but includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration. Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
1…映像監視システム、
2…撮影システム、21…カメラ部、
3…映像解析システム、31…映像入力部、32…映像処理部、321…算出部、3211…人物検出部、3212…属性判定部、322…インタラクション検出部、323…監視重要度判定部、324…出力制御部、33…記憶部、331…監視基準情報、
4…監視センタシステム、41…記録部、42…映像表示部、43…管理制御部、44…検索部
1 ... Video surveillance system,
2 ... Shooting system, 21 ... Camera section,
3 ... Video analysis system, 31 ... Video input unit, 32 ... Video processing unit, 321 ... Calculation unit, 3211 ... Person detection unit, 3212 ... Attribute determination unit, 322 ... Interaction detection unit, 323 ... Monitoring importance determination unit, 324 ... output control unit, 33 ... storage unit, 331 ... monitoring reference information,
4 ... Monitoring center system, 41 ... Recording unit, 42 ... Video display unit, 43 ... Management control unit, 44 ... Search unit

Claims (13)

  1.  監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像解析システムにおいて、
     前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出部と、
     前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定部と、
     前記監視重要度に基づいて、前記事象の検出結果を出力する出力制御部と、を有する映像解析システム。
    In a video analysis system that detects an event in the surveillance area using an image of the surveillance area.
    Based on the video, an interaction that is an event caused by the involvement of a plurality of persons is detected, and the type of the interaction and the interaction showing how each of the plurality of persons interacts with another person in the interaction. The interaction detector that outputs the direction and
    A monitoring importance determination unit that compares the type and direction of the interaction with preset monitoring reference information to determine the monitoring importance of each person for a plurality of persons involved in the interaction.
    A video analysis system including an output control unit that outputs a detection result of the event based on the monitoring importance.
  2.  前記映像に含まれる人物の像を検出し、検出された人物の属性を表す属性特徴量を算出する算出部をさらに備え、
     前記監視重要度判定部は、前記属性特徴量をさらに用いて前記監視重要度を判定することを特徴とする請求項1に記載の映像解析システム。
    Further provided with a calculation unit that detects an image of a person included in the video and calculates an attribute feature amount representing the attribute of the detected person.
    The video analysis system according to claim 1, wherein the monitoring importance determination unit further uses the attribute feature amount to determine the monitoring importance.
  3.  前記インタラクション検出部は、前記映像に基づいて、前記人物の骨格を検出し、
     検出した前記骨格の推定結果から算出される前記人物の姿勢を表す姿勢特徴量と、
     前記人物間の任意の部位間の一つまたは複数の距離から算出される距離特徴量と、
     前記映像に基づいた前後の画像フレーム間の差分から算出される前記骨格の単位時間当たりの移動量を表す移動特徴量と、
     前記人物に対する物品の所有関係を表現する物品特徴量と、の中から少なくともいずれか一つを算出し、算出した特徴量に基づいて、前記インタラクションの種類及び前記インタラクションの方向を検出することを特徴とする請求項1に記載の映像解析システム。
    The interaction detection unit detects the skeleton of the person based on the image and determines the skeleton of the person.
    Posture features representing the posture of the person calculated from the detected estimation result of the skeleton, and
    A distance feature calculated from one or more distances between arbitrary parts between the persons, and
    A moving feature amount representing the moving amount per unit time of the skeleton calculated from the difference between the previous and next image frames based on the video, and
    It is characterized in that at least one of the article feature amount expressing the ownership relationship of the article with respect to the person is calculated, and the type of the interaction and the direction of the interaction are detected based on the calculated feature amount. The video analysis system according to claim 1.
  4.  前記監視重要度判定部は、前記インタラクションが発生した時点の前記複数の人物の位置に関する情報を利用して、前記監視重要度を判定することを特徴とする請求項1に記載の映像解析システム。 The video analysis system according to claim 1, wherein the monitoring importance determination unit determines the monitoring importance by using information regarding the positions of the plurality of persons at the time when the interaction occurs.
  5.  検出対象のインタラクションを発生させた各人物に対し監視重要度を判定するための、インタラクションの種類、インタラクションの方向、及び、発生エリア毎のセキュリティレベルの情報を前記監視基準情報として保持する記憶部を有することを特徴とする請求項1に記載の映像解析システム。 A storage unit that holds information on the type of interaction, the direction of the interaction, and the security level for each occurrence area as the monitoring reference information for determining the monitoring importance for each person who generated the interaction to be detected. The video analysis system according to claim 1, wherein the image analysis system has.
  6.  前記インタラクションの種類及び/または前記人物に関する情報を検索クエリとして、インタラクションの検出実績を検索することで、前記インタラクションを発生させた人物の検索が可能な検索部を有することを特徴とする請求項1に記載の映像解析システム。 1 The video analysis system described in.
  7.  前記出力制御部は、前記インタラクションを発生させた各人物に対する前記監視重要度に応じて、表示端末への表示の大きさを変化させることを特徴とする請求項1に記載の映像解析システム。 The video analysis system according to claim 1, wherein the output control unit changes the size of the display on the display terminal according to the monitoring importance for each person who has generated the interaction.
  8.  前記出力制御部は、前記インタラクションを発生させた各人物について、前記インタラクションの時間的に前後の移動軌跡を画面上に表示することを特徴とする請求項1に記載の映像解析システム。 The video analysis system according to claim 1, wherein the output control unit displays on the screen the movement locus of the interaction before and after in time for each person who has generated the interaction.
  9.  前記出力制御部は、前記監視重要度に基づいて、前記複数の人物の移動軌跡の生成の要否を判定することを特徴とする請求項1に記載の映像解析システム。 The video analysis system according to claim 1, wherein the output control unit determines whether or not it is necessary to generate movement loci of the plurality of persons based on the monitoring importance.
  10.  前記出力制御部は、前記検出したインタラクションにおける前記複数の人物について、前記インタラクションに係る所定の行動の実行者または被実行者であることを示す情報を画面に表示することを特徴とする請求項1に記載の映像解析システム。 The output control unit is characterized in that, with respect to the plurality of persons in the detected interaction, information indicating that the person is the performer or the person to be executed of the predetermined action related to the interaction is displayed on the screen. Video analysis system described in.
  11.  前記出力制御部は、前記インタラクションが検出されたフレーム画像を画面に表示し、さらに、当該インタラクションに関与した人物の現在の位置及び/または現在の映像を画面に表示することを特徴とする請求項1に記載の映像解析システム。 The output control unit is characterized in that the frame image in which the interaction is detected is displayed on the screen, and the current position and / or the current image of the person involved in the interaction is displayed on the screen. The video analysis system according to 1.
  12.  前記インタラクション検出部は、前記姿勢特徴量、前記距離特徴量、前記移動特徴量、及び、前記物品特徴量を算出し、前記姿勢特徴量、前記距離特徴量、前記移動特徴量、及び、前記物品特徴量に基づいて、前記検出した複数の人物間でのインタラクションの種類及び前記インタラクションの方向を検出することを特徴とする請求項3に記載の映像解析システム。 The interaction detection unit calculates the posture feature amount, the distance feature amount, the movement feature amount, and the article feature amount, and calculates the posture feature amount, the distance feature amount, the movement feature amount, and the article. The video analysis system according to claim 3, wherein the type of interaction between the detected plurality of persons and the direction of the interaction are detected based on the feature amount.
  13.  コンピュータによって実行され、監視領域を撮影した映像を用いて、前記監視領域における事象を検出する映像監視方法において、
     前記映像に基づいて、複数の人物の関与により生じる事象であるインタラクションを検出し、前記インタラクションの種類と、前記複数の人物の各々が前記インタラクションにおいて他の人物とどのように関わったかを示すインタラクションの方向とを出力するインタラクション検出ステップと、
     前記インタラクションの種類及び方向と、予め設定された監視基準情報と、を比較して、前記インタラクションに関与した複数の人物について人物毎の監視重要度を判定する監視重要度判定ステップと、
     前記監視重要度に基づいて、前記事象の検出結果を出力する出力制御ステップと、を含む映像解析方法。
    In a video monitoring method that detects an event in the monitoring area using a video executed by a computer and captured in the monitoring area.
    Based on the video, an interaction that is an event caused by the involvement of a plurality of persons is detected, and the type of the interaction and the interaction showing how each of the plurality of persons interacts with another person in the interaction. Interaction detection step that outputs the direction and
    A monitoring importance determination step for determining the monitoring importance of each person for a plurality of persons involved in the interaction by comparing the type and direction of the interaction with the preset monitoring standard information.
    A video analysis method including an output control step for outputting a detection result of the event based on the monitoring importance.
PCT/JP2021/005097 2020-09-15 2021-02-10 Video analyzing system and video analyzing method WO2022059223A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020154309A JP2022048475A (en) 2020-09-15 2020-09-15 Video analyzing system and video analyzing method
JP2020-154309 2020-09-15

Publications (1)

Publication Number Publication Date
WO2022059223A1 true WO2022059223A1 (en) 2022-03-24

Family

ID=80777422

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/005097 WO2022059223A1 (en) 2020-09-15 2021-02-10 Video analyzing system and video analyzing method

Country Status (2)

Country Link
JP (1) JP2022048475A (en)
WO (1) WO2022059223A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095363A (en) * 2023-02-09 2023-05-09 西安电子科技大学 Mobile terminal short video highlight moment editing method based on key behavior recognition

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017028561A (en) * 2015-07-24 2017-02-02 セコム株式会社 Image monitoring system
JP2017046196A (en) * 2015-08-27 2017-03-02 キヤノン株式会社 Image information generating apparatus, image information generating method, image processing system, and program
JP2017135549A (en) * 2016-01-27 2017-08-03 セコム株式会社 Flying object monitoring system
JP2017225122A (en) * 2013-06-28 2017-12-21 日本電気株式会社 Video surveillance system, video processing apparatus, video processing method, and video processing program
JP2019029747A (en) * 2017-07-27 2019-02-21 セコム株式会社 Image monitoring system
JP2019193089A (en) * 2018-04-24 2019-10-31 東芝テック株式会社 Video analysis device
JP2019213116A (en) * 2018-06-07 2019-12-12 キヤノン株式会社 Image processing device, image processing method, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017225122A (en) * 2013-06-28 2017-12-21 日本電気株式会社 Video surveillance system, video processing apparatus, video processing method, and video processing program
JP2017028561A (en) * 2015-07-24 2017-02-02 セコム株式会社 Image monitoring system
JP2017046196A (en) * 2015-08-27 2017-03-02 キヤノン株式会社 Image information generating apparatus, image information generating method, image processing system, and program
JP2017135549A (en) * 2016-01-27 2017-08-03 セコム株式会社 Flying object monitoring system
JP2019029747A (en) * 2017-07-27 2019-02-21 セコム株式会社 Image monitoring system
JP2019193089A (en) * 2018-04-24 2019-10-31 東芝テック株式会社 Video analysis device
JP2019213116A (en) * 2018-06-07 2019-12-12 キヤノン株式会社 Image processing device, image processing method, and program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095363A (en) * 2023-02-09 2023-05-09 西安电子科技大学 Mobile terminal short video highlight moment editing method based on key behavior recognition

Also Published As

Publication number Publication date
JP2022048475A (en) 2022-03-28

Similar Documents

Publication Publication Date Title
JP2018173914A (en) Image processing system, imaging apparatus, learning model creation method, and information processing device
JP5674406B2 (en) Surveillance system, monitoring device, autonomous mobile body, monitoring method, and monitoring program using autonomous mobile body
WO2018116488A1 (en) Analysis server, monitoring system, monitoring method, and program
WO2014050518A1 (en) Information processing device, information processing method, and information processing program
JP7355674B2 (en) Video monitoring system and video monitoring method
JP2018160219A (en) Moving route prediction device and method for predicting moving route
KR102260123B1 (en) Apparatus for Sensing Event on Region of Interest and Driving Method Thereof
WO2018179202A1 (en) Information processing device, control method, and program
JP7145622B2 (en) Information processing device, information processing device control method, subject detection system, and program
JPWO2019220589A1 (en) Video analysis device, video analysis method, and program
JP2020014194A (en) Computer system, resource allocation method, and image identification method thereof
WO2022059223A1 (en) Video analyzing system and video analyzing method
US20210334758A1 (en) System and Method of Reporting Based on Analysis of Location and Interaction Between Employees and Visitors
JP2022053126A (en) Congestion status estimation device, method, and program
JP2021196741A (en) Image processing device, image processing method and program
JP7138547B2 (en) store equipment
KR102464196B1 (en) Big data-based video surveillance system
JP2020166590A (en) Monitoring system, monitoring device, monitoring method, and monitoring program
WO2021186610A1 (en) Digital/autofile/security system, method, and program
JP7246166B2 (en) image surveillance system
JP2004187116A (en) Action monitoring system and program
US20240135716A1 (en) Congestion degree determination apparatus, control method, and non-transitory computer-readable medium
JP6905553B2 (en) Information processing device, registration method, judgment method, and program
JP6997140B2 (en) Information processing equipment, judgment method, and program
JP7256082B2 (en) Surveillance Systems, Programs and Listing Methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868911

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868911

Country of ref document: EP

Kind code of ref document: A1