WO2021235355A1 - 画像データ処理装置及び画像データ処理システム - Google Patents
画像データ処理装置及び画像データ処理システム Download PDFInfo
- Publication number
- WO2021235355A1 WO2021235355A1 PCT/JP2021/018436 JP2021018436W WO2021235355A1 WO 2021235355 A1 WO2021235355 A1 WO 2021235355A1 JP 2021018436 W JP2021018436 W JP 2021018436W WO 2021235355 A1 WO2021235355 A1 WO 2021235355A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- person
- image data
- data processing
- map data
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/695—Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10141—Special mode during image acquisition
- G06T2207/10144—Varying exposure
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
Definitions
- the present invention relates to an image data processing device and an image data processing system, and particularly relates to an image data processing device and an image data processing system that process image data obtained from a plurality of photographing devices.
- Patent Document 1 describes a technique of photographing an area where a plurality of spectators exist, acquiring information such as facial expressions of each spectator by image recognition, and recording the acquired information in association with the information of the position of each spectator. ing.
- Patent Document 2 describes a technique of visualizing information obtained by analyzing image data by color coding or the like and superimposing it on image data expressed in three dimensions.
- One embodiment according to the technique of the present disclosure provides an image data processing device and an image data processing system capable of acquiring information on the person attributes of a person in a specific area with high accuracy.
- An image data processing device that processes image data in which at least a part of a shooting range is duplicated, which is obtained from a plurality of shooting devices, and includes a processor, and the processor is an image represented by the image data for each image data.
- the processor is an image represented by the image data for each image data.
- the process of generating the map data recorded by making it, the process of interpolating the person attributes of people that overlap between multiple map data, and the process of generating composite map data by synthesizing multiple map data after interpolation.
- Image data processing device to execute.
- the processor is an image data processing device of (1) that further executes a process of generating a heat map from synthetic map data.
- the processor is an image data processing device of (2) that further executes a process of displaying the generated heat map on the display.
- the processor is an image data processing device according to (2) or (3), which further executes a process of outputting the generated heat map to the outside.
- the processor collates the person attributes of overlapping persons among a plurality of map data, and interpolates the person attributes of the person missing in one map data with the person attributes of the person in the other map data.
- the image data processing device according to any one of (1) to (4).
- the processor is an image data processing device according to any one of (1) to (5), which also calculates recognition accuracy when recognizing a person's personal attributes.
- the processor replaces the person attribute of a person with relatively low recognition accuracy with the person attribute of a person with relatively high recognition accuracy, and interpolates the person attribute of the duplicated person.
- the processor assigns weights according to the recognition accuracy, calculates the average of the person attributes of each person, replaces the average with the calculated average, and interpolates the person attributes of the duplicated person (6).
- Image data processing device assigns weights according to the recognition accuracy, calculates the average of the person attributes of each person, replaces the average with the calculated average, and interpolates the person attributes of the duplicated person (6).
- the processor adopts the information of the person attribute of a person having a plurality of recognition accuracy and having the recognition accuracy equal to or higher than the first threshold value, and interpolates the person attribute of the duplicated person from (6) to (8). ) Any one of the image data processing devices.
- the processor is an image data processing apparatus according to any one of (6) to (9), which further executes a process of excluding information of a person attribute having a recognition accuracy equal to or lower than the second threshold value in the map data after interpolation. ..
- the processor is an image data processing device according to any one of (1) to (11), which further executes a process of identifying a person who is duplicated among a plurality of map data.
- the processor is an image data processing device of (12) that identifies an overlapping person among a plurality of map data based on the arrangement relationship of the person in the map data.
- the processor is an image data processing device of (12) that identifies a person who overlaps between a plurality of map data based on the person attribute of the person at each position in the map data.
- the processor is an image data processing device according to any one of (1) to (14), which recognizes at least one of gender, age, and emotion as a person attribute based on a person's face.
- the processor is an image data processing device according to any one of (1) to (15), which instructs a plurality of shooting devices to shoot areas where shooting ranges overlap under different conditions.
- An image data processing system including a plurality of photographing devices in which at least a part of a photographing range overlaps and an image data processing device for processing image data obtained from the plurality of photographing devices, wherein the image data processing is performed.
- the device includes a processor, and the processor detects the face of a person in the image represented by the image data for each image data, and recognizes the person attribute of the person based on the detected face, and for each image data. , The process of generating map data that records the recognized person attributes corresponding to the position of the person in the image represented by the image data, and the process of interpolating the person attributes of the duplicated person among multiple map data.
- An image data processing system that executes the process of generating composite map data by synthesizing multiple map data after interpolation.
- the figure which shows the schematic structure of an image data processing system Diagram showing an example of division of the viewing area Conceptual diagram of area photography
- a block diagram showing an example of the hardware configuration of an information processing device Block diagram of functions realized by image data processing equipment Block diagram of the function of the map data processing unit
- Conceptual diagram of face detection Block diagram of the function of the map data processing unit
- Conceptual diagram of map data generation Diagram showing an example of map data Diagram showing an example of a database
- a diagram showing an example of a heat map of the degree of excitement The figure which shows an example of the display form of the degree of excitement
- various information can be analyzed by measuring and collecting emotional information of all spectators in the venue throughout the entire time of the event. For example, in a concert or the like, it is possible to analyze the degree of excitement of the audience for each song from the collected emotional information of the entire audience. In addition, by recording the emotional information of each spectator in association with the information of the position of each spectator, it is possible to analyze the distribution of the excitement state in the venue. Furthermore, by specifying the center of the distribution of excitement, it is possible to identify the audience who is the excitement.
- image recognition technology can be used to measure emotional information of each audience. That is, the emotions of each spectator are estimated by image recognition from the images taken of each spectator.
- the main method is analysis of facial expressions detected from images.
- Obstacles eg, cheering flags, another spectator crossing in front, hands of yourself or nearby spectators, food and drink, cameras, etc.
- Obstacles can hide your face, the spectator turns away, flares and / or ghosts in the image. This is because it is assumed that the face cannot be detected due to the generation of (sunlight, reflection, flash, etc.).
- a system capable of measuring the emotional information of all the spectators in the venue with high accuracy throughout the entire time during the event is provided.
- FIG. 1 is a diagram showing a schematic configuration of the image data processing system of the present embodiment.
- the image data processing system 1 of the present embodiment is an spectator photographing device 10 for photographing an entire spectator in an event venue, and an image data processing for processing image data photographed by the spectator photographing device 10.
- the device 100 is provided.
- the event venue 2 has a stage 4 where the performer 3 shows the show, and a viewing area V where the audience P watches the show.
- Seats 5 are regularly arranged in the viewing area V. Audience P sits in seat 5 and watches the show. The position of the seat 5 is fixed.
- the spectator photographing device 10 is composed of a plurality of cameras C.
- the camera C is a digital camera having a moving image shooting function.
- the camera C is an example of a photographing device.
- Audience P is an example of a person photographed by a photographing device.
- the spectator photographing device 10 divides the viewing area V into a plurality of areas, and photographs each area with a plurality of cameras C from multiple directions.
- FIG. 2 is a diagram showing an example of division of the viewing area. As shown in the figure, in this example, the viewing area V is divided into six areas V1 to V6. Each area V1 to V6 is individually photographed by a plurality of cameras from multiple directions.
- FIG. 3 is a conceptual diagram of area photography. The figure shows an example in the case of photographing the area V1.
- one area V1 is photographed by six cameras C1 to C6.
- Each camera C1 to C6 captures the area V1 from a predetermined position. That is, the region V1 is photographed at a fixed point position.
- Each camera C1 to C6 is installed on a remote control pan head (electric pan head), for example, and is configured so that the shooting direction can be adjusted.
- the first camera C1 photographs the area V1 from the front.
- the second camera C2 captures the area V1 from diagonally above the front.
- the third camera C3 captures the area V1 from the right side.
- the fourth camera C4 captures the area V1 from the diagonally upper right side.
- the fifth camera C5 captures the area V1 from the left side.
- the sixth camera C6 captures the area V1 from the diagonally upper left side.
- Each camera C1 to C6 shoots at the same frame rate and shoots in synchronization.
- the shooting ranges R1 to R6 of each camera C1 to C6 are set to cover the area V1. Therefore, the shooting ranges R1 to R6 of the cameras C1 to C6 overlap each other. Further, the cameras C1 to C6 are set so that each spectator is photographed in substantially the same size in the captured image.
- each camera C is required to be able to recognize at least the facial expressions of all the spectators in the area to be photographed. That is, it is required to have a resolution capable of facial expression analysis by image recognition. Therefore, it is preferable to use a camera C having a high resolution as the camera C constituting the audience photographing device 10.
- the image data taken by each camera C is transmitted to the image data processing device 100.
- the image data transmitted from each camera C includes identification information of each camera C, information on shooting conditions of each camera, and the like.
- the information on the shooting conditions of each camera C includes information on the installation position of the camera, information on the shooting direction, information on the shooting date and time, and the like.
- the image data processing device 100 processes image data transmitted from each camera C of the spectator photographing device 10, and measures emotional information of each spectator in the image for each image data. Further, the image data processing device 100 generates map data for each image data, which is recorded by associating the measured emotion information of each spectator with the information of the position of each spectator in the image. Further, the image data processing device 100 mutually interpolates the map data generated from each image data. Further, the image data processing device 100 synthesizes the interpolated map data and generates synthetic map data representing the map data of the entire venue. Image data processing is performed frame by frame.
- the image data processing device 100 performs a process of visualizing the composite map data as needed. Specifically, a heat map is generated from the synthetic map data.
- FIG. 4 is a block diagram showing an example of the hardware configuration of the information processing device.
- the image data processing device 100 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, an HDD (Hard Disk Drive) 104, an operation unit 105, a display unit 106, and an input / output interface. It is composed of a computer equipped with (interface, I / F) 107 and the like.
- the CPU 101 is an example of a processor.
- the operation unit 105 is composed of, for example, a keyboard, a mouse, a touch panel, and the like.
- the display unit 106 is composed of, for example, a liquid crystal display (Liquid Crystal Display), an organic EL display (Organic Electroluminescence display, Organic Light Emitting Display display), or the like.
- the image data taken by each camera C of the spectator photographing apparatus 10 is input to the image data processing apparatus 100 via the input / output interface 107.
- FIG. 5 is a block diagram of the functions realized by the image data processing device.
- the image data processing device 100 mainly includes an imaging control unit 110, a map data processing unit 120, a map data interpolation unit 130, a map data synthesis unit 140, a data processing unit 150, and a heat map generation unit 160. It has functions such as a display control unit 170 and an output control unit 180. The functions of each part are realized by the CPU 101 executing a predetermined program.
- the program executed by the CPU 101 is stored in the ROM 103 or the HDD 104.
- the program may be stored in a flash memory (Flash Memory), an SSD (Solid State Disk), or the like.
- the shooting control unit 110 controls the operation of the spectator shooting device 10 in response to an operation input from the operation unit 105.
- Each camera C constituting the spectator photographing device 10 performs imaging according to an instruction from the imaging control unit 110.
- the control performed by the shooting control unit 110 includes control of the exposure of each camera C, control of the shooting direction, and the like.
- the map data processing unit 120 generates map data from the image data taken by each camera C of the spectator photographing device 10. Map data is generated for each image data.
- FIG. 6 is a block diagram of the functions of the map data processing unit.
- the map data processing unit 120 mainly has functions such as a shooting information acquisition unit 120A, a face detection unit 120B, a person attribute recognition unit 120C, and a map generation unit 120D.
- the shooting information acquisition unit 120A acquires shooting information from the image data. Specifically, the camera identification information and the camera shooting condition information included in the image data are acquired. By acquiring this information, it is possible to identify the camera that captured the image data, and it is possible to identify which area was captured from which position and in which direction. In addition, the date and time of shooting can be specified. The specified information is output to the map generation unit 120D.
- the face detection unit 120B analyzes the image data and detects the face of a person (audience) existing in the image represented by the image data.
- FIG. 7 is a conceptual diagram of face detection.
- the face detection unit 120B specifies a position and detects a face.
- the position of the face is specified by the coordinate position (x, y) in the image Im.
- the face detection unit 120B surrounds the detected face with a rectangular frame F, obtains the center coordinates of the frame F, and specifies the position of the face.
- Face detection is performed, for example, by scanning the image Im from the upper left to the lower right in order.
- the detected faces are numbered in the order of detection.
- the person attribute recognition unit 120C recognizes the person attribute of the person based on the image of the face of the person (audience) detected by the face detection unit 120B.
- FIG. 8 is a conceptual diagram of the person attribute recognition process by the person attribute recognition unit.
- age, gender and emotion are recognized as personal attributes.
- a known technique can be adopted.
- a recognition method can be adopted by using an image recognition model generated by machine learning, deep learning, or the like.
- facial expressions are classified into seven types, “true face”, “joy”, “anger”, “disgust”, “surprise”, “fear”, and “sadness”, and the degree of each is calculated. And recognize emotions.
- the facial expressions of "joy”, “anger”, “disgust”, “surprise”, “fear” and “sadness” are “joy”, “anger”, “disgust”, “surprise”, “fear” and “fear”, respectively.
- a “true face” is expressionless and corresponds to a state in which there is no specific emotion.
- a score that quantifies the degree of each emotion (emotion-likeness) is output.
- the emotion score is output, for example, with the maximum value set to 100.
- the total score of each emotion is output to be 100.
- age it is possible to recognize the age instead of recognizing a specific age. For example, less than teens, teens, 20s, and so on.
- the age is recognized from the image of the face.
- gender the gender is recognized from the facial image.
- the map generation unit 120D generates map data based on the shooting information acquired by the shooting information acquisition unit 120A and the person attribute recognized by the person attribute recognition unit 120C.
- the map data is recorded by associating the personal attribute information of each spectator with the face information of each spectator in the image.
- the position of the spectator is specified, for example, by the coordinate position of the face of the spectator.
- Map data is generated for each image data. Further, the shooting information of the image data that is the source of the map data is added to the map data. That is, the identification information of the camera that captured the image data, the information on the imaging conditions of the camera, and the like are added. This makes it possible to specify in which area the map data is. In addition, it is possible to specify the time point of the map data.
- FIG. 9 is a conceptual diagram of map data generation. The figure shows an example in which map data is generated from image data obtained by photographing a region V1 with a first camera C1.
- the map data is associated with the position of each spectator in the image, and the information of the person attribute of the spectator is recorded.
- FIG. 10 is a diagram showing an example of map data.
- map data is generated for each image data.
- the map data generated by the map generation unit 120D is recorded in the database (database) 200.
- FIG. 11 is a diagram showing an example of a database.
- the map data is associated with the information of the cameras C1 to C6 that are the generation sources, and is recorded in the database in chronological order. Further, the information of each camera C1 to C6 is associated with the information of the target areas V1 to V6 and recorded in the map data.
- the database 200 manages map data generated from all cameras on an event-by-event basis.
- the database 200 contains map data interpolated by the map data interpolation unit 130, synthetic map data generated from the interpolated map data, data obtained by processing the synthetic map data, and synthetic map data.
- the heat map and the like generated from the data obtained by processing the above are also recorded.
- the database 200 is stored in, for example, the HDD 104.
- the map data interpolation unit 130 interpolates the person attribute information of each spectator between the map data having the same spectator person attribute information duplicated.
- Map data with overlapping shooting ranges of the image data that was the generation source has the same spectator person attribute information in the overlapping shooting range.
- Each map data does not necessarily obtain information on the personal attributes of all spectators. This is because there are cases where the face cannot be detected in the image data which is the generation source, and the person attribute cannot be recognized from the face.
- overlapping areas are photographed by a plurality of cameras from multiple directions. Therefore, for example, even if the face of the audience, which is one camera, cannot be photographed, the other camera may be able to photograph the face.
- the information of the person attribute of each spectator is interpolated between the map data having the information of the person attribute of the same spectator in duplicate. As a result, highly accurate map data is generated.
- map data interpolation process performed by the map data interpolation unit 130 will be described.
- FIGS. 12 and 13 are diagrams showing an example of a face detection result.
- FIG. 12 shows an example of a case where a face is detected from an image obtained when the region V1 is photographed by the first camera C1.
- FIG. 13 shows an example of a case where a face is detected from an image obtained when the region V1 is photographed by the second camera C2.
- the white circles indicate the positions of the spectators whose faces could be detected in the image.
- the black circles indicate the positions of the spectators whose faces could not be detected in the image.
- the map data generated from the image taken by the first camera C1 is referred to as the first map data
- the map data generated from the image taken by the second camera C2 is referred to as the second map data.
- the faces of the spectators P34, P55, P84 and P89 could not be detected from the image taken by the first camera C1. Therefore, in this case, in the first map data, the information on the person attributes of the spectators P34, P55, P84 and P89 is missing.
- the faces of the spectators P34, P55, P84 and P89 can be detected from the image taken by the second camera C2. Therefore, the second map data contains information on the personal attributes of these spectators P34, P55, P84 and P89.
- the information of the spectators missing in the first map data can be interpolated by the second map data. That is, the information on the person attributes of the spectators P34, P55, P84 and P89 missing in the first map data can be interpolated by the information in the second map data.
- the faces of the spectators P29, P47, P62 and P86 could not be detected from the image taken by the second camera C2. Therefore, in this case, in the second map data, the information on the person attributes of the spectators P29, P47, P62 and P86 is missing.
- the faces of the spectators P29, P47, P62 and P86 can be detected from the image taken by the first camera C1. Therefore, the first map data contains information on the personal attributes of these spectators P29, P47, P62 and P86. In this case, the information of the spectator missing in the second map data can be interpolated by the first map data. That is, the information on the person attributes of the spectators P29, P47, P62, and P86 that are missing in the second map data can be interpolated by the information in the first map data.
- the map data generated from the image having the overlapping area has the information of the person attribute of the same audience in the area where the images overlap. Therefore, if they are missing, they can interpolate with each other.
- the information of the person attribute of the spectator that is insufficient between the two map data is interpolated with each other. Interpolate person attribute information.
- the data is collated between the map data having the same spectator's personal attribute information in duplicate, and the missing spectator's personal attribute information is specified in each map data.
- Map data with missing spectator person attribute information is interpolated with the corresponding spectator person attribute information in other map data.
- data matching is performed based on the placement relationship of each spectator. That is, overlapping spectators are identified from the arrangement pattern of each spectator in the image. In addition, data matching can be performed based on the information of the person attributes of the spectators at each position.
- the map data subjected to the interpolation processing by the map data interpolation unit 130 is recorded in the database 200 (see FIG. 11).
- the map data synthesis unit 140 synthesizes the interpolated map data and generates one composite map data.
- This composite map data is map data in which information on the personal attributes of all spectators in the venue is recorded in association with the positions of the faces of each spectator.
- the composite map data is generated from the map data at the same shooting timing. Therefore, the composite map data is sequentially generated in chronological order.
- Camera information is used during composition. That is, the map data is generated from the image taken by the cameras, and each camera shoots a predetermined area under predetermined conditions (position and direction), so that it is easy to use the information. Can be synthesized into.
- composition can also be performed using the image data that is the source of the map data. That is, since the image data and the map data correspond to each other, the map data can also be combined by synthesizing the image data.
- compositing of image data for example, a method such as panoramic compositing can be adopted.
- a plurality of map data are generated, and one composite map data is generated from the generated plurality of map data.
- This makes it possible to easily generate one map data that records information on the personal attributes of all spectators, even in a large event venue. Further, even in a small event venue, the map data of the entire venue can be generated more efficiently than in the case where the entire venue is photographed with one camera and the map data is generated. That is, since the processing is divided into a plurality of areas, distributed processing becomes possible, and map data of the entire venue can be efficiently generated.
- the generated synthetic map data is associated with the map data of the generation source and recorded in the database 200 (see FIG. 11).
- the data processing unit 150 processes the synthetic map data to generate data regarding each spectator in the venue. What kind of data is generated depends on the user's settings. For example, it may generate data showing the emotional state of each audience, data showing the emotional amount of a specific emotion, or data showing the degree of excitement.
- the emotional state data is obtained by extracting the emotion with the highest score from the emotion recognition result. For example, if the emotion recognition result (score) of a certain audience is true face: 12, joy: 75, anger: 0, disgust: 0, surprise: 10, fear: 3, sadness: 0, the emotional state is joy. Become.
- the data showing the emotional amount of a specific emotion is the data obtained by quantifying the emotional level of the specific emotion, the magnitude of the amplitude of the specific emotion, and the like.
- Emotion level data is obtained from the emotion score.
- joy emotion level data is obtained from the joy score.
- joy and surprise emotion level data is obtained by finding the sum of the joy and surprise scores.
- the emotion level data may be calculated by giving a weight to each emotion. That is, the score of each emotion may be multiplied by a predetermined coefficient to calculate the sum.
- Emotion amplitude data is acquired, for example, by calculating the difference in emotion scores at predetermined time intervals.
- the amplitude of joy emotions is obtained by calculating the difference in joy scores at predetermined time intervals.
- the difference between the joy score and the sadness score at predetermined time intervals (for example, the difference between the joy score at time t and the sadness score at time t + ⁇ t). ) Is calculated and acquired.
- the amount of emotions which emotions are to be detected depends on the type of event. For example, at a concert, the magnitude of the emotional level of joy is thought to lead to the satisfaction of the audience. Therefore, in the case of a concert, the emotional level of joy is detected.
- the magnitude of the emotional amplitude for example, the magnitude of the emotional amplitude of joy and sadness
- the magnitude of the emotional amplitude is the detection target.
- the degree of excitement is a numerical expression of the degree of excitement of each audience.
- the degree of excitement is calculated from the emotional score using a predetermined calculation formula.
- the formula Fn is, for example, S1 for a straight-faced emotion, S2 for a joyful emotion, S3 for an angry emotion, S4 for an aversion emotion, S5 for a surprise emotion, and a fear emotion.
- a is a coefficient for true emotions
- b is a coefficient for joyful emotions
- c is a coefficient for angry emotions
- d is a coefficient for disgust emotions
- e is a coefficient for surprise emotions
- f is a coefficient for fear emotions.
- g is a coefficient for feelings of sadness. For example, in the case of a concert, a high weight is given to the coefficient a for feelings of joy.
- Each of the above data is an example of the data generated by the data processing unit 150.
- the data processing unit 150 generates data based on an instruction from the user input via the operation unit 105. For example, the user selects from the items prepared in advance and instructs the data to be generated.
- the data (processed data) processed by the data processing unit 150 is associated with the synthetic map data of the processing source and recorded in the database 200 (see FIG. 11).
- the heat map generation unit 160 generates a heat map from the data processed by the data processing unit 150.
- the heat map generated by the image data processing device 100 of the present embodiment displays the data of the spectators at each position in the venue in colors or shades of color.
- the emotion amount heat map is generated by displaying the emotion amount value of the spectator at each position in color or shade of color.
- the heat map of the degree of excitement is generated by displaying the value of the degree of excitement of the audience at each position in color or shade of color.
- FIG. 14 is a diagram showing an example of a heat map of the degree of excitement.
- the figure uses the seating chart of the event venue to generate a heat map.
- the seating chart is a plan showing the arrangement of seats at the event venue.
- the position of the seat corresponds to the position of each spectator.
- the position of each seat in the seat map can correspond one-to-one with the coordinate position of each spectator in the synthetic map data. Therefore, it is possible to generate a heat map of the degree of excitement by displaying the value of the degree of excitement of each spectator at the position of each seat in color or shade of color.
- FIG. 15 is a diagram showing an example of a display form of the degree of excitement.
- the figure shows an example of expressing the degree of excitement in shades. Within the range that can be calculated, the degree of excitement is divided into multiple categories. The concentration to be displayed is determined for each of the divided categories.
- the figure shows an example in which the degree of excitement is calculated by a numerical value from 1 to 100, and also shows an example in which the degree of excitement is divided into 10 categories and displayed. Further, an example is shown in which the displayed density increases as the degree of excitement increases.
- the heat map data generated by the heat map generation unit 160 is associated with the data of the generation source and recorded in the database 200 (see FIG. 11).
- the display control unit 170 displays the data generated by the data processing unit 150 on the display unit 106 in response to a display instruction from the user input via the operation unit 105. Further, the heat map generated by the heat map generation unit 160 is displayed on the display unit 106.
- the output control unit 180 outputs the data generated by the data processing unit 150 to the external device 300 in response to an output instruction from the user input via the operation unit 105. Further, the heat map data generated by the heat map generation unit 160 is output to the external device 300.
- FIG. 16 is a flowchart showing a procedure for processing image data in the image data processing system of the present embodiment.
- each camera C of the audience photographing device 10 photographs each area V1 to V6 in the venue (step S1).
- Each area V1 to V2 is photographed from multiple directions by a plurality of cameras.
- the image data processing device 100 inputs image data taken by each camera C (step S2).
- the image data of each camera C is input collectively after the event is completed. In addition, it can be configured to input in real time.
- the image data processing device 100 individually processes the input image data of each camera C, and detects the face of each spectator in the image from the image represented by each image data (step S3).
- the image data processing device 100 recognizes the person attributes of each spectator from the detected face (step S4).
- the image data processing device 100 generates map data for each image data based on the recognition result of the person attribute of each spectator in each image data (step S5).
- the map data is generated by recording the personal attribute information of each spectator in association with the information of the position of each spectator in the image.
- the map data generated here does not necessarily record the personal attributes of all spectators.
- the face may be hidden by obstacles, so it is not always possible to recognize the personal attributes of all spectators in time.
- the image data processing device 100 generates map data from each image data, and then interpolates the data between the map data having the overlapping area (step S6). That is, the information of the person attribute of the spectator who is missing in one map data is interpolated by using the information recorded in the other map data. As a result, it is possible to suppress data loss caused by map data.
- the image data processing device 100 synthesizes the interpolated map data and generates synthetic map data representing the map data of the entire venue (step S7).
- the image data processing device 100 processes the composite map data and generates the data instructed by the user (step S8). For example, data on the amount of emotions of each audience, data on the degree of excitement, and the like are generated.
- the image data processing device 100 generates a heat map from the generated data in response to an instruction from the user (step S9).
- the image data processing device 100 displays the generated heat map on the display unit 106 or outputs it to the external device 300 in response to an instruction from the user (step S10).
- map data including information on the person attributes of all the spectators in the venue is synthesized and generated.
- accurate map data can be efficiently generated even when map data of a large venue is generated.
- processing load can be reduced as compared with the case of generating map data for all spectators at once.
- each map data has at least a part of the information of the person attribute of the spectator, the information missing in one map data can be interpolated by the other map data. As a result, in each map data, information on the personal attributes of each spectator can be collected without omission.
- the viewing area of the venue is divided into a plurality of areas, and each area is shot from multiple directions with a plurality of cameras.
- the method of shooting the audience in the venue is , Not limited to this. It suffices if each spectator is photographed by at least two or more cameras. This allows interpolation.
- FIG. 17 is a diagram showing another example of a method of photographing an audience.
- the frames W1 to W3 indicate the shooting range by the camera.
- the shooting range of each camera is set so that at least a part of the area Vc overlaps. Further, in the area Vc, the shooting range of each camera is set so that each spectator is shot by at least two or more cameras.
- each camera shoots areas where the shooting ranges overlap under different conditions.
- the regions where the shooting ranges overlap are configured to be shot from different directions. This makes it possible to take an image taken with one camera even if the face of the audience is hidden by an obstacle or the like.
- By shooting overlapping areas with different exposures for example, in an image shot with one camera, the face cannot be detected due to flare and / or ghosts (sunlight, reflection, flash, etc.) in the image. Even if a situation occurs, it will be possible to detect it from the image taken by another camera.
- the exposure can be adjusted by changing the aperture value, shutter speed or sensitivity, for example, or by using an optical filter such as an ND filter (Neutral Density Filter).
- an optical filter such as an ND filter (Neutral Density Filter).
- the moving image includes the case where still images are continuously taken and processed at predetermined time intervals.
- it also includes processing by performing interval shooting, time-lapse video, and the like.
- Person attribute In the above embodiment, the case of recognizing the age, gender, and emotion of each spectator has been described as an example of the person attribute recognized from the face, but the person attribute recognized from the face is limited to this. It's not something.
- personal identification information that identifies an individual spectator can be included. That is, personally recognized information can be included.
- the recognition of the personal identification information is performed, for example, by using a face recognition database in which the face image and the personal identification information are stored in association with each other. Specifically, the detected face image is collated with the face image stored in the face recognition database, and the personal identification information corresponding to the matched face image is acquired from the face recognition database. Information such as the age and gender of the audience can be associated with the personal identification information. Therefore, when recognizing personal identification information, it is not necessary to recognize age, gender, and the like.
- map data having information on the person attributes of the same person.
- map data is map data generated from image data having overlapping shooting ranges.
- Interpolation of map data is based on interpolating the personal attribute information of a person who is missing in one map data with other map data. Moreover, even if it is not missing, the information of the person attribute of each person can be interpolated as follows.
- the person attribute recognition unit 120C calculates the recognition accuracy together with the recognition of the person attribute.
- an algorithm for calculating recognition accuracy also referred to as reliability, evaluation value, etc.
- an algorithm generally known in image recognition can be adopted.
- (B) Taking the average of the person attributes of the same person between each map data
- the person attributes of the corresponding person are obtained by calculating the average of the person attributes of the same person among the map data having the information of the person attributes of the same person. .. In this case, the information of each map data is replaced with the obtained average.
- Each of the above methods can also be adopted when interpolating the person attribute information of a person who is missing in one map data with another map data. That is, it can be adopted even when there is a plurality of map data having information on the person attribute of the person who is missing in one map data.
- the threshold value can be set by the user. This threshold is an example of the first threshold.
- the user may arbitrarily select the method for interpolating the map data.
- a method of displaying an executable interpolation method on the display unit 106 and letting the user select it via the operation unit 105 can be adopted.
- the image data processing device 100 automatically determines and selects the optimum interpolation method.
- the following methods can be considered as a method for automatically determining the interpolation method.
- the user specifies the time, the area, and the person via the operation unit 105.
- This method is effective when recognizing emotions as a person attribute. That is, the missing emotional information is interpolated with the information of a person having a similar emotional change. This is because they are thought to have similar reactions with respect to emotions.
- the interpolated map data can be further modified and used. For example, people with low recognition accuracy of person attributes are excluded from the map data throughout the entire holding time of the event. As a result, the reliability of the map data after completion can be improved. This process is performed, for example, as follows.
- the duplicated person can be identified by using the position information of each person recorded in the map data. That is, since the arrangement relationship (arrangement pattern) of each person can be specified from the position information of each person recorded in each map data, overlapping persons can be specified from the arrangement relationship. Similarly, the duplicated person can be identified from the information of the person attribute at each position. That is, overlapping persons can be identified from the pattern of person attributes.
- the heat map is generated by using the seat map of the event venue, but the form of the heat map is not limited to this.
- the data of the spectators at each position generated from the composite map data may be displayed in colors or shades of color.
- the display form of the heat map it is not always necessary to display the entire heat map, and it may be displayed for each area. Further, the heat map may be superimposed on the actual image and displayed.
- the hardware structure of the processing unit that executes various processes is realized by various processors.
- the circuit configuration is changed after manufacturing CPU and / or GPU (Graphic Processing Unit), FPGA (Field Programmable Gate Array), which are general-purpose processors that execute programs and function as various processing units.
- a possible processor such as a programmable logic device (PLD), a dedicated electric circuit, which is a processor having a circuit configuration specially designed for executing a specific process such as an ASIC (Application Specific Integrated Circuit), etc. included.
- Program is synonymous with software.
- One processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types.
- one processing unit may be configured by a plurality of FPGAs or a combination of a CPU and an FPGA.
- a plurality of processing units may be configured by one processor.
- one processor is configured by a combination of one or more CPUs and software, as represented by a computer such as a client or a server. There is a form in which the processor functions as a plurality of processing units.
- system-on-chip System on Chip, SoC
- SoC System on Chip
- the various processing units are configured by using one or more of the above-mentioned various processors as a hardware-like structure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022524441A JP7377971B2 (ja) | 2020-05-22 | 2021-05-14 | 画像データ処理装置及び画像データ処理システム |
| CN202180032854.6A CN115552460B (zh) | 2020-05-22 | 2021-05-14 | 图像数据处理装置及图像数据处理系统 |
| US18/048,433 US12073652B2 (en) | 2020-05-22 | 2022-10-20 | Image data processing device and image data processing system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2020089729 | 2020-05-22 | ||
| JP2020-089729 | 2020-05-22 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/048,433 Continuation US12073652B2 (en) | 2020-05-22 | 2022-10-20 | Image data processing device and image data processing system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021235355A1 true WO2021235355A1 (ja) | 2021-11-25 |
Family
ID=78709031
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/018436 Ceased WO2021235355A1 (ja) | 2020-05-22 | 2021-05-14 | 画像データ処理装置及び画像データ処理システム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12073652B2 (https=) |
| JP (1) | JP7377971B2 (https=) |
| CN (1) | CN115552460B (https=) |
| WO (1) | WO2021235355A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2024041500A (ja) * | 2022-09-14 | 2024-03-27 | 株式会社竹中工務店 | 情報処理装置及び情報処理プログラム |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015125731A (ja) * | 2013-12-27 | 2015-07-06 | 沖電気工業株式会社 | 人物属性推定装置、人物属性推定方法及びプログラム |
| WO2016035632A1 (ja) * | 2014-09-02 | 2016-03-10 | Necソリューションイノベータ株式会社 | データ処理装置、データ処理システム、データ処理方法及びプログラム |
| JP2016057701A (ja) * | 2014-09-05 | 2016-04-21 | オムロン株式会社 | 識別装置および識別装置の制御方法 |
| JP2016100033A (ja) * | 2014-11-19 | 2016-05-30 | シャープ株式会社 | 再生制御装置 |
| JP2019197353A (ja) * | 2018-05-09 | 2019-11-14 | コニカミノルタ株式会社 | 属性決定装置、属性決定システムおよび属性決定方法 |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060056667A1 (en) * | 2004-09-16 | 2006-03-16 | Waters Richard C | Identifying faces from multiple images acquired from widely separated viewpoints |
| JP5441676B2 (ja) * | 2009-12-25 | 2014-03-12 | キヤノン株式会社 | 画像処理装置及びその処理方法 |
| JP2012034069A (ja) * | 2010-07-29 | 2012-02-16 | Nikon Corp | 画像処理装置、および画像処理プログラム |
| US8692666B2 (en) * | 2010-08-09 | 2014-04-08 | Olympus Imaging Corp. | Communication system and communication terminal |
| JP6494253B2 (ja) * | 2014-11-17 | 2019-04-03 | キヤノン株式会社 | 物体検出装置、物体検出方法、画像認識装置及びコンピュータプログラム |
| JP6283620B2 (ja) | 2015-02-13 | 2018-02-21 | 日本電信電話株式会社 | 所定の空間における生体情報取得装置、生体情報取得方法及び生体情報取得プログラム |
| JP2017182681A (ja) | 2016-03-31 | 2017-10-05 | 株式会社リコー | 画像処理システム、情報処理装置、プログラム |
| JP6778006B2 (ja) * | 2016-03-31 | 2020-10-28 | 株式会社 資生堂 | 情報処理装置、プログラム及び情報処理システム |
| US10497014B2 (en) * | 2016-04-22 | 2019-12-03 | Inreality Limited | Retail store digital shelf for recommending products utilizing facial recognition in a peer to peer network |
| BR102016009093A2 (pt) * | 2016-04-22 | 2017-10-31 | Sequoia Capital Ltda. | Equipment for acquisition of 3d image data of a face and automatic method for personalized modeling and manufacture of glass frames |
| US10467458B2 (en) * | 2017-07-21 | 2019-11-05 | Altumview Systems Inc. | Joint face-detection and head-pose-angle-estimation using small-scale convolutional neural network (CNN) modules for embedded systems |
| US10489690B2 (en) * | 2017-10-24 | 2019-11-26 | International Business Machines Corporation | Emotion classification based on expression variations associated with same or similar emotions |
| US11341774B2 (en) * | 2018-03-27 | 2022-05-24 | Nec Corporation | Information processing apparatus, data generation method, and non-transitory computer readable medium storing program |
| US10706499B2 (en) * | 2018-06-21 | 2020-07-07 | Canon Kabushiki Kaisha | Image processing using an artificial neural network |
| SG10201807678WA (en) * | 2018-09-06 | 2020-04-29 | Nec Asia Pacific Pte Ltd | A method for identifying potential associates of at least one target person, and an identification device |
| CN110163957A (zh) * | 2019-04-26 | 2019-08-23 | 李辉 | 一种基于唯美人脸程序的表情生成系统 |
| JP2023552105A (ja) * | 2020-12-11 | 2023-12-14 | ヒューマニシング オートノミー リミテッド | 人間行動のオクルージョン対応予測 |
-
2021
- 2021-05-14 CN CN202180032854.6A patent/CN115552460B/zh active Active
- 2021-05-14 WO PCT/JP2021/018436 patent/WO2021235355A1/ja not_active Ceased
- 2021-05-14 JP JP2022524441A patent/JP7377971B2/ja active Active
-
2022
- 2022-10-20 US US18/048,433 patent/US12073652B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2015125731A (ja) * | 2013-12-27 | 2015-07-06 | 沖電気工業株式会社 | 人物属性推定装置、人物属性推定方法及びプログラム |
| WO2016035632A1 (ja) * | 2014-09-02 | 2016-03-10 | Necソリューションイノベータ株式会社 | データ処理装置、データ処理システム、データ処理方法及びプログラム |
| JP2016057701A (ja) * | 2014-09-05 | 2016-04-21 | オムロン株式会社 | 識別装置および識別装置の制御方法 |
| JP2016100033A (ja) * | 2014-11-19 | 2016-05-30 | シャープ株式会社 | 再生制御装置 |
| JP2019197353A (ja) * | 2018-05-09 | 2019-11-14 | コニカミノルタ株式会社 | 属性決定装置、属性決定システムおよび属性決定方法 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2024041500A (ja) * | 2022-09-14 | 2024-03-27 | 株式会社竹中工務店 | 情報処理装置及び情報処理プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230054531A1 (en) | 2023-02-23 |
| CN115552460B (zh) | 2026-04-21 |
| JPWO2021235355A1 (https=) | 2021-11-25 |
| CN115552460A (zh) | 2022-12-30 |
| JP7377971B2 (ja) | 2023-11-10 |
| US12073652B2 (en) | 2024-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10810438B2 (en) | Setting apparatus, output method, and non-transitory computer-readable storage medium | |
| JP5424852B2 (ja) | 映像情報処理方法及びその装置 | |
| US6141434A (en) | Technique for processing images | |
| KR20120048021A (ko) | 이미지 분석을 위한 방법 및 시스템 | |
| JP2008199549A (ja) | 監視画像処理方法、監視システム及び監視画像処理プログラム | |
| CN111062284B (zh) | 一种交互式视频摘要模型的可视理解与诊断方法 | |
| CN111448589B (zh) | 用于检测患者的身体移动的设备、系统和方法 | |
| CN115429271B (zh) | 基于眼动和面部表情的孤独症谱系障碍筛查系统及方法 | |
| US6434271B1 (en) | Technique for locating objects within an image | |
| JP2021192305A (ja) | 映像アライメント方法及びその装置 | |
| JP2018509847A (ja) | 非同期信号を処理するための方法 | |
| Das et al. | Detecting deepfake videos using euler video magnification | |
| US8483431B2 (en) | System and method for estimating the centers of moving objects in a video sequence | |
| JP6910208B2 (ja) | 情報処理装置、情報処理方法およびプログラム | |
| US9436996B2 (en) | Recording medium storing image processing program and image processing apparatus | |
| Kubo et al. | Programmable non-epipolar indirect light transport: Capture and analysis | |
| JPWO2020194378A1 (ja) | 画像処理システム、画像処理装置、画像処理方法、及び画像処理プログラム | |
| JP7377971B2 (ja) | 画像データ処理装置及び画像データ処理システム | |
| WO2023171184A1 (ja) | 動画像集約装置、動画像集約方法、及び動画像集約プログラム | |
| US20220189200A1 (en) | Information processing system and information processing method | |
| Gunawardena et al. | Deep learning based eye tracking on smartphones for dynamic visual stimuli | |
| JP7436668B2 (ja) | 画像データ処理装置及びシステム | |
| CN115578793B (zh) | 一种基于可检测性学习的人体姿态检测方法及系统 | |
| Munn et al. | FixTag: An algorithm for identifying and tagging fixations to simplify the analysis of data collected by portable eye trackers | |
| EP4060617B1 (en) | Image processing system and method |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21808898 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022524441 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21808898 Country of ref document: EP Kind code of ref document: A1 |