WO2022064632A1 - 画像処理装置、画像処理方法及びプログラム - Google Patents

画像処理装置、画像処理方法及びプログラム Download PDF

Info

Publication number
WO2022064632A1
WO2022064632A1 PCT/JP2020/036225 JP2020036225W WO2022064632A1 WO 2022064632 A1 WO2022064632 A1 WO 2022064632A1 JP 2020036225 W JP2020036225 W JP 2020036225W WO 2022064632 A1 WO2022064632 A1 WO 2022064632A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
fisheye
person
panoramic
fish
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/036225
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
カレン ステファン
健全 劉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2022551516A priority Critical patent/JP7589741B2/ja
Priority to PCT/JP2020/036225 priority patent/WO2022064632A1/ja
Priority to US18/026,407 priority patent/US20230368576A1/en
Publication of WO2022064632A1 publication Critical patent/WO2022064632A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/12Panospheric to cylindrical image transformations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to an image processing device, an image processing method and a program.
  • Patent Document 1 discloses a technique for performing machine learning using a training image and information for identifying the location of a business store. Further, Patent Document 1 discloses that a panoramic image, an image having a field of view larger than 180 °, and the like can be used as a training image.
  • Non-Patent Document 1 discloses a technique for estimating a person's behavior indicated by a moving image based on a 3D-CNN (convolutional neural network).
  • fisheye image a technique for estimating human behavior based on an image generated by using a fisheye lens (hereinafter, may be referred to as a "fisheye image").
  • the reference line L s is a line connecting the reference point (x c , y c ) and an arbitrary point on the outer circumference of the circular image, and is a position where the fisheye image is cut open when the panoramic image is developed.
  • the image near the reference line L s is located at the end of the panoramic image.
  • the reference point (x c , y c ) is a point of the image in the circular image circle of the fisheye image, for example, the center of the circle.
  • the width w is the width of the panoramic image
  • the height h is the height of the panoramic image. These values may be default values or may be set arbitrarily by the user.
  • any target point (x f , y f ) in the fisheye image is converted to a point (x p , y p ) in the panoramic image based on the “panoramic expansion” formula shown in the figure. can do.
  • the distance r f between the reference point (x c , y c ) and the target point (x f , y f ) can be calculated.
  • the angle ⁇ formed by the line connecting the reference point (x c , y c ) and the target point (x f , y f ) and the reference line L s can be calculated.
  • An object of the present invention is to estimate the behavior of a person included in a fisheye image with high accuracy.
  • the computer The panoramic image obtained by panoramicly expanding the fisheye image generated by the fisheye lens camera is analyzed, and the human behavior indicated by the panoramic image is estimated.
  • the fisheye partial image which is a partial region of the fisheye image, is image-analyzed without panoramic expansion, and the human behavior indicated by the fisheye partial image is estimated.
  • An image processing method for estimating a person's behavior indicated by the fisheye image is provided based on the estimation result based on the panoramic image and the estimation result based on the fisheye partial image.
  • a program is provided that functions as.
  • the image processing device 10 executes a panorama process (Panorama processing), a fisheye process (Fisheye processing), and an integration process.
  • the image processing device 10 analyzes the panoramic image obtained by panoramicly expanding the fisheye image, and estimates the person behavior indicated by the panoramic image.
  • the image processing apparatus 10 analyzes the fish-eye partial image, which is a partial region of the fish-eye image, without panoramic expansion, and estimates the human behavior indicated by the fish-eye partial image. Then, in the integrated process, the image processing device 10 is based on the estimation result of the person behavior based on the panoramic image obtained in the panorama process and the estimation result of the person behavior based on the fisheye partial image obtained in the fisheye process. , Estimate the human behavior shown by the fish-eye image.
  • Each functional unit included in the image processing device 10 includes a CPU (Central Processing Unit) of an arbitrary computer, a memory, a program loaded in the memory, and a storage unit such as a hard disk for storing the program (stored from the stage of shipping the device in advance).
  • a storage unit such as a hard disk for storing the program (stored from the stage of shipping the device in advance).
  • it can also store programs downloaded from storage media such as CDs (Compact Discs) and servers on the Internet), and by any combination of hardware and software centered on the network connection interface. It will be realized. And, it is understood by those skilled in the art that there are various variations in the method of realizing the device and the device.
  • FIG. 3 is a block diagram illustrating a hardware configuration of the image processing device 10.
  • the image processing device 10 includes a processor 1A, a memory 2A, an input / output interface 3A, a peripheral circuit 4A, and a bus 5A.
  • the peripheral circuit 4A includes various modules.
  • the image processing device 10 does not have to have the peripheral circuit 4A.
  • the image processing device 10 may be composed of a plurality of physically and / or logically separated devices, or may be composed of one physically and / or logically integrated device. When the image processing device 10 is composed of a plurality of physically and / or logically separated devices, each of the plurality of devices can be provided with the above hardware configuration.
  • the bus 5A is a data transmission path for the processor 1A, the memory 2A, the peripheral circuit 4A, and the input / output interface 3A to transmit and receive data to each other.
  • the processor 1A is, for example, an arithmetic processing unit such as a CPU or a GPU (Graphics Processing Unit).
  • the memory 2A is, for example, a memory such as a RAM (RandomAccessMemory) or a ROM (ReadOnlyMemory).
  • the input / output interface 3A includes an interface for acquiring information from an input device, an external device, an external server, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external server, etc. ..
  • the input device is, for example, a keyboard, a mouse, a microphone, a physical button, a touch panel, or the like.
  • the output device is, for example, a display, a speaker, a printer, a mailer, or the like.
  • the processor 1A can issue a command to each module and perform a calculation based on the calculation result thereof.
  • FIG. 4 shows an example of a functional block diagram of the image processing device 10.
  • the image processing device 10 has a first estimation unit 11, a second estimation unit 12, and a third estimation unit 13. These functional parts perform the panoramic process, fisheye process and integration process described above.
  • the configuration of each functional unit will be described separately for each process.
  • FIG. 5 shows the flow of the panoramic process in more detail.
  • the first estimation unit 11 acquires a plurality of time-series fisheye images (fisheye image acquisition process)
  • the first estimation unit 11 panoramicly expands each of them to generate a plurality of time-series panoramic images (panorama).
  • the first estimation unit 11 estimates the person behavior indicated by the time-series plurality of panoramic images based on the time-series plurality of panoramic images and the first estimation model (first estimation process).
  • the panorama process includes a fisheye image acquisition process, a panorama development process and a first estimation process.
  • the first estimation unit 11 acquires a plurality of time-series fisheye images.
  • a fisheye image is an image generated using a fisheye lens.
  • the plurality of time-series fisheye images may be, for example, a moving image, or may be a plurality of continuous still images generated by continuously photographing at predetermined time intervals.
  • acquisition means “the own device goes to fetch the data stored in another device or the storage medium” based on the user input or the instruction of the program (actively). (Acquisition) ”, for example, requesting or inquiring about another device and receiving it, accessing and reading another device or storage medium, and the like may be included. Further, “acquisition” means “inputting data output from another device to the own device (passive acquisition)” based on user input or program instruction, for example, distribution (or distribution (or). , Transmission, push notification, etc.) may be included. In addition, “acquisition” means to select and acquire from received data or information, and “edit data (text conversion, data sorting, partial data extraction, file format change, etc.)". It may include “to generate new data and acquire the new data”.
  • the first estimation unit 11 panoramicly expands each of the plurality of time-series fisheye images to generate a plurality of time-series panoramic images.
  • the panoramic development method will be described, but other methods may be adopted.
  • the first estimation unit 11 determines the reference line L s , the reference point (x c , y c ), the width w, and the height h (see FIG. 1).
  • the first estimation unit 11 detects predetermined plurality of points on the body of each of the plurality of persons from the images in the circular image circle of the fisheye image. Then, the first estimation unit 11 specifies the direction of gravity (vertical direction) at each position of the plurality of persons based on the detected predetermined plurality of points.
  • the first estimation unit 11 may detect a plurality of points (two points) of the body in which the lines connecting each other are parallel to the direction of gravity in the image generated by photographing the standing person from the front. Examples of such a combination of two points include (middle of both shoulders, middle of waist), (tip of head, middle of waist), (tip of head, middle of both shoulders), and the like. Not limited. In the case of this example, the first estimation unit 11 specifies the direction from a predetermined one of the two points detected corresponding to each person toward the other point as the gravity direction.
  • the first estimation unit 11 detects a plurality of points (two points) of the body in which the lines connecting each other are perpendicular to the direction of gravity in the image generated by photographing the standing person from the front. May be good. Examples of such a combination of the two points include (right shoulder, left shoulder), (right waist, left waist), and the like, but are not limited thereto. In the case of this example, the first estimation unit 11 passes through the midpoint of the two points detected corresponding to each person, and the direction in which the line perpendicular to the line connecting the two points extends is defined as the direction of gravity. Identify.
  • the first estimation unit 11 can detect a plurality of points on the body described above by using any image analysis technique.
  • the first estimation unit 11 is described as "an algorithm for detecting predetermined plurality of points of each person's body existing in an image generated by using a standard lens (for example, an angle of view of about 40 ° to about 60 °)".
  • a standard lens for example, an angle of view of about 40 ° to about 60 °
  • the first estimation unit 1 may analyze the image while rotating the fish-eye image. That is, even if the first estimation unit 11 rotates the image in the image circle of the fisheye image and analyzes the image in the image circle after the rotation to detect a predetermined plurality of points of the human body. good.
  • the first estimation unit 11 first analyzes the image in the rotational state shown in FIG. 6, and performs a process of detecting the middle P1 of both shoulders and the middle P2 of the waist of each person. In this case, the first estimation unit 11 could detect the points P1 and P2 of the persons M1 and M2 whose body extension direction is close to the vertical direction in the figure, but could not detect the points P1 and P2 of the other persons. ..
  • the first estimation unit 11 rotates the fisheye image F by 90 °. Then, the state shown in FIG. 7 is obtained.
  • the first estimation unit 11 analyzes the image in this rotational state, and performs a process of detecting the middle P1 of both shoulders and the middle P2 of the waist of each person. In this case, the first estimation unit 11 could detect the points P1 and P2 of the person M5 whose body extension direction is close to the vertical direction in the figure, but could not detect the points P1 and P2 of the other person.
  • the first estimation unit 11 further rotates the fisheye image F by 90 °. Then, the state shown in FIG. 8 is obtained.
  • the first estimation unit 11 analyzes the image in this rotational state, and performs a process of detecting the middle P1 of both shoulders and the middle P2 of the waist of each person. In this case, the first estimation unit 11 could detect the points P1 and P2 of the person M4 whose body extension direction is close to the vertical direction in the figure, but could not detect the points P1 and P2 of the other person.
  • the first estimation unit 11 further rotates the fisheye image F by 90 °. Then, the state shown in FIG. 9 is obtained.
  • the first estimation unit 11 analyzes the image in this rotational state, and performs a process of detecting the middle P1 of both shoulders and the middle P2 of the waist of each person. In this case, the first estimation unit 11 could detect the points P1 and P2 of the person M3 whose body extension direction is close to the vertical direction in the figure, but could not detect the points P1 and P2 of the other person.
  • the first estimation unit 11 can detect predetermined plurality of points of the body of each of the plurality of persons whose body stretches in different directions by analyzing the image while rotating the fisheye image. can.
  • the rotation is performed by 90 °, but this is just an example and is not limited to this.
  • the first estimation unit 11 determines a reference point (x c , y c ) based on the direction of gravity at each position of the plurality of persons in the fisheye image. Then, the first estimation unit 11 stores the determined reference points (x c , y c ) in the storage unit of the image processing device 10.
  • the first estimation unit 11 passes through the positions of the plurality of persons, and when the straight lines extending in the direction of gravity at the positions of the plurality of persons intersect at one point, the intersecting points are used as reference points (x c , y c) . ).
  • the first estimation unit 11 has a distance from each of the plurality of straight lines.
  • a point satisfying a predetermined condition is set as a reference point (x c , y c ).
  • the “straight line that passes through each position and extends in the direction of gravity at each position of the plurality of persons” may be a line connecting the two points detected by the first estimation unit 11.
  • the first estimation unit 11 detects a plurality of points (two points) of the body in which the lines connecting each other are perpendicular to the direction of gravity in the image generated by photographing the standing person from the front, "plurality".
  • the "straight line extending in the direction of gravity at each position of the plurality of persons” passes through the midpoint of the two points detected by the first estimation unit 11 and connects the two points. It may be a line perpendicular to.
  • FIG. 10 shows the concept of the reference point determination process by the first estimation unit 11.
  • the first estimation unit 11 detects the middle P1 of both shoulders and the middle P2 of the waist of each person.
  • the line connecting the points P1 and P2 is "a straight line L1 to L5 that passes through the positions of each of the plurality of persons and extends in the direction of gravity at each of the positions of the plurality of persons".
  • the plurality of straight lines L1 to L5 do not intersect at one point. Therefore, the first estimation unit 11 sets a point at which the distance from each of the plurality of straight lines L1 to L5 satisfies a predetermined condition as a reference point (x c , y c ).
  • the predetermined condition is, for example, "the sum of the distances to each of the plurality of straight lines is the minimum", but the predetermined condition is not limited to this.
  • the first estimation unit 11 can calculate a point satisfying a predetermined condition based on the following equations (1) to (3).
  • each of the straight lines L1 to L5 is shown by the equation (1).
  • k i is the slope of each straight line
  • c i is the intercept of each straight line. From the equations (2) and (3), the point where the sum of the distances to each of the straight lines L1 to L5 is the minimum can be calculated as a reference point (x c , y c ).
  • the reference points (x c , y c ) set in the plurality of fisheye images generated by the camera are the same position. Therefore, when the first estimation unit 11 calculates the reference point (x c , y c ) of one fisheye image by the above processing, the calculated reference point (x c , y c ) is used as the fish eye image. You may register it by associating it with the generated camera. After that, the reference points (x c , y c ) are not calculated for the fisheye image generated by the camera, and the registered reference points (x c , y c ) are read out. You may use it.
  • the first estimation unit 11 complements the image in the image circle of the fisheye image. And generate a complementary circular image.
  • the reference point (x c , y c ) coincides with the center of the image in the image circle of the fisheye image, the first estimation unit 11 does not perform complementation of the image.
  • the complementary circular image is an image obtained by adding a complementary image to the image in the image circle, and is a circular image centered on a reference point (x c , y c ).
  • the maximum value of the distance from the reference point (x c , y c ) to the point on the outer circumference of the image in the image circle is the radius, and the image in the image circle may be inscribed.
  • the complementary image added to the image in the image circle may be a monochromatic (eg, black) image, an arbitrary pattern image, or any other.
  • FIG. 11 shows an example of the complementary circular image C2 generated by the first estimation unit 11.
  • a complementary circular image C2 is generated by adding a black single color complementary image to the image C1 in the image circle of the fisheye image F.
  • the complementary circular image C2 is circular as shown, and the reference point (x c , y c ) is the center thereof.
  • the radius r of the complementary circular image C2 is the maximum value of the distance from the reference point (x c , y c ) to the point on the outer circumference of the image C1 in the image circle.
  • the image C1 in the image circle is inscribed in the complementary circular image C2.
  • the reference line L s is a line connecting a reference point (x c , y c ) and an arbitrary point on the outer periphery of a circular image (image C1 in an image circle, complementary circular image C2, etc.).
  • the position of the reference line L s is the position to be cut open when the circular image is panoramicly developed.
  • the first estimation unit 11 can set, for example, a reference line L s that does not overlap with a person. By setting the reference line L s in this way, it is possible to suppress the inconvenience that the person is separated into two parts in the panoramic image.
  • the first estimation unit 11 does not set the reference line Ls within a predetermined distance from the plurality of points of the body of each person detected in the above process, and places the reference line Ls at a place separated from the plurality of detected points by a predetermined distance or more.
  • the reference line L s may be set.
  • width w is the width of the panoramic image
  • height h is the height of the panoramic image.
  • the first estimation unit 11 panoramicly expands the fisheye image and generates a panoramic image.
  • the reference point (x c , y c ) is different from the center of the image in the image circle of the fisheye image
  • the first estimation unit 11 panoramicly expands the complementary circular image to generate a panoramic image.
  • the reference point (x c , y c ) coincides with the center of the image in the image circle of the fisheye image
  • the first estimation unit 11 panoramicly expands the image in the image circle of the fisheye image to obtain a panoramic image.
  • the first estimation unit 11 can be panoramicly developed by using the method described with reference to FIG.
  • the first estimation unit 11 detects predetermined plurality of points on the body of each of the plurality of persons from the images in the image circle (S10). For example, the first estimation unit 11 detects the middle P1 of both shoulders and the middle P2 of the waist of each person.
  • the first estimation unit 11 analyzes the image in the image circle and detects a predetermined plurality of points on the body of each of the plurality of persons (S20). After that, the first estimation unit 11 rotates the image in the image circle by a predetermined angle (S21).
  • the predetermined angle is, for example, 90 °, but is not limited thereto.
  • the first estimation unit 11 analyzes the image in the image circle after rotation, and detects a predetermined plurality of points on the body of each of the plurality of persons (S22). Then, when the total rotation angle has not reached 360 ° (No in S23), the first estimation unit 11 returns to S21 and repeats the same process. On the other hand, when the total rotation angle reaches 360 ° (Yes in S23), the first estimation unit 11 ends the process.
  • the first estimation unit 11 identifies the direction of gravity at each position of the plurality of persons based on the predetermined plurality of points detected in S10 (S11). For example, the first estimation unit 11 specifies the direction from the center P1 of both shoulders of each person to the center P2 of the waist as the direction of gravity at the position of each person.
  • the first estimation unit 11 passes through the positions of each of the plurality of persons and calculates a straight line extending in the direction of gravity at each position (S12).
  • the first estimation unit 11 sets the intersecting points as reference points (x c , y c ) (S14).
  • the first estimation unit 11 finds a point where the distance from each of the plurality of straight lines satisfies a predetermined condition (example: shortest), and determines that point. It is set as a reference point (x c , y c ) (S15).
  • the first estimation unit 11 uses FIG. Using the method described, the image in the image circle of the fisheye image is panoramicly expanded to generate a panoramic image (S33). That is, in this case, the generation of the complementary circular image and the panoramic expansion of the complementary circular image are not performed.
  • the first estimation unit 11 is a complementary circular image. Is generated (S31).
  • the complementary circular image is a circular image in which the complementary image is added to the image in the image circle, and the reference point (x c , y c ) is the center of the circle.
  • the maximum value of the distance from the reference point (x c , y c ) to the point on the outer circumference of the image in the image circle is the radius, and the image in the image circle may be inscribed.
  • the complementary image added to the image in the image circle may be a monochromatic (eg, black) image, an arbitrary pattern image, or any other.
  • the first estimation unit 11 panoramicly develops the complementary circular image by using the method described with reference to FIG. 1 and generates a panoramic image (S32).
  • the first estimation unit 11 estimates the person behavior indicated by the time-series plurality of panoramic images based on the generated time-series plurality of panoramic images and the first estimation model.
  • the first estimation unit 11 generates three-dimensional feature information indicating the time change of the feature of each position in the image from a plurality of time-series panoramic images.
  • the first estimation unit 11 can generate three-dimensional feature information based on 3D CNN (for example, a convolutional deep learning network such as 3D Resnet, but is not limited to this).
  • the first estimation unit 11 generates person position information indicating a position where a person exists in each of a plurality of time-series panoramic images.
  • the first estimation unit 11 can generate person position information indicating the position where each of the plurality of people is present.
  • the first estimation unit 11 extracts a silhouette (whole body) of a person in an image, and generates person position information indicating an area in the image including the extracted silhouette.
  • the first estimation unit 11 is a "deep learning network for object recognition" that recognizes any object (for example, a person) from a flat image or video more specifically at high speed and with high accuracy based on deep learning technology. Person position information can be generated based on.
  • the deep learning network for object recognition examples include, but are not limited to, Mask-RCNN, RCNN, Fast RCNN, and Faster RCNN.
  • the first estimation unit 11 may perform the same person detection process for each of a plurality of time-series panoramic images, or may use the person tracking technique to track a person once detected in the image. And the position may be specified.
  • the first estimation unit 11 estimates the person behavior shown by the plurality of panoramic images based on the time change of the feature indicated by the three-dimensional feature information at the position where the person indicated by the person position information exists. For example, the first estimation unit 11 corrects the three-dimensional feature information to change the value at a position other than the position where the person indicated by the person position information exists to a predetermined value (example: 0), and then performs the correction. It is possible to estimate the human behavior shown by a plurality of images based on the corrected three-dimensional feature information. The first estimation unit 11 can estimate the person's behavior based on the first estimation model generated in advance by machine learning and the corrected three-dimensional feature information.
  • the first estimation model may be a model that estimates human behavior generated by machine learning based on an image (learning data) generated using a standard lens (for example, an angle of view of about 40 ° to about 60 °). can.
  • the first estimation model may be a model that estimates human behavior generated by machine learning based on a panoramic image (learning data) generated by panoramic expansion of a fisheye image.
  • the first estimation unit 11 acquires a plurality of time-series panoramic images by executing the panoramic expansion process (S40).
  • the first estimation unit 11 After that, the first estimation unit 11 generates three-dimensional feature information indicating the time change of the feature of each position in the image from a plurality of panoramic images in time series (S41). In addition, the first estimation unit 11 generates person position information indicating a position where a person exists in each of the plurality of panoramic images (S42).
  • the first estimation unit 11 estimates the person behavior shown by the plurality of images based on the time change of the feature indicated by the three-dimensional feature information at the position where the person indicated by the person position information exists (S43).
  • the first estimation unit 11 has acquired, for example, a time-series panoramic image (16 ⁇ 2451 ⁇ 800) for 16 frames. Then, the first estimation unit 11 convolves the panoramic image for 16 frames into 512 channels based on 3D CNN (for example, a convolutional deep learning network such as 3D Resnet, but is not limited to this). Generates 3D feature information (512 ⁇ 77 ⁇ 25). Further, the first estimation unit 11 generates person position information (binary Mask in the figure) indicating the position where the person exists in each of the images for 16 frames based on the deep learning network of object recognition such as Mask-RCNN. do. In the illustrated example, the person position information indicates the position of each of a plurality of rectangular areas including each person.
  • 3D CNN for example, a convolutional deep learning network such as 3D Resnet, but is not limited to this.
  • 3D feature information 512 ⁇ 77 ⁇ 25
  • the first estimation unit 11 generates person position information (binary Mask in the figure) indicating the position where the person
  • the first estimation unit 11 makes a correction for the three-dimensional feature information to change the value at the position other than the position where the person indicated by the person position information exists to a predetermined value (example: 0).
  • the first estimation unit 11 divides the three-dimensional feature information into N blocks (each having a width of k), passes through an Average Pooling, flatten, fully-connected layer, and the like, and a plurality of predefined ones. The probability (output value) that each of the categories (personal behavior) of is included is obtained for each block.
  • 19 categories are defined and learned.
  • the 19 categories are “walking”, “running”, “waving”, “picking up”, “throwing away”, “taking off the jacket”, “wearing the jacket”, “calling”, “calling”.
  • Use your smartphone eat snacks, go up the stairs, go down the stairs, drink water, shake hands, take things from someone else's pocket, give things to others , “Push another person”, “Hold the card to enter the station yard”, “Hold the card to exit the station ticket gate”, but it is not limited to these.
  • the processing device 20 estimates that the person behavior corresponding to the category whose probability is equal to or higher than the threshold value is shown in the image.
  • N instance scores indicate the probability that each of the N blocks included in a plurality of time-series panoramic images includes each of the above 19 categories.
  • Final scores of the panorama branch for clip 1 indicates the probability that a plurality of time-series panoramic images include each of the above 19 categories.
  • the details of the process of calculating Final scores of the panorama branch for clip 1 from N instance scores are not particularly limited, but an example will be described below.
  • the fisheye process is performed by the second estimation unit 12. As shown in FIG. 5, when the second estimation unit 12 acquires a plurality of time-series fish-eye images (fish-eye image acquisition process), a part of the region is cut out from each of the time-series multiple fish-eye partial images. (First cutting process). After that, the second estimation unit 12 edits the generated time-series plurality of fish-eye partial images, and generates a time-series plurality of edited fish-eye partial images for each person included in the fish-eye partial image. (Editing process).
  • the second estimation unit 12 estimates the human behavior indicated by the time-series plurality of edited fish-eye partial images based on the time-series plurality of edited fish-eye partial images and the second estimation model (second). 2 estimation process).
  • the fisheye process includes a fisheye image acquisition process, a first cutting process, an editing process and a second estimation process.
  • the second estimation unit 12 acquires a plurality of time-series fisheye images.
  • the fisheye image acquisition process executed by the second estimation unit 12 is the same as the fisheye image acquisition process executed by the first estimation unit 11 described in the panoramic process, and thus the description thereof is omitted here.
  • the second estimation unit 12 cuts out a part of each of the time-series plurality of fish-eye images to generate a time-series plurality of fish-eye partial images.
  • the second estimation unit 12 cuts out an image in a circular region having a radius R centered on the reference point (x c , y c ) described in the panorama process as a fisheye partial image.
  • the radius R may be a preset fixed value. In addition, it may be a variable value determined based on the analysis result of the fisheye image.
  • the radius R (the size of the fisheye partial image) may be determined based on the detection result (number of detected persons) of a person existing in a preset central region in the fisheye image, for example. The larger the number of detected people, the larger the radius R.
  • the second estimation unit 12 edits the generated time-series plurality of fish-eye partial images, and prepares a time-series plurality of edited fish-eye partial images for each person included in the fish-eye partial image. Generate. Hereinafter, it will be described in detail.
  • the second estimation unit 12 analyzes the fish-eye partial image and detects a person included in the fish-eye partial image. For the detection of a person, as in the process described in the panorama process (process of FIG. 13), a method of detecting a person by analyzing the fisheye partial image at each rotation position while rotating the fisheye partial image is adopted. May be good.
  • a person included in the fisheye partial image may be detected based on a person detection model generated by machine learning using the fisheye image as training data.
  • the second estimation unit 12 may perform the same person detection process for each of a plurality of time-series fisheye partial images, or may use a person tracking technique to perform a moving image of a person once detected. It may be tracked within to locate it.
  • the second estimation unit 12 executes a rotation process for rotating the fish-eye partial image and a second cutting process for cutting out a partial area of a predetermined size for each detected person. , Generate a fisheye partial image after editing.
  • the fisheye partial image is rotated so that the direction of gravity at each person's position is the vertical direction on the image.
  • the means for specifying the direction of gravity at the position of each person is as described in the panoramic process, but other methods may be used.
  • an image of a predetermined size including each person is cut out from the fisheye partial image after the rotation process.
  • the shape and size of the image to be cut out are predefined.
  • the second estimation unit 12 cuts out a part of the area in the image C1 in the image circle of the fisheye image F as the fisheye partial image C3 (first cut). Out process). This process is executed for each fisheye image F.
  • the second estimation unit 12 detects a person in the fish-eye partial image C3. In the illustrated example, two people have been detected.
  • the second estimation unit 12 executes a rotation process for the fish-eye partial image C3 for each detected person.
  • the direction of gravity at the position of each person is the vertical direction on the image. This process is executed for each fish-eye partial image C3.
  • the second estimation unit 12 cuts out an image of a predetermined size including each person from the rotated fisheye partial image C3 for each detected person. After editing, a fisheye partial image C4 is generated. This process is executed for each detected person and for each fish-eye partial image C3.
  • the second estimation unit 12 is shown by the time-series plurality of edited fish-eye partial images based on the generated time-series multiple edited fish-eye partial images and the second estimation model. Estimate human behavior.
  • the estimation process of the person behavior by the second estimation unit 12 is basically the same as the estimation process of the person behavior by the first estimation unit 11.
  • the second estimation unit 12 is a three-dimensional feature showing a time change of the feature of each position in the image from a plurality of edited fisheye partial images in time series corresponding to the first person. Generate information.
  • the second estimation unit 12 can generate three-dimensional feature information based on 3D CNN (for example, a convolutional deep learning network such as 3D Resnet, but is not limited to this). After that, the second estimation unit 12 performs a process of emphasizing the value of the position where the person is detected with respect to the generated three-dimensional feature information.
  • 3D CNN for example, a convolutional deep learning network such as 3D Resnet, but is not limited to this.
  • the second estimation unit 12 integrates the probabilities that each of the plurality of edited fisheye partial images corresponding to each person includes each of the plurality of categories (personal behaviors), and the second estimation unit 12 integrates the probabilities that each of the plurality of edited fisheye partial images is included in the fisheye partial image. Performs an operation to calculate the probability that each of the categories (personal behavior) is included.
  • the second estimation unit 12 analyzes the fish-eye partial image, which is a part of the fish-eye image, without panoramic expansion, and performs the human behavior indicated by the fish-eye partial image. presume.
  • the integration process is performed by the third estimation unit 13. As shown in FIG. 5, the third estimation unit 13 is based on the estimation result based on the panoramic image obtained by the panoramic process and the estimation result based on the fisheye partial image obtained by the fisheye process. Estimate the person behavior shown in the image.
  • both the estimation result based on the panoramic image and the estimation result based on the fish-eye partial image indicate the probability of including each of a plurality of predefined human behaviors.
  • the third estimation unit 13 calculates the probability that the fish-eye image includes each of the plurality of predefined human behaviors by a predetermined arithmetic process based on the estimation result based on the panoramic image and the estimation result based on the fish-eye partial image. ..
  • FIG. 19 is an example of a block diagram of the image processing device 10 of this embodiment.
  • the basic configuration of the image processing device 10 is composed of a panoramic process, a fisheye process, and an integrated process.
  • the basic configuration of each process is also as described above.
  • FIG. 20 is a flowchart showing a processing flow of the image processing apparatus 10 of this embodiment.
  • the image processing device 10 divides a plurality of input time-series fisheye images into a plurality of clips for each predetermined number.
  • FIG. 21 shows a specific example. In the illustrated example, 120 time-series fisheye images are input and they are divided into 8 clips. Each clip contains 16 fisheye images, and only the last one clip contains 8 fisheye images. After that, the fisheye process (S102 to S108), the panorama process (S109 to S115), and the integration process (S116) are executed for each clip.
  • FIGS. 17 and 18 Details of the fish-eye process (S102 to S108) are shown in FIGS. 17 and 18.
  • the image processing apparatus 10 extracts a part of each region of the plurality of time-series fisheye images F to generate a plurality of time-series fisheye partial images C3 (S102, FIG. 17 (A). ) ⁇ (B)).
  • the image processing device 10 detects a person from a plurality of time-series fisheye partial images C3 and tracks them in the moving image (S103, (B) ⁇ (C) in FIG. 17).
  • the image processing device 10 performs a rotation process ((C) ⁇ (D) in FIG. 17) with respect to the fish-eye partial image C3 for each detected person, and each of the rotated fish-eye partial images C3.
  • a process of cutting out an image of a predetermined size including a person ((D) ⁇ (E) in FIG. 17) is executed (S104).
  • S104 a plurality of edited fisheye partial images C4 in chronological order can be obtained for each detected person.
  • the image processing apparatus 10 convolves a plurality of edited fisheye partial images in time series for each detected person into a 3D CNN (for example, 3D Resnet or the like). Input to a learning network, etc., but not limited to this) to generate 3D feature information. Further, the image processing device 10 performs a process of emphasizing the value of the position where the person is detected with respect to the generated three-dimensional feature information.
  • a 3D CNN for example, 3D Resnet or the like.
  • the image processing device 10 performs a process of emphasizing the value of the position where the person is detected with respect to the generated three-dimensional feature information.
  • the image processing device 10 concatenates the three-dimensional feature information obtained for each person (S106). After that, the image processing device 10 passes through the Average Pooling, flatten, fully-connected layers, etc., and a plurality of time-series edited fisheye partial images in which each of the plurality of predefined categories (personal behavior) corresponds to each person. The probability (output value) included in is obtained (S107).
  • the image processing device 10 integrates the probability that each of the plurality of edited fisheye partial images corresponding to each person includes each of the plurality of categories (human behavior), and the plurality of time-series fisheye portions.
  • An operation is performed to calculate the probability that each of a plurality of categories (personal behavior) is included in the image (S108).
  • a function that returns statistical values of a plurality of values For example, the average function that returns the average value (see equation (4) above), the max function that returns the maximum value (see equation (5) above), and the log-sum-exp function that smoothly approximates the max function (see equation (6) above). ), Etc. are conceivable.
  • the image processing device 10 is, but is not limited to, a convolutional deep learning network such as 3D CNN (for example, 3D Resnet) after panoramic expansion of each of a plurality of time-series fisheye images (S109). ), The three-dimensional feature information (512 ⁇ 77 ⁇ 25) convolved in 512 channels is generated from the plurality of panoramic images in this time series (S110). Further, the image processing device 10 generates person position information indicating a position where a person exists in each of a plurality of time-series panoramic images based on a deep learning network of object recognition such as Mask-RCNN (S112).
  • 3D CNN for example, 3D Resnet
  • the image processing apparatus 10 makes a correction for changing the value at the position other than the position where the person indicated by the person position information generated in S112 exists to a predetermined value (example: 0) in the three-dimensionally generated in S110. This is performed for the feature information (S111).
  • the image processing apparatus 10 divides the three-dimensional feature information into N blocks (each having a width of k) (S113), passes through an Average Pooling, flatten, fully-connected layer, and the like, and is defined in advance.
  • the probability (output value) that each of the plurality of categories (personal behavior) is included is obtained for each block (S114).
  • the image processing device 10 integrates the probabilities that each of the plurality of categories (personal behaviors) obtained for each block is included, and the probability that each of the plurality of categories (personal behaviors) is included in the plurality of time-series panoramic images. Is calculated (S115).
  • a function that returns statistical values of a plurality of values For example, the average function that returns the average value (see equation (4) above), the max function that returns the maximum value (see equation (5) above), and the log-sum-exp function that smoothly approximates the max function (see equation (6) above). ), Etc. are conceivable.
  • the image processing apparatus 10 has a "probability that each of a plurality of categories (human behavior) is included in a plurality of time-series fish-eye partial images" obtained by the fish-eye process and a "time-series” obtained by the panorama process.
  • the probability that multiple categories (personal behavior) will be included in multiple panoramic images ” is integrated, and the probability that multiple categories (personal behavior) will be included in multiple time-series fisheye images included in each clip. Perform the calculation operation (see S116, FIG. 22). In the arithmetic processing, it is conceivable to use a function that returns statistical values of a plurality of values.
  • the "probability that each of the multiple categories (personal behavior) is included in the plurality of time-series fisheye images included in each clip” can be obtained.
  • a plurality of "probabilities that a plurality of categories (personal behaviors) are included in a plurality of time-series fisheye images included in each clip” obtained for each clip are integrated, and "120 input data are input.”
  • a calculation is performed to calculate the "probability that each of a plurality of categories (personal behavior) is included in a time-series fish-eye image" (see FIG. 22).
  • the image processing device 10 outputs the calculation result (S118) and specifies the position of the person behavior predicted to be included (S119).
  • the image processing device 10 applies the sigmoid function to "include each of a plurality of categories (human behavior) in the input 120 time-series fisheye images. "Probability of being” is converted into a value of 0 to 1. Then, learning is performed so as to optimize the value of the illustrated Total loss1 function.
  • FIG. 23 shows the flow of the modified example.
  • the configuration of the panoramic process is different from the above-described embodiment.
  • the panoramic process of the modified example will be described in detail.
  • the first estimation unit 11 analyzes the image and calculates the first estimation result of the person behavior shown by the plurality of panoramic images in time series.
  • the process is the same as the process of the panorama process described in the above embodiment.
  • the first estimation unit 11 analyzes the optical flow image generated from the panoramic image and calculates the second estimation result of the person behavior indicated by the panoramic image.
  • An optical flow image is an image of a vector representing the movement of an object in a plurality of panoramic images in a time series.
  • the first estimation unit 11 estimates the person behavior shown by the plurality of time-series panoramic images based on the first estimation result and the second estimation result. This estimation result is integrated with the estimation result obtained by the fisheye process.
  • the image processing device 10 generates a panoramic image, a fisheye partial image, and an edited fisheye partial image, but other devices different from the image processing device 10 perform these processes. You may do at least one of them. Then, an image generated by another device (at least one of a panoramic image, a fisheye partial image, and an edited fisheye partial image) may be input to the image processing device 10. In this case, the image processing device 10 performs the above-mentioned processing using the input image.
  • the generated panoramic image is processed to eliminate information on the part corresponding to a part of the area extracted by the fisheye process (hereinafter, “the part") (example: that part is made into a single color, or a predetermined part is specified. You may execute (to make a pattern). Then, the person behavior may be estimated based on the panoramic image after the processing and the first estimation model. Since the person behavior included in the part is estimated by the fish-eye process, the information of the part can be removed from the panoramic image. However, if there is a person who straddles that part and another part, a situation may occur in which the estimation accuracy of the person's behavior deteriorates. Therefore, it is preferable to execute the process without losing the information of the portion from the panoramic image as in the above embodiment.
  • the second estimation unit 12 analyzes the fish-eye partial image and detects a person included in the fish-eye partial image.
  • the second estimation unit 12 may perform the following processing. First, the second estimation unit 12 analyzes the fish-eye image and detects a person included in the fish-eye image. After that, the second estimation unit 12 detects a person whose detection position (coordinates) in the fisheye image satisfies a predetermined condition (within a region cut out as a fisheye partial image) among the persons detected from the fisheye image. do.
  • the process of detecting a person from a fisheye image is realized by an algorithm similar to the algorithm of the process of detecting a person from the fisheye partial image described above. According to the modification, the detection accuracy of the person included in the fish-eye partial image is improved.
  • a process of estimating the human behavior of a person included in the fisheye image can be considered by executing only the panorama process without executing the fisheye process and the integration process.
  • the panoramic process and the integration process are not executed, and the entire fisheye image is processed without panoramic expansion in the same manner as the fisheye process described above.
  • a process of estimating the personal behavior of the person included in is conceivable.
  • the number of images to be generated and processed becomes enormous, and the processing load on the computer increases.
  • the process is the same as the fisheye process described above, the person included in the fisheye image is detected, and the orientation of each person in the image is adjusted for each person to support multiple images (corresponding to the edited fisheye partial image). ) Is generated and processed to estimate the personal behavior of each of the plurality of people.
  • the number of detected persons increases, the number of images to be generated and processed becomes enormous.
  • the image processing device 10 of the present embodiment can solve these problems.
  • the image processing device 10 of the present embodiment analyzes and estimates the human behavior estimated by analyzing the panoramic image and a part of the image near the reference point (x c , y c ) of the fisheye image without panoramic expansion.
  • the human behavior of the person included in the fish-eye image is estimated by integrating it with the human behavior.
  • the second estimation means is The image in the circular region centered on the reference point in the fisheye image determined based on the gravity direction at the position of each of the plurality of persons existing in the fisheye image is defined as the fisheye partial image 1.
  • the second estimation means is The image processing apparatus according to any one of 1 to 3, which determines the size of the fisheye partial image based on the detection result of a person existing in the fisheye image. 5.
  • the second estimation means is A process of rotating the fish-eye partial image and a process of cutting out a partial area of a predetermined size are executed to generate an edited fish-eye partial image for each person detected in the fish-eye partial image.
  • the image processing apparatus according to any one of 1 to 4, wherein the edited fish-eye partial image is analyzed and the person behavior indicated by the fish-eye partial image is estimated. 6.
  • Both the estimation result based on the panoramic image and the estimation result based on the fisheye partial image indicate the probability of including each of a plurality of predefined human behaviors.
  • the third estimation means is a predetermined arithmetic process based on the estimation result based on the panoramic image and the estimation result based on the fisheye partial image, and the fisheye image includes each of the plurality of predefined human actions.
  • the image processing apparatus for calculating the probability.
  • the first estimation means is The panoramic image is image-analyzed to calculate the first estimation result of the person behavior indicated by the panoramic image.
  • the optical flow image generated from the panoramic image is image-analyzed to calculate the second estimation result of the person behavior indicated by the panoramic image.
  • the image processing apparatus according to any one of 1 to 6, which estimates a person's behavior indicated by the panoramic image based on the first estimation result and the second estimation result.
  • the computer The panoramic image obtained by panoramicly expanding the fisheye image generated by the fisheye lens camera is analyzed, and the human behavior indicated by the panoramic image is estimated.
  • the fisheye partial image which is a partial region of the fisheye image, is image-analyzed without panoramic expansion, and the human behavior indicated by the fisheye partial image is estimated.
  • An image processing method for estimating a person's behavior indicated by a fisheye image based on an estimation result based on the panoramic image and an estimation result based on the fisheye partial image 9.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Image Analysis (AREA)
PCT/JP2020/036225 2020-09-25 2020-09-25 画像処理装置、画像処理方法及びプログラム Ceased WO2022064632A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022551516A JP7589741B2 (ja) 2020-09-25 2020-09-25 画像処理装置、画像処理方法及びプログラム
PCT/JP2020/036225 WO2022064632A1 (ja) 2020-09-25 2020-09-25 画像処理装置、画像処理方法及びプログラム
US18/026,407 US20230368576A1 (en) 2020-09-25 2020-09-25 Image processing apparatus, image processing method, and non-transitory storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/036225 WO2022064632A1 (ja) 2020-09-25 2020-09-25 画像処理装置、画像処理方法及びプログラム

Publications (1)

Publication Number Publication Date
WO2022064632A1 true WO2022064632A1 (ja) 2022-03-31

Family

ID=80846326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/036225 Ceased WO2022064632A1 (ja) 2020-09-25 2020-09-25 画像処理装置、画像処理方法及びプログラム

Country Status (3)

Country Link
US (1) US20230368576A1 (https=)
JP (1) JP7589741B2 (https=)
WO (1) WO2022064632A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023248968A1 (ja) * 2022-06-21 2023-12-28 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 画像加工方法、画像加工装置、及び画像加工プログラム

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7485200B2 (ja) * 2020-08-13 2024-05-16 日本電気株式会社 画像拡張装置、制御方法、及びプログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016178406A (ja) * 2015-03-19 2016-10-06 パナソニックIpマネジメント株式会社 撮像装置、録画装置および映像出力制御装置
JP2017162432A (ja) * 2016-03-07 2017-09-14 株式会社リコー 画像処理システム、情報処理装置、情報端末、プログラム
JP2020053019A (ja) * 2018-07-16 2020-04-02 アクセル ロボティクス コーポレーションAccel Robotics Corp. 自律店舗追跡システム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016178406A (ja) * 2015-03-19 2016-10-06 パナソニックIpマネジメント株式会社 撮像装置、録画装置および映像出力制御装置
JP2017162432A (ja) * 2016-03-07 2017-09-14 株式会社リコー 画像処理システム、情報処理装置、情報端末、プログラム
JP2020053019A (ja) * 2018-07-16 2020-04-02 アクセル ロボティクス コーポレーションAccel Robotics Corp. 自律店舗追跡システム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MOKUJI, YASUO: "Video cameras becoming sensors", NIKKEI COMPUTER, 31 October 2005 (2005-10-31), pages 66 - 70 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023248968A1 (ja) * 2022-06-21 2023-12-28 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 画像加工方法、画像加工装置、及び画像加工プログラム

Also Published As

Publication number Publication date
US20230368576A1 (en) 2023-11-16
JP7589741B2 (ja) 2024-11-26
JPWO2022064632A1 (https=) 2022-03-31

Similar Documents

Publication Publication Date Title
CN101393605B (zh) 图像处理设备和图像处理方法
CN109934065B (zh) 一种用于手势识别的方法和装置
US7995805B2 (en) Image matching apparatus, image matching method, computer program and computer-readable storage medium
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
CN113516146B (zh) 一种数据分类方法、计算机及可读存储介质
JP6907774B2 (ja) 物体検出装置、物体検出方法、およびプログラム
CN106326853B (zh) 一种人脸跟踪方法及装置
US11315358B1 (en) Method and system for detection of altered fingerprints
JP2021047538A (ja) 画像処理装置、画像処理方法、及びプログラム
Kerdvibulvech A methodology for hand and finger motion analysis using adaptive probabilistic models
CN108875506B (zh) 人脸形状点跟踪方法、装置和系统及存储介质
CN112926461A (zh) 神经网络训练、行驶控制方法及装置
WO2022064632A1 (ja) 画像処理装置、画像処理方法及びプログラム
Dalara et al. Entity recognition in Indian sculpture using CLAHE and machine learning
CN116110090B (zh) 一种基于人脸关键点的高效伪造人脸视频检测方法
US12506970B2 (en) Image processing device, image processing method, and non-transitory storage medium
CN116543261A (zh) 用于图像识别的模型训练方法、图像识别方法设备及介质
WO2019230965A1 (ja) 物体らしさ推定装置、方法、およびプログラム
KR102237131B1 (ko) 객체를 포함하는 이미지를 처리하는 장치 및 방법
CN114842539B (zh) 基于注意力机制和一维卷积滑窗的微表情发现方法及系统
JP7218804B2 (ja) 処理装置、処理方法及びプログラム
Gavaraskar et al. Licence Plate Detection and Recognition with OCR Using Machine Learning Techniques
Dommati et al. Real-time 3D texture and motion analysis for face anti-spoofing using deep learning and computer vision
CN117593629B (zh) 刷掌识别方法、装置、设备及存储介质
Srinivas et al. E-CNN-FFE: An Enhanced Convolutional Neural Network for Facial Feature Extraction and Its Comparative Analysis with FaceNet, DeepID, and LBPH Methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20955226

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022551516

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20955226

Country of ref document: EP

Kind code of ref document: A1