WO2023053249A1 - 学習装置、推定装置、学習方法、推定方法及びプログラム - Google Patents
学習装置、推定装置、学習方法、推定方法及びプログラム Download PDFInfo
- Publication number
- WO2023053249A1 WO2023053249A1 PCT/JP2021/035782 JP2021035782W WO2023053249A1 WO 2023053249 A1 WO2023053249 A1 WO 2023053249A1 JP 2021035782 W JP2021035782 W JP 2021035782W WO 2023053249 A1 WO2023053249 A1 WO 2023053249A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- person
- image
- visible
- processed image
- keypoint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- the present invention relates to a learning device, an estimation device, a learning method, an estimation method and a program.
- Patent Document 1 and Non-Patent Document 1 disclose techniques for extracting key points of a person's body from an image using a trained model.
- Non-Patent Document 1 for a map obtained by dividing an image into a grid pattern, a map indicating the position of a person (the central position of the person) as a likelihood, and a map position indicating the position of the person.
- a neural network is configured to output a map indicating the position and the correction amount of the joint position.
- Non-Patent Document 1 an input is an image, and a neural network that outputs each of the above maps is used to estimate the joint positions of a person from the image.
- a neural network that outputs each of the above maps is used to estimate the joint positions of a person from the image.
- the learning data is data that associates a teacher image that includes a person with a correct label that indicates the position in the teacher image of each of a plurality of key points of the person's body.
- circles indicate the positions of each of a plurality of key points within the teacher image. It should be noted that the types and number of key points illustrated are merely examples, and are not limited to these.
- FIG. 2 When using a teacher image in which some keypoints are not visible as learning data, in the prior art, as shown in FIG. Learning is performed by preparing a correct label that also indicates the position of the point in the teacher image.
- the feet of the person positioned in the foreground are hidden by the shield and cannot be seen.
- the keypoint at the person's feet is specified on a cover that obscures the person's feet. For example, the operator predicts the positions of hidden key points in the teacher image based on the visible parts of the human body, and creates correct labels as shown in FIG.
- the positions of the keypoints are learned with image patterns that do not show the appearance features of the keypoints.
- the operator predicts the positions of keypoints in the training image that are not actually visible in the image to create correct labels, there is a possibility that the positions of the actual keypoints may be deviated. For these reasons, for example, in the case of the prior art, there is a problem that the estimation accuracy decreases when the training data includes an image in which some of the key points are not visible.
- the present invention is a technique for extracting keypoints of a person's body from images using a trained model, and the accuracy of estimation decreases when learning data includes images in which some of the keypoints are not visible.
- the task is to reduce the problem of
- a teacher image including a person, a correct label indicating the position of each person, a correct label indicating whether or not each of a plurality of key points on the body of each person is visible in the teacher image, and a plurality of key points.
- Acquisition means for acquiring learning data associated with a correct label indicating the position in the teacher image of the key point visible in the teacher image inside;
- learning means for learning an estimation model for estimating information relating to the location of each keypoint for computing the location of the keypoint in the processed image;
- a learning device is provided having:
- the computer A teacher image including a person, a correct label indicating the position of each person, a correct label indicating whether or not each of a plurality of key points on the body of each person is visible in the teacher image, and a plurality of key points.
- a learning method is provided for performing
- the computer A teacher image including a person, a correct label indicating the position of each person, a correct label indicating whether or not each of a plurality of key points on the body of each person is visible in the teacher image, and a plurality of key points.
- Acquisition means for acquiring learning data associated with a correct label indicating the position in the teacher image of the key point visible in the teacher image in Information indicating the position of each person, information indicating whether or not each of the plurality of key points of each person included in the processed image is visible in the processed image, based on the learning data; learning means for learning an estimation model for estimating information relating to the position of each keypoint for calculating the position of the keypoint within the processed image;
- a program is provided to act as a
- An estimating device comprising estimating means for estimating a position within the processed image of each of a plurality of keypoints of each person contained in the processed image using the estimating model trained by the learning device.
- the computer An estimating method is provided for performing an estimating step of estimating a position within the processed image of each of a plurality of key points of each person contained in the processed image using the estimating model trained by the learning device.
- the computer A program is provided that functions as estimation means for estimating positions in the processed image of each of a plurality of key points of each person included in the processed image, using the estimation model learned by the learning device.
- the estimation accuracy is can reduce the problem of a decrease in
- FIG. 1 is an example of a functional block diagram of a learning device according to an embodiment
- FIG. 4 is a flow chart showing an example of the flow of processing of the learning device of the present embodiment; It is a figure which shows an example of the hardware constitutions of the learning apparatus of this embodiment, and an estimation apparatus. It is an example of the functional block diagram of the estimation apparatus of this embodiment. It is an example of the functional block diagram of the estimation apparatus of this embodiment. It is a figure for demonstrating the process of the estimation apparatus of this embodiment. It is a figure for demonstrating the process of the estimation apparatus of this embodiment. It is a flowchart which shows an example of the flow of a process of the estimation apparatus of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating the technique of this embodiment. It is a figure for demonstrating
- the learning device 10 of the present embodiment learns by excluding information on keypoints that are not visible in the image, so that when learning data includes an image in which a part of the keypoint is not visible, the estimation accuracy is reduce the problem of declining
- Non-Patent Document 1 First, the technology described in Non-Patent Document 1 will be described. As shown in FIG. 3, in the case of the technique described in Non-Patent Document 1, when an image is input to the neural network, a plurality of data as shown is output. In other words, the neural network described in Non-Patent Document 1 is composed of a plurality of layers that output a plurality of data as illustrated.
- FIG. 5 shows a diagram in which explanations indicating the concept of each data in FIG. 4 are added to the original image of the data in FIG.
- “Likelihood of human position” is data indicating the likelihood of the position in the image of the central position of the human body. For example, the human body is detected in the image based on the feature amount of the appearance of the human body, and data indicating the likelihood of the central position of the human body is output based on the detection result. As shown, the data indicates the likelihood that the central position of the human body is located in each of a plurality of grids obtained by dividing the image. Note that the method of dividing an image into grids is a matter of design, and the number and size of grids shown in the figure are merely examples. According to the data shown in FIG.
- the third grid from the left and the third grid from the bottom” and “the second grid from the right and the third grid from the top” correspond to the center position of the human body. Identified as a grid of locations.
- the data of the "correction amount of human position” is the amount of movement in the x direction and the amount of movement in the y direction from the center of the grid specified as the center position of the person's body to the center position of the person's body. is data showing As shown in FIG. 5, the center position of the human body exists at a certain position within one grid.
- Size data is data that indicates the vertical and horizontal lengths of a rectangular area that contains a person's body.
- the "relative position of keypoint” data is data that indicates the position in the image of each of a plurality of keypoints. Specifically, it shows the relative positional relationship between each of a plurality of key points and the center of the grid where the center position of the body is located. Although two keypoint positions are shown for each person in FIGS. 4 and 5, the number of keypoints can be three or more.
- FIG. 6 shows an example of "likelihood of position of keypoint a", “likelihood of position of keypoint b", and "correction amount of keypoint position" in the plurality of data shown in FIG. show.
- FIG. 7 shows a diagram in which explanations showing the concept of each data in FIG. 6 are added to the original image of the data in FIG.
- “Likelihood of keypoint position” is data indicating the likelihood of the position of each of a plurality of keypoints in the image. For example, each keypoint is detected in the image based on the feature amount of the appearance of each of a plurality of keypoints, and data indicating the likelihood of the position of each keypoint is output based on the detection result. As shown, the data is output for each keypoint. The data indicates the likelihood that each keypoint is located in each of a plurality of grids obtained by dividing the image. Note that the number of grids shown is merely an example. When an image including a plurality of persons is input as shown in FIG. 7, the likelihood that the keypoints of each of the plurality of persons are located is indicated. According to the data shown in FIG.
- the fourth grid from the left and the first grid from the bottom” and “the second grid from the right and the fourth grid from the top” are the grids where the key point a is located. identified as In addition, “the fourth lattice from the left and the fourth from the bottom” and “the second lattice from the right and the second lattice from the top” are specified as lattices where the key point b is located. Although the figure shows data for two keypoints, the number of keypoints can be three or more. Then, data as described above is output for each key point.
- the data of "correction amount of keypoint position” is the amount of movement in the x direction and the amount of movement in the y direction from the center of the grid where each of the plurality of keypoints is located to the position of each keypoint. is data showing As shown in FIG. 7, each keypoint resides at a position within one grid. The position of each keypoint in the image can be identified by using the likelihood of the position of each keypoint and the amount of correction of the position of each keypoint.
- Non-Patent Document 1 After outputting a plurality of data as described above from an input image, the value of a predetermined loss function is calculated based on the plurality of data and a correct label given in advance. By minimizing, the parameters of the estimation model are calculated (learned). Also, at the time of estimation, the position of each keypoint in the image is specified by two methods (relative position from the center position of the grid shown in FIG. 4, likelihood and correction amount shown in FIG. 6), For example, the result of integrating the positions calculated by each of the two methods is used as the position of each of a plurality of keypoints. Methods of integration include averaging, weighted averaging, selection of one of them, and the like.
- the technology of this embodiment will be described in comparison with the technology described in Non-Patent Document 1.
- FIG. 8 also in the technique of the present embodiment, when an image is input to the neural network, a plurality of data as shown is output.
- the neural network of this embodiment is composed of multiple layers that output multiple data as shown.
- the technique of the present embodiment includes "hidden information" data corresponding to each of a plurality of key points in the output data. It differs from the technique described in 1.
- FIG. 10 shows a diagram in which explanations indicating the concept of each data in FIG. 9 are added to the original image of the data in FIG.
- Keypoint hidden information data is data that indicates whether each keypoint is hidden in the image, that is, whether each keypoint is visible in the image.
- the state in which the keypoint is not visible in the image is the state in which the keypoint is located outside the image, and the state in which the keypoint is located in the image but is obscured by other objects (such as other people and other objects). including.
- the data is output for each keypoint.
- visible keypoints are assigned a value of "0”
- invisible keypoints are assigned a value of "1".
- the key point a of the person 1 positioned in the foreground is hidden behind other objects and cannot be seen. Therefore, if the trained neural network of the present embodiment is used, data to which "1" is added as hidden information for the key point a of person 1 is output as shown in FIG.
- the number of keypoints can be three or more. Then, data as described above is output for each key point.
- the "relative position of keypoint” data is data that indicates the position in the image of each of a plurality of keypoints.
- the data of the "relative position of the keypoint” in this embodiment includes the data of the keypoint which is shown to be visible by the data of the hidden information of the keypoint, and is not visible by the data of the hidden information of the keypoint. It differs from the technique described in Non-Patent Document 1 in that it does not include keypoint data indicating that. Others are the same concepts as the technology described in Non-Patent Document 1.
- the key point a (key point at the foot) of person 1 located in the foreground is hidden behind other objects and cannot be seen. Therefore, if the trained neural network of this embodiment is used, the relative position data of the keypoint a that does not include the relative position data of the keypoint a of the person 1 is output as shown in FIG. Become.
- the data of the relative position of the keypoint a shown in FIG. 9 includes only the data of the relative position of the keypoint a of the person 2 shown in FIG.
- FIG. 11 shows an example of "likelihood of position of keypoint a", “likelihood of position of keypoint b", and "correction amount of keypoint position" in the plurality of data shown in FIG. show.
- FIG. 12 shows a diagram in which an explanation showing the concept of each data in FIG. 11 is added to the original image of the data in FIG.
- the data of "Likelihood of Keypoint Position” has the same concept as the technology described in Non-Patent Document 1.
- the key point a of the person 1 positioned in the foreground is hidden behind other objects and cannot be seen. Therefore, if the trained neural network of this embodiment is used, the likelihood data of the position of the key point a of the person 1, which does not include the likelihood data of the position of the key point a, as shown in FIG. 11, is output. It will be done.
- the data of the likelihood of the position of the keypoint a shown in FIG. 11 includes only the data of the likelihood of the position of the keypoint a of the person 2 shown in FIG.
- the data of "keypoint position correction amount” has the same concept as the technology described in Non-Patent Document 1.
- the key point a (key point at the foot) of the person 1 positioned in the foreground is hidden behind other objects and cannot be seen. For this reason, if the trained neural network of this embodiment is used, as shown in FIG. It will be done.
- the technology of the present embodiment has at least the points of outputting hidden information data for each of a plurality of keypoints, and not outputting the data of the positions of keypoints indicated by the hidden information that they are not visible. , and is different from the technique described in Non-Patent Document 1.
- the technology of the present embodiment has these features that the technology described in Non-Patent Document 1 does not have, thereby realizing learning that excludes information on key points that are not visible in the image.
- FIG. 13 illustrates an example of a functional block diagram of the learning device 10.
- the learning device 10 has an acquisition unit 11 , a learning unit 12 and a storage unit 13 .
- the learning device 10 may not have the storage unit 13 as shown in the functional block diagram of FIG. 14 .
- an external device configured to be able to communicate with the learning device 10 includes the storage unit 13 .
- the acquisition unit 11 acquires learning data that links the teacher image and the correct label.
- a teacher image includes a person.
- a teacher image may include only one person or may include a plurality of persons.
- the correct label indicates at least whether each of a plurality of keypoints of the person's body is visible in the teacher image, and the position within the teacher image of the keypoint that is visible in the teacher image. Correct labels do not indicate positions in the training image of keypoints that are not visible in the training image.
- the correct label may include other information such as the position of the person and the size of the person.
- the correct label may be a new correct label obtained by processing the original correct label.
- the correct label may be a plurality of data shown in FIG. 8 processed from the position of the keypoint in the teacher image and the hidden information of the keypoint.
- the operator who creates the correct label may do the work of specifying only the visible keypoints in the image. Then, the operator does not have to perform troublesome work such as predicting the position in the image of the key point that is hidden behind other objects and specifying it in the image.
- the key point may be at least a part of joints, predetermined parts (eyes, nose, mouth, navel, etc.), and terminal parts of the body (head, feet, hands, etc.). Also, the key points may be other parts. There are various ways to define the number of keypoints and positions, and there are no particular restrictions.
- the acquisition unit 11 can acquire the learning data from the storage unit 13 .
- the learning unit 12 learns the estimation model based on the learning data.
- the storage unit 13 stores the estimation model.
- the estimation model is configured including the neural network described using FIG.
- the estimation model outputs multiple data shown in FIG.
- the plurality of data shown in FIG. 8 includes information indicating the position of each person, information indicating whether each of the plurality of key points of each person included in the processed image is visible in the processed image, and Information related to the position of each keypoint, etc., for calculating the position in the processed image of the visible keypoint.
- the information related to the position of each keypoint indicates the relative position of each keypoint, the likelihood of the position of each keypoint, the amount of correction of the position of each keypoint, and the like.
- an estimating unit for example, an estimating unit 21 described in the following embodiments performs predetermined arithmetic processing based on part of a plurality of data as described with reference to FIGS. 8 to 12.
- FIG. The estimator can estimate positions in the processed image of keypoints that are visible in the processed image.
- the estimating unit uses the position of each keypoint identified based on the likelihood of the position of the person (center position of the person) shown in FIG.
- the result of integrating the position and the position in the processed image of each keypoint specified based on the likelihood of the position of each keypoint and the amount of correction shown in FIG. Calculate as Methods of integration include, but are not limited to, averaging, weighted averaging, selection of one or the like.
- the learning unit 12 uses only information of key points that are shown to be visible in the hidden information of the learning data and the position information of the key points of the learning data, that is, the hidden information of the learning data and the position information of the key points of the learning data. Learning is performed without using information on keypoints that are indicated to be invisible in the keypoint location information. For example, when learning about the positions of keypoints, the learning unit 12 stores keypoint position information output from the estimation model during learning for positions on the grid indicating that the keypoints are visible in the learning data. , and the parameters of the estimation model are adjusted so as to minimize the error with the positional information of the keypoints in the training data (correct labels).
- the learning unit 12 for the likelihood data of the human position (central position), combines a map indicating the likelihood of the human position output from the estimation model during learning and the likelihood of the human position of the learning data. It learns to minimize the error with the indicated map.
- the learning unit 12 estimates the correction amount of the person's position, the size of the person, and the hidden information data of each key point only for the position on the grid indicating the position of the person in the learning data. Minimize the error between the amount of human position correction, person size, and hidden information of each keypoint output from the model and the amount of human position correction, person size, and hidden information of each keypoint in the training data. learn to do.
- the learning unit 12 checks that the relative position data of each keypoint is not hidden by the hidden information of each keypoint of the learning data in the position on the grid indicating the position of the person of the learning data. Learning is performed so as to minimize the error between the relative position of each keypoint output from the estimation model during learning and the relative position of each keypoint in the training data only for positions on the grid shown.
- the learning unit 12 uses a map indicating the likelihood of each keypoint position output from the estimation model during learning and the position It learns to minimize the error with the map showing the likelihood of .
- the learning unit 12 obtains only the position on the grid indicating the position of each keypoint in the learning data, and each key output from the estimation model during learning. Learning is performed so as to minimize the error between the correction amount of the position of the point and the correction amount of the position of each key point of the learning data. Since the likelihood of the position of each keypoint in the training data and the amount of correction of the position of the keypoint in the training data are shown only for the visible keypoints, it is natural to learn only with the visible keypoints. Become.
- the learning unit 12 assigns keypoints output from the estimation model during learning to positions on the grid indicating that the keypoints are visible in the learning data.
- the parameters of the estimation model are adjusted so as to minimize the error between the location information and the location information of the keypoints in the learning data (correct labels).
- the learning device 10 acquires learning data in which the teacher image and the correct label are linked.
- the processing is implemented by the acquisition unit 11 .
- the details of the processing executed by the acquisition unit 11 are as described above.
- the learning device 10 learns the estimation model using the learning data acquired in S10.
- the processing is implemented by the learning unit 12 .
- the details of the processing executed by the learning unit 12 are as described above.
- the learning device 10 repeats the loop of S10 and S11 until the end condition is met.
- a termination condition is defined using, for example, the value of a loss function.
- Each functional unit of the learning device 10 includes a CPU (Central Processing Unit) of any computer, a memory, a program loaded into the memory, a storage unit such as a hard disk for storing the program (stored in advance from the stage of shipping the device). It can also store programs downloaded from storage media such as CDs (Compact Discs) and servers on the Internet, etc.), and is realized by any combination of hardware and software centered on the interface for network connection. be. It should be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.
- FIG. 16 is a block diagram illustrating the hardware configuration of the learning device 10.
- learning device 10 has processor 1A, memory 2A, input/output interface 3A, peripheral circuit 4A, and bus 5A.
- the peripheral circuit 4A includes various modules.
- the learning device 10 may not have the peripheral circuit 4A.
- the learning device 10 may be composed of a plurality of physically and/or logically separated devices. In this case, each of the plurality of devices can have the above hardware configuration.
- the bus 5A is a data transmission path for mutually transmitting and receiving data between the processor 1A, the memory 2A, the peripheral circuit 4A and the input/output interface 3A.
- the processor 1A is, for example, an arithmetic processing device such as a CPU or a GPU (Graphics Processing Unit).
- the memory 2A is, for example, RAM (Random Access Memory) or ROM (Read Only Memory).
- the input/output interface 3A includes an interface for acquiring information from an input device, an external device, an external server, an external sensor, a camera, etc., an interface for outputting information to an output device, an external device, an external server, etc. .
- Input devices are, for example, keyboards, mice, microphones, physical buttons, touch panels, and the like.
- the output device is, for example, a display, speaker, printer, mailer, or the like.
- the processor 1A can issue commands to each module and perform calculations based on the calculation results thereof.
- the estimation model learned by the learning device 10 of the present embodiment is characterized by outputting hidden information data indicating whether or not each of a plurality of key points is visible in the image.
- the estimation model further has a feature of not outputting position information of key points that are shown to be invisible in the hidden information data.
- the learning device 10 has a feature that, when learning the estimation model, it is sufficient to be given only position information where the keypoint is visible in the image with respect to the learning data of the position information of the keypoint.
- the learning device 10 optimizes the parameters of the estimation model based on the result output from the estimation model and the correct label (learning data). According to such a learning device 10, it is possible to correctly learn by excluding information on key points that are not visible in the image. As a result, it is possible to reduce the problem of degraded estimation accuracy when training data includes images in which some keypoints are not visible.
- the estimation device of the present embodiment uses the estimation model learned by the learning device of the first embodiment to estimate the position in the image of each of a plurality of key points of each person included in the image. A detailed description will be given below.
- the estimation device 20 has an estimation unit 21 and a storage unit 22 .
- the estimation device 20 may not have the storage unit 22 .
- an external device configured to communicate with the estimating device 20 includes the storage unit 22 .
- the estimation unit 21 acquires an arbitrary image as a processed image.
- the estimation unit 21 may acquire an image captured by a surveillance camera as the processed image.
- the estimating unit 21 uses the estimation model learned by the learning device 10 to estimate and output the position in the processed image of each of the plurality of key points of each person included in the processed image.
- the estimation model outputs the data described with reference to FIGS. 8 to 11 when an image is input.
- the estimating unit 21 further performs estimation processing using the data output by the estimation model, thereby estimating the positions in the processed image of each of the plurality of key points of each person included in the processed image.
- output as A learned estimation model is stored in the storage unit 22 .
- Output of the estimation result is realized using all means such as a display, projection device, printer, and e-mail.
- the estimation unit 21 may output the data output by the estimation model as it is as the estimation result.
- the estimating unit 21 uses the estimation model to estimate whether or not each of a plurality of key points of each person included in the processed image is visible in the processed image, and uses the result of the estimation to determine the It is characterized by estimating the position within the processed image of each of a plurality of keypoints for each person involved. An example of processing performed by the estimation unit 21 will be described below with reference to FIGS. 19 and 20. FIG.
- Step 1 Process the processed image with the estimation model to obtain a plurality of data as shown in FIGS.
- Step 2 A grid (P1 in FIG. 19) in which the central position of each person (P11 in FIG. 19) is located (included) is specified based on the likelihood data of the human position. Specifically, it identifies lattices whose likelihood is greater than or equal to a threshold.
- Step 3) Acquire the amount of correction (P10 in FIG. 19) corresponding to the grid position specified in (Step 2) from the data of the amount of correction of the person's position.
- Step 4 Based on the position of the grid specified in (Step 2) (including the center position of the grid) and the amount of correction acquired in (Step 3), for each person included in the processed image, Identify the central position of the person (P11 in FIG. 19). Thereby, the central position of each person's body is specified.
- Step 5 Acquire the size of the person corresponding to the grid position specified in (Step 2) from the size data. This identifies the size of each person.
- Step 6) Acquire data corresponding to the position of the grid specified in (Step 2) from the hidden information data of each keypoint. As a result, the invisible information and the visible information at each key point of each person are specified.
- Step 7) Only the data (P12 in FIG. 19) corresponding to the position of the grid specified in (Step 6) where the keypoint is visible is acquired from the relative position data of each keypoint. This obtains only the relative position of each person at each visible keypoint.
- Step 8 Using the grid center identified in (Step 2) and the data obtained in (Step 7), determine the position (P2 in FIG. 19) in the processed image of each visible keypoint. Identify. This locates each person in the processed image at each visible keypoint.
- Step 9) Identify the lattice (P4 in FIG. 20) where each keypoint (P5 in FIG. 20) is located (included) based on the likelihood data of the keypoint positions. Specifically, it identifies lattices whose likelihood is greater than or equal to a threshold.
- Step 10) Acquire the correction amount (P6 in FIG. 20) corresponding to the grid position specified in (Step 9) from the correction amount data of the keypoint position.
- Step 11 Based on the position of the grid specified in (Step 9) (including the center position of the grid) and the amount of correction acquired in (Step 10), each keypoint in the processed image is Identify the position (P5 in FIG. 20).
- Step 12 For the position of the keypoint in the processed image of each person obtained in (Step 8) and the position of the keypoint in the processed image obtained in (Step 11), the same type of keypoint is used to calculate the distance are close to each other (e.g., those whose distance is less than a threshold value), and by integrating the associated positions, the positions of the key points in the processed image of each person obtained in step 8 are corrected. Compute the position in the processed image of each of a plurality of visible keypoints of each person in the image. Methods of integration include averaging, weighted averaging, selection of one of them, and the like.
- step 8 Since the position in the processed image of each keypoint calculated in (step 12) and the position of the grid indicating the position of the person are associated in (step 8), the calculated processed image of each keypoint It will be known which person the position corresponds to. In addition, in (step 7), only the data corresponding to the positions of the grid identified as visible in (step 6) were acquired, but the data including the positions of the grid identified as not visible were acquired. may be obtained.
- the estimation unit 21 may or may not estimate the positions of each of the plurality of invisible key points of each person in the processed image. In the case of not estimating, since the types of invisible keypoints are known for each person, it is possible to output the information (types of invisible keypoints) for each person. Furthermore, as shown in P40 in FIG. 24, it is also possible to express the types of invisible key points for each person as an object modeled on a person and display them for each person.
- the estimating unit 21 identifies visible keypoints that are directly connected to invisible keypoints based on a predefined connection relation of a plurality of keypoints for a person. Then, the estimating unit 21 estimates the position of the invisible keypoint in the processed image based on the position in the processed image of the visible keypoint directly connected to the invisible keypoint.
- the details vary and can be implemented using any technology.
- the estimated invisible keypoint position in the processed image can also be displayed as a circular range centered at that position.
- the position in the processed image of the estimated non-visible keypoint is actually an approximate position, so it is a display method that can represent it.
- the range of the circle may be calculated based on the spread of the positions of the keypoints corresponding to the persons to whom the keypoints belong, or may be fixed.
- the estimated position of the visible keypoint in the processed image is accurate, it can be displayed by an object (point, figure, etc.) that can indicate the position with a single point.
- the estimation device 20 acquires the processed image. For example, an operator inputs the processed image to the estimating device 20 . Then, the estimation device 20 acquires the input processed image.
- the estimation device 20 uses the estimation model learned by the learning device 10 to estimate the position within the processed image of each of a plurality of key points of each person included in the processed image.
- the processing is implemented by the estimation unit 21 .
- the details of the processing executed by the estimation unit 21 are as described above.
- the estimation device 20 outputs the estimation result of S21.
- the estimating device 20 can use any means such as a display, projection device, printer, and e-mail.
- Each functional unit of the estimation device 20 includes a CPU of an arbitrary computer, a memory, a program loaded into the memory, a storage unit such as a hard disk that stores the program (programs stored in advance from the stage of shipping the device, Programs downloaded from a storage medium such as a CD or a server on the Internet can be stored), and are realized by an arbitrary combination of hardware and software centering on an interface for network connection. It should be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.
- FIG. 16 is a block diagram illustrating the hardware configuration of the estimation device 20.
- the estimation device 20 has a processor 1A, a memory 2A, an input/output interface 3A, a peripheral circuit 4A, and a bus 5A.
- the peripheral circuit 4A includes various modules.
- the estimating device 20 may not have the peripheral circuit 4A.
- the estimating device 20 may be composed of a plurality of physically and/or logically separated devices. In this case, each of the plurality of devices can have the above hardware configuration.
- each of the plurality of keypoints of each person included in the processed image is processed using an estimation model correctly learned by excluding information on keypoints that are not visible in the image.
- a position in the image can be estimated.
- Such an estimating device 20 improves the accuracy of the estimation.
- the estimating unit 21 estimates based on at least one of the number of keypoints estimated to be visible in the processed image and the number of keypoints estimated not to be visible in the processed image for each estimated person.
- Information indicating at least one of the extent to which the person's body is visible in the processed image and the extent to which the person's body is hidden in the processed image may be calculated and output for each person.
- the estimating unit 21 calculates the ratio of (number of keypoints estimated to be visible in the processed image) to (total number of keypoints) for each estimated person. may be calculated as information indicating the extent to which the body of the person is visible.
- the estimating unit 21 calculates the ratio of (the number of keypoints estimated to be invisible in the processed image) to the (total number of keypoints) estimated for each person in the processed image for each estimated person. may be calculated as information indicating the extent to which the body is hidden.
- the information (or ratio) indicating the degree to which the body of each person is visible/not visible, as shown above, is the center position of each person or the designated key point, as shown in P30 in FIG. It may be displayed for each person based on location.
- the information (or the ratio) may be converted into information indicating no hidden/hidden for each person based on a specified threshold value, and the converted information may be displayed in the same manner as above (Fig. 23 P31).
- a color/pattern may be assigned to the information of non-hidden/hidden for each person, and the key points for each person may be displayed in that color as shown in P32 of FIG.
- Second modification The estimation model of the embodiment described above learned and estimated whether each of a plurality of keypoints of each person is visible in the processed image.
- the estimation model may further learn and estimate the hidden state of each keypoint that is not visible in the processed image.
- the correct label of the learning data further indicates the hidden state of each keypoint that is not visible in the teacher image.
- the state of hidden keypoints is, for example, the state of being located outside the image, the state of being located in the image but hidden by another object, the state of being located in the image but being hidden by its own part. , can be included.
- a value of "0" is assigned to visible keypoints
- a value of "1" is assigned to invisible keypoints.
- visible keypoints are assigned a value of “0”
- invisible keypoints located outside the image are assigned a value of “1”
- keypoints located outside the image are assigned a value of “1”.
- a value of 2 is given to keypoints that are located in the image but are not visible because they are hidden by other objects, and keypoints that are located in the image but are not visible because they are hidden by their own parts. may be given a value of "3".
- One or more of the hidden information indicate keypoints that are not visible.
- the estimation model of the embodiment described above learned and estimated whether each of a plurality of keypoints of each person is visible in the processed image.
- the estimating model uses the state of the overlapping of each keypoint that is not visible in the processed image as the object that hides the keypoint. As a number, it may be further learned and estimated.
- the overlapping state of each keypoint that is not visible in the teacher image is further indicated as the number of objects hiding the keypoint.
- a value of "0" is assigned to visible keypoints
- a value of "1" is assigned to invisible keypoints.
- a visible keypoint is given a value of "0”
- an invisible keypoint is given a value corresponding to the number M of objects hiding the keypoint, for example, " M” value is given.
- One or more of the hidden information indicate keypoints that are not visible.
- the maximum value is calculated for each person, and the calculated maximum value is calculated as the state of overlapping for each person.
- the calculated overlapping state (or maximum value) for each person may be displayed for each person based on the central position of each person or the position of the designated key point, as shown in P35 of FIG. .
- a color/pattern may be assigned to the overlapping state of each person, and the key points for each person may be displayed in that color, as indicated by P36 in FIG.
- the depth information shown here indicates the order of distance from the camera.
- the third modified example can also be combined with the second modified example.
- acquisition means "acquisition of data stored in another device or storage medium by one's own device based on user input or program instructions (active acquisition)", for example, receiving by requesting or querying other devices, accessing and reading other devices or storage media, etc., and based on user input or program instructions, " Inputting data output from other devices to one's own device (passive acquisition), for example, receiving data distributed (or transmitted, push notification, etc.), and received data or information Selecting and acquiring from among, and “editing data (text conversion, rearranging data, extracting some data, changing file format, etc.) to generate new data, and/or "obtaining data”.
- editing data text conversion, rearranging data, extracting some data, changing file format, etc.
- a teacher image including a person, a correct label indicating the position of each person, a correct label indicating whether or not each of a plurality of key points on the body of each person is visible in the teacher image, and a plurality of key points.
- Acquisition means for acquiring learning data associated with a correct label indicating the position in the teacher image of the key point visible in the teacher image inside; Information indicating the position of each person, information indicating whether or not each of the plurality of key points of each person included in the processed image is visible in the processed image, based on the learning data; learning means for learning an estimation model for estimating information relating to the location of each keypoint for computing the location of the keypoint in the processed image; A learning device having 2. 2. The learning device according to claim 1, wherein the correct label does not indicate the position in the teacher image of the keypoint that is not visible in the teacher image. 3.
- the learning means is information indicating the position of each person, information indicating whether each of the plurality of key points of each person included in the processed image is visible in the processed image, and a plurality of the estimating information about the location of each keypoint for calculating the location of each keypoint in the training image; Adjusting the parameters of the estimation model so as to minimize the difference between the estimation result of the information indicating the position of each person and the information indicating the position of each person indicated by the correct label; an estimation result of information indicating whether or not each of the plurality of keypoints of each person included in the processed image is visible in the processed image, and each of the plurality of keypoints of the body of each person indicated by the correct label; Adjust the parameters of the estimation model so as to minimize the difference from the information indicating whether or not it is visible in the teacher image, an estimated result of information about the position of each of the keypoints for calculating the position of each of the plurality of keypoints in the training image; minimizing the difference between the information about the position of each keypoint obtained from the position in
- the learning device which adjusts the parameters of the estimation model so that 4. the correct label further indicates the state of each invisible keypoint for each person in the teacher image; 4.
- a learning device according to any one of claims 1 to 3, wherein the estimation model further estimates the state of each of the invisible keypoints for each person in the processed image. 5.
- the state includes a state of being located outside the image, a state of being located within the image but hidden by another object, and a state of being located within the image but being hidden by its own part. 6.
- the states indicate the number of objects hiding the keypoints that are not visible in the teacher image or the processed image. 7.
- An estimating device having estimating means for estimating the position of each of a plurality of key points of each person contained in a processed image within the processed image, using the estimating model learned by the learning device according to any one of 1 to 6. .
- the estimation means uses the estimation model to estimate whether or not each of the plurality of key points of each person included in the processed image is visible in the processed image, and uses the result of the estimation to determine the 8.
- the estimating means uses the estimated information as to whether or not each of the plurality of keypoints of each person included in the processed image is visible in the processed image, and determines the types of invisible keypoints for each person. or represent the types of the invisible keypoints in human-like objects and display them for each person. 10.
- the estimating means uses the estimated information as to whether or not each of the plurality of key points of each person included in the processed image is visible in the processed image to specify invisible key points, and preliminarily Identifying a visible keypoint directly connected to the identified invisible keypoint based on a connection relationship of a plurality of keypoints for a defined person; 10.
- An estimating device for estimating the position within the processed image of the identified non-visible keypoint based on the position.
- the estimation means is based on at least one of the number of keypoints estimated to be visible in the processed image and the number of keypoints estimated to be invisible in the processed image for each estimated person.
- the method according to any one of 7 to 10, wherein information indicating at least one of the extent to which the person's body is visible in the processed image and the extent to which the person's body is hidden in the processed image is calculated for each person. estimation device. 12.
- the estimating means generates information indicating at least one of the calculated extent to which the person's body is visible and the extent to which the person's body is hidden based on the central position of each person or the specified key point position, 12.
- the estimation device according to 11 which displays for each person.
- the estimating means stores information indicating at least one of the calculated extent to which the person's body is visible and the extent to which the person's body is hidden based on a specified threshold. 12.
- the estimating device according to 11 which converts the information into information such as , and displays the converted information for each person based on the central position of each person or the designated key point position. 14.
- the estimating means calculates a maximum value for each person in the number of objects hiding each key point for each person, calculates the calculated maximum value as an overlapping state for each person, and calculates Display the overlapping state of each person based on the center position of each person or the position of a specified key point, or assign a color/pattern to the overlapping state of each person 8.
- the computer A teacher image including a person, a correct label indicating the position of each person, a correct label indicating whether or not each of a plurality of key points on the body of each person is visible in the teacher image, and a plurality of key points.
- Acquisition means for acquiring learning data associated with a correct label indicating the position in the teacher image of the key point visible in the teacher image in Information indicating the position of each person, information indicating whether or not each of the plurality of key points of each person included in the processed image is visible in the processed image, based on the learning data; learning means for learning an estimation model for estimating information relating to the position of each keypoint for calculating the position of the keypoint within the processed image;
- learning device 11 acquisition unit 12 learning unit 13 storage unit 20 estimation device 21 estimation unit 22 storage unit 1A processor 2A memory 3A input/output I/F 4A peripheral circuit 5A bus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/275,791 US20240119711A1 (en) | 2021-09-29 | 2021-09-29 | Learning apparatus, estimation apparatus, learning method, estimation method, and program and non-transitory storage medium |
| PCT/JP2021/035782 WO2023053249A1 (ja) | 2021-09-29 | 2021-09-29 | 学習装置、推定装置、学習方法、推定方法及びプログラム |
| JP2023550828A JP7480920B2 (ja) | 2021-09-29 | 2021-09-29 | 学習装置、推定装置、学習方法、推定方法及びプログラム |
| EP21959297.9A EP4276742A4 (en) | 2021-09-29 | 2021-09-29 | Learning device, estimation device, learning method, estimation method, and program |
| JP2024066640A JP7683784B2 (ja) | 2021-09-29 | 2024-04-17 | 情報処理装置、情報処理方法及びプログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2021/035782 WO2023053249A1 (ja) | 2021-09-29 | 2021-09-29 | 学習装置、推定装置、学習方法、推定方法及びプログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023053249A1 true WO2023053249A1 (ja) | 2023-04-06 |
Family
ID=85781547
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/035782 Ceased WO2023053249A1 (ja) | 2021-09-29 | 2021-09-29 | 学習装置、推定装置、学習方法、推定方法及びプログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240119711A1 (https=) |
| EP (1) | EP4276742A4 (https=) |
| JP (2) | JP7480920B2 (https=) |
| WO (1) | WO2023053249A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2024165569A (ja) * | 2023-05-17 | 2024-11-28 | 株式会社クボタ | 学習モデル生成方法、作業分析装置および作業分析プログラム |
| JP2024165568A (ja) * | 2023-05-17 | 2024-11-28 | 株式会社クボタ | 学習モデル生成方法、作業分析装置および作業分析プログラム |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP7840606B1 (ja) * | 2025-12-19 | 2026-04-06 | 株式会社アークス | 精子を追跡するための情報処理装置、方法、及びプログラム |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004295436A (ja) | 2003-03-26 | 2004-10-21 | Fujitsu Ltd | ドキュメント管理装置およびドキュメント管理プログラム |
| JP2020098603A (ja) * | 2018-12-18 | 2020-06-25 | 富士通株式会社 | 画像処理方法及び情報処理装置 |
| JP2020123105A (ja) * | 2019-01-30 | 2020-08-13 | セコム株式会社 | 学習装置、学習方法、学習プログラム、及び対象物認識装置 |
| JP2021033395A (ja) * | 2019-08-16 | 2021-03-01 | セコム株式会社 | 学習済みモデル、学習装置、学習方法、及び学習プログラム |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018058419A1 (zh) * | 2016-09-29 | 2018-04-05 | 中国科学院自动化研究所 | 二维图像人体关节点定位模型的构建方法及定位方法 |
| JP6831769B2 (ja) | 2017-11-13 | 2021-02-17 | 株式会社日立製作所 | 画像検索装置、画像検索方法、及び、それに用いる設定画面 |
| CN108229305B (zh) | 2017-11-21 | 2021-06-04 | 北京市商汤科技开发有限公司 | 用于确定目标对象的外接框的方法、装置和电子设备 |
| JP7263094B2 (ja) | 2019-04-22 | 2023-04-24 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
-
2021
- 2021-09-29 JP JP2023550828A patent/JP7480920B2/ja active Active
- 2021-09-29 EP EP21959297.9A patent/EP4276742A4/en active Pending
- 2021-09-29 US US18/275,791 patent/US20240119711A1/en active Pending
- 2021-09-29 WO PCT/JP2021/035782 patent/WO2023053249A1/ja not_active Ceased
-
2024
- 2024-04-17 JP JP2024066640A patent/JP7683784B2/ja active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004295436A (ja) | 2003-03-26 | 2004-10-21 | Fujitsu Ltd | ドキュメント管理装置およびドキュメント管理プログラム |
| JP2020098603A (ja) * | 2018-12-18 | 2020-06-25 | 富士通株式会社 | 画像処理方法及び情報処理装置 |
| JP2020123105A (ja) * | 2019-01-30 | 2020-08-13 | セコム株式会社 | 学習装置、学習方法、学習プログラム、及び対象物認識装置 |
| JP2021033395A (ja) * | 2019-08-16 | 2021-03-01 | セコム株式会社 | 学習済みモデル、学習装置、学習方法、及び学習プログラム |
Non-Patent Citations (2)
| Title |
|---|
| See also references of EP4276742A4 |
| XINGYI ZHOU ET AL., OBJECTS AS POINTS, 16 April 2019 (2019-04-16) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2024165569A (ja) * | 2023-05-17 | 2024-11-28 | 株式会社クボタ | 学習モデル生成方法、作業分析装置および作業分析プログラム |
| JP2024165568A (ja) * | 2023-05-17 | 2024-11-28 | 株式会社クボタ | 学習モデル生成方法、作業分析装置および作業分析プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4276742A4 (en) | 2024-04-24 |
| JPWO2023053249A1 (https=) | 2023-04-06 |
| EP4276742A1 (en) | 2023-11-15 |
| US20240119711A1 (en) | 2024-04-11 |
| JP7480920B2 (ja) | 2024-05-10 |
| JP2024083602A (ja) | 2024-06-21 |
| JP7683784B2 (ja) | 2025-05-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7683784B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| US9684827B2 (en) | Eye gaze tracking based upon adaptive homography mapping | |
| US10713471B2 (en) | System and method for simulating facial expression of virtual facial model | |
| CN107615310A (zh) | 信息处理设备 | |
| JP7586172B2 (ja) | 情報処理装置およびプログラム | |
| KR102274581B1 (ko) | 개인화된 hrtf 생성 방법 | |
| US20230054973A1 (en) | Information processing apparatus, information processing method, and information processing program | |
| US20220111869A1 (en) | Indoor scene understanding from single-perspective images | |
| US20230237777A1 (en) | Information processing apparatus, learning apparatus, image recognition apparatus, information processing method, learning method, image recognition method, and non-transitory-computer-readable storage medium | |
| CN114051632A (zh) | 人体与人手的关联方法、装置、设备及存储介质 | |
| US11580431B2 (en) | Methods for predicting likelihood of successful experimental synthesis of computer-generated materials by combining network analysis and machine learning | |
| US9792835B2 (en) | Proxemic interfaces for exploring imagery | |
| US11188787B1 (en) | End-to-end room layout estimation | |
| CN116528759A (zh) | 信息处理装置、信息处理方法和程序 | |
| KR102511762B1 (ko) | 증강 현실 매핑 시스템들 및 관련 방법들 | |
| KR20130067856A (ko) | 손가락 동작을 기반으로 하는 가상 악기 연주 장치 및 방법 | |
| JP2022131397A (ja) | 姿勢推定装置、学習装置、姿勢推定方法及びプログラム | |
| JP7687382B2 (ja) | 関節点検出装置、関節点検出方法、及びプログラム | |
| CN113191462A (zh) | 信息获取方法、图像处理方法、装置及电子设备 | |
| JP7852744B2 (ja) | 学習装置、推定装置、学習方法、推定方法ならびにプログラム | |
| WO2023188160A1 (ja) | 入力支援装置、入力支援方法、及び非一時的なコンピュータ可読媒体 | |
| CN109661639A (zh) | 输出控制设备、输出控制方法和程序 | |
| WO2024128124A1 (ja) | 学習装置、推定装置、学習方法、推定方法ならびに記録媒体 | |
| US12014008B2 (en) | Information processing apparatus, information processing method, and program | |
| RU2788482C2 (ru) | Тренировка модели нейронной сети |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21959297 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023550828 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18275791 Country of ref document: US |
|
| ENP | Entry into the national phase |
Ref document number: 2021959297 Country of ref document: EP Effective date: 20230810 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |