WO2023112198A1 - 情報処理装置、情報処理方法、及び、記録媒体 - Google Patents

情報処理装置、情報処理方法、及び、記録媒体 Download PDF

Info

Publication number
WO2023112198A1
WO2023112198A1 PCT/JP2021/046251 JP2021046251W WO2023112198A1 WO 2023112198 A1 WO2023112198 A1 WO 2023112198A1 JP 2021046251 W JP2021046251 W JP 2021046251W WO 2023112198 A1 WO2023112198 A1 WO 2023112198A1
Authority
WO
WIPO (PCT)
Prior art keywords
faces
persons
class
information processing
face images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/046251
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
博志 橋本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US18/718,834 priority Critical patent/US20250054336A1/en
Priority to PCT/JP2021/046251 priority patent/WO2023112198A1/ja
Priority to JP2023567383A priority patent/JP7718506B2/ja
Publication of WO2023112198A1 publication Critical patent/WO2023112198A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/50Maintenance of biometric data or enrolment thereof

Definitions

  • This disclosure relates to the technical field of information processing devices, information processing methods, and recording media.
  • Japanese Patent Application Laid-Open No. 2002-200001 discloses a technique for maintaining a certain level of matching performance and security level by determining whether or not a face matching dictionary contains similar face patterns.
  • Japanese Patent Application Laid-Open No. 2002-200001 discloses a technique for performing stable authentication even when similar persons are registered.
  • the correspondence relationship between the registered face information of the registered person and the complementary person is stored, and the input face information representing the features of the face included in the face area extracted from the input image is stored as each of the registered face information of the plurality of registered persons. Then, among multiple registered persons, a registered person whose registered face information is similar to the input face information is specified as an authentication candidate person, and in a plurality of input images taken at different times, the registered person or the registered person is associated with the registered person. Based on the number of times that any of the complementary persons identified as authentication candidate persons is specified, it is determined that the faces appearing in the plurality of input images are the faces of the registered persons, and the faces are similar to each other.
  • Japanese Patent Application Laid-Open No. 2002-300000 discloses a technique for highly accurately authenticating a person even if the person is registered.
  • the object of this disclosure is to provide an information processing device, an information processing method, and a recording medium aimed at improving the technology described in the prior art document.
  • a first aspect of an information processing apparatus includes an acquisition unit that acquires a determination target image in which a plurality of faces are captured; a determining means for determining whether the two faces are similar to each other beyond a predetermined level, two face images each including the two faces and two persons corresponding to the two face images are not the same person. and storage means for storing in association with a third party label indicating that the
  • a second aspect of the information processing apparatus includes: a plurality of face images in which the faces of a plurality of persons whose faces are more than predetermined are captured; and a correct class to which the plurality of persons commonly belongs among a plurality of classes
  • Acquisition means for acquiring a data set containing label information relating to, extraction means for extracting the facial feature amount of each of the plurality of persons based on the plurality of face images, and based on the feature amount, class identification means for generating class identification information regarding an estimated class to which the plurality of persons commonly belong among the plurality of classes; and operating characteristics of the extraction means based on the label information and the class identification information.
  • learning means for performing machine learning to set.
  • a first aspect of the information processing method acquires a determination target image in which a plurality of faces are captured, and determines whether two faces out of the plurality of faces captured in the determination target image are similar to each other at a predetermined level or more. and, if the two faces are similar to a predetermined degree or more, two face images each containing the two faces and another person indicating that the two persons corresponding to the two face images are not the same person. stored in association with the label.
  • a second aspect of the information processing method includes: a plurality of face images in which the faces of a plurality of persons whose faces are more than predetermined are captured; and a correct class to which the plurality of persons commonly belongs among a plurality of classes obtain a data set containing label information about and based on the plurality of face images, extract the feature amount of each of the faces of the plurality of persons, based on the feature amount, out of the plurality of classes generates class identification information related to an estimated class to which the plurality of persons belong in common, and performs machine learning for setting operating characteristics of the extraction means based on the label information and the class identification information.
  • a computer acquires a determination target image in which a plurality of faces are captured, and two faces out of the plurality of faces captured in the determination target image are similar to each other at a predetermined level or more. If the two faces are similar to each other more than a predetermined amount, it is determined that the two face images each including the two faces and the two persons corresponding to the two face images are not the same person.
  • a computer program is recorded for executing an information processing method for storing in association with the indicated other person's label.
  • a computer stores, in a computer, a plurality of face images in which the faces of a plurality of persons whose faces are more similar than a predetermined number are captured, and a plurality of classes to which the plurality of persons belong in common. obtaining a data set including label information about the correct class; extracting the face feature amount of each of the plurality of persons based on the plurality of face images; extracting the plurality of classes based on the feature amount; Information processing for performing machine learning to generate class identification information about an estimated class to which the plurality of persons commonly belong, and to set operating characteristics of the extraction means based on the label information and the class identification information A computer program is recorded for carrying out the method.
  • FIG. 1 is a block diagram showing the configuration of an information processing apparatus according to the first embodiment.
  • FIG. 2 is a schematic diagram of annotation operation in the first embodiment.
  • FIG. 3 is a flowchart showing the flow of annotation operations performed by the information processing apparatus according to the first embodiment.
  • FIG. 4 is a block diagram showing the configuration of an information processing apparatus according to the second embodiment.
  • FIG. 5 is a conceptual diagram of learning data used in the second embodiment.
  • FIG. 6 is a conceptual diagram of a loss function in the second embodiment.
  • FIG. 7 is a flow chart showing the flow of the learning operation performed by the information processing apparatus according to the second embodiment.
  • FIG. 1 is a block diagram showing the configuration of an information processing device 1 according to the first embodiment.
  • the information processing device 1 includes an arithmetic device 11 and a storage device 12 . Furthermore, the information processing device 1 may include a communication device 13 , an input device 14 and an output device 15 . However, the information processing device 1 does not have to include at least one of the communication device 13 , the input device 14 and the output device 15 . Arithmetic device 11 , storage device 12 , communication device 13 , input device 14 , and output device 15 may be connected via data bus 16 .
  • the computing device 11 includes, for example, at least one of a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), and an FPGA (Field Programmable Gate Array). Arithmetic device 11 reads a computer program. For example, arithmetic device 11 may read a computer program stored in storage device 12 . For example, the computing device 11 reads a computer program stored in a computer-readable and non-temporary recording medium by a recording medium reading device (not shown) provided in the information processing device 1 (for example, an input device 14 to be described later). can be read using The computing device 11 may acquire (that is, download) a computer program from a device (not shown) arranged outside the biometric authentication device 2 via the communication device 13 (or other communication device). may be read).
  • a recording medium reading device not shown
  • the computing device 11 may acquire (that is, download) a computer program from a device (not shown) arranged outside the biometric authentication device 2 via the communication device 13 (or other communication device). may be read).
  • Arithmetic device 11 executes the read computer program. As a result, logical functional blocks for executing the operations to be performed by the information processing device 1 are realized in the arithmetic device 11 .
  • the arithmetic device 11 can function as a controller for realizing logical functional blocks for executing the operations (in other words, processing) that the information processing device 1 should perform.
  • FIG. 1 shows an example of logical functional blocks implemented within the arithmetic device 11 for executing information processing operations.
  • the computing device 11 includes an acquisition unit 111, which is a specific example of "acquisition means,” a face extraction unit 112, a feature amount extraction unit 113, and a specific example of “determination means.” and a storage control unit 115, which is a specific example of "storage means". At least one of the face extraction unit 112 and the feature amount extraction unit 113 may not be implemented in the computing device 11 .
  • the computing device 11 does not have to include the face extraction unit 112 .
  • the storage device 12 can store desired data.
  • the storage device 12 may temporarily store computer programs executed by the arithmetic device 11 .
  • the storage device 12 may temporarily store data temporarily used by the arithmetic device 11 while the arithmetic device 11 is executing a computer program.
  • the storage device 12 may store data that the information processing device 1 saves for a long period of time.
  • the storage device 12 may include at least one of RAM (Random Access Memory), ROM (Read Only Memory), hard disk device, magneto-optical disk device, SSD (Solid State Drive), and disk array device. good. That is, storage device 12 may include non-transitory recording media.
  • the storage device 12 may store sample data SD used by the information processing device 1 for information processing operations. However, the storage device 12 does not have to store the sample data SD. If the storage device 12 does not store the sample data SD, the communication device 13 may acquire the sample data SD from a device external to the information processing device 1 , or the input device 14 may acquire the sample data SD from the information processing device 1 . Input of sample data SD from the outside may be accepted. Further, the storage device 12 may store face image pairs IP generated by the information processing operation of the information processing device 1 .
  • the information processing apparatus 1 may use an image including a face as the sample data SD.
  • the information processing device 1 may use the sample data SD to generate a data set used for machine learning of the face recognition engine.
  • a data set used for machine learning preferably contains a large amount of data, such as 10,000 or more, so it is preferable that a large amount of sample data SD can be collected.
  • the communication device 13 can communicate with devices external to the information processing device 1 via a communication network (not shown).
  • the input device 14 is a device that receives input of information to the information processing device 1 from the outside of the information processing device 1 .
  • the input device 14 may include an operating device (for example, at least one of a keyboard, a mouse, and a touch panel) that can be operated by an operator of the information processing device 1 .
  • the input device 14 may include a reading device capable of reading information recorded as data on a recording medium that can be externally attached to the information processing device 1 .
  • the output device 15 is a device that outputs information to the outside of the information processing device 1 .
  • the output device 15 may output information as an image. That is, the output device 15 may include a display device (so-called display) capable of displaying an image showing information to be output.
  • the output device 15 may output information as voice.
  • the output device 15 may include an audio device capable of outputting audio (so-called speaker).
  • the output device 15 may output information on paper.
  • the output device 15 may include a printing device (so-called printer) capable of printing desired information on paper. [1-2: Information processing operation performed by information processing apparatus 1]
  • the information processing operation performed by the information processing apparatus 1 according to the first embodiment may be an annotation operation of labeling a face image. More specifically, the information processing operation performed by the information processing apparatus 1 according to the first embodiment involves processing two face images each including two faces that are more than predetermined similar to each other, and adding two persons corresponding to the two face images. may be an annotation operation that associates a different person label indicating that the person is not the same person.
  • the two face images each including two faces that are more than predetermined similar may include a first face image including the first face and a second face image including the second face.
  • FIG. 2 is a diagram showing an outline of annotation operation in the first embodiment.
  • the plurality of persons when a plurality of persons (in FIG. 2(a), three persons, person A, person B, and person C) appear in one image, the plurality of persons can be considered the same person.
  • three persons, person A, person B, and person C are very similar, three similar faces are lined up, and these three faces are considered to be the faces of non-identical persons. can judge.
  • the information processing apparatus 1 determines that even if the faces are the same, if they appear in one image, they are not the same person. Then, the information processing apparatus 1 attaches to each person a “other person label” indicating a different person with a similar face. For example, it is difficult to distinguish very similar faces such as twins. This is also difficult for machines such as neural networks, and construction of a machine that can distinguish very similar faces is required.
  • the annotation operation in the first embodiment can generate learning data that can be used for machine learning of similar face identification.
  • the three persons, person A, person B, and person C shown in FIG. It is often difficult for a person who is not so close to person A, person B, and person C to distinguish person A, person B, and person C from each other.
  • the information processing apparatus 1 generates face image pairs IP in order to generate learning data useful for machine learning that can accurately identify faces even when faces are difficult to distinguish. good.
  • the information processing device 1 may process each pair of all persons appearing in the image. As shown in FIG. 2(a), when three persons are included in the image, the information processing apparatus 1 performs processing for the pair of person A and person B shown in FIG. 2(b-1). 2(b-2) for the pair of person B and person C, and processing for the pair of person C and person A shown in FIG. 2(b-3).
  • the information processing apparatus 1 indicates in each of the image of person A and the image of person B that person A and person B are not the same person. "Other person label 1" is attached. Also in the process shown in FIG. 2B-2, the information processing apparatus 1 indicates in each of the image of person B and the image of person C that person B and person C are not the same person. "Other person label 2" is attached. Also in the process shown in FIG. 2B-3, the information processing apparatus 1 indicates in each of the image of person C and the image of person A that person B and person C are not the same person. "Other person label 3" is attached.
  • FIG. 3 is a flowchart showing the flow of annotation operations performed by the information processing apparatus 1 according to the first embodiment.
  • the acquisition unit 111 acquires one sample data SD as a determination target image (step S11).
  • the acquisition unit 111 determines whether the sample data SD is a composite image obtained by combining a plurality of images (step S12). If the sample data SD is not a synthesized image obtained by synthesizing a plurality of images (step S12: No), the face extractor 112 extracts a face area from the sample data SD (step S13).
  • the face extraction unit 112 determines whether two or more faces are reflected in the sample data SD (step S14).
  • the face extraction unit 112 may determine whether the number of face regions extracted in step S13 is two or more. If two or more faces appear in the sample data SD (step S14: Yes), the face extraction unit 112 selects a pair of two faces from the two or more faces (step S15).
  • the feature quantity extraction unit 113 extracts the feature quantity of each face included in the selected pair (step S16).
  • the determination unit 114 calculates the similarity of the feature amount of each face included in the selected pair (step S17). Based on the calculated degree of similarity, the determining unit 114 determines whether the two faces included in the selected pair are more than predetermined similar (step S18).
  • the determination unit 114 may calculate cosine similarity, for example, as the similarity. In this case, the determining unit 114 may determine that the two faces are more than predetermined similar when the degree of similarity is greater than or equal to a predetermined threshold.
  • the storage control unit 115 stores two face images each including two faces and a face image corresponding to the two face images.
  • the two persons are stored in the storage device 12 in association with another person's label indicating that the two persons are not the same person (step S19).
  • the storage control unit 115 may store the first face image including the first face and the second face image including the second face in the storage device 12 in association with each other's label. .
  • the storage control unit 115 stores the image of person A and the image of person B (face image pair 1) as shown in FIG. , and the “other person label 1” indicating that they are not the same person, the face image pair IP0 with the other person label may be stored in the storage device 12 .
  • the storage control unit 115 determines that the image of person B and the image of person C (face image pair 2) are not the same as shown in FIG. 2B-2.
  • the storage device 12 may store the face image pair IP0 with the other person's label associated with the "other person label 2" indicating that the person is a person. (3) As for the pair of person A and person B, the storage control unit 115 determines that the image of person C and the image of person A (face image pair 3) are not the same as shown in FIG. 2B-3. The storage device 12 may store the face image pair IP0 with the other person's label associated with the "other person label 3" indicating that the person is a person.
  • step S20 The face extraction unit 112 determines whether or not there is a pair that has not yet been selected as a pair of two faces. If there is a pair that has not been selected yet (step S20: Yes), the process proceeds to step S15. If there is no pair that has not been selected yet (step S20: No), the operation for 1 sample data SD ends. If the sample data SD is a synthesized image obtained by synthesizing a plurality of images (step S12: Yes), the operation for one sample data SD ends.
  • the arithmetic unit 11 performs the processes of steps S15 to S19 on one sample data SD.
  • the arithmetic device 11 may perform the processing of steps S11 to S19 for each of the plurality of sample data SD.
  • the face image pair IP0 with the other person's label generated by the information processing apparatus 1 in the first embodiment may be used to build the feature quantity extraction model EM1.
  • the feature quantity extraction model EM1 may be a model for identifying faces of non-identical persons as faces of non-identical persons, and identifying faces of the same person as faces of the same person.
  • a person-labeled face image pair IP1 may also be prepared in which two different images (face image pair) of the same person are associated with a "personal label" indicating the same person. Then, a face image pair IP including both the face image pair IP0 and the face image pair IP1 may be prepared, and the face image pair IP may be used as learning data TD for constructing the feature quantity extraction model EM1.
  • the feature quantity extraction model EM1 determines that the face image pairs associated with the other person's label are the face images of the non-identical person, and the face image pair associated with the person's own label It may be a model that is determined to be a face image. More specifically, when the face image pair IP is input, the feature quantity extraction model EM1 extracts each feature quantity with a shared weight network, and the face image pair is determined by the distance or similarity of each feature quantity. It may be determined whether the face images are of a different person or of the same person. In this case, the feature quantity extraction model EM1 learns to minimize the distance or maximize the similarity when the face image pair IP1 is input, and maximizes the distance when the face image pair IP0 is input.
  • the feature quantity extraction model EM1 may be a model that compares the distance or similarity between sample data SD.
  • the wording "other person's label/self label” is used. may be used to express the same technical content.
  • the information processing apparatus 1 can attach a “other person label” indicating that the faces are not the same person to a pair of face images of similar faces that are known to be the faces of the same person. .
  • a “other person label” indicating that the faces are not the same person to a pair of face images of similar faces that are known to be the faces of the same person.
  • the determination unit 114 determines that the two faces are more than predetermined similar when the matching score between the feature amounts of the two faces is equal to or greater than a predetermined value.
  • the determining unit 114 can determine that two faces that are so similar that they are erroneously determined to be the same person are more similar than a predetermined amount.
  • the information processing apparatus 2 in the second embodiment prepares sample data SD in which the same label is assigned to the face images of persons belonging to the same group. Using this sample data SD, a feature quantity extraction model EM2 is constructed for extracting feature quantities so as to accurately determine to which group a face image belongs.
  • a "other person label” is assigned to face image pairs that are known to be non-identical persons.
  • the second embodiment it is difficult to accurately identify the owner of the face, and it is difficult to identify whether the face is of the same person or the face of a different person.
  • a “twin ID label” is given.
  • the “twin ID label” may be the name of a label assigned to the facial images of persons belonging to a group of persons whose faces are more than predetermined similar.
  • the "twin ID label” may be a label shared by a plurality of persons whose faces are more than predetermined similar.
  • information of "someone belonging to a group (not an individual)" may be used instead of "someone (individual)".
  • the information processing operation performed by the information processing apparatus 2 in the second embodiment may be a learning operation for identifying the faces of a plurality of persons whose faces are more similar than a predetermined number, such as multiple fetuses, as belonging to the same class. More specifically, in the information processing operation performed by the information processing apparatus 2 in the second embodiment, the faces of a plurality of persons whose faces are more similar than a predetermined number, such as multiple fetuses, belong to the same class. It may be a learning operation that sets the extraction operation characteristics.
  • the information processing apparatus 2 in the second embodiment includes a feature quantity extraction model EM2 for face authentication of a plurality of non-identical persons having faces that are difficult for others to distinguish, such as multiple fetuses such as twins. may be constructed.
  • EM2 feature quantity extraction model for face authentication of a plurality of non-identical persons having faces that are difficult for others to distinguish, such as multiple fetuses such as twins.
  • FIG. 4 is a block diagram showing the configuration of the information processing device 2 according to the second embodiment.
  • the information processing device 2 includes an arithmetic device 21 and a storage device 22 . Furthermore, the information processing device 2 may include a communication device 23 , an input device 24 and an output device 25 . However, the information processing device 2 does not have to include at least one of the communication device 23 , the input device 24 and the output device 25 . Arithmetic device 21 , storage device 22 , communication device 23 , input device 24 and output device 25 may be connected via data bus 26 .
  • the arithmetic device 21 performs the operations that the information processing device 2 should perform so that the arithmetic device 11 can function as a controller for realizing logical functional blocks for executing the operations that the information processing device 1 should perform. It can act as a controller to implement logical functional blocks for execution.
  • FIG. 4 shows an example of logical functional blocks implemented within the arithmetic unit 21 for executing information processing operations.
  • the arithmetic unit 21 includes an acquisition unit 211 as a specific example of the "acquisition means”, a feature quantity extraction unit 212 as a specific example of the “extraction means”, and a “class identification means”. and a learning unit 214, which is a specific example of the "learning means”. Details of the operations of the acquisition unit 211, the feature amount extraction unit 212, the class identification unit 213, and the learning unit 214 will be described with reference to FIGS. 5 to 7.
  • FIG. 4 shows an example of logical functional blocks implemented within the arithmetic unit 21 for executing information processing operations.
  • the arithmetic unit 21 includes an acquisition unit 211 as a specific example of the "acquisition means”, a feature quantity extraction unit 212 as a specific example of the “extraction means”, and a “class identification means”. and a learning unit 214, which is a specific example of the "learning means". Details of the
  • the storage device 22 can store desired data like the storage device 12 described above.
  • the storage device 22 may store learning data TD. However, the storage device 22 does not have to store the learning data TD.
  • the communication device 23 may acquire the learning data TD from a device external to the information processing device 2
  • the input device 24 may acquire the learning data TD from a device external to the information processing device 2 . You may receive the input of learning data TD from. Details of the learning data TD will be described with reference to FIG.
  • twin ID classes to be processed by the information processing apparatus 2 in the second embodiment.
  • the learning data TD used in the second embodiment includes data in which face images belonging to the same twin ID class are given the same "twin ID label".
  • Face images included in the same twin ID class may be, for example, face images of brothers and sisters such as twins, triplets, and quadruplets, or may be face images of others who are very similar to each other.
  • the number of owners of faces whose face images are included in the same twin ID class may be known.
  • FIG. 5 is a conceptual diagram of learning data TD used in the second embodiment.
  • the learning data TD includes data belonging to four types of twin ID classes: twin ID class CA, twin ID class CB, twin ID class CC, and twin ID class CD.
  • twin ID class CA contains facial images of multiple persons.
  • the twin ID class CA includes L face images 1a, 2a, 3a, 4a, .
  • "LA" is attached as an ID label.
  • Information is also attached that the number of persons whose faces belong to the twin ID class CA is KA, such as two. Also, it is unknown which of the KA persons is the person corresponding to each of the L face images.
  • the twin ID class CB includes M facial images of 1b, 2b, 3b, 4b, . is attached. Information is also attached that the number of people who have faces in the face images belonging to the twin ID class CB is KB, such as three. Also, it is unknown which of the KB people is the person corresponding to each of the M face images.
  • the twin ID class CC includes N facial images of 1c, 2c, 3c, 4c, . is attached. Information is also attached that the number of persons whose faces in the face images belong to the twin ID class CC are KC persons, such as two. Also, it is unknown which of the KC persons is the person corresponding to each of the N face images.
  • the twin ID class CD includes O facial images of 1d, 2d, 3d, 4d, . is attached. Information is also attached that the number of persons who have faces in the face images belonging to the twin ID class CD is KD, such as four. Also, it is unknown which of the KD persons corresponds to each of the O face images. [2-3: Outline of information processing operation performed by information processing device 2]
  • the information processing apparatus 2 performs machine learning for setting operation characteristics of an operation for extracting facial feature amounts based on the acquired label information and the generated class identification information.
  • the label information is information about a correct class to which a plurality of persons whose faces are more similar than a predetermined number belong in common among a plurality of classes.
  • the label information may indicate the correct class using the correct value of the probability that a plurality of persons commonly belong to each of a plurality of classes.
  • the class identification information is information regarding an estimated class to which a plurality of persons whose faces are more than predeterminedly similar among a plurality of classes commonly belong.
  • the class identification information may indicate an estimated class using probabilities that a plurality of persons commonly belong to each of a plurality of classes.
  • the feature quantity extraction model EM2 may extract common facial features of persons belonging to the same class, and the information processing apparatus 2 may accurately identify belonging to the same class. [2-3-1: Introduction of cross-entropy error]
  • the information processing device 2 in the second embodiment may perform machine learning based on the cross-entropy error calculated based on the label information and the class identification information to construct the feature quantity extraction model EM2.
  • the information processing apparatus 2 in the second embodiment may perform machine learning based on the cross-entropy error calculated using the cross-entropy-type loss function shown in [Equation 1] below, for example.
  • [Formula 1] The function shown in [Formula 1] above is a loss function based on label information and class identification information. yi indicates the correct class, and corresponds to the twin ID label (any of LA, LB, LC, and LD) in the case shown in FIG.
  • the cross-entropy loss function shown in [Equation 1] above has a margin m added to the correct class compared to a general cross-entropy loss function. That is, in the second embodiment, a cross-entropy loss function shown in the above [Equation 1], which adds a margin m to the correct class, may be employed in order to reduce the intra-class variance.
  • the feature quantity extraction model EM2 constructed through machine learning in this manner can extract feature quantities such that the feature quantities of face images belonging to the same twin ID class are close to each other.
  • the feature quantity extraction model EM2 may be a model that compares the distance or similarity between the sample data SD and the center.
  • a center may be a feature quantity representing a class.
  • a face image belonging to the twin ID class belongs to one class because it is difficult to distinguish which person's face it is.
  • a face image belonging to the twin ID class is a face image of one of a plurality of persons. You can think of it as a dataset containing difficult sample data. Therefore, in the information processing apparatus 2 according to the second embodiment, each of multiple classes may include multiple subclasses.
  • the twin ID class since the twin ID class includes face images of a known number of people, it is expected that there will be as many centers in the probability distribution as there are. Therefore, in the second embodiment, the number of subclasses included in each of the classes may be the same as the number of persons belonging to the class. That is, there is no need to prepare many subcenters. Also, since the number of subcenters can be suppressed, the amount of calculation can be reduced.
  • the feature quantity extraction model EM2 may be a model that compares the distance or similarity between the sample data SD and the subcenter.
  • a subcenter may be a feature quantity representing a subclass. [2-3-2-2: Concept of learning process when subcenter is introduced]
  • FIG. 6 is a conceptual diagram of the learning process by the information processing device 2 in the second embodiment.
  • the same number of subcenters as the number of owners of face images belonging to twin ID class j may be prepared.
  • FIG. 6 shows a case where the number of owners of face images belonging to twin ID class j is two, and two subcenters W j1 and W j2 are prepared.
  • KA subcenters corresponding to the twin ID class CA when learning the feature quantity extraction model EM2 using the data set shown in FIG. 5, KA subcenters corresponding to the twin ID class CA, KB subcenters corresponding to the twin ID class CB, twins There may be KC subcenters corresponding to ID class CC and KD subcenters corresponding to twin ID class CD.
  • the information processing device 2 in the second embodiment may construct the feature quantity extraction model EM2 so that each feature quantity is close to any subcenter.
  • a face image belonging to the twin ID class belongs to one class because it is difficult to distinguish between the two.
  • a certain twin ID class includes each of twins, since each feature amount is a feature amount extracted from the face of one of two persons, it can be expected to be distributed at two distribution centers. For this reason, when two subcenters are prepared, the feature values of one person's face are close to one subcenter, and the feature values of the other person are close to the other subcenter. It is possible to learn the extraction operation of
  • ⁇ i,j represented by the following [Formula 2] may be applied to ⁇ i,j of the above [Formula 1].
  • the parenthesis of the arkcos function in [Equation 2] above indicates a max processing for selecting any one of the plurality of prepared subcenters. That is, the learning unit 214 may calculate the cross entropy error of the above [Equation 2] by adopting the sub-center W jk having the largest inner product with the extracted feature value among the plurality of sub-centers W jk . .
  • the learning unit 214 may calculate the cross-entropy error of [Equation 2] above by adopting the W jk that maximizes cos ⁇ i,j and the W jk that minimizes ⁇ i,j .
  • the feature quantity extraction model EM2 may assign the sub-class of the subcenter selected by the max processing to the face image class from which the feature quantity is to be extracted. That is, the feature quantity extraction model EM2 can also assign classes during learning.
  • the class identification information is one of a plurality of subclasses in which each of a plurality of persons is included in one class as a probability that a plurality of persons commonly belong to one class among a plurality of classes.
  • the probabilities of belonging to one subclass may be used to indicate the putative class.
  • the class identification information is any one of the subclass feature values that each of the plurality of persons corresponds to the feature value most similar to the feature value extracted by the feature value extraction model EM2, out of the plurality of subclasses included in one class.
  • the probabilities of belonging to one subclass may be used to indicate the putative class.
  • twins belong to a twin ID class
  • two subcenters W 1 and W 2 may be provided.
  • the feature quantity extraction model EM2 extracts the feature quantity of one face image of the twin so as to approach the subcenter W1 , and extracts the feature quantity of the other face image of the twin so as to approach the subcenter W2. It may be machine-learned to extract. That is, in the second embodiment, a plurality of subcenters can capture characteristics of each of a plurality of persons belonging to the twin ID class.
  • an example of selecting the sub-center by max processing has been described, but the sub-center may be selected using other methods such as an attention mechanism. [2-4: Learning operation by information processing device 2]
  • FIG. 7 is a flow chart showing the flow of the learning operation performed by the information processing apparatus 2 according to the second embodiment.
  • the acquisition unit 211 obtains a plurality of face images in which the faces of a plurality of persons whose faces are similar to a predetermined degree or more are captured, and correct answers to which a plurality of persons in a plurality of classes belong in common.
  • a data set including label information about classes is obtained (step S21).
  • the acquisition unit 211 obtains a data set including L face images 1a, 2a, 3a, 4a, . , M face images 1b, 2b, 3b, 4b, . , .
  • a data set containing label information 'LD' for 'CD' may be obtained.
  • the label information may indicate the correct class using the correct value of the probability that a plurality of persons commonly belong to each of a plurality of classes.
  • the feature amount extraction unit 212 extracts the feature amount of each of the faces of the plurality of persons based on the plurality of face images (step S22).
  • the feature amount extraction unit 212 may use the feature amount extraction model EM2 to extract the feature amount of each face of a plurality of persons.
  • the class identification unit 213 Based on the feature amount, the class identification unit 213 generates class identification information regarding an estimated class to which a plurality of persons commonly belongs among a plurality of classes (step S23).
  • the class identification information may indicate an estimated class using probabilities that a plurality of persons commonly belong to each of a plurality of classes.
  • the learning unit 214 Based on the label information and the class identification information, the learning unit 214 performs machine learning for setting the operating characteristics of the feature quantity extraction unit 212 (step S24).
  • the learning unit 214 may perform machine learning based on the cross-entropy error calculated based on the label information and the class identification information.
  • the learning unit 214 may perform machine learning based on a cross-entropy error calculated using a cross-entropy-type loss function using label information and class identification information.
  • the learning unit 214 causes the feature quantity extraction unit 212 to learn a method of extracting a feature quantity from a face image.
  • the learning unit 214 may cause the feature amount extraction model EM2 used by the feature amount extraction unit 212 to learn a method of extracting feature amounts from the face image, and construct the feature amount extraction model EM2.
  • the learning unit 214 calculates the gradient of the learning parameter included in the feature quantity extraction model EM2 based on the cross entropy error, and uses the gradient of the learning parameter to obtain the value of the learning parameter included in the feature quantity extraction model EM2. You may update. Updating the value of the learning parameter corresponds to learning of the feature quantity extraction model EM2. For example, the learning unit 214 may optimize the value of the learning parameter so that the cross-entropy error value is minimized.
  • step S24 may be executed for each batch size sample data SD.
  • the batch size value is not particularly limited, and any value can be used.
  • the acquisition unit 211 determines whether or not there is unprocessed learning data TD (step S25). If there is no unprocessed learning data TD (step S25: No), the computing device 21 stores the feature quantity extraction model EM2 in the storage device 22 (step S26). The process proceeds to step S22. If there is unprocessed learning data TD (step S25: Yes), the process proceeds to step S22.
  • the learning unit 214 may store the optimized feature quantity extraction model EM2 including the optimally updated learning parameters in the storage device 22.
  • the feature amount extracted from one person's face will be closer to one subcenter, and the feature amount extracted from the other person's face will be closer to the other subcenter.
  • the extraction behavior may be learned. Two faces may be determined to be different people if they fall into different subclasses.
  • the learning of the extraction operation may be advanced so that the feature amounts extracted from the two faces are closer to different subclasses.
  • the feature quantity extraction model EM2 generated in the second embodiment it may be determined whether the pair is a stranger pair or a true pair.
  • the feature amount extraction model EM2 can extract feature amounts so that the feature amounts of the face images belonging to the same twin ID class are close to each other, so that the face images belonging to the same twin ID class can be accurately identified. be able to.
  • the learning unit 214 performs machine learning based on the cross entropy error calculated based on the label information and the class identification information. Machine learning can be recommended so that Also, if the number of sub-centers is the same as the number of persons belonging to the class, there is no need to prepare a large number of sub-centers. Also, since the number of subcenters can be suppressed, the amount of calculation can be reduced. Furthermore, since the feature quantity extraction model EM2 is constructed so that each feature quantity is close to one of the subcenters, it is possible to identify which person's face image is. [3: Addendum]
  • An information processing apparatus comprising: storage means for storing in association with each other; [Appendix 2] The information processing apparatus according to Supplementary Note 1, wherein the determining means determines that the two faces are more than a predetermined similarity when a matching score between the feature amounts of the two faces is equal to or greater than a predetermined value. [Appendix 3] Acquiring a data set including a plurality of face images in which the faces of a plurality of persons whose faces are more than predetermined are captured, and label information relating to a correct class to which the plurality of persons commonly belongs among a plurality of classes.
  • An information processing apparatus comprising: learning means for performing machine learning for setting operation characteristics of the extraction means based on the label information and the class identification information.
  • the label information indicates the correct class using a correct value of the probability that the plurality of persons commonly belong to each of the plurality of classes; the class identification information indicates the estimated class using a probability that the plurality of persons commonly belong to each of the plurality of classes;
  • the information processing apparatus according to appendix 3, wherein the learning means performs the machine learning based on a cross-entropy error calculated based on the label information and the class identification information.
  • each of the plurality of classes includes a plurality of subclasses;
  • the class identification information is a probability that each of the plurality of persons commonly belongs to one class out of the plurality of classes, which of the plurality of subclasses the plurality of persons are included in the one class.
  • the information processing apparatus according to appendix 4, wherein the number of the plurality of subclasses included in each class is the same as the number of the plurality of persons. [Appendix 6] wherein each of the plurality of persons corresponds to a subclass feature amount that is most similar to the feature amount extracted by the extraction means, among the plurality of subclasses included in the one class.
  • the information processing apparatus according to appendix 5, wherein the estimated class is indicated using a probability of belonging to one subclass.
  • [Appendix 7] Acquire an image to be judged in which multiple faces are captured, determining whether two faces out of the plurality of faces appearing in the determination target image are similar to a predetermined degree or more; When the two faces are similar to each other more than a predetermined amount, two face images each containing the two faces and a stranger label indicating that the two persons corresponding to the two face images are not the same person are generated.
  • An information processing method that associates and stores.
  • [Appendix 8] Acquiring a data set including a plurality of face images in which the faces of a plurality of persons whose faces are more than predetermined are captured, and label information relating to a correct class to which the plurality of persons commonly belongs among a plurality of classes.
  • [Appendix 10] to the computer Acquiring a data set including a plurality of face images in which the faces of a plurality of persons whose faces are more than predetermined are captured, and label information relating to a correct class to which the plurality of persons commonly belongs among a plurality of classes. death, extracting the facial features of each of the plurality of persons based on the plurality of face images; generating class identification information related to an estimated class to which the plurality of persons among the plurality of classes commonly belong, based on the feature quantity;
  • a recording medium recording a computer program for executing an information processing method that performs machine learning for setting operating characteristics of the extraction means based on the label information and the class identification information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
PCT/JP2021/046251 2021-12-15 2021-12-15 情報処理装置、情報処理方法、及び、記録媒体 Ceased WO2023112198A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/718,834 US20250054336A1 (en) 2021-12-15 2021-12-15 Information processing apparatus, information processing method, and recording medium
PCT/JP2021/046251 WO2023112198A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、及び、記録媒体
JP2023567383A JP7718506B2 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、及び、記録媒体

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/046251 WO2023112198A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、及び、記録媒体

Publications (1)

Publication Number Publication Date
WO2023112198A1 true WO2023112198A1 (ja) 2023-06-22

Family

ID=86773765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/046251 Ceased WO2023112198A1 (ja) 2021-12-15 2021-12-15 情報処理装置、情報処理方法、及び、記録媒体

Country Status (3)

Country Link
US (1) US20250054336A1 (https=)
JP (1) JP7718506B2 (https=)
WO (1) WO2023112198A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097739A (zh) * 2023-12-13 2024-05-28 深圳云天励飞技术股份有限公司 人脸数据标注方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011237911A (ja) * 2010-05-07 2011-11-24 Seiko Epson Corp 画像処理装置、画像処理方法、及び画像処理プログラム
JP2015179423A (ja) * 2014-03-19 2015-10-08 キヤノン株式会社 人物登録装置、人物認識装置及びプログラム

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7460450B2 (ja) 2020-06-05 2024-04-02 矢崎総業株式会社 視線推定システム、視線推定方法、視線推定プログラム、学習用データ生成装置、及び、視線推定装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011237911A (ja) * 2010-05-07 2011-11-24 Seiko Epson Corp 画像処理装置、画像処理方法、及び画像処理プログラム
JP2015179423A (ja) * 2014-03-19 2015-10-08 キヤノン株式会社 人物登録装置、人物認識装置及びプログラム

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118097739A (zh) * 2023-12-13 2024-05-28 深圳云天励飞技术股份有限公司 人脸数据标注方法、装置、电子设备及存储介质
CN118097739B (zh) * 2023-12-13 2025-10-28 深圳云天励飞技术股份有限公司 人脸数据标注方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
JP7718506B2 (ja) 2025-08-05
JPWO2023112198A1 (https=) 2023-06-22
US20250054336A1 (en) 2025-02-13

Similar Documents

Publication Publication Date Title
Galbally et al. Iris image reconstruction from binary templates: An efficient probabilistic approach based on genetic algorithms
CN115050064B (zh) 人脸活体检测方法、装置、设备及介质
WO2018167900A1 (ja) ニューラルネットワーク学習装置、方法、およびプログラム
CN109241741B (zh) 一种基于图像纹理指纹的恶意代码分类方法
CN114785606A (zh) 一种基于预训练LogXLNet模型的日志异常检测方法、电子设备及存储介质
US20190147218A1 (en) User specific classifiers for biometric liveness detection
EP3591561A1 (en) An anonymized data processing method and computer programs thereof
CN114398681A (zh) 训练隐私信息分类模型、识别隐私信息的方法和装置
Sun et al. Generative adversarial networks unlearning
Ajay et al. Dense-ShuffleGCANet: an attention-driven deep learning approach for diabetic foot ulcer classification using refined spatio-dimensional features
CN112446408B (zh) 基于装置上训练的识别用户的方法和设备
JP7718506B2 (ja) 情報処理装置、情報処理方法、及び、記録媒体
Raj et al. Attacking NIST biometric image software using nonlinear optimization
CN113705468B (zh) 基于人工智能的数字图像识别方法及相关设备
Zhou et al. Partial fingerprint indexing: a combination of local and reconstructed global features
Gupta et al. Iris recognition using templates fusion with weighted majority voting
CN113434895B (zh) 文本解密方法、装置、设备及存储介质
CN119783086A (zh) 一种基于大语言模型的定向口令猜测方法及系统
Arora et al. Cryptography and Tay-Grey wolf optimization based multimodal biometrics for effective security
Khallaf et al. Implementation of quaternion mathematics for biometric security
JP2021174471A (ja) 識別器学習装置及び識別器学習方法
CN116127377B (zh) 群体分类方法、装置、设备及存储介质
WO2024025621A1 (en) Detecting face morphing by one-to-many face recognition statement regarding federally sponsored research
Gunasekar et al. Authentic cloud-biometric signature verification system for healthcare data management
CN116468043A (zh) 嵌套实体识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21968115

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023567383

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 18718834

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21968115

Country of ref document: EP

Kind code of ref document: A1