WO2021017096A1 - 一种将人脸信息录入数据库的方法和装置 - Google Patents
一种将人脸信息录入数据库的方法和装置 Download PDFInfo
- Publication number
- WO2021017096A1 WO2021017096A1 PCT/CN2019/104108 CN2019104108W WO2021017096A1 WO 2021017096 A1 WO2021017096 A1 WO 2021017096A1 CN 2019104108 W CN2019104108 W CN 2019104108W WO 2021017096 A1 WO2021017096 A1 WO 2021017096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- database
- face
- voice
- video
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000001815 facial effect Effects 0.000 title claims abstract description 16
- 238000009434 installation Methods 0.000 title 1
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 230000004807 localization Effects 0.000 claims description 6
- 239000013589 supplement Substances 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 8
- 230000015654 memory Effects 0.000 description 7
- 208000010415 Low Vision Diseases 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000004303 low vision Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 230000003936 working memory Effects 0.000 description 2
- 201000009487 Amblyopia Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/41—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/166—Detection; Localisation; Normalisation using acquisition arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/50—Maintenance of biometric data or enrolment thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
Definitions
- the present disclosure relates to face recognition, and particularly to a method and device for recording face information into a database.
- Face recognition is a kind of biometric recognition technology based on the facial feature information of people. Face recognition technology uses a video camera or camera to collect images or video streams containing faces, and automatically detects faces in the images, and then performs face recognition on the detected faces. Building a face information database is a prerequisite for face recognition. In the process of entering the face information into the database, it is usually the user of the image and video acquisition device that enters the information corresponding to the collected face information.
- An object of the present disclosure is to provide a method, processor chip, electronic device, and storage medium for recording face information into a database.
- a method for recording face information into a database including: video shooting one or more subjects, and extracting the one or more people from the video screen during shooting. Face information of a person being photographed; recording the voice of at least one of the one or more persons being photographed during the photographing; performing semantic analysis on the recorded voice to extract corresponding information from it; and combining the extracted
- the information is associated with the face information of the person who spoke the information and entered into the database.
- a processor chip circuit for recording face information into a database including a circuit unit configured to perform the steps of the above method.
- an electronic device including: a video sensor for taking video shots of one or more subjects; an audio sensor for recording At least one voice during the shooting period; and the above-mentioned processor chip circuit to associate the information of the person being photographed and the face information and record them in the database.
- a computer-readable storage medium wherein a program including instructions is stored on the storage medium, which when executed by a processor of the electronic device cause the electronic device to perform the steps of the above method.
- Fig. 1 shows a flowchart of associating face information with information extracted from speech according to the first embodiment
- Figure 2 exemplarily shows a scene in which face information is entered for multiple subjects
- Figure 3 shows the first arrangement of the microphone array and the camera
- Figure 4 shows a second arrangement of the microphone array and the camera
- Fig. 5 exemplarily displays video images and audio waveforms in association based on a common time axis
- Fig. 6 shows a flow chart of associating face information with information extracted from speech according to the second embodiment
- FIG. 7 shows a flowchart of associating face information with information extracted from speech according to the third embodiment
- Fig. 8 shows a structural block diagram of an exemplary computing device that can be applied to an exemplary embodiment.
- first, second, etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements. Such terms are only used for Distinguish one element from another.
- first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.
- Fig. 1 shows a flow chart of associating face information with information extracted from speech according to the first embodiment of the present disclosure.
- step S101 a video is taken of a subject, and the face information of the subject is extracted from the video screen during the shooting.
- Video shooting can be done by means of a video camera, camera or other video capture unit with image sensor.
- the video acquisition unit can automatically search for a face by using face recognition technology, and then extract the face information of the photographed person for face recognition.
- the facial information includes facial feature information that can be used to identify the subject.
- the features that can be used by the face recognition system include visual features, pixel statistical features, face image transformation coefficient features, and face image algebra features. For example, the geometric description of the structural relationship between the eyes, nose, mouth, chin and other parts of the face, as well as the iris, can be used as important features to recognize the face.
- the extracted face information is searched and matched with the face information template stored in the database, and the identity information of the face is judged according to the degree of similarity.
- deep learning can be used to train a neural network to perform the above-mentioned similarity judgment.
- step S103 the voice of the subject during the shooting is recorded.
- the voice can include the speaker's own identity information; as an alternative and supplement, the voice can also include information related to the speaker's own scene.
- the conversation content of the doctor can include not only the doctor's name, department, position and other identity information, but also effective voice information about treatment methods and medication methods.
- Voice collection can be achieved through audio collection units such as microphones.
- the person being photographed took the initiative to speak out information, such as his own identity information "I am Wang Jun” and so on.
- the identity information includes at least the name, but according to the different uses of the database, it can also include other information such as age, hometown, and the above-mentioned work unit and position.
- step S105 semantic analysis is performed on the recorded speech, and corresponding information is extracted therefrom.
- Extracting information from speech can be achieved through speech recognition technology, and the extracted information can be stored in the form of text. Based on the speech database of various languages such as Chinese (including different dialects) and English provided by the speech recognition technology provider, the information reported in multiple languages can be recognized. As described above, the extracted information may be the speaker's own identity information; as an alternative and supplement, the extracted information may also include information related to the scene in which the speaker himself is located. It should be pointed out that the identity information extracted by semantic analysis is different from the speaker's voiceprint information.
- the degree of cooperation of the subject may affect the result of voice recognition. It is understandable that if the person being photographed clearly speaks the corresponding information at an appropriate rate of speech, the result of voice recognition will be more accurate.
- step S107 the extracted information is associated with the face information of the person who spoke the information and stored in the database.
- the extracted face information and information belong to the same subject, and then the extracted face information and information are stored in the database in an associated form.
- the information is stored in the database in the form of text information.
- the above-mentioned face information entry method automatically recognizes and associates the information broadcast by the subject with the face information, which reduces the risk of incorrect entry of the subject’s information (especially identity information) by the user of the video capture unit, and improves the human The efficiency of face information entry.
- the method according to the present disclosure makes it possible to input other information related to the scene at the same time, so that the user's use requirements in different scenes can be met.
- the steps in the flowchart in Figure 1 can also be applied to a scene where there are multiple subjects.
- This scenario is, for example, a person with low vision attending a multi-person meeting or in a social situation.
- step S101 video shooting is performed on multiple subjects, and the face information of each subject is extracted from the video screen during the shooting.
- the shooting range of the video capture unit 204 (the fan-shaped area defined by the two dotted lines in FIG. 2 ) there are three subjects 201, 202 and 203 at the same time.
- the face recognition technology is used to automatically search for the faces of multiple subjects, and then the corresponding face information is extracted for all the photographed faces.
- step S103 the voice of at least one of the plurality of photographers during being photographed is recorded.
- Multiple subjects can broadcast their own information in turn, and the recorded voice can be stored in the memory.
- step S105 semantic analysis is performed on each recorded voice, and corresponding information is extracted therefrom.
- the voice can also include information related to the scene where the speaker is located. Such information can also be extracted by analyzing the voice and stored in association with the face information. To the database kind. For the sake of simplicity of description, the present invention will be described below by taking the identity information in the voice as an example.
- step S107 the extracted information is associated with the face information of the person who spoke the information and entered into the database.
- the correlation between the extracted corresponding information and the face information can be realized in the following two ways:
- the device 200 for inputting face information further includes an audio collection unit 205. It should be pointed out that FIG. 2 is not intended to limit the relative positions of the audio collection unit 205 and the video collection unit 204.
- the audio collection unit 205 may be an array including three microphones, where the microphones are, for example, non-directional microphone elements with high sensitivity to sound pressure.
- three microphones 305-1, 305-2, and 305-3 are arranged in a straight line above the camera 304.
- three microphones 405-1, 405-2, and 405-3 form an equilateral triangle with the camera 404 as the center.
- the form of the microphone array is not limited to the modes shown in FIG. 3 and FIG. 4, and it is important that the three microphones are installed at known and different positions on the face information recording devices 200, 300, and 400, respectively.
- the speaking sound waves propagate to the three microphones 305-1, 305-2, and 305-3 of the audio collection unit. Due to the different positions, the audio signals collected by the three microphones have a phase difference with each other. According to the three phase difference information, the direction of the sound source relative to the face information recording device can be judged. For example, as shown in FIG.
- one 305-2 of the three microphones 305-1, 305-2, and 305-3 may be set on the vertical axis of the face information recording device 300, and the remaining two microphones 305-1 and 305-3 are arranged symmetrically with respect to the microphone 305-2, and the normal line passing through the microphone 305-2 and perpendicular to the plane where the microphone 305-2 is located is used as the reference line, and the specific direction of the sound source is calibrated by angle.
- the subject 1 is sending out a voice to announce his identity information.
- the direction of the subject 1 relative to the audio collection unit 205 can be accurately located.
- the accuracy of sound source localization is related to the sensitivity of the microphone used by the audio collection unit. If the distance between the subjects in the shooting range is very large, the requirements for the accuracy of sound source positioning are relatively low; on the contrary, if the distance between the subjects in the shooting range is very small, The requirements for the accuracy of sound source localization are relatively high.
- those skilled in the art can determine the performance of the audio collection unit according to specific application scenarios (for example, according to the number of people within the shooting range at the same time).
- the video capture units 304 and 404 may be used to map the real scene and the video scene where the photographer is located with respect to the orientation. This mapping can be achieved by preset reference markers 206 and 207 in the real scene (in this case, the distance from the video capture unit to the reference marker is known), or can be achieved by using the distance measurement function of the camera.
- the distance measurement using the camera can be achieved in the following ways:
- the sensors such as gyroscopes
- the sensors can be used to estimate the changes in the camera's viewing angle and the displacement of the video acquisition unit to infer The actual spatial distance corresponding to the pixel displacement in the image;
- the position of the speaker (photographed person 1) in the video frame can be calculated to complete the association between the extracted identity information and the extracted facial information.
- the above-mentioned sound source localization involves the association of audio and video in spatial orientation, and the implementation of capturing lip movements involves the association of video and audio in time.
- Figure 5 uses a common time axis to correlate the captured video and recorded audio waveforms.
- the face information recording device 200, 300, 400 retrieves the recorded video image
- the frame 502 at time t1 and the frame 501 at the time before are compared.
- the identity information collected by the audio collection unit during the time interval from t1 to t2 should be associated with the person being photographed on the left.
- the above-mentioned method of associating identity information and face information by capturing lip movements can be used to reinforce the implementation of sound source localization, or can be used alone as an alternative to sound source localization.
- the identity information of the people present is quickly grasped, and the identity information of strangers is associated with the corresponding facial information and stored in the database.
- the location technology explained above can be used to confirm the position of the speaker in the video screen, and perform face recognition on it, for example, to provide the current speaker to people with low vision through the speaker.
- the identity information which provides great convenience for people with low vision to carry out normal social activities.
- the corresponding semantics can also be accurately analyzed through the captured lip video actions, and different sound sources can be split through the audio collection device, and the semantics of the video lip action analysis can be compared with the audio
- the single-channel sound source information split by the acquisition device is compared for correlation.
- Fig. 6 shows a flowchart of associating face information and extracted corresponding information into a database according to the second embodiment of the present disclosure.
- the difference from the embodiment shown in FIG. 1 is that the second embodiment determines whether the extracted face information has been stored in the database before extracting corresponding information from the voice.
- step S601 a video is taken of one or more subjects, the face information of the subject is extracted from the video screen, and the voice of the subject is recorded.
- step S602 the extracted face information is compared with face information templates already stored in the database.
- step S605 If it is determined that the face information has been stored in the database, proceed to step S605 to exit the face information entry mode.
- the name to be entered when the name to be entered has been stored in the database (the corresponding face information is different), the name to be entered can be distinguished and entered into the database. For example, when there is already “Wang Jun” in the database, it is entered as “Wang Jun No. 2" to distinguish the "Wang Jun” that has been entered in the database, so that when the user is subsequently broadcast to the user, a different voice message code is used to allow the user Correspond to different face information.
- step S604 the extracted information is associated with the face information and entered into the database.
- the manner of associating voice and human face explained above in conjunction with FIGS. 1 to 5 can also be applied to this second embodiment.
- the input efficiency of the extracted corresponding information and facial information can be further improved.
- the corresponding information including identity extracted according to the present disclosure is text information recognized from voice information in audio format. Therefore, the above information is stored in the database as text information instead of voice information.
- Fig. 7 shows a flowchart of associating face information and identity information into a database according to the third embodiment of the present disclosure.
- step S701 a video is taken of one or more subjects, and the face information of the subject is extracted from the video screen during the shooting.
- step S703 semantic analysis is performed on the voice of the subject during the shooting, and the voice may contain the speaker's own identity information.
- step S705 it is determined whether the extracted face information is already in the database.
- step S707 If it is determined that the relevant face information has not been stored in the database, proceed to step S707, and store the extracted information and the face information in the database in an associated form.
- the manner of associating the voice and the human face explained in conjunction with FIGS. 1 to 5 can also be applied to the third embodiment.
- the relevant face information has been stored in the database, proceed to S710 to further determine whether the extracted information can supplement the existing information in the database.
- the name of the person being photographed already exists in the database and the extracted information also includes other information such as age, hometown, etc., or new information related to the scene where the speaker is located.
- a more comprehensive identity information database can be obtained with higher efficiency.
- FIG. 8 is a computing device 2000 for implementing the method or process of the present disclosure, which is an example of a hardware device that can be applied to various aspects of the present disclosure.
- the computing device 2000 may be any machine configured to perform processing and/or calculations. Especially in the above-mentioned conference or social scene where multiple people are present, the computing device 2000 may be implemented as a wearable device, preferably as a smart glasses. In addition, the computing device 2000 may also be implemented as a tablet computer, a smart phone, or any combination thereof.
- the apparatus for inputting facial information according to the present disclosure may be implemented in whole or at least in part by the computing device 2000 or similar devices or systems.
- the computing device 2000 may include elements connected to or in communication with the bus 2002 (possibly via one or more interfaces).
- the computing device 2000 may include a bus 2002, one or more processors 2004, one or more input devices 2006, and one or more output devices 2008.
- the one or more processors 2004 may be any type of processor, and may include, but are not limited to, one or more general-purpose processors and/or one or more special-purpose processors (for example, special processing chips).
- the input device 2006 may be any type of device that can input information to the computing device 2000, and may include, but is not limited to, a camera.
- the output device 2008 may be any type of device that can present information, and may include, but is not limited to, a speaker, an audio output terminal, a vibrator, or a display.
- the computing device 2000 may also include a non-transitory storage device 2010 or be connected to a non-transitory storage device 2010.
- the non-transitory storage device may be any storage device that is non-transitory and can realize data storage, and may include, but is not limited to, a disk drive , Optical storage device, solid state memory, floppy disk, flexible disk, hard disk, tape or any other magnetic medium, optical disk or any other optical medium, ROM (read only memory), RAM (random access memory), cache memory and/or Any other memory chip or cartridge, and/or any other medium from which the computer can read data, instructions, and/or code.
- the non-transitory storage device 2010 can be detached from the interface.
- the non-transitory storage device 2010 may have data/programs (including instructions)/code for implementing the above-mentioned methods and steps.
- the computing device 2000 may also include a communication device 2012.
- the communication device 2012 may be any type of device or system that enables communication with external devices and/or with the network, and may include, but is not limited to, wireless communication devices and/or chipsets, such as Bluetooth devices, 1302.11 devices, WiFi devices, WiMax Devices, cellular communication devices and/or the like.
- the computing device 2000 may also include a work memory 2014, which may be any type of work memory that stores programs (including instructions) and/or data useful for the work of the processor 2004, and may include, but is not limited to, random access memory and/ Or read-only memory device.
- a work memory 2014 may be any type of work memory that stores programs (including instructions) and/or data useful for the work of the processor 2004, and may include, but is not limited to, random access memory and/ Or read-only memory device.
- the software elements may be located in the working memory 2014, including but not limited to the operating system 2016, one or more applications 2018, drivers, and/or other data and codes. Instructions for performing the above methods and steps may be included in one or more applications 2018.
- the memory 2014 may store program code, captured video and/or audio used to execute the flowcharts shown in FIGS. 1, 6 and 7 File, where the application 2018 may include face recognition applications, voice recognition applications, camera ranging applications, etc. provided by third parties.
- the input device 2006 may be a sensor for acquiring video and audio, such as a camera and a microphone.
- the storage device 2010 is, for example, used to store a database, so that the associated identity information and face information can be written into the database.
- the processor 2004 is configured to execute the method steps according to various aspects of the present disclosure according to the program code in the working memory 2014.
- the components of the computing device 2000 may be distributed on a network. For example, one processor may be used to perform some processing, while at the same time, another processor far away from the one processor may perform other processing. Other components of computing device 2000 can also be similarly distributed. In this way, the computing device 2000 can be interpreted as a distributed computing system that performs processing in multiple locations.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Collating Specific Patterns (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (16)
- 一种用于将人脸信息录入数据库中的方法,包括:对一个或多个被拍摄者进行视频拍摄,在拍摄期间从视频画面中提取所述一个或多个被拍摄者的人脸信息;记录所述一个或多个被拍摄者中的至少一个在被拍摄期间的语音;对被记录的语音进行语义分析,从中提取相应的信息;并且将所提取的信息与说出该信息的被拍摄者的人脸信息进行关联并录入所述数据库中。
- 根据权利要求1的方法,其中,所述人脸信息包括能够用于辨识所述一个或多个被拍摄者的人脸特征信息。
- 根据权利要求1或2的方法,其中,所述至少一个被拍摄者的语音中包括说话者自己的身份信息,并且所述提取的相应的信息包括所述说话者自己的身份信息。
- 根据权利要求3的方法,其中,所述身份信息包括姓名。
- 根据权利要求1或2的方法,所述至少一个被拍摄者的语音中包括与说话者自己所处场景有关的信息,并且所述提取的相应的信息包括所述与说话者自己所处场景有关的信息。
- 根据权利要求1的方法,其中,将所提取的信息与说出该信息的被拍摄者的人脸信息进行关联包括:通过声源定位来确定所述说出该信息的被拍摄者在现实场景中的方位。
- 根据权利要求6的方法,其中,将所提取的信息与说出该信息的被拍摄者的人脸信息进行关联还包括:对所述现实场景与视频场景关于方位进行映射;通过所述说出该信息的被拍摄者在现实场景中的方位来确定其在视频场景中的位置。
- 根据权利要求1的方法,对所提取的信息与说出该信息的被拍摄者的人脸信息进行关联包括:在拍摄期间根据视频画面分析所述一个或多个被拍摄者的嘴唇的运动。
- 根据权利要求8的方法,其中,比较所述嘴唇的运动的开始时间与所述语音被记录的开始时间。
- 根据权利要求1的方法,其中,检测所述至少一个被拍摄者的人脸信息是否已经存储在数据库中,如果所述至少一个被拍摄者的人脸信息不在数据库中,对所述被记录的语音进行分析。
- 根据权利要求1的方法,其中,检测所述至少一个被拍摄者的人脸信息是否已经存储在数据库中,如果所述至少一个被拍摄者的人脸信息已经存储在数据库中,则利用所述所提取的信息补充数据库中已存的与所述至少一个被拍摄者的人脸信息相关联的信息。
- 根据权利要求1的方法,其中,所述信息作为文字信息存储在数据库中。
- 一种处理器芯片电路,用于将人脸信息录入数据库中,包括:被配置为执行根据权利要求1至12中任一项所述的方法的步骤的电路单元。
- 一种电子设备,包括:视频传感器,用于对一个或多个被拍摄者进行视频拍摄;音频传感器,用于记录所述一个或多个被拍摄者中的至少一个在被拍摄期间的语音;以及如权利要求13所述的处理器芯片电路,以对相应被拍摄者的信息和人脸信息进行关联并录入数据库中。
- 根据权利要求14的电子设备,其中所述电子设备实现为可穿戴设备,所述可穿戴设备进一步包括扬声器,用于当数据库中存在与所识别的人脸对应的信息时语音播放出信息内容。
- 一种存储程序的计算机可读存储介质,所述程序包括指令,所述指令在由电子设备的处理器执行时,致使所述电子设备执行根据权利要求1至12中任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227006755A KR20220041891A (ko) | 2019-07-29 | 2019-09-03 | 얼굴 정보를 데이터베이스에 입력하는 방법 및 설치 |
US16/678,838 US10922570B1 (en) | 2019-07-29 | 2019-11-08 | Entering of human face information into database |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910686122.3 | 2019-07-29 | ||
CN201910686122.3A CN110196914B (zh) | 2019-07-29 | 2019-07-29 | 一种将人脸信息录入数据库的方法和装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/678,838 Continuation US10922570B1 (en) | 2019-07-29 | 2019-11-08 | Entering of human face information into database |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021017096A1 true WO2021017096A1 (zh) | 2021-02-04 |
Family
ID=67756178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/104108 WO2021017096A1 (zh) | 2019-07-29 | 2019-09-03 | 一种将人脸信息录入数据库的方法和装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10922570B1 (zh) |
EP (1) | EP3772016B1 (zh) |
JP (1) | JP6723591B1 (zh) |
KR (1) | KR20220041891A (zh) |
CN (1) | CN110196914B (zh) |
WO (1) | WO2021017096A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544270A (zh) * | 2019-08-30 | 2019-12-06 | 上海依图信息技术有限公司 | 结合语音识别且实时预测人脸追踪轨迹方法及装置 |
CN110767226B (zh) * | 2019-10-30 | 2022-08-16 | 山西见声科技有限公司 | 具有高准确度的声源定位方法、装置、语音识别方法、系统、存储设备及终端 |
CN114420131B (zh) * | 2022-03-16 | 2022-05-31 | 云天智能信息(深圳)有限公司 | 低弱视力智能语音辅助识别系统 |
CN114863364B (zh) * | 2022-05-20 | 2023-03-07 | 碧桂园生活服务集团股份有限公司 | 一种基于智能视频监控的安防检测方法及系统 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020106114A1 (en) * | 2000-12-01 | 2002-08-08 | Jie Yan | System and method for face recognition using synthesized training images |
CN101021897A (zh) * | 2006-12-27 | 2007-08-22 | 中山大学 | 一种基于块内相关性的二维线性鉴别分析人脸识别方法 |
CN105512348A (zh) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | 用于处理视频和相关音频的方法和装置及检索方法和装置 |
Family Cites Families (86)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6111517A (en) | 1996-12-30 | 2000-08-29 | Visionics Corporation | Continuous video monitoring using face recognition for access control |
US6243683B1 (en) * | 1998-12-29 | 2001-06-05 | Intel Corporation | Video control of speech recognition |
US6567775B1 (en) * | 2000-04-26 | 2003-05-20 | International Business Machines Corporation | Fusion of audio and video based speaker identification for multimedia information access |
US20030154084A1 (en) | 2002-02-14 | 2003-08-14 | Koninklijke Philips Electronics N.V. | Method and system for person identification using video-speech matching |
US7472063B2 (en) * | 2002-12-19 | 2008-12-30 | Intel Corporation | Audio-visual feature fusion and support vector machine useful for continuous speech recognition |
EP1602063A1 (en) * | 2003-03-13 | 2005-12-07 | Intelligent Mechatronic Systems, Inc. | Automotive occupant detection and classification method and system |
WO2005106841A1 (en) * | 2004-04-28 | 2005-11-10 | Koninklijke Philips Electronics N.V. | Adaptive beamformer, sidelobe canceller, handsfree speech communication device |
JP5134876B2 (ja) * | 2007-07-11 | 2013-01-30 | 株式会社日立製作所 | 音声通信装置及び音声通信方法並びにプログラム |
US20090055180A1 (en) * | 2007-08-23 | 2009-02-26 | Coon Bradley S | System and method for optimizing speech recognition in a vehicle |
US8219387B2 (en) * | 2007-12-10 | 2012-07-10 | Microsoft Corporation | Identifying far-end sound |
US8624962B2 (en) * | 2009-02-02 | 2014-01-07 | Ydreams—Informatica, S.A. Ydreams | Systems and methods for simulating three-dimensional virtual interactions from two-dimensional camera images |
JP2011186351A (ja) * | 2010-03-11 | 2011-09-22 | Sony Corp | 情報処理装置、および情報処理方法、並びにプログラム |
US9183560B2 (en) * | 2010-05-28 | 2015-11-10 | Daniel H. Abelow | Reality alternate |
US9396385B2 (en) * | 2010-08-26 | 2016-07-19 | Blast Motion Inc. | Integrated sensor and video motion analysis method |
US8700392B1 (en) * | 2010-09-10 | 2014-04-15 | Amazon Technologies, Inc. | Speech-inclusive device interfaces |
US10289288B2 (en) * | 2011-04-22 | 2019-05-14 | Emerging Automotive, Llc | Vehicle systems for providing access to vehicle controls, functions, environment and applications to guests/passengers via mobile devices |
US10572123B2 (en) * | 2011-04-22 | 2020-02-25 | Emerging Automotive, Llc | Vehicle passenger controls via mobile devices |
US20130030811A1 (en) * | 2011-07-29 | 2013-01-31 | Panasonic Corporation | Natural query interface for connected car |
US8913103B1 (en) * | 2012-02-01 | 2014-12-16 | Google Inc. | Method and apparatus for focus-of-attention control |
KR101971697B1 (ko) * | 2012-02-24 | 2019-04-23 | 삼성전자주식회사 | 사용자 디바이스에서 복합 생체인식 정보를 이용한 사용자 인증 방법 및 장치 |
US9153084B2 (en) * | 2012-03-14 | 2015-10-06 | Flextronics Ap, Llc | Destination and travel information application |
US9922646B1 (en) * | 2012-09-21 | 2018-03-20 | Amazon Technologies, Inc. | Identifying a location of a voice-input device |
US9008641B2 (en) * | 2012-12-27 | 2015-04-14 | Intel Corporation | Detecting a user-to-wireless device association in a vehicle |
CN103973441B (zh) * | 2013-01-29 | 2016-03-09 | 腾讯科技(深圳)有限公司 | 基于音视频的用户认证方法和装置 |
EP2974124A4 (en) * | 2013-03-14 | 2016-10-19 | Intel Corp | VOICE AND / OR FACE RECOGNITION BASED SERVICE DELIVERY |
US9747898B2 (en) * | 2013-03-15 | 2017-08-29 | Honda Motor Co., Ltd. | Interpretation of ambiguous vehicle instructions |
US9317736B1 (en) * | 2013-05-08 | 2016-04-19 | Amazon Technologies, Inc. | Individual record verification based on features |
US9680934B2 (en) * | 2013-07-17 | 2017-06-13 | Ford Global Technologies, Llc | Vehicle communication channel management |
US9892745B2 (en) * | 2013-08-23 | 2018-02-13 | At&T Intellectual Property I, L.P. | Augmented multi-tier classifier for multi-modal voice activity detection |
JP6148163B2 (ja) * | 2013-11-29 | 2017-06-14 | 本田技研工業株式会社 | 会話支援装置、会話支援装置の制御方法、及び会話支援装置のプログラム |
US9390726B1 (en) * | 2013-12-30 | 2016-07-12 | Google Inc. | Supplementing speech commands with gestures |
US9582246B2 (en) * | 2014-03-04 | 2017-02-28 | Microsoft Technology Licensing, Llc | Voice-command suggestions based on computer context |
KR102216048B1 (ko) * | 2014-05-20 | 2021-02-15 | 삼성전자주식회사 | 음성 명령 인식 장치 및 방법 |
US9373200B2 (en) * | 2014-06-06 | 2016-06-21 | Vivint, Inc. | Monitoring vehicle usage |
JP6464449B2 (ja) * | 2014-08-29 | 2019-02-06 | 本田技研工業株式会社 | 音源分離装置、及び音源分離方法 |
US20160100092A1 (en) * | 2014-10-01 | 2016-04-07 | Fortemedia, Inc. | Object tracking device and tracking method thereof |
US9881610B2 (en) * | 2014-11-13 | 2018-01-30 | International Business Machines Corporation | Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities |
US10318575B2 (en) * | 2014-11-14 | 2019-06-11 | Zorroa Corporation | Systems and methods of building and using an image catalog |
US9741342B2 (en) * | 2014-11-26 | 2017-08-22 | Panasonic Intellectual Property Corporation Of America | Method and apparatus for recognizing speech by lip reading |
US9734410B2 (en) * | 2015-01-23 | 2017-08-15 | Shindig, Inc. | Systems and methods for analyzing facial expressions within an online classroom to gauge participant attentiveness |
DE102015201369A1 (de) * | 2015-01-27 | 2016-07-28 | Robert Bosch Gmbh | Verfahren und Vorrichtung zum Betreiben eines zumindest teilautomatisch fahrenden oder fahrbaren Kraftfahrzeugs |
US9300801B1 (en) * | 2015-01-30 | 2016-03-29 | Mattersight Corporation | Personality analysis of mono-recording system and methods |
US20160267911A1 (en) * | 2015-03-13 | 2016-09-15 | Magna Mirrors Of America, Inc. | Vehicle voice acquisition system with microphone and optical sensor |
US10305895B2 (en) * | 2015-04-14 | 2019-05-28 | Blubox Security, Inc. | Multi-factor and multi-mode biometric physical access control device |
US9641585B2 (en) * | 2015-06-08 | 2017-05-02 | Cisco Technology, Inc. | Automated video editing based on activity in video conference |
DE102015210430A1 (de) * | 2015-06-08 | 2016-12-08 | Robert Bosch Gmbh | Verfahren zum Erkennen eines Sprachkontexts für eine Sprachsteuerung, Verfahren zum Ermitteln eines Sprachsteuersignals für eine Sprachsteuerung und Vorrichtung zum Ausführen der Verfahren |
US10178301B1 (en) * | 2015-06-25 | 2019-01-08 | Amazon Technologies, Inc. | User identification based on voice and face |
US20170068863A1 (en) * | 2015-09-04 | 2017-03-09 | Qualcomm Incorporated | Occupancy detection using computer vision |
US9764694B2 (en) * | 2015-10-27 | 2017-09-19 | Thunder Power Hong Kong Ltd. | Intelligent rear-view mirror system |
US9832583B2 (en) * | 2015-11-10 | 2017-11-28 | Avaya Inc. | Enhancement of audio captured by multiple microphones at unspecified positions |
US11437020B2 (en) * | 2016-02-10 | 2022-09-06 | Cerence Operating Company | Techniques for spatially selective wake-up word recognition and related systems and methods |
US11783524B2 (en) * | 2016-02-10 | 2023-10-10 | Nitin Vats | Producing realistic talking face with expression using images text and voice |
US10476888B2 (en) * | 2016-03-23 | 2019-11-12 | Georgia Tech Research Corporation | Systems and methods for using video for user and message authentication |
EP3239981B1 (en) * | 2016-04-26 | 2018-12-12 | Nokia Technologies Oy | Methods, apparatuses and computer programs relating to modification of a characteristic associated with a separated audio signal |
US9984314B2 (en) * | 2016-05-06 | 2018-05-29 | Microsoft Technology Licensing, Llc | Dynamic classifier selection based on class skew |
US10089071B2 (en) * | 2016-06-02 | 2018-10-02 | Microsoft Technology Licensing, Llc | Automatic audio attenuation on immersive display devices |
CN109313935B (zh) * | 2016-06-27 | 2023-10-20 | 索尼公司 | 信息处理系统、存储介质和信息处理方法 |
US10152969B2 (en) * | 2016-07-15 | 2018-12-11 | Sonos, Inc. | Voice detection by multiple devices |
US10026403B2 (en) * | 2016-08-12 | 2018-07-17 | Paypal, Inc. | Location based voice association system |
JP6631445B2 (ja) * | 2016-09-09 | 2020-01-15 | トヨタ自動車株式会社 | 車両用情報提示装置 |
US10198626B2 (en) * | 2016-10-19 | 2019-02-05 | Snap Inc. | Neural networks for facial modeling |
JP2018074366A (ja) * | 2016-10-28 | 2018-05-10 | 京セラ株式会社 | 電子機器、制御方法およびプログラム |
CN106782545B (zh) * | 2016-12-16 | 2019-07-16 | 广州视源电子科技股份有限公司 | 一种将音视频数据转化成文字记录的系统和方法 |
US10497382B2 (en) * | 2016-12-16 | 2019-12-03 | Google Llc | Associating faces with voices for speaker diarization within videos |
US10403279B2 (en) * | 2016-12-21 | 2019-09-03 | Avnera Corporation | Low-power, always-listening, voice command detection and capture |
US20180190282A1 (en) * | 2016-12-30 | 2018-07-05 | Qualcomm Incorporated | In-vehicle voice command control |
US20180187969A1 (en) * | 2017-01-03 | 2018-07-05 | Samsung Electronics Co., Ltd. | Refrigerator |
WO2018147687A1 (en) * | 2017-02-10 | 2018-08-16 | Samsung Electronics Co., Ltd. | Method and apparatus for managing voice-based interaction in internet of things network system |
US10467510B2 (en) * | 2017-02-14 | 2019-11-05 | Microsoft Technology Licensing, Llc | Intelligent assistant |
WO2018150758A1 (ja) * | 2017-02-15 | 2018-08-23 | ソニー株式会社 | 情報処理装置、情報処理方法及び記憶媒体 |
US10748542B2 (en) * | 2017-03-23 | 2020-08-18 | Joyson Safety Systems Acquisition Llc | System and method of correlating mouth images to input commands |
DK179867B1 (en) * | 2017-05-16 | 2019-08-06 | Apple Inc. | RECORDING AND SENDING EMOJI |
US20180357040A1 (en) * | 2017-06-09 | 2018-12-13 | Mitsubishi Electric Automotive America, Inc. | In-vehicle infotainment with multi-modal interface |
US10416671B2 (en) * | 2017-07-11 | 2019-09-17 | Waymo Llc | Methods and systems for vehicle occupancy confirmation |
US20190037363A1 (en) * | 2017-07-31 | 2019-01-31 | GM Global Technology Operations LLC | Vehicle based acoustic zoning system for smartphones |
CN107632704B (zh) * | 2017-09-01 | 2020-05-15 | 广州励丰文化科技股份有限公司 | 一种基于光学定位的混合现实音频控制方法及服务设备 |
JP2019049829A (ja) * | 2017-09-08 | 2019-03-28 | 株式会社豊田中央研究所 | 目的区間判別装置、モデル学習装置、及びプログラム |
JP7123540B2 (ja) * | 2017-09-25 | 2022-08-23 | キヤノン株式会社 | 音声情報による入力を受け付ける情報処理端末、方法、その情報処理端末を含むシステム |
US11465631B2 (en) * | 2017-12-08 | 2022-10-11 | Tesla, Inc. | Personalization system and method for a vehicle based on spatial locations of occupants' body portions |
US10374816B1 (en) * | 2017-12-13 | 2019-08-06 | Amazon Technologies, Inc. | Network conference management and arbitration via voice-capturing devices |
US10834365B2 (en) * | 2018-02-08 | 2020-11-10 | Nortek Security & Control Llc | Audio-visual monitoring using a virtual assistant |
US11335079B2 (en) * | 2018-03-05 | 2022-05-17 | Intel Corporation | Method and system of reflection suppression for image processing |
US10699572B2 (en) * | 2018-04-20 | 2020-06-30 | Carrier Corporation | Passenger counting for a transportation system |
US11196669B2 (en) * | 2018-05-17 | 2021-12-07 | At&T Intellectual Property I, L.P. | Network routing of media streams based upon semantic contents |
US20190355352A1 (en) * | 2018-05-18 | 2019-11-21 | Honda Motor Co., Ltd. | Voice and conversation recognition system |
DK201870683A1 (en) * | 2018-07-05 | 2020-05-25 | Aptiv Technologies Limited | IDENTIFYING AND AUTHENTICATING AUTONOMOUS VEHICLES AND PASSENGERS |
-
2019
- 2019-07-29 CN CN201910686122.3A patent/CN110196914B/zh active Active
- 2019-09-03 WO PCT/CN2019/104108 patent/WO2021017096A1/zh active Application Filing
- 2019-09-03 KR KR1020227006755A patent/KR20220041891A/ko active Search and Examination
- 2019-10-08 JP JP2019184911A patent/JP6723591B1/ja active Active
- 2019-11-08 US US16/678,838 patent/US10922570B1/en active Active
- 2019-11-26 EP EP19211509.5A patent/EP3772016B1/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020106114A1 (en) * | 2000-12-01 | 2002-08-08 | Jie Yan | System and method for face recognition using synthesized training images |
CN101021897A (zh) * | 2006-12-27 | 2007-08-22 | 中山大学 | 一种基于块内相关性的二维线性鉴别分析人脸识别方法 |
CN105512348A (zh) * | 2016-01-28 | 2016-04-20 | 北京旷视科技有限公司 | 用于处理视频和相关音频的方法和装置及检索方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN110196914A (zh) | 2019-09-03 |
US10922570B1 (en) | 2021-02-16 |
JP6723591B1 (ja) | 2020-07-15 |
JP2021022351A (ja) | 2021-02-18 |
EP3772016A1 (en) | 2021-02-03 |
EP3772016B1 (en) | 2022-05-18 |
CN110196914B (zh) | 2019-12-27 |
KR20220041891A (ko) | 2022-04-01 |
US20210034898A1 (en) | 2021-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021017096A1 (zh) | 一种将人脸信息录入数据库的方法和装置 | |
CN112037791B (zh) | 会议纪要转录方法、设备和存储介质 | |
US9769367B2 (en) | Speech and computer vision-based control | |
US20190215464A1 (en) | Systems and methods for decomposing a video stream into face streams | |
JP5456832B2 (ja) | 入力された発話の関連性を判定するための装置および方法 | |
JP6999734B2 (ja) | オーディオビジュアルデータに基づく話者ダイアライゼーション方法および装置 | |
CN113874936A (zh) | 用于优化分布式系统中的用户偏好的定制输出 | |
US11527242B2 (en) | Lip-language identification method and apparatus, and augmented reality (AR) device and storage medium which identifies an object based on an azimuth angle associated with the AR field of view | |
CN113906503A (zh) | 处理来自分布式设备的重叠语音 | |
WO2021120190A1 (zh) | 数据处理方法、装置、电子设备和存储介质 | |
CN111551921A (zh) | 一种声像联动的声源定向系统及方法 | |
EP3963575A1 (en) | Distributed device meeting initiation | |
US11875571B2 (en) | Smart hearing assistance in monitored property | |
KR20210066774A (ko) | 멀티모달 기반 사용자 구별 방법 및 장치 | |
US20220180535A1 (en) | Presenter-tracker management in a videoconferencing environment | |
JPWO2021230180A5 (zh) | ||
US20230136553A1 (en) | Context-aided identification | |
WO2021134720A1 (zh) | 一种会议数据处理方法及相关设备 | |
Korchagin et al. | Just-in-time multimodal association and fusion from home entertainment | |
US20220329755A1 (en) | Tracker Activation and Deactivation in a Videoconferencing System | |
Korchagin et al. | Multimodal cue detection engine for orchestrated entertainment | |
Jain et al. | Survey on Various Techniques based on Voice Assistance for Blind | |
JP2023117068A (ja) | 音声認識装置、音声認識方法、音声認識プログラム、音声認識システム | |
JP2022098561A (ja) | プログラム、方法、情報処理装置、システム | |
Raheja et al. | Lip-Contour based Speaker Activity Detection in smart environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19940046 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20227006755 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19940046 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19940046 Country of ref document: EP Kind code of ref document: A1 |