CN112700576B - Multi-modal recognition algorithm based on images and characters - Google Patents

Multi-modal recognition algorithm based on images and characters Download PDF

Info

Publication number
CN112700576B
CN112700576B CN202011587116.1A CN202011587116A CN112700576B CN 112700576 B CN112700576 B CN 112700576B CN 202011587116 A CN202011587116 A CN 202011587116A CN 112700576 B CN112700576 B CN 112700576B
Authority
CN
China
Prior art keywords
image information
mode
information
dimensional
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011587116.1A
Other languages
Chinese (zh)
Other versions
CN112700576A (en
Inventor
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Qiyuan Xipu Technology Co ltd
Original Assignee
Chengdu Qiyuan Xipu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Qiyuan Xipu Technology Co ltd filed Critical Chengdu Qiyuan Xipu Technology Co ltd
Priority to CN202011587116.1A priority Critical patent/CN112700576B/en
Publication of CN112700576A publication Critical patent/CN112700576A/en
Application granted granted Critical
Publication of CN112700576B publication Critical patent/CN112700576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a multi-modal recognition algorithm based on images and characters. The algorithm is based on the matching and fusion technology of two-dimensional and three-dimensional information, the character image information obtained by the on-site signature is combined with the three-dimensional image information obtained by face recognition, the dependence on biological characteristics is reduced, and the interference of counterfeiters on the face recognition by using the three-dimensional printing technology is avoided.

Description

Multi-modal recognition algorithm based on images and characters
Technical Field
The invention relates to a multi-modal recognition algorithm based on images and characters.
Background
The access control system can control the access of personnel and can record the access information of the personnel. By utilizing the installed measuring and controlling device, the opening and closing of people can be effectively managed, and the free access of authorized personnel and the personal and property safety are ensured. With the development of economy and society in China, the access control safety management system has been deeply developed in the aspect of life, and provides important guarantee for personal safety, property safety and information safety of people. The entrance guard safety management system is a modern safety management system, relates to a plurality of new technologies such as electronics, machinery, optics, computer technology, communication technology, biotechnology and the like, is an effective measure for solving the safety precaution management of the entrances and exits of important departments, and is suitable for various occasions such as banks, hotels, parking lot management, machine rooms, military machine storehouses, key rooms, offices, intelligent districts, factories and the like. The traditional access security management system, despite combining multiple detection means, may still fail to match various biometrics due to modern high-tech counterfeiting technology, and may be entered by a counterfeiter or a counterfeit person, thereby causing a potential risk of access security.
Disclosure of Invention
In order to solve the problem of entrance guard safety caused by the fact that high-tech technology is used for imitating biological characteristics possibly existing in entrance guard safety in the prior art, for example, a 3D printing mask mode and the like are used, the invention provides a multi-mode recognition algorithm based on images and characters, which comprises the following steps:
(1) acquiring name character image information and face image information at the same moment, wherein the name character image information and the face image information are associated by the acquisition direction of a sensor for acquiring each information to establish pairing connection, the character image information is two-dimensional image information obtained by the on-site signature of a person to be detected, and the face image information is three-dimensional image information;
(2) inputting name character image information, and preprocessing the character image information, wherein the name character image information comprises a plurality of groups of static image information of a first mode and a second mode, the first mode and the second mode are provided with the same interesting region and represent shapes, the static image information of the second mode corresponds to the first mode and represents colors, and the first mode and the second mode are respectively positioned on different layers of a data structure of the static image information;
(3) inputting face image information matched with name character image information, wherein the face image information comprises a plurality of groups of dynamic image information of at least a shape mode with the same region of interest and a color mode, a speed mode and a distance mode corresponding to the shape mode, and the shape mode, the color mode, the speed mode and the distance mode are respectively positioned on different layers of a data structure of the dynamic image information;
(4) judging whether the image information of each layer in different modes in each group of static image information is matched with each other; if the image information of each layer in different modes in each group of static image information is matched with each other, dividing the name character image information of each corresponding layer into a plurality of two-dimensional image blocks respectively; if the static image information of each layer in different modes in each group of static image information is not completely matched, performing three-dimensional reconstruction and registration on the static image information of the first mode in each group of data, and then segmenting to obtain a first set containing m layers of static image information of the first mode, wherein m is a natural number greater than 5; cleaning the information of the first set by using a morphological hole filling method, performing information fusion on each layer of sliced static image information in the static image information of a first mode and the corresponding static image information of a second mode in the same group by using a frequency domain information fusion method of discrete cosine transform, performing three-dimensional reconstruction and registration to obtain three-dimensional fusion information, wherein the first dimension is the information obtained by fusing the static image information corresponding to the first mode and the second mode, the second dimension is the static image information of the second mode representing color, the third dimension represents distance and is set to be 0, the reconstructed three-dimensional fusion information is subjected to information fusion with the dynamic image information, and the information obtained by fusion is marked as a quasi-identification image sub-block with a direction according to the acquisition direction;
(5) training a neural network model by utilizing pre-collected name, character and image information;
(6) setting the third dimension to be 0 by using the prepared image sub-blocks to be recognized in all directions so as to carry out two-dimension to obtain two-dimensional image sub-blocks, inputting the two-dimensional image sub-blocks into a neural network model, and carrying out similarity comparison on the obtained recognition result and one of two-dimensional face image information: if the similarity of the comparison result is smaller than the preset threshold value, the similarity comparison with other two-dimensional facial image information is continued, otherwise, the model stops the iteration operation of the similarity comparison, and the model is saved.
Further, the preprocessing includes threshold processing to eliminate the influence of noise possibly existing in the text image information, and/or interpolation processing on the face image information to unify the resolution of different planes of the face image information.
Further, the direction includes three angles of 75 °, +90 °, 105 °.
Further, each set of the first-mode static image information and the second-mode static image information of the name character image information come from the same person to be detected.
Furthermore, each set of the first-mode static image information and the second-mode static image information of the name character image information come from different persons to be detected and serve as confusion data during the training of the model.
Further, image information of the same modality is acquired by the same equipment.
Further, the device is a three-dimensional camera.
The invention has the beneficial effects that: the character image information obtained by using the in-situ signature is combined with the three-dimensional image information obtained by face recognition, so that the dependence on biological characteristics is reduced, and the interference of counterfeiters on the face recognition by using a three-dimensional printing technology is avoided.
Drawings
Fig. 1 shows a flow diagram of the present algorithm.
Detailed Description
An image and text based multi-modal recognition algorithm, comprising the steps of:
(1) acquiring name character image information and face image information at the same moment, wherein the name character image information and the face image information are associated by the acquisition direction of a sensor for acquiring each information to establish pairing connection, the character image information is two-dimensional image information obtained by the on-site signature of a person to be detected, and the face image information is three-dimensional image information;
(2) inputting name character image information, and preprocessing the character image information, wherein the name character image information comprises a plurality of groups of static image information of a first mode and a second mode, the first mode and the second mode are provided with the same interesting region and represent shapes, the static image information of the second mode corresponds to the first mode and represents colors, and the first mode and the second mode are respectively positioned on different layers of a data structure of the static image information;
(3) inputting face image information matched with name character image information, wherein the face image information comprises a plurality of groups of dynamic image information of at least a shape mode with the same region of interest and a color mode, a speed mode and a distance mode corresponding to the shape mode, and the shape mode, the color mode, the speed mode and the distance mode are respectively positioned on different layers of a data structure of the dynamic image information;
(4) judging whether the image information of each layer in different modes in each group of static image information is matched with each other; if the image information of each layer in different modes in each group of static image information is matched with each other, dividing the name character image information of each corresponding layer into a plurality of two-dimensional image blocks respectively; if the static image information of each layer in different modes in each group of static image information is not completely matched, performing three-dimensional reconstruction and registration on the static image information of the first mode in each group of data, and then segmenting to obtain a first set containing m layers of static image information of the first mode, wherein m is a natural number greater than 5; cleaning the information of the first set by using a morphological hole filling method, performing information fusion on each layer of sliced static image information in the static image information of a first mode and the corresponding static image information of a second mode in the same group by using a frequency domain information fusion method of discrete cosine transform, performing three-dimensional reconstruction and registration to obtain three-dimensional fusion information, wherein the first dimension is the information obtained by fusing the static image information corresponding to the first mode and the second mode, the second dimension is the static image information of the second mode representing color, the third dimension represents distance and is set to be 0, the reconstructed three-dimensional fusion information is subjected to information fusion with the dynamic image information, and the information obtained by fusion is marked as a quasi-identification image sub-block with a direction according to the acquisition direction;
(5) training a neural network model by utilizing pre-collected name, character and image information;
(6) setting the third dimension to be 0 by using the prepared image sub-blocks to be recognized in all directions so as to carry out two-dimension to obtain two-dimensional image sub-blocks, inputting the two-dimensional image sub-blocks into a neural network model, and carrying out similarity comparison on the obtained recognition result and one of two-dimensional face image information: if the similarity of the comparison result is smaller than the preset threshold value, the similarity comparison with other two-dimensional facial image information is continued, otherwise, the model stops the iteration operation of the similarity comparison, and the model is saved.
Preferably, the preprocessing includes thresholding to remove the effects of noise that may be present in the text image information and/or interpolation of the face image information to unify the resolution of different planes of the face image information.
Preferably, the direction comprises three angles 75 °, +90 °, 105 °.
Preferably, each set of the first-modality static image information and the second-modality static image information of the name text image information is from the same person to be detected.
Preferably, each set of the first-modality static image information and the second-modality static image information of the name text image information comes from different persons to be detected and is used as confusion data when the model is trained.
Preferably, image information of the same modality is acquired using the same apparatus.
Preferably, the device is a three-dimensional camera.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (7)

1. A multi-modal recognition algorithm based on images and characters is characterized by comprising the following steps:
(1) acquiring name character image information and face image information at the same moment, wherein the name character image information and the face image information are associated and paired through the acquisition direction of a sensor for acquiring each piece of information, the character image information is two-dimensional image information, and the face image information is three-dimensional image information;
(2) inputting name character image information, and preprocessing the character image information, wherein the name character image information comprises a plurality of groups of static image information of a first mode and a second mode, the first mode and the second mode are provided with the same interesting region and represent shapes, the static image information of the second mode corresponds to the first mode and represents colors, and the first mode and the second mode are respectively positioned on different layers of a data structure of the static image information;
(3) inputting face image information matched with name character image information, wherein the face image information comprises a plurality of groups of dynamic image information of at least a shape mode with the same region of interest and a color mode, a speed mode and a distance mode corresponding to the shape mode, and the shape mode, the color mode, the speed mode and the distance mode are respectively positioned on different layers of a data structure of the dynamic image information;
(4) judging whether the image information of each layer in different modes in each group of static image information is matched with each other; if the image information of each layer in different modes in each group of static image information is matched with each other, dividing the name character image information of each corresponding layer into a plurality of two-dimensional image blocks respectively; if the static image information of each layer in different modes in each group of static image information is not completely matched, performing three-dimensional reconstruction and registration on the static image information of the first mode in each group of data, and then segmenting to obtain a first set containing m layers of static image information of the first mode, wherein m is a natural number greater than 5; cleaning the information of the first set by using a morphological hole filling method, performing information fusion on each layer of sliced static image information in the static image information of a first mode and the corresponding static image information of a second mode in the same group by using a frequency domain information fusion method of discrete cosine transform, performing three-dimensional reconstruction and registration to obtain three-dimensional fusion information, wherein the first dimension is the information obtained by fusing the static image information corresponding to the first mode and the second mode, the second dimension is the static image information of the second mode representing color, the third dimension represents distance and is set to be 0, the reconstructed three-dimensional fusion information is subjected to information fusion with the dynamic image information, and the information obtained by fusion is marked as a quasi-identification image sub-block with a direction according to the acquisition direction;
(5) training a neural network model by utilizing pre-collected name, character and image information;
(6) setting the third dimension to be 0 by using the prepared image sub-blocks to be recognized in all directions so as to carry out two-dimension to obtain two-dimensional image sub-blocks, inputting the two-dimensional image sub-blocks into a neural network model, and carrying out similarity comparison on the obtained recognition result and one of two-dimensional face image information: if the similarity of the comparison result is smaller than the preset threshold value, the similarity comparison with other two-dimensional facial image information is continued, otherwise, the model stops the iteration operation of the similarity comparison, and the model is saved.
2. The multi-modal image and text based recognition algorithm of claim 1, wherein: the preprocessing includes threshold processing to eliminate the effect of noise that may be present in the text image information and/or interpolation processing of the face image information to unify the resolution of different planes of the face image information.
3. The multi-modal image and text-based recognition algorithm of claim 1, wherein the direction comprises three angles of 75 °, +90 °, and 105 °.
4. The multi-modal image and text based recognition algorithm of claim 1, wherein: and each group of the first-mode static image information and the second-mode static image information of the name character image information come from the same person to be detected.
5. The multi-modal image and text based recognition algorithm of claim 1, wherein: and each group of first-mode static image information and second-mode static image information of the name character image information come from different persons to be detected and serve as confusion data during the training of the model.
6. The multi-modal image and text based recognition algorithm of claim 1, wherein: and image information of the same modality is acquired by the same equipment.
7. The multi-modal image and text based recognition algorithm of claim 6, wherein: the device is a three-dimensional camera.
CN202011587116.1A 2020-12-29 2020-12-29 Multi-modal recognition algorithm based on images and characters Active CN112700576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011587116.1A CN112700576B (en) 2020-12-29 2020-12-29 Multi-modal recognition algorithm based on images and characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011587116.1A CN112700576B (en) 2020-12-29 2020-12-29 Multi-modal recognition algorithm based on images and characters

Publications (2)

Publication Number Publication Date
CN112700576A CN112700576A (en) 2021-04-23
CN112700576B true CN112700576B (en) 2021-08-03

Family

ID=75511427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011587116.1A Active CN112700576B (en) 2020-12-29 2020-12-29 Multi-modal recognition algorithm based on images and characters

Country Status (1)

Country Link
CN (1) CN112700576B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3271045D1 (en) * 1981-11-20 1986-06-12 Siemens Ag Method of identifying a person by speech and face recognition, and device for carrying out the method
CN103903319A (en) * 2014-02-10 2014-07-02 袁磊 Electronic lock system based on internet dynamic authorization
CN104573634A (en) * 2014-12-16 2015-04-29 苏州福丰科技有限公司 Three-dimensional face recognition method
CN204331744U (en) * 2014-11-27 2015-05-13 天津和财世纪信息技术有限公司 3 D stereo intelligent face recognition system
CN109785483A (en) * 2018-12-28 2019-05-21 杭州文创企业管理有限公司 A kind of wisdom garden access control system
CN111597928A (en) * 2020-04-29 2020-08-28 深圳市商汤智能传感科技有限公司 Three-dimensional model processing method and device, electronic device and storage medium
CN111625100A (en) * 2020-06-03 2020-09-04 浙江商汤科技开发有限公司 Method and device for presenting picture content, computer equipment and storage medium
CN111862413A (en) * 2020-07-28 2020-10-30 公安部第三研究所 Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107724900B (en) * 2017-09-28 2018-10-23 楷模居品(江苏)有限公司 A kind of family security door based on personal recognition
CN108596171A (en) * 2018-03-29 2018-09-28 青岛海尔智能技术研发有限公司 Enabling control method and system
CN111311786A (en) * 2018-11-23 2020-06-19 杭州眼云智家科技有限公司 Intelligent door lock system and intelligent door lock control method thereof
CN111401160A (en) * 2020-03-03 2020-07-10 北京三快在线科技有限公司 Hotel authentication management method, system and platform and hotel PMS system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3271045D1 (en) * 1981-11-20 1986-06-12 Siemens Ag Method of identifying a person by speech and face recognition, and device for carrying out the method
CN103903319A (en) * 2014-02-10 2014-07-02 袁磊 Electronic lock system based on internet dynamic authorization
CN204331744U (en) * 2014-11-27 2015-05-13 天津和财世纪信息技术有限公司 3 D stereo intelligent face recognition system
CN104573634A (en) * 2014-12-16 2015-04-29 苏州福丰科技有限公司 Three-dimensional face recognition method
CN109785483A (en) * 2018-12-28 2019-05-21 杭州文创企业管理有限公司 A kind of wisdom garden access control system
CN111597928A (en) * 2020-04-29 2020-08-28 深圳市商汤智能传感科技有限公司 Three-dimensional model processing method and device, electronic device and storage medium
CN111625100A (en) * 2020-06-03 2020-09-04 浙江商汤科技开发有限公司 Method and device for presenting picture content, computer equipment and storage medium
CN111862413A (en) * 2020-07-28 2020-10-30 公安部第三研究所 Method and system for realizing epidemic situation resistant non-contact multidimensional identity rapid identification

Also Published As

Publication number Publication date
CN112700576A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN103984915B (en) Pedestrian's recognition methods again in a kind of monitor video
CN101763671B (en) System for monitoring persons by using cameras
CN108875341A (en) A kind of face unlocking method, device, system and computer storage medium
CN106919921B (en) Gait recognition method and system combining subspace learning and tensor neural network
CN104766063A (en) Living body human face identifying method
CN102629320A (en) Ordinal measurement statistical description face recognition method based on feature level
CN104700094A (en) Face recognition method and system for intelligent robot
CN111507320A (en) Detection method, device, equipment and storage medium for kitchen violation behaviors
Sabourin et al. Shape matrices as a mixed shape factor for off-line signature verification
Barni et al. Iris deidentification with high visual realism for privacy protection on websites and social networks
CN114758440B (en) Access control system based on image and text mixed recognition
CN114758439B (en) Multi-mode access control system based on artificial intelligence
RU2316051C2 (en) Method and system for automatically checking presence of a living human face in biometric safety systems
CN112700576B (en) Multi-modal recognition algorithm based on images and characters
CN106845500A (en) A kind of human face light invariant feature extraction method based on Sobel operators
Daramola et al. Algorithm for fingerprint verification system
Ali et al. Image forgery localization using image patches and deep learning
Kalangi et al. Deployment of Haar Cascade algorithm to detect real-time faces
Chua et al. Fingerprint Singular Point Detection via Quantization and Fingerprint Classification.
CN203415026U (en) Palmar venous access control system
JP2008129679A (en) Fingerprint discrimination model construction method, fingerprint discrimination method, identification method, fingerprint discrimination device and identification device
CN109035171A (en) A kind of reticulate pattern facial image restorative procedure
Kundu et al. An efficient chain code based face identification system for biometrics
Chen et al. Broad learning with uniform local binary pattern for fingerprint liveness detection
Jyothsna et al. Facemask detection using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant