CN112700576B

CN112700576B - Multi-modal recognition algorithm based on images and characters

Info

Publication number: CN112700576B
Application number: CN202011587116.1A
Authority: CN
Inventors: 王勇
Original assignee: Chengdu Qiyuan Xipu Technology Co ltd
Current assignee: Chengdu Qiyuan Xipu Technology Co ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-08-03
Anticipated expiration: 2040-12-29
Also published as: CN112700576A

Abstract

The invention provides a multi-modal recognition algorithm based on images and characters. The algorithm is based on the matching and fusion technology of two-dimensional and three-dimensional information, the character image information obtained by the on-site signature is combined with the three-dimensional image information obtained by face recognition, the dependence on biological characteristics is reduced, and the interference of counterfeiters on the face recognition by using the three-dimensional printing technology is avoided.

Description

Multi-modal recognition algorithm based on images and characters

Technical Field

The invention relates to a multi-modal recognition algorithm based on images and characters.

Background

The access control system can control the access of personnel and can record the access information of the personnel. By utilizing the installed measuring and controlling device, the opening and closing of people can be effectively managed, and the free access of authorized personnel and the personal and property safety are ensured. With the development of economy and society in China, the access control safety management system has been deeply developed in the aspect of life, and provides important guarantee for personal safety, property safety and information safety of people. The entrance guard safety management system is a modern safety management system, relates to a plurality of new technologies such as electronics, machinery, optics, computer technology, communication technology, biotechnology and the like, is an effective measure for solving the safety precaution management of the entrances and exits of important departments, and is suitable for various occasions such as banks, hotels, parking lot management, machine rooms, military machine storehouses, key rooms, offices, intelligent districts, factories and the like. The traditional access security management system, despite combining multiple detection means, may still fail to match various biometrics due to modern high-tech counterfeiting technology, and may be entered by a counterfeiter or a counterfeit person, thereby causing a potential risk of access security.

Disclosure of Invention

In order to solve the problem of entrance guard safety caused by the fact that high-tech technology is used for imitating biological characteristics possibly existing in entrance guard safety in the prior art, for example, a 3D printing mask mode and the like are used, the invention provides a multi-mode recognition algorithm based on images and characters, which comprises the following steps:

(1) acquiring name character image information and face image information at the same moment, wherein the name character image information and the face image information are associated by the acquisition direction of a sensor for acquiring each information to establish pairing connection, the character image information is two-dimensional image information obtained by the on-site signature of a person to be detected, and the face image information is three-dimensional image information;

(2) inputting name character image information, and preprocessing the character image information, wherein the name character image information comprises a plurality of groups of static image information of a first mode and a second mode, the first mode and the second mode are provided with the same interesting region and represent shapes, the static image information of the second mode corresponds to the first mode and represents colors, and the first mode and the second mode are respectively positioned on different layers of a data structure of the static image information;

(3) inputting face image information matched with name character image information, wherein the face image information comprises a plurality of groups of dynamic image information of at least a shape mode with the same region of interest and a color mode, a speed mode and a distance mode corresponding to the shape mode, and the shape mode, the color mode, the speed mode and the distance mode are respectively positioned on different layers of a data structure of the dynamic image information;

(4) judging whether the image information of each layer in different modes in each group of static image information is matched with each other; if the image information of each layer in different modes in each group of static image information is matched with each other, dividing the name character image information of each corresponding layer into a plurality of two-dimensional image blocks respectively; if the static image information of each layer in different modes in each group of static image information is not completely matched, performing three-dimensional reconstruction and registration on the static image information of the first mode in each group of data, and then segmenting to obtain a first set containing m layers of static image information of the first mode, wherein m is a natural number greater than 5; cleaning the information of the first set by using a morphological hole filling method, performing information fusion on each layer of sliced static image information in the static image information of a first mode and the corresponding static image information of a second mode in the same group by using a frequency domain information fusion method of discrete cosine transform, performing three-dimensional reconstruction and registration to obtain three-dimensional fusion information, wherein the first dimension is the information obtained by fusing the static image information corresponding to the first mode and the second mode, the second dimension is the static image information of the second mode representing color, the third dimension represents distance and is set to be 0, the reconstructed three-dimensional fusion information is subjected to information fusion with the dynamic image information, and the information obtained by fusion is marked as a quasi-identification image sub-block with a direction according to the acquisition direction;

(5) training a neural network model by utilizing pre-collected name, character and image information;

(6) setting the third dimension to be 0 by using the prepared image sub-blocks to be recognized in all directions so as to carry out two-dimension to obtain two-dimensional image sub-blocks, inputting the two-dimensional image sub-blocks into a neural network model, and carrying out similarity comparison on the obtained recognition result and one of two-dimensional face image information: if the similarity of the comparison result is smaller than the preset threshold value, the similarity comparison with other two-dimensional facial image information is continued, otherwise, the model stops the iteration operation of the similarity comparison, and the model is saved.

Further, the preprocessing includes threshold processing to eliminate the influence of noise possibly existing in the text image information, and/or interpolation processing on the face image information to unify the resolution of different planes of the face image information.

Further, the direction includes three angles of 75 °, +90 °, 105 °.

Further, each set of the first-mode static image information and the second-mode static image information of the name character image information come from the same person to be detected.

Furthermore, each set of the first-mode static image information and the second-mode static image information of the name character image information come from different persons to be detected and serve as confusion data during the training of the model.

Further, image information of the same modality is acquired by the same equipment.

Further, the device is a three-dimensional camera.

The invention has the beneficial effects that: the character image information obtained by using the in-situ signature is combined with the three-dimensional image information obtained by face recognition, so that the dependence on biological characteristics is reduced, and the interference of counterfeiters on the face recognition by using a three-dimensional printing technology is avoided.

Drawings

Fig. 1 shows a flow diagram of the present algorithm.

Detailed Description

An image and text based multi-modal recognition algorithm, comprising the steps of:

Preferably, the preprocessing includes thresholding to remove the effects of noise that may be present in the text image information and/or interpolation of the face image information to unify the resolution of different planes of the face image information.

Preferably, the direction comprises three angles 75 °, +90 °, 105 °.

Preferably, each set of the first-modality static image information and the second-modality static image information of the name text image information is from the same person to be detected.

Preferably, each set of the first-modality static image information and the second-modality static image information of the name text image information comes from different persons to be detected and is used as confusion data when the model is trained.

Preferably, image information of the same modality is acquired using the same apparatus.

Preferably, the device is a three-dimensional camera.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A multi-modal recognition algorithm based on images and characters is characterized by comprising the following steps:

(1) acquiring name character image information and face image information at the same moment, wherein the name character image information and the face image information are associated and paired through the acquisition direction of a sensor for acquiring each piece of information, the character image information is two-dimensional image information, and the face image information is three-dimensional image information;

2. The multi-modal image and text based recognition algorithm of claim 1, wherein: the preprocessing includes threshold processing to eliminate the effect of noise that may be present in the text image information and/or interpolation processing of the face image information to unify the resolution of different planes of the face image information.

3. The multi-modal image and text-based recognition algorithm of claim 1, wherein the direction comprises three angles of 75 °, +90 °, and 105 °.

4. The multi-modal image and text based recognition algorithm of claim 1, wherein: and each group of the first-mode static image information and the second-mode static image information of the name character image information come from the same person to be detected.

5. The multi-modal image and text based recognition algorithm of claim 1, wherein: and each group of first-mode static image information and second-mode static image information of the name character image information come from different persons to be detected and serve as confusion data during the training of the model.

6. The multi-modal image and text based recognition algorithm of claim 1, wherein: and image information of the same modality is acquired by the same equipment.

7. The multi-modal image and text based recognition algorithm of claim 6, wherein: the device is a three-dimensional camera.