CN111209859B

CN111209859B - Method for dynamically adapting display to visual angle based on face recognition

Info

Publication number: CN111209859B
Application number: CN202010010053.7A
Authority: CN
Inventors: 王卫; 杨天浩
Original assignee: Nanjing Jusha Display Technology Co Ltd; Nanjing Jusha Medical Technology Co Ltd
Current assignee: Nanjing Jusha Display Technology Co Ltd; Nanjing Jusha Medical Technology Co Ltd
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2023-09-19
Anticipated expiration: 2040-01-06
Also published as: CN111209859A

Abstract

The invention discloses a method for dynamically adapting to a visual angle of a display based on face recognition, which comprises the following steps: step SS1: inputting face information of the key person through a display built-in device; step SS2: identifying all face information in the view range of the camera; step SS3: comparing the entered face information with all face information in the visual field range to judge whether a target key person exists in front of the current display; step SS4: if the step SS3 is judged to be yes, dynamically adapting the display to the key person; if the step SS3 is determined to be none, the display incorporates as much face information as possible through a limited field of view. The invention solves the problem that the optimal visual angle of a doctor of a main knife needs to be ensured under the application scene of an operating room. Meanwhile, in the scene of a conference, consultation, morning meeting and other multi-person conference, the optimal view angle of key people is ensured or more people have better image watching experience.

Description

Method for dynamically adapting display to visual angle based on face recognition

Technical Field

The invention relates to a method for dynamically adapting to a visual angle of a display based on face recognition, and belongs to the technical field of face recognition application.

Background

With the increase of the complexity of modern operations, an integrated operating room has been developed. The integrated operating room integrates various medical instruments into the same operating room, so that the operation is performed efficiently, safely and conveniently, and the most important equipment for interaction with doctors is a medical display. Especially for minimally invasive surgery, the display truly restores the internal conditions of the human body through imaging, assists doctors in real-time diagnosis and performs next surgery planning. The visual angle of the display can be influenced by the stations of different doctors, the stations of the doctors are often the optimal positions for the operation, and if the visual angle of the display is irregular, the judgment of the doctors can be influenced, so that the safe operation is influenced. For large-scale endoscopic displays, there is only one operating room, so how to ensure the optimal viewing angle of the doctor of the main knife is an urgent problem to be solved. Therefore, it is a technical challenge in the art how to address the technical need to dynamically adapt a display to the perspective of a key character.

Disclosure of Invention

The invention aims to overcome the technical defects in the prior art, solve the technical problems, and provide a method for dynamically adapting to the visual angle of a display based on face recognition.

The invention adopts the following technical scheme: the method for dynamically adapting the display to the visual angle based on the face recognition is characterized by comprising the following steps of:

step SS1: inputting face information of the key person through a display built-in device;

step SS2: identifying all face information in the view range of the camera;

step SS3: comparing the entered face information with all face information in the visual field range to judge whether a target key person exists in front of the current display;

step SS4: if the step SS3 is judged to be yes, dynamically adapting the display to the key person; if the step SS3 is determined to be none, the display incorporates as much face information as possible through a limited field of view.

As a preferred embodiment, the step SS1 specifically includes: and extracting and storing the facial information of the person by adopting a Softmax loss function and a discriminant face recognition algorithm to finish the facial information input of the key person.

As a preferred embodiment, the step SS1 further includes: the Softmax loss function is expressed as:

wherein: m is the number of samples input per training; n is the number of categories; x is x _i Feature vectors for the ith sample; y is _i Marking for the corresponding category; w and b are respectively a weight matrix and a bias vector of the last full connection layer; w (W) _j A weight matrix of the j-th class; b _j Is the corresponding bias term.

As a preferred embodiment, the step SS1 further includes: to eliminate the larger intra-class variation generated by the Softmax loss function, the intra-class becomes more compact, the features are more discriminant, and the intra-class cosine similarity loss function is adopted and expressed as:

in θ _yi Is the included angle between the feature vector of the ith sample and its corresponding class weight vector.

As a preferred embodiment, the step SS1 further includes: to facilitate forward and backward propagation, equation (2) is converted into:

wherein:

equation (3) effectively describes the intra-class variation,for the actual loss layer input, let +.>Only calculation is needed in the forward propagation process:

during backward propagation, L _c3 For z _i The gradient of (2) is

As a preferred embodiment, the step SS1 further includes: in order to make the learned characteristics have discriminant, training is carried out under the common supervision of a Softmax loss function and an intra-class cosine similarity loss function, and the formed discriminant face recognition algorithm expression is as follows:

and lambda is a scalar quantity and is used for balancing two loss functions, and key character face information is recorded according to the Softmax loss function and the discriminant face recognition algorithm.

As a preferred embodiment, the step SS3 specifically includes: comparing all facial information in the visual field range of the entered database with the entered facial information of the key figures one by one to judge whether a target key figure exists in front of the current display; if the target key person exists, an instruction is sent to the display rotating device at the moment, and the key person is positioned on the central axis of the screen through the left-right rotation of the screen.

As a preferred embodiment, the step SS3 specifically further includes: the display camera takes the central axis of the vertical display as a reference, and when an included angle of an angle a exists between the key figure and the central axis, the display is dynamically rotated so that the key figure returns to the central axis.

As a preferred embodiment, the step SS3 specifically further includes: under the scene that the visual angle of the display camera is 30 degrees and the visual distance is 3m, when people with irrelevant keys in the visual field of the display, starting an ant colony algorithm module at the moment, and iterating according to the number of people contained in the visual field as a result.

As a preferred embodiment, the ant colony algorithm specifically includes: after each ant walks one step or traverses all n nodes are completed, namely after one cycle is finished, the residual information is updated;

the number of persons held in the field of view, which is the pheromone on the path (i, j) at time t+n, can be expressed as:

τ _ij (t+n)＝(1-ρ)×τ _ij (t)+Δτ _ij (t) (6)

iterating through ant colony algorithms in the formulas (6) and (7), determining the angle finally determined by the display when the iterated result is converged, and finally taking the optimal iterated result to be the maximum number of people accommodated in the visual field; wherein ρ is an information volatilization factor, and 1- ρ represents a residual factor; m represents m ants in total; τ _ij (t+n) represents the amount of pheromone on the path (i, j) at time t+n, τ _ij (t) represents the amount of pheromone on the path (i, j) at time t, Δτ _ij (t) represents the difference in the amount of pheromone between time t+n and time t, and this difference is equal toRepresenting the sum of the pheromone differences caused on the path (i, j) from the first ant to the mth ant at time t.

The invention has the beneficial effects that: firstly, the invention solves the problem that the optimal visual angle of the doctor of the main knife needs to be ensured in the application scene of the operating room, and can ensure that various operation image information of a patient can be fed back to the doctor of the main knife through the display rapidly and without errors by inputting the face information of the doctor of the main knife, thereby improving the safety and the efficiency of the operation; secondly, the invention solves the problem that in the scene of a plurality of conferences such as conferences, consultations, morning meetings and the like, the optimal view angle of key people is ensured or more people have better image watching experience.

Drawings

Fig. 1 is a schematic flow chart of a preferred embodiment of the present invention.

FIG. 2 is a schematic diagram of the initial state of the process of dynamically adapting a display to a key character according to the present invention.

FIG. 3 is a schematic diagram of the end state of the process of dynamically adapting a display to a key character in accordance with the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

As shown in fig. 1, the invention provides a method for dynamically adapting to a viewing angle of a display based on face recognition, which comprises the following steps:

step SS2: identifying all face information in the view range of the camera;

In step SS1, the display needs to have a camera with functions of living body detection, face acquisition, face comparison, face library management, etc. built in to input and process face information. Meanwhile, as the display needs to rotate in the process of dynamically adapting to the visual angle, the connection part of the display screen and the bracket needs to support left-right rotation to achieve the purpose of adapting to the visual angle of the watching crowd, corresponding software is developed, and the key character face information is input and stored in the database by clicking a software 'start input' button through the face recognition technology.

The invention aims to extract and store facial information of people by adopting a Softmax loss function and a discriminant face recognition algorithm, thereby finishing the input of the facial information of key people.

The Softmax penalty function is mainly used for multi-classification problems. From the perspective of probability theory, the Softmax penalty function aims to convert the true weight vector into a probability distribution, and is the cross entropy of the Softmax function. The Softmax loss function is expressed as:

wherein: m is the number of samples input per training; n is the number of categories; xi is the eigenvector of the ith sample; yi is the corresponding class label; w and b are respectively a weight matrix and a bias vector of the last full connection layer; wj is the weight matrix of the j-th class; bj is the corresponding bias term.

In order to eliminate the larger intra-class variation generated by the Softmax loss function, the intra-class becomes more compact, and the characteristics are more discriminative, the patent adopts an intra-class cosine similarity loss function expressed as:

in theta _yi Is the included angle between the feature vector of the ith sample and its corresponding class weight vector.

To facilitate forward and backward propagation, equation (2) is converted into:

wherein:

equation (3) can effectively describe intra-class variations,for the actual loss layer input, let +.>Only during forward propagationThe calculation is needed:

during backward propagation, L _c3 For z _i The gradient of (2) is

In order to make the learned features have discriminant, training is carried out under the common supervision of Softmax loss and intra-class cosine similarity loss, and the formed discriminant face recognition algorithm expression is as follows:

where λ is a scalar used to balance the two loss functions.

The invention carries out the input of the face information of the key person according to the discriminant face recognition algorithm.

In step SS2, most of the existing display cameras have a viewing angle of 30 ° at maximum, click a software "face information in recognition range" button within a certain distance (3 m in the present invention) before taking the display, and the display will recognize all face information in the viewing range and enter the database. At this time, the software executes judgment, if the display has entered the face information of the key person, then step SS3 is performed; if no key character information is entered, step SS4 is performed.

In step SS3, if the key character face information is already entered by the display, all the face information in the visual field range already entered by the database is compared with the entered key character face information one by one to determine whether the target key character is in front of the current display. If the target key person exists, the software sends an instruction to the display rotating device, and the key person is positioned on the central axis of the screen through the left-right rotation of the screen, as shown in fig. 2 and 3.

When the display camera is based on the central axis of the vertical display and an included angle of an angle a exists between the key character and the central axis (as shown in fig. 2), the display is dynamically rotated to enable the key character to return to the central axis (as shown in fig. 3).

If not, go to step SS4.

In step SS4, the view angle of the display camera in step SS2 is still 30 °, and the viewing distance is 3m, and when there is no key character in the view field of the display, the software starts the ant colony algorithm module, and iterates the number of people in the view field as a result.

The ant colony Algorithm (AG) is a simulated optimization algorithm for simulating the foraging behavior of ants. The basic principle of the ant colony algorithm is as follows:

1. ants release pheromones on the path.

2. When the crossing which has not passed is hit, a road is randomly selected. At the same time, the pheromone related to the path length is released.

3. Pheromone concentration is inversely proportional to path length. When the following ants hit the intersection again, a path with higher pheromone concentration is selected.

4. The pheromone concentration on the optimal path is increasing.

5. And finally, the ant colony finds the optimal feeding path.

To avoid flooding heuristic information with too many residual pheromones, the residual information is updated after each ant walks one step or completes the traversal of all n nodes (i.e. one cycle is finished).

the pheromone (i.e., the number of people in the field of view that is most) update formula on path (i, j) at time t+n can be expressed as:

τ _ij (t+n)＝(1-ρ)×τ _ij (t)+Δτ _ij (t) (6)

the patent adopts the algorithm to iterate, when the iteration result is converged, the finally determined angle of the display is determined, and the optimal result of iteration is finally taken to be the maximum number of people accommodated in the field of view. In actual operation, the iteration times can be customized according to actual conditions. Wherein ρ is an information volatilization factor, and 1- ρ represents a residual factor; m represents m ants in total; τ _ij (t+n) represents the amount of pheromone on the path (i, j) at time t+n, τ _ij (t) represents the amount of pheromone on the path (i, j) at time t, Δτ _ij (t) represents the difference in the amount of pheromone between time t+n and time t, and this difference is equal toRepresenting the sum of the pheromone differences caused on the path (i, j) from the first ant to the mth ant at time t.

The software sets a fixed period (such as 2 min) to rescan the face information in the current visual field range, or can immediately scan by clicking a button for 'face information in the identification range', so as to dynamically adapt to the current crowd visual angle for a new round.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. The method for dynamically adapting the display to the visual angle based on the face recognition is characterized by comprising the following steps of:

step SS2: identifying all face information in the view range of the camera;

step SS4: if the step SS3 is judged to be yes, dynamically adapting the display to the key person; if the step SS3 is determined to be none, the display incorporates facial information as much as possible through a limited field of view;

the step SS1 specifically includes: extracting and storing facial information of a person by adopting a Softmax loss function and a discriminant face recognition algorithm to finish the facial information input of a key person;

the step SS1 further includes: the Softmax loss function is expressed as:

wherein: m is the number of samples input per training; n is the number of categories; x is x _i Feature vectors for the ith sample; y is _i Marking for the corresponding category; w and b are respectively a weight matrix and a bias vector of the last full connection layer; w (W) _j A weight matrix of the j-th class; b _j Is a corresponding bias term;

the step SS1 further includes: to eliminate the larger intra-class variation generated by the Softmax loss function, the intra-class becomes more compact, the features are more discriminant, and the intra-class cosine similarity loss function is adopted and expressed as:

in the method, in the process of the invention,the included angle between the characteristic vector of the ith sample and the corresponding category weight vector is set;

the step SS1 further includes: to facilitate forward and backward propagation, equation (2) is converted into:

wherein:

during backward propagation, L _c3 For z _i The gradient of (2) is

The step SS1 further includes: in order to make the learned characteristics have discriminant, training is carried out under the common supervision of a Softmax loss function and an intra-class cosine similarity loss function, and the formed discriminant face recognition algorithm expression is as follows:

2. The method for dynamically adapting a viewing angle of a display based on face recognition according to claim 1, wherein the step SS3 specifically comprises: comparing all facial information in the visual field range of the entered database with the entered facial information of the key figures one by one to judge whether a target key figure exists in front of the current display; if the target key person exists, an instruction is sent to the display rotating device at the moment, and the key person is positioned on the central axis of the screen through the left-right rotation of the screen.

3. The method for dynamically adapting a viewing angle of a display based on face recognition according to claim 2, wherein said step SS3 specifically further comprises: the display camera takes the central axis of the vertical display as a reference, and when an included angle of an angle a exists between the key figure and the central axis, the display is dynamically rotated so that the key figure returns to the central axis.

4. A method for dynamically adapting a viewing angle of a display based on face recognition according to claim 3, wherein said step SS3 specifically further comprises: under the scene that the visual angle of the display camera is 30 degrees and the visual distance is 3m, when people with irrelevant keys in the visual field of the display, starting an ant colony algorithm module at the moment, and iterating according to the number of people contained in the visual field as a result.

5. The method for dynamically adapting a viewing angle of a display based on face recognition according to claim 4, wherein the ant colony algorithm specifically comprises: after each ant walks one step or traverses all n nodes are completed, namely after one cycle is finished, the residual information is updated;

τ _ij (t+n)＝(1-ρ)×τ _ij (t)+Δτ _ij (t) (6)

iterating through ant colony algorithms in the formulas (6) and (7), determining the angle finally determined by the display when the iterated result is converged, and finally taking the optimal iterated result to be the maximum number of people accommodated in the visual field; wherein ρ is an information volatilization factor, and 1- ρ represents a residual factor; m represents m ants in total; τ _ij (t+n) then tableThe amount of pheromone on path (i, j) at time t+n, τ _ij (t) represents the amount of pheromone on the path (i, j) at time t, Δτ _ij (t) represents the difference in the amount of pheromone between time t+n and time t, and this difference is equal toRepresenting the sum of the pheromone differences caused on the path (i, j) from the first ant to the mth ant at time t.