CN117275080A

CN117275080A - Eye state identification method and system based on computer vision

Info

Publication number: CN117275080A
Application number: CN202311562635.6A
Authority: CN
Inventors: 陈国泉; 李小洁; 周智闯; 常博
Original assignee: Shenzhen Meiaitang Technology Co ltd
Current assignee: Shenzhen Meiaitang Technology Co ltd
Priority date: 2023-11-22
Filing date: 2023-11-22
Publication date: 2023-12-22

Abstract

The invention relates to the technical field of image processing, in particular to an eye state recognition method and system based on computer vision, comprising the steps of obtaining an image to be recognized and dividing a plurality of local areas; constructing a two-dimensional distribution matrix for each local area and carrying out SVD (singular value decomposition) to obtain a feature vector; acquiring the spatial information distribution similarity between the target local area and the local area to be calculated; obtaining the frequency domain information distribution similarity between the target local area and the local area to be calculated; acquiring the comprehensive similarity of the target local area and the local area to be calculated; adjusting the gradient histogram of the target local area according to the comprehensive similarity, and acquiring an adjusted HOG feature descriptor according to the adjusted gradient histogram; and the eye state is identified according to the adjusted HOG feature descriptors, so that the accuracy of eye state identification is improved.

Description

Eye state identification method and system based on computer vision

Technical Field

The invention relates to the technical field of image processing, in particular to an eye state identification method and system based on computer vision.

Background

In the fields of cosmetology, medical treatment, facial expression recognition and the like, accurate recognition of eye states is required, wherein the recognition of eye angular lines in the eye state recognition process can influence the accuracy of eye state recognition, and the recognition becomes complex due to the fact that the shape, depth and quantity of the eye angular lines are large along with individual differences, and further an incorrect recognition result is obtained due to the fact that external factors such as illumination conditions and the like can cause the fact that the judgment of eye states in the fields of cosmetology, medical treatment, facial expression recognition and the like is influenced.

In order to accurately obtain specific details of the eye and canthus lines, the HOG feature descriptors are used for extracting feature descriptors from the collected eye images, and the feature descriptors are analyzed to obtain specific states of the eyes. In the extraction process, as the principle of the HOG feature descriptors is only analyzed according to the features of the local areas, the relations among the local areas are destroyed, so that the HOG feature descriptors do not contain the integral features of the eye angular patterns, further, the training result of the eye state recognition network is influenced in the subsequent training process of the eye state recognition network, and further, the effectiveness of eye state recognition is influenced.

Disclosure of Invention

In order to solve the problems, the invention discloses an eye state identification method and system based on computer vision.

The eye state identification method and system based on computer vision adopts the following technical scheme:

one embodiment of the present invention provides a computer vision-based eye state recognition method, which includes the steps of:

acquiring an eye image and dividing the eye image to acquire an image to be identified, and dividing a plurality of local areas on the image to be identified;

constructing a two-dimensional distribution matrix for each local area and carrying out SVD (singular value decomposition) to obtain a feature vector of the local area;

Marking any one local area as a target local area, marking a local area adjacent to the target local area as a local area to be calculated, and acquiring airspace information distribution similarity between the target local area and the local area to be calculated according to cosine similarity between the characteristic vector of the target local area and the characteristic vector of the local area to be calculated and gray value difference between the target local area and the local area to be calculated;

different scale windows are preset, and frequency domain information distribution similarity between the target local area and the local area to be calculated is obtained according to frequency spectrums of the target local area and the local area to be calculated under the different scale windows;

acquiring the comprehensive similarity of the target local area and the local area to be calculated according to the spatial domain information distribution similarity and the frequency domain information distribution similarity between the target local area and the local area to be calculated;

adjusting the gradient histogram of the target local area according to the comprehensive similarity, and acquiring an adjusted HOG feature descriptor according to the adjusted gradient histogram;

and identifying the eye state according to the adjusted HOG feature descriptors.

Further, the method for constructing a two-dimensional distribution matrix for each local area and performing SVD decomposition to obtain the feature vector of the local area comprises the following specific steps:

The local areas are square windows, gray values of all pixel points in each local area form a two-dimensional distribution matrix, and each element in the two-dimensional distribution matrix is a gray value of each pixel point in each position in the local area; and inputting the two-dimensional distribution matrix into an SVD algorithm to obtain a left singular matrix of the two-dimensional distribution matrix, and taking a column vector of the left singular matrix as a characteristic vector of each local area.

Further, the method for obtaining the spatial information distribution similarity between the target local area and the local area to be calculated according to the cosine similarity between the feature vector of the target local area and the feature vector of the local area to be calculated and the gray value difference between the target local area and the local area to be calculated comprises the following specific steps:

acquiring a first cosine similarity data set of each feature vector of the target local area; acquiring a second cosine similarity data set of each feature vector of the local area to be calculated;

and acquiring airspace information distribution similarity between the target local area and the local area to be calculated according to the first cosine similarity data set, the second cosine similarity data set and the gray value difference between the target local area and the local area to be calculated of the target local area.

Further, the first cosine similarity data set of each feature vector of the target local area is obtained; the method for acquiring the second cosine similarity data set of each feature vector of the local area to be calculated comprises the following specific steps:

marking any one of the feature vectors of the target local area as a first feature vector, respectively obtaining cosine similarity values between the first feature vector and all feature vectors of the local area to be calculated, and forming a first cosine similarity data set of the first feature vector;

and marking any one of the feature vectors of the local area to be calculated as a second feature vector, respectively obtaining cosine similarity between the second feature vector and all feature vectors of the target local area, and forming a second cosine similarity data set of the second feature vector.

Further, the method for obtaining the spatial information distribution similarity between the target local area and the local area to be calculated according to the first cosine similarity data set, the second cosine similarity data set and the gray value difference between the target local area and the local area to be calculated of the target local area comprises the following specific steps:

For the firstA target local area and a first ∈of the target local area>Spatial information distribution similarity between the partial regions to be calculated +.>For->A target local area and a first ∈of the target local area>Spatial information distribution similarity between the partial regions to be calculated +.>The calculation method of (1) is as follows:

wherein,indicate->Gray value average values of the local areas of the targets; />Indicate->The>The gray value average value of each local area to be calculated; />Indicate->The number of feature vectors for each target local region; />Indicate->The>The number of feature vectors of the local area to be calculated; />Indicate->The first target local areaA first cosine similarity dataset of the feature vectors of the individuals; />Indicate->The>The +.o. of the local area to be calculated>A second cosine similarity dataset of the feature vectors of the individuals; />Representing a maximum function; />An exponential function based on a natural constant is represented.

Further, the method for obtaining the frequency domain information distribution similarity between the target local area and the local area to be calculated according to the frequency spectrums of the target local area and the local area to be calculated under different scale windows comprises the following specific steps:

Under the same size of the scale window, acquiring a first amplitude and a first phase under different sizes of the scale window for the target local area; acquiring a first amplitude to be calculated and a first phase to be calculated under different scale window sizes for a local area to be calculated;

for the firstA target local area and a first ∈of the target local area>Similarity of frequency domain information distribution between the individual local areas to be calculated +.>The calculation method of (1) is as follows:

wherein,representing the number of dimensions; />Representation officeThe number of pixels in the partial region, wherein the number of pixels in each partial region is the same; />A sequential position number indicating a pixel point in the local area; />Indicate->Under the scale->The>A first amplitude of each pixel point; />Indicate->Under the scale->The>The +.o. of the local area to be calculated>A first magnitude to be calculated for each pixel point; / >Indicate->Maximum value of amplitude values of all pixel points in image to be identified under scale, < >>Representing a maximum function; />Indicate->Under the scale->The>A first phase of each pixel; />Indicate->Under the scale->The>The +.o. of the local area to be calculated>A first phase to be calculated of the pixel points; />An exponential function based on a natural constant is represented.

Further, under the same size of the scale window, acquiring a first amplitude and a first phase under different sizes of the scale window for the target local area; the method for obtaining the first amplitude to be calculated and the first phase to be calculated under different scale window sizes for the local area to be calculated comprises the following specific steps:

for any one scale window, for any one pixel point in any one target local area, marking the pixel point as a first pixel point of the target local area, taking the first pixel point as a central pixel point of the window, acquiring a neighborhood pixel point of the first pixel point under the scale window, and marking the neighborhood pixel point as a second pixel point under the scale window of the first pixel point;

Performing fast Fourier transform on the first pixel point and the second pixel point, obtaining a spectrum amplitude mean value and a spectrum phase mean value between the first pixel point and all the second pixel points under the scale window, and respectively marking the spectrum amplitude mean value and the spectrum phase mean value as a first amplitude value and a first phase under the scale window;

for any one pixel point in any one local area to be calculated, marking the pixel point as a first pixel point to be calculated of the local area to be calculated, taking the first pixel point to be calculated as a central pixel point of a window, acquiring a neighborhood pixel point of the first pixel point to be calculated under the scale window, and marking the neighborhood pixel point as a second pixel point to be calculated under the scale window of the first pixel point to be calculated;

and performing fast Fourier transform on the first pixel point to be calculated and the second pixel points to be calculated, obtaining a spectrum amplitude mean value and a spectrum phase mean value between the first pixel point to be calculated and all the second pixel points under the scale window, and respectively recording the spectrum amplitude mean value and the spectrum phase mean value as a first amplitude to be calculated and a first phase to be calculated under the scale window.

Further, the method for obtaining the comprehensive similarity of the target local area and the local area to be calculated according to the spatial domain information distribution similarity and the frequency domain information distribution similarity between the target local area and the local area to be calculated comprises the following specific steps:

For the firstThe target local area is +.>Comprehensive similarity between the individual partial regions to be calculated +.>The calculation method of (1) is as follows:

wherein,indicate->Accumulating and summing the airspace similarity between each target local area and all to-be-calculated local areas of the target local area; />Indicate->The frequency domain information distribution similarity between each target local area and all to-be-calculated local areas of the target local area is accumulated and summed; />Indicate->The target local area is +.>Spatial domain information distribution similarity among the local areas to be calculated; />Indicate->The target local area is +.>The frequency domain information distribution similarity between the partial areas to be calculated.

Further, the method for adjusting the gradient histogram of the target local area according to the comprehensive similarity and obtaining the adjusted HOG feature descriptor according to the adjusted gradient histogram includes the following specific steps:

according to the comprehensive similarity of the target local area and each local area to be calculated, obtaining a comprehensive similarity data set of the target local area, and carrying out each data in the comprehensive similarity data set Normalizing to obtain a normalized comprehensive similarity data set;

gradient angles and gradient amplitudes of all pixel points in a target local area are obtained; acquiring a HOG feature descriptor of a target local area, wherein the HOG feature descriptor comprises a gradient histogram, and the gradient histogram comprises a vote value of each gradient amplitude;

obtaining a local area to be calculated corresponding to the maximum value in the normalized comprehensive similarity data set of the local area, and marking the local area as a reference local area; acquiring a connecting line between the center of the target local area and the center of the reference local area, and recording the angle of the connecting line as a reference angle;

acquiring gradient angles of all pixel points in a target local area, and if the absolute value of the difference value between the gradient angles and a reference angle is smaller than a preset angle difference threshold value, marking the gradient angles as angles to be adjusted;

the maximum value in the comprehensive similarity data set is marked as the direction weight of the extension distribution informationThe voting number of the gradient amplitude corresponding to the angle to be adjusted is adjusted, and the voting number after adjustment is +.>The calculation method of (1) is as follows:

wherein,indicating the direction weight of the spread distribution information,/-, and >Voting value representing the magnitude of the gradient corresponding to the angle to be adjusted, +.>Representing a downward rounding function;

obtaining an adjusted gradient histogram according to the vote count value of the adjusted votes, and obtaining an adjusted HOG feature descriptor according to the adjusted gradient histogram.

The invention also provides an eye state recognition system based on computer vision, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes any one step of the eye state recognition method based on computer vision when executing the computer program.

The technical scheme of the invention has the beneficial effects that: according to the invention, local areas are obtained through eye image acquisition, a two-dimensional distribution matrix is constructed on the local areas, SVD decomposition is carried out to obtain feature vectors, spatial domain information distribution similarity between the local areas is obtained according to gray level difference between each local area and adjacent local areas and cosine similarity between the feature vectors, and information on a frequency domain is fully combined to obtain frequency domain information distribution similarity between local area values, so that comprehensive similarity characteristics are obtained for different local area characteristics. And adjusting gradient amplitude votes of gradient angles in the gradient histogram according to the obtained comprehensive similarity characteristics of the local region to obtain an adjusted histogram and HOG feature descriptors, and taking the adjusted HOG feature descriptors as recognition references of eye states. The defect that the traditional HOG feature descriptors lack overall distribution features is avoided, and the spatial domain information and the frequency domain information are fully utilized, so that the obtained HOG feature descriptors are more accurate, and the accuracy of eye state identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart illustrating steps of a computer vision-based eye state recognition method according to the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to the specific implementation, structure, characteristics and effects of the eye state recognition method and system based on computer vision according to the invention with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the eye state recognition method and system based on computer vision provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart illustrating a method for identifying eye states based on computer vision according to an embodiment of the invention is shown, the method includes the following steps:

s001, acquiring an eye image, and carrying out semantic segmentation pretreatment on the eye image to acquire an image to be identified.

The purpose of this embodiment is to realize eye state recognition based on computer vision by extracting HOG feature descriptors from the collected eye images and using the HOG feature descriptors as input data of an eye state recognition network, so that the collected eye images need to be collected as a data set, and background information of the collected eye images need to be removed for facilitating extraction of HOG feature descriptors later.

Specifically, in this embodiment, eye images of different users are collected by a high-resolution industrial CCD camera, where the collected eye images include other image areas except for the eye areas, including an image background and other face areas, and a semantic segmentation network is constructed to perform semantic segmentation on the collected eye images, so as to obtain an image to be identified. In this embodiment, the semantic segmentation network is deep labv3, the eye region in the dataset is marked as 0 by a professional marking mode, and other image regions are marked as 1, wherein the input data of the semantic segmentation network is the acquired eye image, the output data is the marked eye image, and the used loss function is a cross entropy function.

S002, dividing local areas of the image to be identified, and constructing a two-dimensional distribution matrix for each local area; SVD decomposition is carried out on the two-dimensional distribution matrix to obtain a feature vector; and acquiring the target local area and the local area to be calculated, and acquiring the spatial information distribution similarity of the target local area and the local area to be calculated through the gray value difference between the target local area and the local area to be calculated and the cosine similarity difference between the feature vectors.

It should be noted that, because the canthus lines have a certain detailed texture and are spread in multiple directions, and the differences among the collected different eye images are very large, and meanwhile the collected eye images are easily affected by noise, in order to make the eye state recognition network accurately obtain the distribution characteristics of the canthus lines in the eye images, the preprocessed image needs to obtain the feature descriptors, and the HOG feature descriptors can describe local information and are not affected by illumination invariance, so that the HOG feature descriptors are used for processing the images to be recognized, and when the eye state recognition network is used as input data in the training process of the subsequent eye state recognition network, the feature of the canthus lines in the eye image can be learned by the eye state recognition network. However, in the conventional HOG feature descriptor acquiring process, the image to be identified is divided into regions, and a gradient histogram is calculated for each region, so as to obtain a corresponding HOG feature descriptor. Therefore, the traditional HOG feature descriptors are analyzed according to gradient distribution features of pixel points in a local area, but in order to better reflect detailed texture features and extension distribution features of an eye area in an image to be identified, the features of the local area need to be considered and the integral features need to be considered, so that the HOG feature descriptors also contain the integral features of the image to be identified in the acquisition process, and the integral features between the local areas are prevented from being damaged.

It should be further noted that, in order to make the obtained HOG feature descriptor in each region include the distribution feature of the local region and also include the integrity feature, the relationship between the single local region and each local region in its neighborhood needs to be considered in obtaining the HOG feature descriptor. Because the image to be identified contains extension distribution information, the direction weight of the extension distribution information is obtained by analyzing the relation between the local areas, and the gradient histogram containing the integral characteristic is obtained according to the direction weight of the extension distribution information, so that the HOG characteristic descriptor containing the integral information is obtained. The method comprises the steps of combining the similarity between information distribution of each local area in a space domain of an image to be identified and the local area of a neighborhood of the image to be identified, constructing and obtaining comprehensive similarity characteristics of the local areas in order to combine edge characteristics in canthus lines in an eye image and simultaneously consider the similarity between information distribution on a frequency domain, and further obtaining the direction weight of extension distribution information according to the comprehensive similarity characteristics of the local areas.

In particular, the image to be identified is divided into a plurality of misaligned images And the gray values of all the pixel points in each local area form a two-dimensional distribution matrix, and each element in the two-dimensional distribution matrix is the gray value of the pixel point in each position in the local area. In particular, there may be partial pixels in the partial region dividing process, the partial region being not large enough, and other defects in the partial region where the pixels are locatedAnd the positions of the pixel points are subjected to linear interpolation to obtain the gray values of the missing pixel points.

Further, for any two-dimensional distribution matrix, SVD decomposition is carried out on the two-dimensional distribution matrix to obtain a left singular matrix of the two-dimensional distribution matrix, wherein a column vector of the left singular matrix is used as a feature vector of each local area, a plurality of feature vectors are contained, and each feature vector contains detail feature information of the two-dimensional distribution matrix. For any one local area, the local area is marked as a target local area, and the adjacent 8 neighborhood local areas in the target local area are marked as local areas to be calculated, and it is required to be specially noted that if the number of the local areas to be calculated in the target local area is less than 8, the local areas to be calculated in the target area are only the number of the existing local areas to be calculated. Marking any one of the feature vectors of the target local area as a first feature vector, respectively obtaining cosine similarity values between the first feature vector and all feature vectors of the local area to be calculated, and forming a first cosine similarity data set of the first feature vector; and marking any one of the feature vectors of the local area to be calculated as a second feature vector, respectively obtaining cosine similarity between the second feature vector and all feature vectors of the target local area, and forming a second cosine similarity data set of the second feature vector. And acquiring airspace information distribution similarity between the target local area and the local area to be calculated according to the first cosine similarity data set, the second cosine similarity data set and the gray value difference between the target local area and the local area to be calculated of the target local area. For the first The target local area and->Spatial information distribution similarity between the partial regions to be calculated +.>Is calculated by the method of (a)The method comprises the following steps:

wherein,indicate->Gray value average values of the local areas of the targets; />Indicate->The>The gray value average value of each local area to be calculated; />Indicate->The number of feature vectors for each target local region; />Indicate->The>The number of feature vectors of the local area to be calculated; />Indicate->The first target local areaA first cosine similarity dataset of the feature vectors of the individuals; />Indicate->The>The +.o. of the local area to be calculated>A second cosine similarity dataset of the feature vectors of the individuals; />Representing a maximum function; />An exponential function based on a natural constant is represented. It should be noted that +.>The model is used only to represent the negative correlation and the result of the constraint model output is at +.>Within the interval, other models with the same purpose can be replaced in the implementation, and the embodiment is only to +.>The model is described as an example, without specific limitation, wherein +.>Representing the input of the model. Wherein (1) >A reference value representing the spatial similarity between the target local region and the local region to be calculated is obtained by obtaining each characteristic direction of the target local regionThe maximum value of cosine similarity between the quantity and all the feature vectors of the local area to be calculated and the maximum value of cosine similarity between each feature vector of the local area to be calculated and all the feature vectors of the target local area are represented, if the maximum cosine similarity value in the cosine similarity data set of each feature vector is larger, the detail feature information of the target local area and the detail feature information of the local area to be calculated are similar, and the airspace information distribution between the target local area and the local area to be calculated is similar; by->The adjustment value of the reference value of the spatial similarity between the target local area and the local area to be calculated is used for indicating that the reference value of the spatial similarity between the target local area and the local area to be calculated should be amplified if the average difference of gray values between the target local area and the local area to be calculated is smaller, wherein the amplification degree is ∈>Corresponding values.

S003, according to the fast Fourier transform of the pixel points in the local area under different scales, obtaining a first amplitude and a first phase of each pixel point in the target local area, and a first amplitude to be calculated and a first phase to be calculated of each pixel point in the local area to be calculated, and obtaining the frequency domain information distribution similarity of the target local area and the local area to be calculated.

It should be noted that, because the spatial information between the local areas represents the gray value of the pixel point and the similarity in gray space, only the similarity of the brightness characteristics between the target local area and the local area to be calculated, that is, the similarity of the brightness characteristics of the canthus line features in the eye image and the similarity of the surrounding skin color textures are included, but because there is a lot of high-frequency information distribution, such as the extending distribution feature of canthus lines, in the eye image, the comprehensive similarity feature of the local area is constructed by considering the similarity of the frequency domain information distribution on the basis of the similarity of the acquired spatial information distribution. When the similarity between the frequency domain information distribution is obtained, the similarity between the frequency domain information distribution between the target local area and the local area to be calculated is obtained by analyzing the similarity between the frequency domain information between the target local area and the local area to be calculated under different scales because the amplitude information and the phase information of the pixel points under different scales are different, and the amplitude information and the phase information reflect the similarity between the frequency domain information distribution between the target local area and the local area to be calculated.

Specifically, different scale windows are preset, and the embodiment is set to 2 different scale windows, which are respectivelyScale Window and +.>A scale window. Similarly, for any one local area, the local area is marked as a target local area, and the adjacent 8 neighborhood local areas in the target local area are marked as local areas to be calculated. And for any pixel point in any one target local area, marking the pixel point as a first pixel point of the target local area. At->Under a scale window, taking a first pixel point of a target local area as a central pixel point of the window, acquiring a neighborhood pixel point of the first pixel point under the scale window, and marking the neighborhood pixel point as +.>A second pixel point under the scale window. Performing fast Fourier transform on the first pixel point and a second pixel point under the scale window of the first pixel point to obtain the first pixel point under the scale window and +.>The average value of the spectrum amplitude values and the average value of the spectrum phase values among all second pixel points under the scale window.Are respectively marked as->A first amplitude and a first phase under a scale window.

The same operation is performed for any one pixel point in any one local area to be calculated. The pixel point is marked as a first pixel point to be calculated of the local area to be calculated, and the pixel point is marked as a first pixel point to be calculated of the local area to be calculatedObtaining a neighborhood pixel point of the first pixel point to be calculated under the scale window, and marking the neighborhood pixel point as a second pixel point to be calculated, thereby obtaining +.>Under the scale window, the first pixel to be calculated and the first pixel to be calculated of the local area to be calculated are +.>The average value of the frequency spectrum amplitude and the average value of the frequency spectrum phase among all the second pixel points to be calculated under the scale window are respectively marked as +.>A first magnitude to be calculated and a first phase to be calculated under a scale window.

Similar operation, acquisition of target local areaFirst pixel point under scale window and first pixel pointFirst amplitude and first phase between all second pixel points under the scale window, and local area to be calculatedFirst pixel to be calculated under scale window and +.f. of the first pixel to be calculated>Under the scale windowThe first to-be-calculated amplitude values and the first to-be-calculated phases between all the second pixel points. It should be noted that +.>Scale Window +.>When the second pixel point in the scale window may exceed the boundary of the image, the embodiment uses the quadratic linear interpolation method to interpolate and fill the data in the portion of the image beyond the boundary.

Further, it willScale window is marked as->Dimension (L)>The scale window is marked as scale +.>Scale, for->The target local area and->Similarity of frequency domain information distribution between the individual local areas to be calculated +.>The calculation method of (1) is as follows:

wherein,represents the number of dimensions, in this example +.>；/>Representing the number of pixels in the local area, wherein the number of pixels in each local area is the same; />The sequence position serial numbers of the pixel points in the local areas are represented, wherein the sequence positions from top to bottom and from left to right in each local area are adopted for serial number labeling; />Indicate->Under the scale->The>A first amplitude of each pixel point; />Indicate->Under the scale->The>The +.o. of the local area to be calculated>A first magnitude to be calculated for each pixel point; />Indicate->Maximum value of amplitude values of all pixel points in image to be identified under scale, < >>Representing a maximum function; />Indicate->Under the scale->The>A first phase of each pixel; />Indicate->Under the scale->The>The +.o. of the local area to be calculated>A first phase to be calculated of the pixel points; / >An exponential function based on a natural constant is represented. It should be noted that +.>The model is used only to represent the negative correlation and the result of the constraint model output is at +.>Within the interval, other models with the same purpose can be replaced in the implementation, and the embodiment is only toThe model is described as an example, without specific limitation, wherein +.>An input representing the model; />The maximum angle value is represented and used for angle normalization processing. Wherein (1)>Indicate->The target local area is +.>The energy relation of the pixel points at the same position between the local areas to be calculated is generally represented by using the square of the amplitude, so that the similarity of the energy relation is represented by calculating the difference between the square of the first amplitude between the target local area and the local area to be calculated and the square of the first amplitude to be calculated, and if the difference is smaller, the energy between the target local area and the local area to be calculated is indicated to be more similar, namely the similarity of the frequency domain information distribution between the two local areas is larger; />And carrying out characteristic combination on the amplitude difference value and the phase difference value of the pixel points at the same position between the target local area and the local area to be calculated under the same scale in an L2 norm mode, wherein the smaller the amplitude difference value and the phase difference value of the pixel points at the same position between the two local areas, the larger the frequency domain information distribution similarity between the target local area and the local area to be calculated under the scale is.

S004, acquiring comprehensive similarity of the target local area and the local area to be calculated according to the acquired spatial domain information distribution similarity and the frequency domain information distribution similarity of the target local area and the local area to be calculated; and adjusting the gradient histogram according to the comprehensive similarity of the target local area and the local area to be calculated.

It should be noted that, in acquiring the spatial domain information distribution similarity and the frequency domain information distribution similarity between the target local area and the local area to be calculated, it is necessary to construct and acquire the comprehensive similarity characteristics between the target local area and each local area to be calculated, and further acquire the direction weight of the extension distribution information according to the comprehensive similarity characteristics of the local areas. When the comprehensive similarity characteristics between the local areas and each local area to be calculated are obtained, as the characteristic information of the eye images represented by different local areas is different, the partial local areas are represented by the texture characteristics of the surface of the eye images, and the partial local areas are represented by the extending distribution characteristics of the eye angular lines, different fusion weights are required to be set for different local areas, and the determination of the fusion weights is related to the spatial domain information distribution similarity and the frequency domain information distribution similarity. After the comprehensive similarity features are obtained, the distribution of the similarity between the local area and each part to be calculated can be determined, and then in the corresponding HOG feature descriptor generation process, the angle weight of the vote of the gradient angle in the gradient histogram generation process, namely the direction weight of the extension distribution information, can be obtained.

Specifically, will beThe sum of the spatial information distribution similarities between the individual target partial areas and all the partial areas to be calculated of the target partial areas is marked +.>Wherein->，/>Indicate->The number of partial areas to be calculated of the target partial areas, is->Indicate->The target local area is +.>Spatial domain information distribution similarity among the local areas to be calculated; likewise, get->The sum of the frequency domain information distribution similarities between the individual target local area and all the local areas to be calculated of the target local area is denoted +.>WhereinWherein->Indicate->The target local area is +.>The frequency domain information distribution similarity between the partial areas to be calculated. For->The target local area is +.>To-be-calculated partsComprehensive similarity between regions->The calculation method of (1) is as follows:

wherein,indicate->Accumulating and summing the airspace similarity between each target local area and all to-be-calculated local areas of the target local area; />Indicate->The frequency domain information distribution similarity between each target local area and all to-be-calculated local areas of the target local area is accumulated and summed; / >Indicate->The target local area is +.>Spatial domain information distribution similarity among the local areas to be calculated; />Indicate->The target local area is +.>Local regions to be calculatedAnd (5) frequency domain information distribution similarity among the two. The method comprises the steps that through the accumulation sum of spatial domain information distribution similarities of a target local area and a local area to be calculated and the fusion weight of a frequency domain information distribution similarity accumulation sum as the target local area, when the frequency domain information distribution similarities are smaller, the target local area mainly contains more extension distribution information, the frequency domain information distribution similarities which need to be considered when the comprehensive similarities are obtained are reduced, the fusion weight of the spatial domain information distribution similarities in calculating the comprehensive similarities is reduced, and the fusion weight of the frequency domain information distribution similarities in calculating the comprehensive similarities is amplified; on the contrary, when the space domain information distribution similarity is smaller, the fact that the target local area mainly contains more texture distribution information is indicated, the space domain information distribution similarity is considered when the comprehensive similarity is acquired, the fusion weight of the space domain information distribution similarity in calculating the comprehensive similarity is amplified, and the fusion weight of the frequency domain information distribution similarity in calculating the comprehensive similarity is reduced.

Further, gradient angles and gradient amplitudes of all pixel points in the target local area are obtained; acquiring a HOG feature descriptor of a target local area, wherein the HOG feature descriptor comprises a gradient histogram, and the gradient histogram comprises a vote value of each gradient amplitude; according to the comprehensive similarity of the target local area and each local area to be calculated, obtaining a comprehensive similarity data set of the target local area, and carrying out each data in the comprehensive similarity data setAnd (5) carrying out normalization processing to obtain a normalized comprehensive similarity data set. And acquiring the local area to be calculated corresponding to the maximum value in the normalized comprehensive similarity data set of the local area, and recording the local area as a reference local area. Acquiring a connecting line of the center of the target local area and the center of the reference local area, and recording the angle of the connecting line as a reference angle, wherein the value range of the angle is +.>. The preset angle difference threshold (10 in this embodiment, may be set according to the implementation)According to the specific implementation situation), in the target local area, acquiring gradient angles of all pixel points, and if the absolute value of the difference value between the gradient angles and the reference angle is smaller than a preset angle difference threshold value, marking the gradient angle as an angle to be adjusted. The maximum value in the comprehensive similarity data set is marked as the extension distribution information direction weight +. >The voting number of the gradient amplitude corresponding to the angle to be adjusted is adjusted, and the voting number after adjustment is +.>The calculation method of (1) is as follows:

wherein,indicating the direction weight of the spread distribution information,/-, and>voting value representing gradient amplitude corresponding to angle to be adjusted,/->Representing a downward rounding function. Obtaining an adjusted gradient histogram according to the adjusted vote count value, and further obtaining an adjusted HOG feature descriptor, wherein the process of constructing the gradient histogram by the HOG feature descriptor is a well-known technology, and this embodiment is not described in detail.

S005, training according to the adjusted HOG feature descriptors of all the images to be identified, which are obtained as input data of the eye state identification network, and obtaining the eye state identification network.

Inputting the adjusted HOG feature descriptor into a trained neural network, the network outputting eye states including: no four eye states of canthus, slight canthus, moderate canthus, severe canthus. The neural network used in this embodiment is ResNet500, and other neural networks and other eye states may be used in other embodiments, and this embodiment is not particularly limited. The training method of the neural network comprises the following steps: collecting a large number of eye images, obtaining adjusted HOG feature descriptors on each eye head portrait by using the method of the embodiment, marking the eye state of the eye images by a professional as labels of the adjusted HOG feature descriptors in the eye images, taking the adjusted HOG feature descriptors and the corresponding labels in all the eye images as a data set, training a neural network by using the data set, and training a loss function used as a cross entropy loss function. The training method of the second neural network is a known technology, and the present embodiment is not limited in detail.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. A computer vision-based eye state recognition method, characterized in that the method comprises the following steps:

2. The method for recognizing eye states based on computer vision according to claim 1, wherein the steps of constructing a two-dimensional distribution matrix for each local region and performing SVD decomposition to obtain feature vectors of the local region include the following specific steps:

3. The method for recognizing eye states based on computer vision according to claim 1, wherein the step of obtaining the spatial domain information distribution similarity between the target local area and the local area to be calculated based on the cosine similarity between the feature vector of the target local area and the feature vector of the local area to be calculated and the gray value difference between the target local area and the local area to be calculated comprises the following specific steps:

4. The computer vision based eye state recognition method of claim 3, wherein the first cosine similarity dataset for each feature vector of the target local area is obtained; the method for acquiring the second cosine similarity data set of each feature vector of the local area to be calculated comprises the following specific steps:

5. The method for recognizing eye states based on computer vision according to claim 3, wherein the step of obtaining the spatial domain information distribution similarity between the target local area and the local area to be calculated based on the first cosine similarity data set and the second cosine similarity data set and the gray value difference between the target local area and the local area to be calculated of the target local area comprises the following specific steps:

for the firstA target local area and a first ∈of the target local area>Spatial information distribution similarity between the partial regions to be calculated +. >The calculation method of (1) is as follows:wherein (1)>Indicate->Gray value average values of the local areas of the targets; />Indicate->The>The gray value average value of each local area to be calculated;indicate->The number of feature vectors for each target local region; />Indicate->The>The number of feature vectors of the local area to be calculated; />Indicate->The>A first cosine similarity dataset of the feature vectors of the individuals; />Indicate->The>The +.o. of the local area to be calculated>A second cosine similarity dataset of the feature vectors of the individuals; />Representing a maximum function; />An exponential function based on a natural constant is represented.

6. The method for identifying the eye state based on computer vision according to claim 1, wherein the step of obtaining the similarity of the frequency domain information distribution between the target local area and the local area to be calculated according to the frequency spectrums of the target local area and the local area to be calculated under different scale windows comprises the following specific steps:

For the firstA target local area and a first ∈of the target local area>Similarity of frequency domain information distribution between the individual local areas to be calculated +.>The calculation method of (1) is as follows:wherein (1)>Representing the number of dimensions; />Representing the number of pixels in the local area, wherein the number of pixels in each local area is the same; />A sequential position number indicating a pixel point in the local area; />Indicate->Under the scale->The>A first amplitude of each pixel point; />Indicate->Under the scale->The>The first local area to be calculatedA first magnitude to be calculated for each pixel point; />Indicate->Maximum value of amplitude values of all pixel points in image to be identified under scale, < >>Representing a maximum function; />Indicate->Under the scale->The>A first phase of each pixel; />Indicate->Under the scale->The>The +.o. of the local area to be calculated>A first phase to be calculated of the pixel points; />An exponential function based on a natural constant is represented.

7. The method for identifying eye states based on computer vision according to claim 6, wherein the first amplitude and the first phase under different scale window sizes are obtained for the target local area under the same scale window size; the method comprises the following specific steps of:

8. The method for identifying the eye state based on computer vision according to claim 1, wherein the step of obtaining the comprehensive similarity of the target local area and the local area to be calculated according to the spatial domain information distribution similarity and the frequency domain information distribution similarity between the target local area and the local area to be calculated comprises the following specific steps:

for the firstThe target local area is +.>Comprehensive similarity between individual local regions to be calculatedThe calculation method of (1) is as follows: />Wherein,indicate->Accumulating and summing the airspace similarity between each target local area and all to-be-calculated local areas of the target local area; />Indicate->The frequency domain information distribution similarity between each target local area and all to-be-calculated local areas of the target local area is accumulated and summed; / >Indicate->The target local area is +.>Spatial domain information distribution similarity among the local areas to be calculated; />Indicate->The target local area is +.>The frequency domain information distribution similarity between the partial areas to be calculated.

9. The method for recognizing eye states based on computer vision according to claim 1, wherein the step of adjusting the gradient histogram of the target local area according to the comprehensive similarity, and obtaining the adjusted HOG feature descriptor according to the adjusted gradient histogram, comprises the specific steps of:

according to the comprehensive similarity of the target local area and each local area to be calculated, obtaining a comprehensive similarity data set of the target local area, and carrying out each data in the comprehensive similarity data setNormalizing to obtain a normalized comprehensive similarity data set;

the maximum value in the comprehensive similarity data set is marked as the direction weight of the extension distribution informationThe voting number of the gradient amplitude corresponding to the angle to be adjusted is adjusted, and the voting number after adjustment is +.>The calculation method of (1) is as follows: />Wherein (1)>Indicating the direction weight of the spread distribution information,/-, and>voting value representing gradient amplitude corresponding to angle to be adjusted,/->Representing a downward rounding function;

10. Computer vision based eye state recognition system comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the computer vision based eye state recognition method according to any one of claims 1-9 when the computer program is executed.