CN115599937A

CN115599937A - Retrieval and classification-based image content attribute extraction method and system

Info

Publication number: CN115599937A
Application number: CN202211138151.4A
Authority: CN
Inventors: 杜学绘; 王娜; 李峰; 任志宇; 王文娟; 曹利峰; 刘敖迪; 单棣斌
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-01-13

Abstract

The invention belongs to the technical field of image processing, and particularly relates to an image content attribute extraction method and system based on retrieval and classification, wherein a retrieval set containing images of a training set of image classification tasks is constructed, and an image attribute generator for extracting image content attributes is constructed, and is composed of image related elements in a resource set and corresponding relations among the elements; representing images by using sample points in the characteristic space, retrieving and acquiring the most similar image to the image to be detected from an image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the images and the most similar image, and acquiring a corresponding image content attribute pair through the most similar image; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured according to the recommended attribute. The invention can continuously learn on the premise of ensuring the image classification accuracy, can effectively identify the group class which is not learned, provides high-quality image content attributes for attribute-based access control, and is convenient for application in actual scenes.

Description

Retrieval and classification-based image content attribute extraction method and system

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image content attribute extraction method and system based on retrieval and classification.

Background

The large-scale application of the computer network brings convenience to people, and meanwhile, information safety hazards are generated, such as illegal access, private data leakage and the like, and unnecessary troubles are brought to normal life and business activities. In addition to information encryption, firewalling, and user authentication techniques, the most important means of information security is access control techniques. The access control not only requires that only a legal user can access the resources in the system, but also ensures the information security to the maximum extent because the legal user can access the resources only according to the specified authority. Over decades of development, access Control models have evolved from the initial autonomous Access Control (DAC), mandatory Access Control (MAC) to Task-Based Access Control (TBAC), role-Based Access Control (RBAC), and Attribute-Based Access Control (ABAC). The ABAC has the characteristics of strong flexibility and simple strategy formulation, and is more suitable for dynamic access control scenes under the condition of big data. If the ABAC is taken as a standard, the TBAC and the RBAC will be two special cases, i.e., the ABAC can be compatible with the TBAC and the RBAC. In the zero trust architecture appearing in recent years, the ABAC is more suitable than the RBAC.

In the access control technique, there are four basic elements: subject, object, environment, and policy. The main body is the initiator of the access and can be a common user or process; the object is an accessed resource, generally a certain file or service; the policy is a basis for judging whether the subject has access to the object, and is achieved by a restriction condition describing the subject, the object and the environment attributes. The access control objects include various system resources such as data, services, etc., wherein the most common data includes structured data and unstructured data. In a conventional access control method, attributes of an object are usually attribute identifications having a uniform standard, such as title, ownership, creation time, content, and the like. The content attributes of the structured data may be obtained using the matching keywords. And for unstructured data, semantic description of the content of the unstructured data has high abstraction and generalization, and the unstructured data can be generally marked only by hands. An image is typically unstructured data, the most basic data units of which are pixels, and the image content generally refers to the corresponding semantics of a subject in the image. Since it is difficult to establish a strict definition from pixel to semantic, the image content attribute can only be obtained by a manual labeling method for a long time, and the requirements of real-time performance and accuracy are difficult to meet. Deep learning is utilized, the image category label is used as the description of the content attribute, and the method is simple to implement and high in accuracy rate in the environment that training data is sufficient and relatively fixed; however, in practical applications, since the image data is generated quickly and massively, and is limited by the classification method itself, the attribute extraction method mainly faces two problems: the ability of the cross-domain time-division classification model to extract information is sharply reduced, and the ability of dynamic adjustment (namely continuous learning) is lacked.

Disclosure of Invention

Therefore, the invention provides the image content attribute extraction method and system based on retrieval and classification, which can continuously learn on the premise of ensuring the image classification accuracy, can effectively identify the group classes which are not learned, provides high-quality image content attributes for the access control based on the attributes, and is convenient for application in actual scenes.

According to the design scheme provided by the invention, the image content attribute extraction method based on retrieval and classification is provided, and comprises the following contents:

constructing a retrieval set containing images of each image classification task training set and an image attribute generator for extracting image content attributes, wherein the image attribute generator is composed of corresponding relations between image related elements and elements in a resource set, and the related elements at least comprise: the image retrieval system comprises a classification model, a classification task group name and a classification task group name, wherein the classification model is used for acquiring image category attributes, the class name of the category to which the image belongs and the classification task group name corresponding to the category to which the image belongs, and any one image in an image retrieval library has a unique group name and a unique class name and forms a corresponding image content attribute pair by the group name and the class name;

representing images by using sample points in the characteristic space, retrieving and acquiring the most similar images to the images to be detected from an image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the images and the most similar images, and acquiring corresponding image content attribute pairs through the most similar images; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured according to the recommended attribute.

As the image content attribute extraction method based on retrieval and classification, further, a retrieval set is composed of a feature extraction model, an image feature vector library and a vector attribute library, and a resource set of an image attribute generator is composed of a plurality of groups of structural bodies, wherein each group of structural bodies comprises: the system comprises a group name for representing a group ID, a class name list for representing all class names in the group and the number of images added into a retrieval set in each class, a classification model for obtaining the attribute of the image class, an M value for representing the output parameter of the classification model, and a vector ID list for recording vectors in an image feature vector library contained in each class in the group.

As the image content attribute extraction method based on retrieval and classification, a lightweight deep learning model is adopted as a classification model, the labeled data set is used for training and learning of the model, and the training and learning process of the model is completed before the group structure is established.

As the image content attribute extraction method based on retrieval and classification, the feature extraction model is pre-trained by using a known data set, the image feature vector library comprises image feature vectors of various categories in each group of the resource set, each image feature vector is provided with an ID, and a corresponding relation is generated by using the ID and the image attribute.

The image content attribute extraction method based on retrieval and classification is characterized in that aiming at the most similar image, the image feature vector of the most similar image is matched in the vector attribute library, and the image content attribute pair corresponding to the image feature vector in the most similar image is obtained through matching.

As the image content attribute extraction method based on retrieval and classification, the invention further aims at the acquired characteristic space distance and the recommendation attribute, if the characteristic space distance between the image to be detected and the most similar image is greater than a preset distance threshold value, the current image to be detected is judged to belong to an unknown group image, otherwise, the group name in the recommendation attribute is accepted as the group name of the image to be detected, and the class name of the image to be detected is predicted according to the classification probability and the power law distribution of the same group of image data.

As the image content attribute extraction method based on retrieval and classification of the present invention, further, for a preset distance threshold, a class name in the acquired image content attribute is used as a category keyword, a nearest neighbor image feature vector of the category keyword is retrieved in an image feature vector library according to a ranking condition, a maximum feature space distance is acquired according to the nearest neighbor image feature vector, and a distance threshold is set according to the maximum feature space distance.

As the retrieval and classification-based image content attribute extraction method, in the step of predicting the category name of the image to be detected, firstly, aiming at the image to be detected, obtaining an image probability vector through a classification model, taking logarithm of each probability element in the image probability vector, and obtaining power law distribution of the image probability vector by using the numerical value after taking the logarithm; then, the same group of sample image data is taken to carry out a plurality of times of power law distribution experiments so as to

As x, L of the abscissa ₀ An xy coordinate system is established for a vertical coordinate y, and positive and negative sample concentrated distribution is obtained by utilizing the xy coordinate system, wherein n is the dimension of an image probability vector output by a classification model, L is a descending order sequence of each class probability output by the classification model when an input image I to be detected is input, and L is ₀ The head of the L descending sequence; and then, acquiring the class name of the image to be detected by retrieval or classification according to the concentrated distribution rule of the positive and negative samples.

The image content attribute extraction method based on retrieval and classification further utilizes the positive and negative sample centralized distribution rule to obtain the class name of the image to be detected

As a classification condition, when the classification condition is met, acquiring the name of the image to be detected by using a classification model corresponding to the group name through an image classification method, otherwise, receiving the name of the class in the recommendation attribute corresponding to the group name by using an image retrieval method, wherein M is the M value of the output parameter of the classification model, and M is the value of M of the output parameter of the classification model

Further, the present invention also provides a system for extracting image content attributes based on retrieval and classification, comprising: a data construction module and a prediction output module, wherein,

the data construction module is used for constructing a retrieval set containing images of the training set of each image classification task and an image attribute generator used for extracting image content attributes, the image attribute generator is composed of corresponding relations between related elements of the images in the resource set and the elements, wherein the related elements at least comprise: the image retrieval system comprises a classification model, a classification task group name and a classification task group name, wherein the classification model is used for obtaining image category attributes, the class name of the category to which the image belongs and the classification task group name corresponding to the category to which the image belongs, any one image in an image retrieval library has a unique group name and a unique class name, and a corresponding image content attribute pair is formed by the group name and the class name;

the prediction output module is used for representing the images by using sample points in the characteristic space, retrieving and acquiring the most similar image to the image to be detected from the image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the most similar image and the image to be detected, and acquiring the corresponding image content attribute pair through the most similar image; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured through the recommended attribute.

The invention has the beneficial effects that:

the invention divides the image content attribute into two levels of groups and classes, and continuously learns by taking the groups as units. Giving high quality attributes of the image data by combining image retrieval and classification; the known and unknown group classes are distinguished in groups, the group name of the image to be detected with the known class is judged, the class name is obtained in the group by classification and combination of a classification and retrieval method, and misjudgment caused by the simple use of an image classification method is reduced; and for the images of the unknown group class, new groups can be continuously learned, and the knowledge of the system can be expanded, so that the images of the unknown group class are converted into the known group class. Further, experimental data prove that the scheme can improve the classification accuracy, and has certain continuous learning capacity while enhancing the robustness.

Description of the drawings:

FIG. 1 is a schematic diagram illustrating an exemplary process for extracting image content attributes;

FIG. 2 is a schematic diagram of an embodiment in which the probability vectors and feature vectors are taken from different layers in a deep neural network;

FIG. 3 is a schematic diagram of spatial sample distribution of image classification and retrieval in an embodiment;

FIG. 4 is a schematic illustration of the positioning of the image attribute generator IAE in an embodiment;

FIG. 5 is an illustration of an embodiment of an image attribute generator architecture;

FIG. 6 is a schematic power law distribution diagram of an embodiment;

FIG. 7 is a schematic diagram showing that the output of the image classification model in the embodiment follows a power law distribution;

FIG. 8 is a schematic diagram showing the distribution of positive (green) and negative (red) samples in the example;

FIG. 9 is a diagram illustrating the determination of the group name of an image to be detected by retrieving nearest neighbor vectors in the embodiment;

FIG. 10 is a flowchart of an image content attribute extraction algorithm in the embodiment;

fig. 11 is an effect illustration of access control in the embodiment.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

The access control is the last line of defense of information security, and ensures that a legal user can only access resources within a certain authority range. In the big data era, the traditional access control method is difficult to apply due to high information updating speed and large volume, and the access control based on the attributes is brought forward, so that the extraction of the high-quality attributes becomes a key element. An image is a typical unstructured data, and how to effectively extract the content attribute of the image directly influences the effective implementation of access control. The existing research is mostly limited to the use of an image classification method to obtain attributes, and the existing research has high dependence on training data, weak robustness and lacks cross-group knowledge.

To this end, an embodiment of the present invention provides an image content attribute extraction method based on retrieval and classification, including:

s101, constructing a retrieval set containing images of each image classification task training set and an image attribute generator for extracting image content attributes, wherein the image attribute generator is formed by image related elements in a resource set and corresponding relations among the elements, and the related elements at least comprise: the image retrieval system comprises a classification model, a classification task group name and a classification task group name, wherein the classification model is used for obtaining image category attributes, the class name of the category to which the image belongs and the classification task group name corresponding to the category to which the image belongs, any one image in an image retrieval library has a unique group name and a unique class name, and a corresponding image content attribute pair is formed by the group name and the class name;

s102, representing images by using sample points in a feature space, retrieving and acquiring a most similar image to the image to be detected from an image retrieval library according to a feature space distance between the images, recording the feature space distance between the most similar image and the image to be detected, and acquiring a corresponding image content attribute pair through the most similar image; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured according to the recommended attribute.

The attributes are the basis of the ABAC access control strategy, and the attributes with high quality are beneficial to simplifying the expression of the strategy and improving the overall efficiency of the system. The method has less related research on access control specific to image data, and mainly focuses on artificially synthesizing and labeling images with emphasis on geographic data access control and the like. Compared with traditional structured data, the unstructured data of the image is characterized by large data volume, and no clear definition is formed between basic units (pixels) of the data and high-level semantics (such as class labels). It has long been converted into structured data by manual annotation. The manual marking under the condition of big data has the problems of poor timeliness and high marking cost, and also has the problems of high subjective degree of description and lack of unified standard. With the continuous development of artificial intelligence theory, deep learning represented by Convolutional Neural Networks (CNNs) has gradually become the mainstream method in image classification tasks in recent decades, and for example, residual error Networks (Resnet) have better performance than human testers in image classification tasks. And the classification precision is continuously improved by using deeper and larger models, such as Inception V4, vit, denseNet, convNeXt and the like. Extracting image content attributes using image classification models is the most straightforward and efficient method.

The general method of image classification using deep learning is: and selecting a neural network model with a proper parameter scale by taking the manually marked images of the categories as a training set, and judging whether the classification result meets the requirement by using a loss function. And continuously iterating and updating the model parameters through a back propagation algorithm until the value of the loss function meets the preset requirement, and storing the model framework and all the parameters for classification prediction. As shown in fig. 2, when the class of the image is predicted using the model, a class probability vector is output for any input image, each component of the vector represents the probability that the input image belongs to each class, and the class with the highest probability is output as the prediction result. The model can be generally divided into a convolutional layer, a fully-connected layer and Softmax, wherein the input vector of the fully-connected layer is generally called the feature vector of the image and is often used for image retrieval.

The image classification method based on deep learning has great limitations, and on one hand, the robustness is weak: the accuracy of classification is very sensitive to the selection of the training set, and even if the classification is the same, the difference of test results is large by using images in different domains from the training set; the subject position and orientation of the images in the training set can also affect the classification. On the other hand, the model is more difficult to give relevant knowledge when the image class to be tested does not belong to the class of the training set due to the lack of continuous learning capability. With regard to continuous learning, researchers have found that when a model trained on one task is fine-tuned on another task, the performance on the original task drops dramatically, a phenomenon known as catastrophic forgetting. The method using Orthogonal Weight Modification (OWM) helps to overcome catastrophic forgetfulness, but this requires the construction of a super model, which brings difficulties to practical application, and no solution is provided for the problem of weak model robustness.

Image Retrieval is another basic problem related to Image processing, and the purpose of the Image Retrieval is to quickly find an Image closest to an Image to be retrieved from a huge Image library, and the Image Retrieval method can be divided into Text-based Image Retrieval (TBIR) and Content-based Image Retrieval (CBIR). Global description is a very effective content-based image retrieval method, which uses the full-connected layer output of an image classification model to extract image features (in the form of a high-dimensional feature vector), and converts the search of nearest images into the search of nearest neighbor vectors. In sharp contrast to the weak cross-domain capability of image classification methods, the image retrieval method is excellent in performance in a wide range of scenes and has strong generalization capability after the model learns sufficient feature expression through a large classification task.

Different from a common image classification application scene (for example, defective product detection in industrial production only requires to distinguish qualified products from unqualified products), the access control system faces a wide range of data sources, and the attribute extraction method is required to have stronger generalization capability and continuous learning capability so as to ensure that the safety of the system is not affected when the difference between the image data to be processed and the images in the training set is larger. In the embodiment of the scheme, the image attributes are extracted by combining image retrieval with a plurality of image classification tasks, and the image content attributes are divided into two levels: group name G and class name C, i.e. attributes of an image may be represented by a string pair (const char G, const char C), the class name being taken from a category in the image classification, the group name representing to which image classification task the class belongs, both the group name and the class name having global uniqueness.

As shown in fig. 3, the image may be represented by sample points in a feature space, and in order to distinguish various types of images in the feature space, the image classification method (fig. 3, left) divides the entire feature space by a classification plane, the feature space and the division thereof are corresponding to a specific classification task, and the separability of various types of samples is high, but other classification tasks are not considered. Image retrieval (fig. 3 right) uses the general feature space to calculate the most approximate image from the distance. The linear separability of samples within a single group between different classes is slightly lower than if the image classification used a particular space, but the samples of the same class are still clustered to a greater extent. In the case of multiple groups, the intra-group sample distance is small, the inter-group sample distance is large, and the unknown group class images are significantly far from the known group class samples.

The image retrieval library comprises images of training sets of all classification tasks, and any image in the library has a unique group name and a unique class name. When the attribute is extracted for a given image I to be detected, firstly, an image I ' which is closest to the image I in the library is obtained by using image retrieval, and the attribute (G ', C ') is used as a recommended attribute. And judging whether to accept the recommended group name according to the relative distance between the samples, and judging whether to change the class name by combining an image classification model, thereby realizing the high-quality attribute extraction of the image data by combining retrieval and classification.

As a preferred embodiment, further, the search set is composed of a feature extraction model, an image feature vector library and a vector attribute library, and the resource set of the image attribute generator is composed of a plurality of groups of structures, each group of structures including: the system comprises a group name for representing a group ID, a class name list for representing all class names in the group and the number of images added into a retrieval set in each class, a classification model for obtaining the attribute of the image class, an M value for representing the output parameter of the classification model, and a vector ID list for recording vectors in an image feature vector library contained in each class in the group.

In the standard ABAC model, as in fig. 4, four main components are involved: the PEP (Policy Enforcement Point) accepts the user request and returns the requested resource or refuses to access according to the Policy matching result; the PDP (Policy Decision Point) makes the Decision of access or denial according to the requested attribute and Policy; PAP (Policy Administration Point) manages all policies; PIP (Policy Information Point) acquires and manages related attributes of a subject, environment, and resource.

The image itself has general attributes of the file such as author, creation time, extension, etc. In the embodiment of the present invention, an Image Attribute generator (IAE) is used as a branch of the PIP for extracting an Image content Attribute (hereinafter referred to as an Attribute). The specific names of the attributes are controlled by the administrator and updated in time according to task changes. When new image resources are added into the system, the IAE is responsible for extracting the attributes of the image file resources. An access control policy may then be formulated based on the resource attributes.

As the image classification and retrieval method is involved, the IAE needs to contain various related elements and corresponding relations, and as shown in fig. 5, the structure thereof is divided into three parts: resource sets, search sets, and method sets.

The resource set contains several group structures, each of which includes a group name, a list of class names, a classification model, an M value, and a list of vector IDs, as shown in fig. 5. In IAE, the group name and the class name must have uniqueness, and the class name list includes all class names in the group and the number of images ki added to the search set in each class. A classification model (model) is a tool for obtaining image class attributes and is trained before a group structure is built. The training is to use deep learning image classification training method to learn the model parameters in the labeled data set and achieve the expected classification accuracy. When the IAE contains more groups, a lightweight model can be selected for reducing the system resource consumption. The M value represents the height of a long tail part output by the classification model, is an important parameter for calculating classification conditions when image attributes are extracted, and is defined as formula (1), wherein n is the number of classes in a group, and L is the descending order of the probability of each class output by the model when the input is an image I to be detected (2).

L＝sort _reverse (model(I)) (2)

The definition of M can be based on the analysis of the power law distribution: the power law distribution is a commonly used probability distribution characterized by a peak and a long tail. The characteristic of the peak represents the strong discretization degree between a small number of values and the whole, and the system state can be approximately represented by using a small number of large data for a system with power law distribution output. The value of the long tail part is very small, and the distribution rule of the long tail part needs to be analyzed after logarithm taking. Assuming that the functional relationship y = f (x) obeys a power law distribution, it is generally expressed as equation (3),

y＝ax ^-k ；(x,a,k>0) (3)

the logarithm of the two sides of the equation is:

lg(y)＝lg(a)-klg(x)；(a,k>0)

order:

y′＝lg(y)；x′＝lg(x)

obtaining:

y′＝lg(a)-kx′ (4)

the image is a straight line. The method for determining the power law distribution according to equation (4) is to make a function image having a log-log of the independent variable and the function value exhibit linear characteristics, as shown in fig. 6.

In engineering applications, the probability distribution of a single independent variable generally follows normal distribution, and the output of a complex system with self-organization criticality is characterized by power law distribution in many cases, such as a sand heap model. Not only does the power-law distribution occur widely in the brain, some levels in the deep neural network follow the power-law distribution. For the input image I, as shown in fig. 7 (a), L (L) is set ₀ ,L ₁ ,…,L _i ,…,L _n-1 ) N-dimensional probability vectors for the output of the classification model, in which the components have been arranged in descending order, will be L _i Viewed as a function of i, the function image is a discrete point (1,L) ₀ ),(2,L ₁ ),…,(i,L _i-1 ),…,(n,L _n-1 ) As shown in fig. 7 (c); taking a double logarithm post-function image with the base 10 as (lg (1), lg (L) ₀ )),(lg(2),lg(L ₁ )),…,(lg(i),lg(L _i-1 )),…,(lg(n),lg(L _n-1 ) As shown in (d) of fig. 7). The image of the L conforms to power law distribution, and the linear characteristic of the middle section is obvious; the image has a horizontal center of 0.5lg (n), and the nearest point on the right side has the sequence number

Corresponding to ordinate of

The vector ID list records which vectors in the image feature vector library are included in each class in the group and is created when the group structure is generated.

The retrieval set comprises a general feature extraction model Modelf, an image feature vector library and a vector attribute library, the feature extraction model uses a classification model pre-trained on a large data set and takes the classification model to output in a full connection layer, and the feature vector corresponding to the image I to be detected is marked as I _f . The image feature vector library comprises all kinds of image feature vectors in each group in the resource set, each vector is assigned with a unique ID when being put in storage, and the feature vectors generate corresponding relations with the images through the IDs. To control the size of the library, a range of the number of images ki for any one category is defined: 0<k0, where the value of k0 must be specified when establishing IAE, a default value of k0=100 may be set.

The vector attribute library records the attributes of the image corresponding to all vectors in the image feature vector library, and the image attributes can be inquired by inputting the vector ID.

As a preferred embodiment, further, in predicting the category name of the image to be measured, first, for the image to be measured, obtaining an image probability vector through a classification model, taking a logarithm of each probability element in the image probability vector, and obtaining power law distribution of the image probability vector by using the value after taking the logarithm; and then, taking the same group of sample image data to perform a plurality of times of power law distribution experiments, and acquiring the concentrated distribution of positive and negative samples by constructing an xy coordinate system.

Taking images in the same training set to carry out multiple experiments so as to obtain

Is the abscissa x, L ₀ For the ordinate y, it can be seen that the positive and negative samples are respectively distributed in two directions in a concentrated manner, as shown in fig. 8. Taking out of all samples in the same group

Namely, (M, 0.5) is marked as point A, and (0,1) is marked as point B, when the sample to be detected is positioned above the straight line AB, the accuracy rate of the attribute acquisition by using the classification method is high, and when the sample to be detected is positioned below the AB, the accuracy rate of the classification is low, so that the method is more suitable for acquiring the attribute by using the retrieval method.

The equation for line AB is:

substituting the formula (1) into the formula:

let the left side of the equation equal to T, then equation (5) is called the classification condition:

in a uniform feature vector space, the distance between images in each group is relatively short, and the distance between groups is relatively long; if the name of the image group to be detected is unknown, the name of the group can be recommended by retrieving the most similar image. And when the distance between the image to be detected and the nearest similar image in the feature space is overlarge, the image belongs to an unknown group of images. Because the distribution of the feature vectors in the space is not uniform, in order to judge the relative size of the distance, the distance between the feature vectors and the existing vectors in the library needs to be compared:

as shown in FIG. 9, the feature vector corresponding to the image I to be measured is I _f The most recent similar image obtained by the search is I ₁ The corresponding vector is I _1f A distance of D ₁ By querying the vector attribute library with its ID, I is obtained _1f Property (G) of ₁ ，C ₁ )，C ₁ Class retrieval in image feature vector library I _1f Top (80% ₁ ) Nearest neighbor vector, and the maximum distance is denoted as D ₈₀ If:

D ₁ <D ₈₀ (6)

the attribute G1 is accepted as the group name of I, otherwise I is marked as an unknown group class image. Equation (6) may be set as the grouping condition.

In the architecture shown in fig. 5, the method set can be set to include three methods according to practical applications:

the method comprises the following steps: one group is added. The input is as follows: (group name, class name list (containing number k of images of each class), classification model, and training image set of each class satisfying number limitation). To generate a group structure, M is first calculated using the formula (1,2); then, converting the training image into a feature vector by using a feature extraction model, adding the feature vector into an image feature vector library, and adding the corresponding relation between the ID and the attribute into a vector attribute library; all intra-group vector IDs are recorded in a vector ID list. And finally adding the obtained group structure body into the resource set, and outputting the group name and the class name to the PIP.

The method 2 comprises the following steps: a group is deleted. Deleting a group requires only inputting the group name. According to the input group name, the PIP is firstly informed to delete the group name and the related class name, all vectors and attributes in the vector ID list are deleted from the image feature vector library and the vector attribute library, and finally the group of structural bodies are deleted from the resource set. If the accuracy of extracting the image attributes in the group is low or aliasing exists among the groups during image retrieval, the administrator can delete the related groups and add a new group after the classification model is retrained or the grouping structure is adjusted.

The method 3 comprises the following steps: and extracting image attributes. The input is as follows: an image file path. As shown in fig. 10, firstFirstly, an image I most similar to the image I to be detected is obtained by using an image retrieval method ₁ Recording a distance D ₁ Is shown by ₁ Property (G) of ₁ ，C ₁ ) As a recommended attribute. Judging a grouping condition (6), if the grouping condition is not met, rejecting the recommended attribute, and marking the image as an unknown group; if the grouping condition is satisfied, the recommended group name G is accepted ₁ (ii) a Continuously judging the classification condition (5), and if the classification condition is not met, receiving the recommended class name C ₁ If the classification condition is satisfied, use G ₁ The set of classification models predict the image class name.

Further, based on the foregoing method, an embodiment of the present invention further provides a system for extracting an image content attribute based on retrieval and classification, including: a data construction module and a prediction output module, wherein,

the prediction output module is used for representing the images by using sample points in the characteristic space, retrieving and acquiring the most similar image to the image to be detected from the image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the most similar image and the image to be detected, and acquiring the corresponding image content attribute pair through the most similar image; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured according to the recommended attribute.

To verify the validity of the scheme, the following further explanation is made by combining experimental data:

and (4) selecting a digital library virtual scene through experiments, and extracting attributes from the internal resource image for access control. The effectiveness of the attribute extraction method is checked from two aspects of robustness and continuous learning respectively, and related relevant elements are as follows:

setting a scene: some digital library is planned to host protected wildbird campaigns for 24-5-30 of 2022 months, and the event preparation team needs to access resources in the library image database in order to prepare multimedia courseware. The library contains public and non-public images, with three working groups for non-public images: "Land" (UC Mercded Land Use Dataset), "Office" (Office 10_ caltech), "bird" (10 Birds screened from ILSVRC2012, 1000 images per category). The class names contained in each group are shown in table 1:

TABLE 1 group name and class name

Training each group of classification models: for each group, 70 images from each category are selected as a training set, and the model is tested in all the images after training. The classification model uses Resnet50, and after 100epochs training, the model parameter with the highest classification accuracy of the test set is taken and stored.

Constructing a retrieval set: the retrieval set extracts image feature vectors using a VGG19 model pre-trained on the ILSVRC2012 dataset and builds a vector database using Milvus, with the retrieval distances being in euclidean distances. Each category uses all 70 images in the training data to extract features, building a vector attribute library using Redis.

Constructing IAE: and calling a method 1 in the IAE method set to add Office groups into the robustness experiment, and adding Land and Birds groups one by one into a plurality of groups of image attribute extraction experiments to complete the construction of the IAE.

The Baseline method comprises the following steps: all known groups are combined in a robustness experiment and a plurality of groups of image attribute extraction experiments respectively, a unified classification model is obtained through training respectively, and class attributes are extracted by using an image classification method.

And (3) robustness experiment: assuming that the known group only has Office, the experiment uses Office10 data set, and extracts image attributes through image classification (Baseline) and scheme of the scheme, and the accuracy comparison is shown in table 2.

Table 2 robustness test (accuracy (%))

Serial number	Domain name	Baseline	Scheme of scheme
				1	amazon	85.80	90.71
2	caltech	96.17	96.35
				3	dslr	75.16	82.80
4	webcam	67.12	76.61

The Office10 data set has four domains, and the image class structure in each domain is the same, but the images in different domains have slight differences. The accuracy of the classification model trained in only one domain is significantly reduced when it is used for classification in other domains, and the data set is usually used to check the robustness of the classification method. The filtech domain is used as an Office group in an experiment for training a classification model, so that the filtech domain has high accuracy. As can be seen from the table, compared with the method of simply using the image classification, the method of extracting the image attributes by adopting the scheme has higher accuracy in all four domains, the average improvement is 5.56%, and stronger robustness is shown.

Multiple sets of image attribute extraction experiments: on the basis of robustness experiments, 2 known sets of Land and bids were added. The experiment used each known set of images and an unknown set of images (100 images randomly selected from non-birds in ILSVRC 2012), and extracted the attributes using image classification and the scheme of this case, with the accuracy shown in table 3.

Table 3 multiple sets of image attribute extraction (accuracy (%))

As seen from the table, the continued learning ability of the IAE was shown by adding multiple groups. For the known images of each group, the accuracy of the scheme is higher; for an unknown group of images, the image classification method cannot effectively identify the unknown group of images, and the scheme can distinguish the unknown group of images from the known group of images with higher accuracy.

And comparing access control strategies: according to scene setting and work tasks, the preparation group registers that users need to access Birds and habitat images thereof, including all images of the bird group and partial images of the Land group, the access control strategy is expressed as follows before the activity is finished, and the access time is allowed to be:

the effect is as follows: allow for

The operation is as follows: reading

A main body: provisioning group registered users

Resource:

environment: 31 days before 5 months of 2022

Due to the adoption of two-level attributes of groups and classes, the description of resources by using the image attributes in the scheme is simpler and more convenient, and the access control effect is as shown in figure 11. Compared with the method for extracting attributes in a classified manner, the scheme has higher robustness while keeping higher accuracy; IAEs can learn continuously in groups and can distinguish between known and unknown groups. The robustness is improved because the grouping condition is set, the image to be detected with larger difference with the training set is screened out, and the attribute is extracted by using the image retrieval method, so that the rapid decrease of the accuracy of the image classification method caused by weak algorithm robustness is avoided. The relative distance setting enables the IAE to continuously learn and identify unknown group images. In an image retrieval feature space, various sample points have the characteristic of clustering, various clusters in the same group are generally close to each other, but the distribution of samples is not uniform and has no regular shape, so that the scheme takes classes as a unit, and a threshold value of a distance is established locally to distinguish known groups from unknown groups. The method has dynamic property and ambiguity, and after a new group is learned, the image of the original unknown group name can be judged as a known group; when different images are selected to establish a search base, the sample points and the local thresholds of all groups of images are changed, and the experimental result has a certain degree of uncertainty. In addition, the two-level attributes of groups and classes also make the access control policy simpler.

In the embodiment of the scheme, an attribute extraction method combining image retrieval and classification is provided on the basis of image classification and attribute extraction based on access control requirements. The advantages of the classification and retrieval methods are combined by grouping and then classifying and setting classification conditions, and the effectiveness of the scheme is verified by using experimental results.

Unless specifically stated otherwise, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The elements of the various examples and method steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and the components and steps of the examples have been described in a functional generic sense in the foregoing description for clarity of hardware and software interchangeability. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Those skilled in the art will appreciate that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, which may be stored in a computer-readable storage medium, such as: read-only memory, magnetic or optical disk, and the like. Alternatively, all or part of the steps of the foregoing embodiments may also be implemented by using one or more integrated circuits, and accordingly, each module/unit in the foregoing embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or changes to the embodiments described in the foregoing embodiments, or make equivalent substitutions for some features, within the scope of the disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image content attribute extraction method based on retrieval and classification is characterized by comprising the following steps:

constructing a retrieval set containing images of each image classification task training set and an image attribute generator for extracting image content attributes, wherein the image attribute generator is composed of corresponding relations between image related elements and elements in a resource set, and the related elements at least comprise: the image retrieval system comprises a classification model, a classification task group name and a classification task group name, wherein the classification model is used for obtaining image category attributes, the class name of the category to which the image belongs and the classification task group name corresponding to the category to which the image belongs, any one image in an image retrieval library has a unique group name and a unique class name, and a corresponding image content attribute pair is formed by the group name and the class name;

representing images by using sample points in the characteristic space, retrieving and acquiring the most similar image to the image to be detected from an image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the images and the most similar image, and acquiring a corresponding image content attribute pair through the most similar image; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured through the recommended attribute.

2. The retrieval and classification-based image content attribute extraction method according to claim 1, wherein the retrieval set is composed of a feature extraction model, an image feature vector library and a vector attribute library, and the resource set of the image attribute generator is composed of a plurality of groups of structures, each group of structures comprising: the system comprises a group name for representing a group ID, a class name list for representing all class names in the group and the number of images added into a retrieval set in each class, a classification model for obtaining the attribute of the image class, an M value for representing the output parameter of the classification model, and a vector ID list for recording vectors in an image feature vector library contained in each class in the group.

3. The retrieval and classification-based image content attribute extraction method according to claim 1 or 2, wherein the classification model adopts a lightweight deep learning model, the model is trained and learned by using the labeled data set, and the training and learning process of the model is completed before the group structure is established.

4. The retrieval and classification-based image content attribute extraction method according to claim 2, wherein the feature extraction model is pre-trained by using a known data set, the image feature vector library comprises image feature vectors of each category in each group of the resource set, each image feature vector is provided with an ID, and the ID and the image attribute are used for generating a corresponding relationship.

5. The retrieval and classification-based image content attribute extraction method according to claim 2, wherein for the most similar image, matching is performed in the vector attribute library according to the image feature vector of the most similar image, and an image content attribute pair corresponding to the image feature vector in the most similar image is obtained through matching.

6. The retrieval and classification-based image content attribute extraction method according to claim 1 or 5, characterized in that, for the obtained feature space distance and recommended attribute, if the feature space distance between the image to be detected and the most similar image is greater than a preset distance threshold, it is determined that the current image to be detected belongs to an unknown group image, otherwise, the group name in the recommended attribute is accepted as the group name of the image to be detected, and the class name of the image to be detected is predicted through the classification probability and power law distribution of the same group image data.

7. The retrieval and classification-based image content attribute extraction method according to claim 6, wherein, for the preset distance threshold, a class name in the obtained image content attribute is used as a category key, a nearest neighbor image feature vector of the category key is retrieved in the image feature vector library according to a ranking condition, a maximum feature space distance is obtained according to the nearest neighbor image feature vector, and the distance threshold is set according to the maximum feature space distance.

8. The retrieval and classification-based image content attribute extraction method according to claim 6, wherein in predicting the category name of the image to be detected, firstly, for the image to be detected, obtaining an image probability vector through a classification model, taking logarithm of each probability element in the image probability vector, and obtaining power law distribution of the image probability vector by using the value after taking logarithm; then, the same group of sample image data is taken to carry out a plurality of times of power law distribution experiments so as to

9. The image content attribute extraction method based on retrieval and classification as claimed in claim 8, wherein the category name of the image to be detected is obtained according to the positive and negative sample set distribution rule

As a classification condition, when the classification is satisfiedAnd in condition, acquiring the name of the image to be detected by using the classification model corresponding to the group name through an image classification method, otherwise, receiving the name of the class in the recommendation attribute corresponding to the group name by using an image retrieval method, wherein M is the M value of the output parameter of the classification model, and M is the value of the classification model

10. An image content attribute extraction system based on retrieval and classification, comprising: a data construction module and a prediction output module, wherein,

the prediction output module is used for representing the images by using sample points in the characteristic space, retrieving and acquiring the most similar images to the images to be detected from the image retrieval library according to the characteristic space distance between the images, recording the characteristic space distance between the images and the most similar images, and acquiring the corresponding image content attribute pair through the most similar images; and taking the obtained image content attribute pair as a recommended attribute, and predicting the image content attribute of the image to be measured according to the recommended attribute.