CN116824248A - Scenerised image recognition method and device integrated with personalized tag - Google Patents

Scenerised image recognition method and device integrated with personalized tag Download PDF

Info

Publication number
CN116824248A
CN116824248A CN202310771392.0A CN202310771392A CN116824248A CN 116824248 A CN116824248 A CN 116824248A CN 202310771392 A CN202310771392 A CN 202310771392A CN 116824248 A CN116824248 A CN 116824248A
Authority
CN
China
Prior art keywords
sample
image
scene
information
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310771392.0A
Other languages
Chinese (zh)
Inventor
周源杰
林静
吴佳彧
陈建军
王利琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Public Information Industry Co ltd
Original Assignee
Zhejiang Public Information Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Public Information Industry Co ltd filed Critical Zhejiang Public Information Industry Co ltd
Priority to CN202310771392.0A priority Critical patent/CN116824248A/en
Publication of CN116824248A publication Critical patent/CN116824248A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Abstract

The invention discloses a method and a device for identifying a scenerised image integrated with a personalized tag, and relates to the field of computer vision, wherein the method comprises the following steps: acquiring a target scene image; inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene. The invention can clearly and accurately identify the images under different scenes, and is particularly suitable for identifying the scenes with partial differences.

Description

Scenerised image recognition method and device integrated with personalized tag
Technical Field
The invention relates to the field of computer vision, in particular to a method and a device for identifying a scenerised image integrated with a personalized tag.
Background
Image recognition in computer vision is a process of mapping from image space to feature space and then from feature space to class space. How to select and use these features, the training data of sample reorganization and distribution equalization is seriously relied on, and the data set needs to contain various kinds of data in scenes, but the data size can increase exponentially, which brings great challenges to image recognition.
In practical application, data unbalance and data complexity widely exist, and particularly in some special scenes, the image recognition model cannot be accurately recognized due to influences of installation angles, heights, distances, illumination conditions, some special conditions and the like, so that the model performance is greatly reduced.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method and a device for identifying a scenerised image integrated with a personalized tag, so as to solve the problem that the existing model cannot accurately identify a special scene.
According to a first aspect, an embodiment of the present invention provides a method for identifying a scenerised image incorporated into a personalized tag, the method comprising:
acquiring a target scene image;
inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
With reference to the first aspect, in a first implementation manner of the first aspect, the scenerized image recognition model includes:
the device comprises a feature extraction layer, a feature fusion layer, a similarity calculation layer and a multi-center classification layer;
the feature extraction layer is used for extracting features of the target scene image;
the feature fusion layer is used for fusing at least one differential feature vector extracted from the target scene image; the differential feature vectors comprise element information and position information of elements of the target scene and/or elements generated by user behaviors in the target scene, and the number of the differential feature vectors corresponds to the number of the differential elements of the target scene;
the similarity calculation layer is used for determining similarity between the differential information obtained by fusing at least one differential feature vector and sample differential information stored in a preset database; the sample differentiation information is obtained by fusing at least one sample differentiation characteristic vector corresponding to a sample scene, and the sample differentiation characteristic vector is extracted from a sample image;
the multi-center classification layer is used for classifying the differentiated information based on the similarity.
With reference to the first embodiment of the first aspect, in a second embodiment of the first aspect, the inputting the target scene image into the trained scene image recognition model, to obtain the scene information corresponding to the target scene output by the scene image recognition model specifically includes:
inputting the target scene image into the feature extraction layer to obtain at least one differentiated feature vector corresponding to the target scene image output by the feature extraction layer;
inputting at least one differential feature vector into the feature fusion layer to obtain the differential information output by the feature fusion layer;
inputting the differential information into the similarity calculation layer to obtain the similarity between the differential information output by the similarity calculation layer and each sample differential feature vector;
and inputting the similarity and the differentiation information into the multi-center classification layer to obtain scene information corresponding to the target scene output by the multi-center classification layer.
With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, the scenerized image recognition model is obtained through training:
acquiring the sample image;
extracting at least one sample differential vector from the sample image, and fusing the at least one sample differential vector to obtain sample differential information corresponding to the sample image; the sample differentiation feature vectors comprise element information and position information of elements of the sample scene corresponding to the sample image and/or elements generated by user behaviors in the sample scene, and the number of the sample differentiation feature vectors corresponds to the number of differentiated elements of the sample scene;
and taking the sample image as input data for training, taking sample differentiation information as a label for training, and generating the scene image recognition model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
With reference to the first aspect, in a fourth implementation manner of the first aspect, the scenerized image recognition model includes:
a feature extraction model and a personalized tag model;
the feature extraction model is used for extracting features of the target scene image;
the device comprises a feature connection layer, a similarity calculation layer, a multi-center classification layer and a posterior verification layer;
the feature connection layer is used for fusion connection of at least one differential feature vector extracted by the feature extraction model; the differential feature vectors comprise element information and position information of elements of the target scene and/or elements generated by user behaviors in the target scene, and the number of the differential feature vectors corresponds to the number of the differential elements of the target scene;
the similarity calculation layer is used for determining similarity between the differential information obtained by fusing at least one differential feature vector and sample differential information stored in a preset database; the sample differentiation information is obtained by fusing at least one sample differentiation characteristic vector corresponding to a sample scene, and the sample differentiation characteristic vector is extracted from a sample image;
the multi-center classification layer is used for classifying the differentiated information based on the similarity;
the posterior verification layer is used for updating the multi-center classification layer based on posterior probability obtained after verification of the classification result of the differential information.
With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model specifically includes:
inputting the target scene image into the feature extraction model to obtain at least one differentiated feature vector corresponding to the target scene image output by the feature extraction model;
inputting at least one differential feature vector into the feature connection layer to obtain the differential information output by the feature connection layer;
inputting the differential information into the similarity calculation layer to obtain the similarity between the differential information output by the similarity calculation layer and each sample differential feature vector;
and inputting the similarity and the differentiation information into the multi-center classification layer to obtain scene information corresponding to the target scene output by the multi-center classification layer.
With reference to the fourth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the scenerized image recognition model is obtained through training:
pre-training to obtain the feature extraction model;
acquiring the personalized label marked by the sample image;
inputting the sample image into the feature extraction model to obtain at least one sample differentiation vector output by the feature extraction model;
and taking the sample differential vector as input data for training, taking the personalized label as a label for training, and generating the personalized label model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
According to a third aspect, an embodiment of the present invention further provides a device for identifying a scenerised image incorporated into a personalized tag, the device comprising:
the acquisition module is used for acquiring the target scene image;
the recognition module is used for inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
According to a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method for identifying a scenic image incorporated into a personalized tag as described in any one of the above when the program is executed.
According to a fourth aspect, embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of scenic image recognition incorporating a personalized tag as described in any of the above.
According to the scene identification method and device integrating the personalized tag, the trained scene identification model is used for extracting the user behavior and the special elements of the scene in the target scene image as the differential information of the target scene, the differential information is used as an important reference factor in the identification process, the difference between different scenes is enlarged, the scene identification model of the personalized tag can be formed aiming at different service systems, the accuracy of scene identification can be improved, the identification effect in the actual application scene is greatly improved, the images under different scenes can be identified more clearly and accurately, and the method and device are particularly suitable for scene identification with partial difference of the scenes.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the invention in any way, in which:
FIG. 1 shows a schematic flow chart of a method for identifying a scenerised image incorporated into a personalized tag according to the present invention;
fig. 2 shows one of schematic structural diagrams of a scene image recognition model in the scene image recognition method incorporated with the personalized tag provided by the invention;
FIG. 3 shows a second schematic view of a scene image recognition model in the scene image recognition method incorporated with the personalized tag according to the present invention;
fig. 4 is a schematic structural diagram of a scenerised image recognition device incorporating a personalized tag according to the present invention;
fig. 5 shows a schematic structural diagram of an electronic device of the method for identifying a scenerised image incorporated with a personalized tag.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
In practical application, the image recognition model in computer vision is not accurately recognized by using the collected image data images as offline data training due to the influence of data unbalance and data complexity, especially under some special scenes, such as installation angles, heights, distances, illumination conditions, some special conditions and the like, and the performance of the image recognition model is greatly reduced, so that the established image recognition task cannot be completed.
In order to solve the above-mentioned problems, a method for identifying a scenerised image incorporated with a personalized tag is provided in the present embodiment. The method for identifying the scenerised image integrated with the personalized tag according to the embodiment of the invention can be used in electronic equipment, including but not limited to a computer, a mobile terminal and the like, and fig. 1 is a schematic flow diagram of the method for identifying the scenerised image integrated with the personalized tag according to the embodiment of the invention, as shown in fig. 1, the method comprises the following steps:
s10, acquiring a target scene image. The target scene image is the image information to be processed (identified) of the target scene to be identified, and real-time online analysis is performed later.
In the embodiment of the invention, the target scene image may be acquired by the electronic device from the outside, for example, by acquiring the image by a pre-arranged image capturing device, and the electronic device may be acquired from an external image capturing device, or may be stored in the electronic device in advance, or the like. The specific acquisition form of the target scene image is not limited at all, and the electronic equipment can acquire the target scene image.
S20, inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model. The scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
The scene information is a scene where the target scene is specifically located.
Although some scenes have very similar visual appearances, each scene may still have some differences, and the reason for the differences may be difference information generated due to user behaviors, or may be defined by special elements of the scene itself, or may be defined by multiple factors such as user behaviors and special elements of the scene. These differences may particularly manifest themselves in some subtle differences between two types of the same scene, e.g., scene a has a special warning sign, scene B requires the user to wear/wear special apparel/wear, scene C requires the user to be located in a designated area, etc.
For a particular scene, the definition may be based on the particular elements that are different from other scenes, and these particular elements are the above-mentioned differentiation information. Therefore, the scenerification image recognition model can be used for scenerification image recognition with differences such as intelligent community access control recognition, kitchen range-brightening worker clothes recognition and beauty country bayonet recognition.
According to the scene image recognition method integrated with the personalized tag, the trained scene image recognition model is used for extracting the difference information of the target scene image, which is formed by the user behavior and the special elements of the scene, and the difference information is used as an important reference factor in the recognition process, so that the difference between different scenes is enlarged, the scene recognition model of the personalized tag can be formed aiming at different service systems, the accuracy of scene recognition can be improved, the recognition effect in the actual application scene is greatly improved, the images under different scenes can be recognized more clearly and accurately, and the scene recognition method is particularly suitable for scene recognition with partial difference of the scenes.
The following is a method for identifying a scenerised image integrated with a personalized tag according to the present invention in combination with fig. 2, where a specific model architecture of the scenerised image identification model includes:
the system comprises a feature extraction layer, a feature fusion layer, a similarity calculation layer and a multi-center classification layer.
The feature extraction layer is used for extracting at least one differential feature vector of the target scene image, the differential feature vector comprises element information and position information of elements generated by self elements of the target scene and/or user behaviors in the target scene, it can be understood that the element information and the position information are differential information of the scene, and for one scene, the differential feature vector with a corresponding number of differential elements can be extracted. For a given target scene image or sample image x, feature vectors of h×w×d are extracted by a feature extraction layer:
wherein x represents an input image, which may be a target scene image or a sample image;representing the extracted differential feature vector, which is also the output of the feature extraction layer or the feature information extracted by the feature extraction layer; θ represents a feature extraction parameter of the feature extraction layer f (·); h×w represents the size of the feature map output by the feature extraction layer; d represents the number of channels in the feature extraction layer.
The feature extraction layer may be expressed as a 3-order tensor having h×w×d elements, and includes a two-dimensional feature map such that the feature information extracted by the feature extraction layer has local visual information and spatial information.
In the embodiment of the invention, the feature extraction layer is a specified layer of the scene image recognition model for outputting tag information, namely the scene image recognition model specifies the tag information of the output of the feature extraction layer as input data, namely the personalized tag.
More specifically, in the embodiment of the present invention, the feature extraction layer includes four first convolution layers and three first pooling layers, and adjacent first convolution layers are connected through the first pooling layers. Each first convolution layer comprises a second convolution layer, a normalization layer (Bathch Normalization, BN) and an activation layer (Relu) which are connected in sequence, and corresponding characteristic information is obtained through a group of convolution, normalization and activation layer processing.
Preferably, the second convolution layer employs a convolution kernel of 3 x 3 size.
In consideration of optimizing the performance of the feature extraction layer, in the embodiment of the invention, the acquired sample scene comprises image sets of side view, bottom view, overlook view and other view angles, the acquired sample scene is a universal sample image set, the sample images are enriched, and the performance of the feature extraction layer/feature extraction model in the subsequent training process is ensured.
The feature fusion layer is used for fusing at least one differential feature vector to obtain differential information of the target scene. The extracted differential feature vector has information such as position information, and based on the information, at least one differential feature vector can be assembled and spliced to obtain differential information, wherein the differential information comprises various differential elements, categories of the elements, spatial distances among the elements and the like. It can be understood that when the extracted differential feature vector is only one, the feature fusion layer outputs the differential feature vector.
The similarity calculation layer is used for calculating the similarity between the differential information, and for given two comparison objectsAnd +.>The similarity (similarity vector) between the two comparison objects is calculated by the similarity calculation layer:
wherein, the liquid crystal display device comprises a liquid crystal display device,representation->And->Similarity between, and->Controlling the similarity to be between 0 and 1 through a softmax function; />A similarity parameter representing a similarity calculation layer F (·); n represents the number of nodes of the corresponding network.
Finally, a similarity matrix S can be obtained through a similarity calculation layer lc ∈R N×N
The multi-center classification layer is used for determining scene information of a target scene based on the similarity, classifying the scene information by adopting a multi-center nearest neighbor mean classifier, and obtaining a plurality of category center sets by utilizing a k-means clustering mode, wherein each category corresponds to one type of scene information.
The output of the multi-center classifying layer is the output of the scene image recognition model.
Accordingly, the scenerized image recognition model shown in fig. 2 is obtained through training by the following steps:
a10, acquiring a sample image.
In the embodiment of the present invention, the sample image may be acquired by an electronic device from the outside, for example, by acquiring an image by a pre-arranged image capturing device, and the electronic device may be acquired from an external image capturing device, or may be stored in the electronic device in advance, or the like. The specific acquisition form of the sample image is not limited, and the electronic device can acquire the sample image.
A20, extracting at least one sample differential vector from the sample image, and fusing the at least one sample differential vector to obtain sample differential information corresponding to the sample image. And extracting the label information through the characteristic extraction layer.
A30, taking the sample image as input data for training, taking sample differentiation information as a label for training, and generating a scene image recognition model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
The following is a method for identifying a scenerised image integrated with a personalized tag according to the present invention in combination with fig. 3, where a specific model architecture of the scenerised image identification model includes:
a feature extraction model and a personalized tag model.
As some preferred implementations of the embodiments of the present invention, a feature extraction model for feature extraction is first pre-trained from the sample image. The specific structure of the feature extraction model is shown in detail as a feature extraction layer in fig. 2.
The model framework of the personalized label model comprises a characteristic connection layer, a similarity calculation layer, a multi-center classification layer, a posterior verification layer,
The specific structures of the feature connection layer, the similarity calculation layer and the multi-center classification layer are shown in detail as a feature fusion layer, a similarity calculation layer and a multi-center classification layer shown in fig. 2.
When the personalized label model is trained, label labeling is carried out on sample images trained by the model, and personalized labels for the sample images are obtained. The personalized label can be extracted by a model such as a feature extraction model, and can also be obtained based on a manual labeling mode.
In the following multi-center classification layer:
wherein S represents a set of centers of the categories;a center representing a category; c represents the number of centers of each category; n is n O Features representing participation in pre-trainingExtracting the number of sample images of the model; n is n N And representing the number of sample images subjected to label labeling to obtain personalized labels.
In the embodiment of the invention, the performance, namely the accuracy, of the scene image recognition model is continuously verified in the process of training the scene image recognition model.
Specifically, the posterior verification layer calculates the posterior probability of each category, and optimizes and corrects the scenerised image recognition model based on the posterior probability of each category.
Wherein P (i|z) represents the posterior probability of the i-th class;representing the features z (features extracted by the feature extraction model) to->Is a Euclidean distance of (2); />Representing sample image predictionsA label; y is i A label representing the sample image; />Representing a classification result of the sample image; m represents the total number of sample images, n O +n N
The feature extraction model and personalized tag model based scene image recognition model is decoupled from a service system, the two models can be packaged into independent components, the method is suitable for docking any service system, scene image recognition is not carried out through a real-time similarity learning algorithm, the requirement on server configuration is lower, and the recognition efficiency is higher. The recognition effect of the image in the actual application scene is greatly improved, the images under different scenes can be recognized clearly and accurately, the basic feature extraction model is only required to be trained, the scene training is carried out on the feature extraction model, and the reconstruction process of the model is quickened.
Accordingly, the scenerized image recognition model shown in fig. 3 is obtained through training by the following steps:
b10, pre-training to obtain a feature extraction model,
and B20, acquiring a sample image and a personalized label marked by the sample image. The personalized label can be extracted by a model such as a feature extraction model, and can also be obtained based on a manual labeling mode.
B30, inputting the sample image into the feature extraction model to obtain at least one sample differentiation vector output by the feature extraction model;
and B40, taking the sample differential vector as input data for training, taking the personalized label as a label for training, and generating a personalized label model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
The description of the personalized tag-incorporated scenerised image recognition device provided by the embodiment of the invention is provided below, and the personalized tag-incorporated scenerised image recognition device described below and the personalized tag-incorporated scenerised image recognition method described above can be referred to correspondingly.
In order to solve the above-described problems, a scenerised image recognition apparatus incorporating a personalized tag is provided in the present embodiment. The personalized tag-incorporated scenerised image recognition device according to the embodiment of the present invention may be used in an electronic device, including but not limited to a computer, a mobile terminal, etc., and fig. 4 is a schematic structural diagram of the personalized tag-incorporated scenerised image recognition device according to the embodiment of the present invention, as shown in fig. 4, the device includes:
an acquisition module 10 is configured to acquire an image of a target scene. The target scene image is the image information to be processed (identified) of the target scene to be identified, and real-time online analysis is performed later.
In the embodiment of the invention, the target scene image may be acquired by the electronic device from the outside, for example, by acquiring the image by a pre-arranged image capturing device, and the electronic device may be acquired from an external image capturing device, or may be stored in the electronic device in advance, or the like. The specific acquisition form of the target scene image is not limited at all, and the electronic equipment can acquire the target scene image.
The recognition module 20 is configured to input the target scene image into the trained recognition model of the scene image, and obtain scene information corresponding to the target scene output by the recognition model of the scene image. The scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
The scene information is a scene where the target scene is specifically located.
Although some scenes have very similar visual appearances, each scene may still have some differences, and the reason for the differences may be difference information generated due to user behaviors, or may be defined by special elements of the scene itself, or may be defined by multiple factors such as user behaviors and special elements of the scene. These differences may particularly manifest themselves in some subtle differences between two types of the same scene, e.g., scene a has a special warning sign, scene B requires the user to wear/wear special apparel/wear, scene C requires the user to be located in a designated area, etc.
Therefore, the scenerification image recognition model can be used for scenerification image recognition with differences such as intelligent community access control recognition, kitchen range-brightening worker clothes recognition and beauty country bayonet recognition.
According to the scene image recognition device integrated with the personalized tag, the trained scene image recognition model is used for extracting the difference information of the target scene image, which is formed by the user behavior and the special elements of the scene, and the difference information is used as an important reference factor in the recognition process, so that the difference between different scenes is enlarged, the scene recognition model of the personalized tag can be formed aiming at different service systems, the accuracy of scene recognition can be improved, the recognition effect in the actual application scene is greatly improved, the images under different scenes can be recognized more clearly and accurately, and the scene recognition device is particularly suitable for scene recognition with partial difference of the scenes.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a method of scenic image recognition incorporating personalized tags, the method comprising:
acquiring a target scene image;
inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method for identifying a scenerised image incorporating a personalized tag provided by the methods described above, the method comprising:
acquiring a target scene image;
inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method for identifying a scenerised image incorporating a personalized tag provided by the methods described above, the method comprising:
acquiring a target scene image;
inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for identifying a scenerised image incorporating a personalised tag, the method comprising:
acquiring a target scene image;
inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
2. The method for identifying a scenerised image incorporating a personalised label according to claim 1, wherein the scenerised image identification model comprises:
the device comprises a feature extraction layer, a feature fusion layer, a similarity calculation layer and a multi-center classification layer;
the feature extraction layer is used for extracting features of the target scene image;
the feature fusion layer is used for fusing at least one differential feature vector extracted from the target scene image; the differential feature vectors comprise element information and position information of elements of the target scene and/or elements generated by user behaviors in the target scene, and the number of the differential feature vectors corresponds to the number of the differential elements of the target scene;
the similarity calculation layer is used for determining similarity between the differential information obtained by fusing at least one differential feature vector and sample differential information stored in a preset database; the sample differentiation information is obtained by fusing at least one sample differentiation characteristic vector corresponding to a sample scene, and the sample differentiation characteristic vector is extracted from a sample image;
the multi-center classification layer is used for classifying the differentiated information based on the similarity.
3. The method for identifying a scenerised image incorporated with a personalized tag according to claim 2, wherein the step of inputting the target scenerised image into the trained scenerised image identification model to obtain the scenerised information corresponding to the target scenerised output by the scenerised image identification model comprises the following steps:
inputting the target scene image into the feature extraction layer to obtain at least one differentiated feature vector corresponding to the target scene image output by the feature extraction layer;
inputting at least one differential feature vector into the feature fusion layer to obtain the differential information output by the feature fusion layer;
inputting the differential information into the similarity calculation layer to obtain the similarity between the differential information output by the similarity calculation layer and each sample differential feature vector;
and inputting the similarity and the differentiation information into the multi-center classification layer to obtain scene information corresponding to the target scene output by the multi-center classification layer.
4. The method for identifying the scenerised image integrated into the personalized tag according to claim 2, wherein the scenerised image identification model is obtained through training of the following steps:
acquiring the sample image;
extracting at least one sample differential vector from the sample image, and fusing the at least one sample differential vector to obtain sample differential information corresponding to the sample image; the sample differentiation feature vectors comprise element information and position information of elements of the sample scene corresponding to the sample image and/or elements generated by user behaviors in the sample scene, and the number of the sample differentiation feature vectors corresponds to the number of differentiated elements of the sample scene;
and taking the sample image as input data for training, taking sample differentiation information as a label for training, and generating the scene image recognition model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
5. The method for identifying a scenerised image incorporating a personalised label according to claim 1, wherein the scenerised image identification model comprises:
a feature extraction model and a personalized tag model;
the feature extraction model is used for extracting features of the target scene image;
the device comprises a feature connection layer, a similarity calculation layer, a multi-center classification layer and a posterior verification layer;
the feature connection layer is used for fusion connection of at least one differential feature vector extracted by the feature extraction model; the differential feature vectors comprise element information and position information of elements of the target scene and/or elements generated by user behaviors in the target scene, and the number of the differential feature vectors corresponds to the number of the differential elements of the target scene;
the similarity calculation layer is used for determining similarity between the differential information obtained by fusing at least one differential feature vector and sample differential information stored in a preset database; the sample differentiation information is obtained by fusing at least one sample differentiation characteristic vector corresponding to a sample scene, and the sample differentiation characteristic vector is extracted from a sample image;
the multi-center classification layer is used for classifying the differentiated information based on the similarity;
the posterior verification layer is used for updating the multi-center classification layer based on posterior probability obtained after verification of the classification result of the differential information.
6. The method for identifying a scenerised image incorporated in a personalized tag according to claim 5, wherein the step of inputting the target scenerised image into the trained scenerised image identification model to obtain the scenerised information corresponding to the target scenerised image output by the scenerised image identification model comprises the following steps:
inputting the target scene image into the feature extraction model to obtain at least one differentiated feature vector corresponding to the target scene image output by the feature extraction model;
inputting at least one differential feature vector into the feature connection layer to obtain the differential information output by the feature connection layer;
inputting the differential information into the similarity calculation layer to obtain the similarity between the differential information output by the similarity calculation layer and each sample differential feature vector;
and inputting the similarity and the differentiation information into the multi-center classification layer to obtain scene information corresponding to the target scene output by the multi-center classification layer.
7. The method for identifying the scenerised image integrated into the personalized tag according to claim 5, wherein the scenerised image identification model is obtained through training by the following steps:
pre-training to obtain the feature extraction model;
acquiring the personalized label marked by the sample image;
inputting the sample image into the feature extraction model to obtain at least one sample differentiation vector output by the feature extraction model;
and taking the sample differential vector as input data for training, taking the personalized label as a label for training, and generating the personalized label model for obtaining scene information corresponding to the target scene image by adopting a machine learning mode.
8. A scenerised image recognition device incorporating a personalised tag, the device comprising:
the acquisition module is used for acquiring the target scene image;
the recognition module is used for inputting the target scene image into the trained scene image recognition model to obtain scene information corresponding to the target scene output by the scene image recognition model; the scene image recognition model is trained by sample images and sample differentiation information corresponding to the sample images, the sample differentiation information is a personalized label corresponding to the sample images, and the sample differentiation information is obtained based on elements of the sample scene and/or elements generated by user behaviors in the sample scene.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the scenic image recognition method of any one of claims 1 to 7 incorporated into a personalized tag when the program is executed.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for scenerised image recognition incorporating a personalized tag according to any one of claims 1 to 7.
CN202310771392.0A 2023-06-27 2023-06-27 Scenerised image recognition method and device integrated with personalized tag Pending CN116824248A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310771392.0A CN116824248A (en) 2023-06-27 2023-06-27 Scenerised image recognition method and device integrated with personalized tag

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310771392.0A CN116824248A (en) 2023-06-27 2023-06-27 Scenerised image recognition method and device integrated with personalized tag

Publications (1)

Publication Number Publication Date
CN116824248A true CN116824248A (en) 2023-09-29

Family

ID=88125322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310771392.0A Pending CN116824248A (en) 2023-06-27 2023-06-27 Scenerised image recognition method and device integrated with personalized tag

Country Status (1)

Country Link
CN (1) CN116824248A (en)

Similar Documents

Publication Publication Date Title
CN109255364B (en) Scene recognition method for generating countermeasure network based on deep convolution
US10762376B2 (en) Method and apparatus for detecting text
CN111709409B (en) Face living body detection method, device, equipment and medium
CN108734162B (en) Method, system, equipment and storage medium for identifying target in commodity image
CN109446889B (en) Object tracking method and device based on twin matching network
CN111476806B (en) Image processing method, image processing device, computer equipment and storage medium
CN110598019B (en) Repeated image identification method and device
CN112686812A (en) Bank card inclination correction detection method and device, readable storage medium and terminal
CN109145964B (en) Method and system for realizing image color clustering
CN115457531A (en) Method and device for recognizing text
CN111967429A (en) Pedestrian re-recognition model training method and device based on active learning
CN115862055A (en) Pedestrian re-identification method and device based on comparison learning and confrontation training
US20200050899A1 (en) Automatically filtering out objects based on user preferences
CN113553975B (en) Pedestrian re-identification method, system, equipment and medium based on sample pair relation distillation
CN114626476A (en) Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN112927783B (en) Image retrieval method and device
CN113936175A (en) Method and system for identifying events in video
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
CN117152459A (en) Image detection method, device, computer readable medium and electronic equipment
CN114119970B (en) Target tracking method and device
CN115909335A (en) Commodity labeling method and device
CN113743251B (en) Target searching method and device based on weak supervision scene
CN116258937A (en) Small sample segmentation method, device, terminal and medium based on attention mechanism
CN114155540B (en) Character recognition method, device, equipment and storage medium based on deep learning
CN116824248A (en) Scenerised image recognition method and device integrated with personalized tag

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination