CN116523028B

CN116523028B - Image characterization model training method and device based on image space position

Info

Publication number: CN116523028B
Application number: CN202310779761.0A
Authority: CN
Inventors: 孙海亮; 暴宇健
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-10-03
Anticipated expiration: 2043-06-29
Also published as: CN116523028A

Abstract

The disclosure relates to the technical field of machine learning, and provides an image characterization model training method and device based on image space positions. The method comprises the following steps: determining low-dimensional vector representation of each small image through a nonlinear coding network, and determining front and rear space latent vectors of the front K small images through an autoregressive network according to the low-dimensional vector representation of the front K small images; respectively passing the front and rear space latent vectors of the Kth small image through N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th small image to the Nth small image; calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image; calculating contrast learning loss according to the first K small images and the low-dimensional vector characterization of positive samples and negative samples of the first K small images; and updating model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete training of the image representation model.

Description

Image characterization model training method and device based on image space position

Technical Field

The disclosure relates to the technical field of machine learning, in particular to an image characterization model training method and device based on image space positions.

Background

The image representation is to represent the image information by numbers, and the image coding is an image representation method. Image characterization may be used for image recognition, image processing, image transmission, image utilization, and the like. To improve the efficiency of image characterization, machine learning may be utilized for image characterization. Currently used machine learning models, such as diffusion models and generation type countermeasure networks, only characterize a single image when the images are characterized, in fact, in many cases, there is a correlation between the images (for example, multiple images are all related to a scene, then there is a spatial position relationship between the multiple images related to the scene, and when one image is characterized, information of other images needs to be considered), and the correlation between the images is often ignored by the existing image characterization method. Furthermore, the usual loss functions have a weaker guiding effect on the training of the image characterization model.

In the process of implementing the disclosed concept, the inventor finds that at least the following technical problems exist in the related art: the model cannot consider the correlation between images when representing the images, and the guiding effect of the loss function on model training is weak.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide an image representation model training method, apparatus, electronic device and computer readable storage medium based on image space position, so as to solve the problem in the prior art that the correlation between images cannot be considered when the model represents the images, and the guiding effect of the loss function on model training is weak.

In a first aspect of an embodiment of the present disclosure, there is provided an image characterization model training method based on an image spatial position, including: constructing a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, taking a cyclic neural network as an autoregressive network, and constructing an image characterization model by utilizing the nonlinear coding network, the autoregressive network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized; acquiring a training data set, and dividing a training image in the training data set into N small images with equal size and sequencing according to the image space position; determining low-dimensional vector representation of each small image through a nonlinear coding network, and determining front and rear space latent vectors of the front K small images through an autoregressive network according to the low-dimensional vector representation of the front K small images, wherein the front and rear space latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear space latent vectors of the front small image of the small image; respectively passing the front and rear space latent vectors of the Kth small image through N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th small image to the Nth small image; calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image; positive samples and negative samples of the first K small images are determined from the training data set, and contrast learning loss is calculated according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images; and updating model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete training of the image representation model.

In a second aspect of the embodiments of the present disclosure, there is provided an image characterization model training device based on an image space position, including: the construction module is configured to construct a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, take a circulating neural network as an autoregressive network, and construct an image characterization model by utilizing the nonlinear coding network, the autoregressive network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized; the segmentation module is configured to acquire a training data set, and segment a training image in the training data set into N small images with equal size and sequencing according to the image space position; a first determination module configured to determine a low-dimensional vector representation of each small image through a nonlinear encoding network, a second determination module configured to determine front and rear spatial latent vectors of the first K small images through an autoregressive network from the low-dimensional vector representations of the first K small images, wherein the front and rear spatial latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear spatial latent vectors of the previous small image of the small image; the third determining module is configured to enable the front space latent vector and the rear space latent vector of the Kth small image to pass through N-K fully connected neural networks respectively to obtain predictive vector characterization of the (K+1) th small image to the (N) th small image; a first calculation module configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the k+1th through nth small images; the second calculation module is configured to determine positive samples and negative samples of the first K small images from the training data set, and calculate contrast learning loss according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images; and the updating module is configured to update the model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete the training of the image representation model.

In a third aspect of the disclosed embodiments, an electronic device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: because the embodiment of the disclosure constructs a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, a cyclic neural network is used as an autoregressive network, and an image characterization model is constructed by utilizing the nonlinear coding network, the autoregressive network and N-K fully connected neural networks, wherein each fully connected neural network is randomly initialized; acquiring a training data set, and dividing a training image in the training data set into N small images with equal size and sequencing according to the image space position; determining low-dimensional vector representation of each small image through a nonlinear coding network, and determining front and rear space latent vectors of the front K small images through an autoregressive network according to the low-dimensional vector representation of the front K small images, wherein the front and rear space latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear space latent vectors of the front small image of the small image; respectively passing the front and rear space latent vectors of the Kth small image through N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th small image to the Nth small image; calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image; positive samples and negative samples of the first K small images are determined from the training data set, and contrast learning loss is calculated according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images; according to the self-supervision loss and contrast learning loss, model parameters of the image characterization model are updated to complete training of the image characterization model, so that the problem that in the prior art, correlation between images cannot be considered when the model characterizes the images, and the guiding effect of a loss function on model training is weak can be solved, and further expression of image characterization information on related image information and guiding effect of the loss function on model training are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic flow chart (I) of an image characterization model training method based on image space position according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart (II) of an image characterization model training method based on image space position according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of an image representation model training device based on image space position according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

Fig. 1 is a schematic flow chart (a) of an image characterization model training method based on an image space position according to an embodiment of the disclosure. The image space position based image characterization model training method of fig. 1 may be performed by a computer or server, or software on a computer or server. As shown in fig. 1, the image characterization model training method based on the image space position includes:

s101, constructing a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, and constructing an image characterization model by utilizing the nonlinear coding network, the autoregressive network and N-K fully-connected neural networks by taking a circulating neural network as an autoregressive network, wherein each fully-connected neural network is randomly initialized;

s102, acquiring a training data set, and dividing a training image in the training data set into N small images with equal size and sequence according to the spatial position of the image;

s103, determining low-dimensional vector characterization of each small image through a nonlinear coding network;

s104, determining front and rear space latent vectors of the front K small images through an autoregressive network according to the low-dimensional vector representation of the front K small images, wherein the front and rear space latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear space latent vectors of the front small image of the small image;

s105, the front and rear space latent vectors of the Kth small image are respectively subjected to N-K fully connected neural networks to obtain predictive vector characterization from the Kth small image to the Nth small image;

s106, calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image;

s107, positive samples and negative samples of the first K small images are determined from the training data set, and contrast learning loss is calculated according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images;

s108, updating model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete training of the image representation model.

The recurrent neural network may be GRU (Gate Recurrent Unit). The nonlinear coding network is connected with an autoregressive network, and the autoregressive network is connected with N-K fully-connected neural networks which are mutually parallel to obtain an image representation model. Because each fully-connected neural network is randomly initialized, parameters within the plurality of fully-connected neural networks are all different. The training dataset has a plurality of training images, which may be considered hereinafter as processing one training image for ease of understanding. The image space position can be from left to back and from top to bottom, for example, a training image is transversely cut twice vertically to obtain nine small images, three rows and three columns are obtained, the first row is an image number 1, an image number 2 and an image number 3, the second row is an image number 4, an image number 5 and an image number 6, the third row is an image number 7, an image number 8 and an image number 9, the image space position of the training image is used for defining the sequence of the nine small images of the training image, and the sequence can be an image number 1, an image number 2, an image number 3, an image number 4, an image number 5, an image number 6, an image number 7, an image number 8 and an image number 9.

N is the number of small images, K is a fixed value, for example, N is 9,K and is 4, then one training image is divided into nine small images with sequencing, and 5 fully connected neural networks exist. All the small images are subjected to a nonlinear coding network to obtain low-dimensional vector representation of each small image, and the low-dimensional vector representation of the first 4 small images is subjected to an autoregressive network to obtain front and rear space latent vectors (each small image has a front and rear space latent vector); the front and rear space latent vectors of the 4 th small image are respectively passed through 5 fully connected neural networks, 5 results are obtained and are used as predictive vector representation of the last 5 small images (each small image is provided with a predictive vector representation); calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the last 5 small images; the contrast learning penalty is calculated from the low-dimensional vector characterizations of the positive and negative samples of the first 4 and the first 4 small images (one for each small image). The positive and negative samples of one small image are determined according to the distance between the images, for example, the positive sample of the 4 th small image may be the 5 th small image, the negative sample of the 4 th small image may be the 8 th small image, that is, the k+1th small image is taken as the positive sample of the K small image, and the k+4th small image is taken as the negative sample of the K small image.

Note that, since the 1 st small image has no small image before, the front-back space latent vector of the 1 st small image is only related to the low-dimensional vector characterization of the 1 st small image. The previous small image of the 3 rd small image is the 2 nd small image.

Optionally, each fully-connected neural network may be followed by a Relu activation function, and the nonlinear coding network, the autoregressive network, the N-K fully-connected neural networks, and each fully-connected neural network may be followed by a Relu activation function to construct an image characterization model. Each fully connected neural network may be single-layered.

Optionally, a training dataset is acquired, the training dataset comprising a plurality of sets of images, each set of images comprising N images, the N images of each set of images being related to one object, the N images of each set of images having a precedence order because they express information of different spatial positions of the object. The image characterization model is trained through a plurality of groups of images, and the method for training the image characterization model through each group of images is the same as the training method after the training image is divided into N small images according to the image space positions.

According to the technical scheme provided by the embodiment of the disclosure, a nonlinear coding network is constructed by utilizing a plurality of convolution layers and a plurality of pooling layers, a circulating neural network is used as an autoregressive network, and an image characterization model is constructed by utilizing the nonlinear coding network, the autoregressive network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized; acquiring a training data set, and dividing a training image in the training data set into N small images with equal size and sequencing according to the image space position; determining low-dimensional vector representation of each small image through a nonlinear coding network, and determining front and rear space latent vectors of the front K small images through an autoregressive network according to the low-dimensional vector representation of the front K small images, wherein the front and rear space latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear space latent vectors of the front small image of the small image; respectively passing the front and rear space latent vectors of the Kth small image through N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th small image to the Nth small image; calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image; positive samples and negative samples of the first K small images are determined from the training data set, and contrast learning loss is calculated according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images; according to the self-supervision loss and contrast learning loss, model parameters of the image characterization model are updated to complete training of the image characterization model, so that the problem that in the prior art, correlation between images cannot be considered when the model characterizes the images, and the guiding effect of a loss function on model training is weak can be solved, and further expression of image characterization information on related image information and guiding effect of the loss function on model training are improved.

The low-dimensional vector representation and the front-to-back spatial latent vectors of the kth small image are calculated as follows.

z _K =Genc(x _K )；

c _K =Gar(z _K ，c _K-1 )；

Genc () represents a nonlinear coding network, x _K Representing the Kth small image, z _K Low-dimensional vector representation representing the kth small image, gar () represents the autoregressive network, c _K Representing the front and back spatial latent vectors of the Kth small image, c _K-1 Representing the front and rear spatial latent vectors of the K-1 th small image.

The same is true of low-dimensional vector characterization and front-to-back spatial latent vector computation of other small images.

Self-monitoring losses are calculated by the following formula：

；

Where MSE () is the mean square error function,representing the predictive vector for the (K+i) th small image,>for low-dimensional vector characterization of the (K+i) th small image, i is a natural number, and the value of i is between 1 and N-K.

The contrast learning loss is calculated by the following formula：

；

wherein ,is a triple loss function, j, and> and />Are natural numbers, j is 1 to K, the +.>The small image is the positive sample of the j-th small image +.>The small image is a negative sample of the j-th small image,>the value is between 2 and K+1, < >>Take on a value between K+1 and N, z _j Low-dimensional vector characterization for the j-th small image,>is->Low-dimensional vector representation of small images, +.>Is->Low-dimensional vector characterization of the individual small images.

Updating model parameters of the image characterization model based on the self-supervised and contrast learning losses, comprising: calculating the total loss according to the following formula, and updating the model parameters of the image characterization model according to the total loss:

；

wherein ,for total loss->For self-supervision loss->To contrast learning loss->For weight adjustment factor, ++>The value range of (2) is between 0 and 1, and can be set by oneself.

In an alternative embodiment, the image characterization model is multi-stage trained: freezing model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model, and updating model parameters of a nonlinear coding network in the image characterization model according to contrast learning loss so as to complete first-stage training of the image characterization model; model parameters of the nonlinear coding network in the image characterization model are frozen, and model parameters of the autoregressive network and all the fully-connected neural networks in the image characterization model are updated according to the self-supervision loss, so that second-stage training of the image characterization model is completed.

In the first stage training, freezing model parameters of an autoregressive network and all fully-connected neural networks, updating model parameters of a nonlinear coding network according to contrast learning loss, wherein the autoregressive network and all fully-connected neural networks do not participate in the process; after the first-stage training is finished, defrosting model parameters of the autoregressive network and all the fully-connected neural networks, starting the second-stage training, freezing model parameters of the nonlinear coding network, updating model parameters of the autoregressive network and all the fully-connected neural networks according to self-supervision loss, wherein the nonlinear coding network participates in the process, but the model parameters of the nonlinear coding network are not updated (because the input of the autoregressive network uses the output of the nonlinear coding network, the nonlinear coding network participates in the process); after the second stage training is finished, model parameters of the nonlinear coding network are unfrozen, and at the moment, the training of the image characterization model is determined to be finished.

Fig. 2 is a schematic flow chart (ii) of an image characterization model training method based on an image space position according to an embodiment of the disclosure. As shown in fig. 2, includes:

s201, extracting hash features of the first K small images by using a hash algorithm;

s202, determining a comprehensive hash vector of each small image according to the hash characteristics of the small image and all small images before the small image;

s203, calculating characterization loss according to front and rear space latent vectors and comprehensive hash vectors of the front K small images;

s204, performing multi-stage training on the image characterization model: freezing model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model, and updating model parameters of a nonlinear coding network in the image characterization model according to contrast learning loss so as to complete first-stage training of the image characterization model;

s205, freezing model parameters of a nonlinear coding network and all fully-connected neural networks in the image characterization model, and updating model parameters of an autoregressive network in the image characterization model according to the characterization loss so as to complete second-stage training of the image characterization model;

s206, freezing model parameters of the nonlinear coding network and the autoregressive network in the image representation model, and updating model parameters of all the fully-connected neural networks in the image representation model according to the self-supervision loss so as to complete the third-stage training of the image representation model.

The hash feature of the 1 st small image is the comprehensive hash vector of the 1 st small image, the comprehensive hash vector of the 3 rd small image is determined according to the hash features of the 1 st, 2 nd and 3 rd small images, and the hash features of one small image and all the previous small images can be spliced together to be used as the comprehensive hash vector of the small image. The cross entropy loss function may be used to calculate a token loss between the front and back spatial latent vectors and the synthetic hash vector.

The multi-stage training in the embodiment of the present application is similar to that in the previous embodiment, and will not be described again.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of an image characterization model training device based on an image spatial position according to an embodiment of the present disclosure. As shown in fig. 3, the image characterization model training device based on the image space position includes:

a construction module 301 configured to construct a nonlinear coding network using a plurality of convolutional layers and a plurality of pooling layers, and construct an image characterization model using the nonlinear coding network, the autoregressive network, and N-K fully connected neural networks, with each fully connected neural network being randomly initialized;

the segmentation module 302 is configured to acquire a training data set, and segment a training image in the training data set into N small images with equal size and sequencing according to the image space position;

a first determination module 303 configured to determine a low-dimensional vector representation of each small image over a non-linear encoding network;

a second determining module 304 configured to determine front and rear spatial latent vectors of the first K small images from the low-dimensional vector representations of the first K small images through the autoregressive network, wherein the front and rear spatial latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear spatial latent vectors of the previous small image of the small image;

a third determining module 305, configured to pass the front and rear spatial latent vectors of the kth small image through N-K fully connected neural networks, respectively, to obtain predictive vector representations of the (k+1) th small image to the (N) th small image;

a first calculation module 306 configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the (k+1) -th small image to the (N) -th small image;

a second calculation module 307 configured to determine positive and negative samples of the first K small images from the training dataset, calculate a contrast learning penalty from the first K small images and the low-dimensional vector characterizations of the positive and negative samples of the first K small images;

an updating module 308 configured to update model parameters of the image characterization model based on the self-supervised and contrast learning losses to complete training of the image characterization model.

Optionally, the first determining module 303 is further configured to calculate the low-dimensional vector representation and the front-to-back spatial latent vector of the kth small image as follows.

z _K =Genc(x _K )；

c _K =Gar(z _K ，c _K-1 )；

Optionally, the first calculation module 306 is further configured to calculate the self-supervising loss by the following formula：

；

Optionally, the second calculation module 307 is further configured to calculate the contrast learning loss by the following formula：

；

Optionally, the second calculation module 307 is further configured to calculate a total loss from which model parameters of the image representation model are updated by:

；

wherein ,for total loss->For self-supervision loss->To contrast learning loss->Is a weight adjustment factor.

Optionally, the update module 308 is further configured to multi-stage train the image characterization model: freezing model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model, and updating model parameters of a nonlinear coding network in the image characterization model according to contrast learning loss so as to complete first-stage training of the image characterization model; model parameters of the nonlinear coding network in the image characterization model are frozen, and model parameters of the autoregressive network and all the fully-connected neural networks in the image characterization model are updated according to the self-supervision loss, so that second-stage training of the image characterization model is completed.

Optionally, the updating module 308 is further configured to extract hash features of the first K small images using a hash algorithm; determining a comprehensive hash vector of each small image according to the hash characteristics of the small images and all small images before the small image; calculating characterization loss according to front and rear space latent vectors and comprehensive hash vectors of the front K small images; performing multi-stage training on the image characterization model: freezing model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model, and updating model parameters of a nonlinear coding network in the image characterization model according to contrast learning loss so as to complete first-stage training of the image characterization model; freezing model parameters of a nonlinear coding network and all fully-connected neural networks in the image characterization model, and updating model parameters of an autoregressive network in the image characterization model according to the characterization loss so as to complete second-stage training of the image characterization model; model parameters of a nonlinear coding network and an autoregressive network in the image characterization model are frozen, and model parameters of all fully-connected neural networks in the image characterization model are updated according to the self-supervision loss, so that the third-stage training of the image characterization model is completed.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 4 is a schematic diagram of an electronic device 4 provided by an embodiment of the present disclosure. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps of the various method embodiments described above are implemented by processor 401 when executing computer program 403. Alternatively, the processor 401, when executing the computer program 403, performs the functions of the modules/units in the above-described apparatus embodiments.

The electronic device 4 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the electronic device 4 and is not limiting of the electronic device 4 and may include more or fewer components than shown, or different components.

The processor 401 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4. Memory 402 may also include both internal storage units and external storage devices of electronic device 4. The memory 402 is used to store computer programs and other programs and data required by the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. An image characterization model training method based on image space positions is characterized by comprising the following steps:

constructing a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, taking a cyclic neural network as an autoregressive network, and constructing an image characterization model by utilizing the nonlinear coding network, the autoregressive network and N-K fully-connected neural networks, wherein each fully-connected neural network is randomly initialized, N is the number of training images divided into small images, K is a set fixed value, and K is smaller than N;

acquiring a training data set, and dividing a training image in the training data set into N small images with equal size and sequencing according to the image space position;

determining a low-dimensional vector representation of each small image through the nonlinear encoding network;

determining front and rear spatial latent vectors of the front K small images through the autoregressive network according to the low-dimensional vector representation of the front K small images, wherein the front and rear spatial latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear spatial latent vectors of the front small image of the small image;

respectively passing the front and rear space latent vectors of the Kth small image through N-K fully connected neural networks to obtain predictive vector characterization from the Kth+1th small image to the Nth small image;

calculating self-supervision loss according to the low-dimensional vector representation and the predictive vector representation of the (K+1) th small image to the (N) th small image;

positive samples and negative samples of the first K small images are determined from the training data set, and contrast learning loss is calculated according to the first K small images and low-dimensional vector characterization of the positive samples and the negative samples of the first K small images;

and updating model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete training of the image representation model.

2. The method of claim 1, wherein the self-supervising loss is calculated by the formula：

3. The method of claim 1, wherein the contrast learning penalty is calculated by the formula：

wherein ,is a triple loss function, j, and> and />Are natural numbers, j is 1 to K, the +.>The small image is the positive sample of the j-th small image +.>The small image is a negative sample of the j-th small image,>the value is between 2 and K+1, < >>The value is K+Between 1 and N, z _j Low-dimensional vector characterization for the j-th small image,>is->Low-dimensional vector representation of small images, +.>Is->Low-dimensional vector characterization of the individual small images.

4. The method of claim 1, wherein updating model parameters of the image characterization model based on the self-supervised and the contrast learning losses comprises:

calculating the total loss by the following formula, and updating the model parameters of the image characterization model according to the total loss:

wherein ,for the total loss, ++>For the self-supervision loss->For the comparison of the learning loss,is a weight adjustment factor.

5. The method according to claim 1, wherein the method further comprises:

performing multi-stage training on the image characterization model:

freezing model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model, and updating model parameters of a nonlinear coding network in the image characterization model according to the contrast learning loss so as to complete first-stage training of the image characterization model;

and freezing model parameters of a nonlinear coding network in the image characterization model, and updating model parameters of an autoregressive network and all fully-connected neural networks in the image characterization model according to the self-supervision loss to complete second-stage training of the image characterization model.

6. The method according to claim 1, wherein the method further comprises:

performing multi-stage training on the image characterization model:

freezing model parameters of a nonlinear coding network and all fully-connected neural networks in the image characterization model, and updating model parameters of an autoregressive network in the image characterization model according to characterization loss so as to complete second-stage training of the image characterization model;

and freezing model parameters of a nonlinear coding network and an autoregressive network in the image representation model, and updating model parameters of all fully-connected neural networks in the image representation model according to the self-supervision loss to complete the third-stage training of the image representation model.

7. The method of claim 6, wherein the method further comprises:

extracting hash characteristics of the first K small images by using a hash algorithm;

determining a comprehensive hash vector of each small image according to the hash characteristics of the small images and all small images before the small image;

and calculating the characterization loss according to the front-back space latent vectors and the comprehensive hash vectors of the first K small images.

8. An image representation model training device based on image space position, which is characterized by comprising:

the construction module is configured to construct a nonlinear coding network by utilizing a plurality of convolution layers and a plurality of pooling layers, take a cyclic neural network as an autoregressive network, and construct an image characterization model by utilizing the nonlinear coding network, the autoregressive network and N-K fully connected neural networks, wherein each fully connected neural network is randomly initialized, N is the number of training images divided into small images, K is a set fixed value, and K is smaller than N;

the segmentation module is configured to acquire a training data set, and segment training images in the training data set into N small images with equal size and sequencing according to the image space position;

a first determination module configured to determine a low-dimensional vector representation of each small image through the nonlinear encoding network;

a second determination module configured to determine front and rear spatial latent vectors of the first K small images from the low-dimensional vector representations of the first K small images, wherein the front and rear spatial latent vectors of each small image are related to the low-dimensional vector representation of the small image and the front and rear spatial latent vectors of a previous small image of the small image;

the third determining module is configured to enable the front space latent vector and the rear space latent vector of the Kth small image to pass through N-K fully connected neural networks respectively to obtain predictive vector characterization of the (K+1) th small image to the (N) th small image;

a first calculation module configured to calculate a self-supervising penalty from the low-dimensional vector characterizations and the predictive vector characterizations of the k+1th through nth small images;

a second calculation module configured to determine positive and negative samples of the first K small images from the training dataset, calculate a contrast learning loss from the first K small images and low-dimensional vector characterizations of the positive and negative samples of the first K small images;

and the updating module is configured to update model parameters of the image representation model according to the self-supervision loss and the contrast learning loss so as to complete training of the image representation model.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.