CN109492120B

CN109492120B - Model training method, retrieval method, device, electronic equipment and storage medium

Info

Publication number: CN109492120B
Application number: CN201811292397.0A
Authority: CN
Inventors: 雷印杰; 周子钦; 刘砚
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2020-07-03
Anticipated expiration: 2038-10-31
Also published as: CN109492120A

Abstract

The embodiment of the invention provides a model training method, a retrieval device, electronic equipment and a storage medium, wherein a sample picture and a label used for representing the image category in the sample picture are obtained, and the sample picture comprises the following steps: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles; inputting a sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first feature vector and a plurality of second feature vectors; the first feature vectors and each second feature vector are fused into the same high-dimensional subspace, and third feature vectors and fourth feature vectors are obtained; and acquiring the feature vector of the three-dimensional shape of the sample picture, the updated parameter and the class center vector based on the third feature vector, the fourth feature vectors, the label, the class center vectors and a preset criterion, so that the training time is reduced, and the retrieval precision is improved.

Description

Model training method, retrieval method, device, electronic equipment and storage medium

Technical Field

The invention relates to the field of image processing, in particular to a model training method, a model searching device, an electronic device and a storage medium.

Background

The last decade is a rapid development period of computer vision technology, and especially with the gradual maturity of deep learning technology, core problems in the field (such as image recognition, target tracking, image segmentation, image labeling and the like) are rapidly developed.

As far as the modality of the study object is concerned, computer vision techniques can be mainly divided into two main categories 1) based on color, texture information (two-dimensional image analysis); 2) based on spatial, shape information (three-dimensional shape analysis). However, the two-dimensional image is a planar projection of a spatial object, a large amount of information such as space and shape is lost in the projection process, and the two-dimensional image is easily affected by illumination and posture changes. On the contrary, the three-dimensional shape is not obviously influenced by illumination and posture change, and the inherent defects of the two-dimensional image can be compensated. Therefore, the three-dimensional shape analysis is receiving more and more attention, and a great deal of manpower and material resources are invested at home and abroad for research.

Due to the limitations of the three-dimensional shape data acquisition methods, the size of the three-dimensional shape database is generally small compared to the size of the traditional two-dimensional image database, which is on the order of tens of millions, for example, the current largest three-dimensional shape database sharenet contains about 300 ten thousand shapes, and the core subset sharenetcore only contains 51300 three-dimensional shapes of 55 types. Smaller data sets present obstacles to the training of complex deep neural networks. However, despite differences in modalities, two-dimensional images and three-dimensional shapes have a high degree of relevance in describing the objective world, as well as the transferability of knowledge between modalities. Therefore, the cross-modal three-dimensional shape retrieval based on the hand-drawn sketch is an important research direction in the field.

However, the following problems exist in the prior art: the training process is very complex, a positive sample and a negative sample need to be generated for training, the training difficulty and the training time are increased, the generalization performance of the generated twin network of the positive and negative examples of modes is not strong, the difference between the center of the high-dimensional space represented by the multi-view model and the mode of the hand-drawn sketch is large, and the finally obtained retrieval effect is poor.

Disclosure of Invention

In view of the above, an object of the embodiments of the present invention is to provide a model training method, a model searching device, an electronic device, and a storage medium, so as to alleviate the above problems.

In a first aspect, an embodiment of the present invention provides a model training method, where the method includes: obtaining a sample picture and a label for representing image categories in the sample picture, wherein the sample picture comprises: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles; inputting the sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures; merging the first feature vector and each second feature vector into the same high-dimensional subspace, and acquiring a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector; updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors.

In a second aspect, an embodiment of the present invention provides a retrieval method, which is applied to the three-dimensional shape retrieval model, where the method includes: acquiring a target picture to be retrieved; acquiring a target characteristic vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model; and respectively calculating Euclidean distances between the target characteristic vector and characteristic vectors corresponding to all three-dimensional shapes in the three-dimensional shape database, and retrieving and sorting the Euclidean distances from small to large to obtain a sorting linked list.

In a third aspect, an embodiment of the present invention provides a model training apparatus, where the apparatus includes: the image processing device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a sample picture and a label used for representing image categories in the sample picture, and the sample picture comprises: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles; the input module is used for inputting the sample picture into a pre-trained three-dimensional shape retrieval model and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures; the feature fusion module is used for fusing the first feature vector and each second feature vector into the same high-dimensional subspace to obtain a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector; and the updating module is used for updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture, the updated parameters and the class center vectors.

In a fourth aspect, an embodiment of the present invention provides a retrieval apparatus, which is applied to the three-dimensional shape retrieval model, where the apparatus includes: the target picture acquisition module is used for acquiring a target picture to be retrieved; the target characteristic vector acquisition module is used for acquiring a target characteristic vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model; and the retrieval module is used for respectively calculating Euclidean distances between the target characteristic vector and the characteristic vectors corresponding to all the three-dimensional shapes in the three-dimensional shape database, and retrieving and sorting the Euclidean distances from small to large to obtain a sorting linked list.

In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory connected to the processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the electronic device is caused to perform the methods of the first and second aspects.

In a sixth aspect, an embodiment of the present invention provides a storage medium, where a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method according to the first aspect and the second aspect.

Compared with the prior art, the model training method, the retrieval device, the electronic equipment and the storage medium provided by the embodiments of the invention have the beneficial effects that: obtaining a sample picture and a label for representing image categories in the sample picture, wherein the sample picture comprises: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles; inputting the sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures; merging the first feature vector and each second feature vector into the same high-dimensional subspace, and acquiring a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector; updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors. The classification of the images is realized by using the class center vector, a positive sample and a negative sample are not required to be generated for training, the feature vectors corresponding to the projection pictures of multiple visual angles and the feature vectors corresponding to the two-dimensional hand-drawn sketch are merged into the same high-dimensional subspace, the defect that the mode difference between the center of the high-dimensional space represented by the multi-visual-angle model and the hand-drawn sketch is large is overcome, the training time is shortened, and the retrieval precision is improved.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flowchart of a model training method according to an embodiment of the present invention;

fig. 3 is a flowchart of a retrieval method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a model training apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of a retrieval apparatus according to an embodiment of the present invention.

Icon: 100-an electronic device; 110-a memory; 120-a memory controller; 130-a processor; 140-peripheral interfaces; 150-input-output unit; 170-a display unit; 200-a model training device; 210-a first obtaining module; 220-an input module; 230-a feature fusion module; 240-update module; 300-a retrieval device; 310-target picture acquisition module; 320-a target feature vector acquisition module; 330-retrieval Module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

An embodiment of the present invention provides a schematic structural diagram of an electronic device 100, where the electronic device 100 may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or the like.

As shown in fig. 1, the electronic device 100 may include: the search device 300, the model training device 200, the memory 110, the memory controller 120, the processor 130, the peripheral interface 140, the input/output unit 150, and the display unit 170.

The memory 110, the memory controller 120, the processor 130, the peripheral interface 140, the input/output unit 150, and the display unit 170 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The retrieving means 300 and the model training means 200 comprise at least one software function module which can be stored in the memory 110 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the client device. The processor 130 is used to execute executable modules, such as the sequences, stored in the memory 110.

The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 110 is configured to store a program, and the processor 130 executes the program after receiving an execution instruction, and the method executed by the electronic device 100 defined by the flow disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 130, or implemented by the processor 130.

The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The peripheral interface 140 couples various input/output devices to the processor 130 and to the memory 110. In some embodiments, peripheral interface 140, processor 130, and memory controller 120 may be implemented in a single chip. In other examples, they may be implemented separately from the individual chips.

The input and output unit 150 is used for providing input data to the user to realize the interaction of the user with the electronic device 100. The input/output unit 150 may be, but is not limited to, a mouse, a keyboard, and the like.

The display unit 170 provides an interactive interface (e.g., a user operation interface) between the electronic device 100 and a user or is used to display image data to a user reference. In this embodiment, the display unit 170 may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. Supporting single-point and multi-point touch operations means that the touch display can sense touch operations from one or more locations on the touch display at the same time, and the sensed touch operations are sent to the processor 130 for calculation and processing.

Examples

Referring to fig. 2, fig. 2 is a flowchart of a model training method according to an embodiment of the present invention. As will be described in detail below with respect to the flow shown in fig. 2, the method is applied to the electronic device 100 shown in fig. 1, and the method includes:

s100: obtaining a sample picture and a label for representing image categories in the sample picture, wherein the sample picture comprises: the two-dimensional hand-drawn sketch map comprises a two-dimensional hand-drawn sketch map and projection pictures of multiple visual angles.

In an actual implementation process, the sample picture is an electronic picture that can be input into the electronic device 100 with the image processing capability, the contents in the sample picture may be a cat, a dog, an apple, a table, a person, and the like, where cats belong to one category, dogs belong to one category, apples belong to one category, tables belong to one category, and persons belong to one category, and the sample picture and a tag used for representing the image category in the sample picture are input into the electronic device 100 with the image processing capability.

S200: and inputting the sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures.

In practical implementation, by inputting the sample picture into a pre-trained three-dimensional shape retrieval model stored in the electronic device 100, where the pre-trained three-dimensional shape retrieval model may be AlexNet, Vgg11, Vgg16, Vgg19, ResNet18, ResNet34, ResNet50, and the like, the electronic device 100 can extract a first eigenvector of the two-dimensional hand-drawn sketch and a second eigenvector of each of a plurality of projection pictures through the pre-trained three-dimensional shape retrieval model.

S300: and merging the first feature vector and each second feature vector into the same high-dimensional subspace, and acquiring a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector.

In order to enable the two-dimensional hand-drawn sketch and the feature vectors corresponding to the projection pictures with multiple viewing angles to be in the same subspace, as an implementation manner, the embodiment utilizes a matrix network to blend a first feature vector and each second feature vector into the same high-dimensional subspace, and then obtains a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector.

Since not every projection picture is useful in the projection pictures of multiple viewing angles, in order to reduce the computational complexity, it is necessary to screen the fourth feature vectors corresponding to the projection pictures of multiple viewing angles, and screen out the useful fourth feature vectors, so as to reduce the computational complexity of model training, and the screening method will be specifically explained below:

and screening the plurality of fourth feature vectors based on a first preset criterion to obtain screened fourth feature vectors, wherein the number of the screened fourth feature vectors is less than that before screening.

Assuming that before a plurality of fourth feature vectors M are screened, a dimension formed by the fourth feature vectors of the same three-dimensional sample model with n view angles is a (100 × n) matrix M, and the first preset criterion is M^*M × f (M), where f (M) represents a weight matrix of dimension n × M by substituting M into M^*Obtaining a high dimensional feature M of dimension (100 × M), wherein M is less than n, wherein,

wherein, W_AttnAnd b_AttnRespectively representing weights of the Attention ModelAnd a bias means for biasing the movable member in a direction perpendicular to the axis,

represents W_AttnThe transposing of (1). Dimension reduction can be realized through the steps, and then the computational complexity is reduced.

S400: updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors.

As an embodiment, the method further includes, based on a second preset criterion, wherein the second preset criterion is:

k. by combining the third feature vector (W)^TS + b) and a kth center-like vector C of the plurality of center-like vectors_kSubstituting the second preset criterion to obtain a first distance between the third feature vector and each class center vector in the plurality of class center vectors; wherein the content of the first and second substances,

characterizing a kth class center vector C of the third feature vector and a plurality of class center vectors_kA first distance, -S_kSubscript for dis;

characterisation of L2 norm squared, R^100×1A column vector of dimension 100 × 1 is represented.

As an embodiment, the method further includes, based on a third preset criterion, where the third preset criterion is:

a∈R^m×1by a matrix (W) formed by a plurality of said fourth eigenvectors^TM + b) and a plurality of central directionsKth class center vector C in quantity_kSubstituting the third preset criterion to obtain a second distance between a plurality of fourth feature vectors and each class center vector, wherein,

a matrix (W) formed by a plurality of the fourth characteristic vectors^TM + b) and a kth center-like vector C of the plurality of center-like vectors_kA second distance between, -M_kIs a subscript of dis, R^100×mRepresenting a matrix with dimension 100 × m.

As an embodiment, based on a plurality of the first distances, a plurality of the second distances, and a fourth preset criterion, the fourth preset criterion is: l ═ CE (-dis, k) + λ × dis_k，

dis_kRespectively representing the distance between the matrix formed by the third eigenvector and the plurality of fourth eigenvectors and the kth class center vector, namely

Can use dis_kIt is shown that,

may also use dis_kDenotes, dis ═ dis₁,dis₂,dis₃,...,dis_KSubstituting the plurality of first distances and the plurality of second distances into the third preset criterion, in this embodiment, assuming that there are K classes, it can be understood that there are K class center vectors C, K first distances and K second distances, and for the parameter W, b in the pre-trained three-dimensional shape retrieval model and the class center vector C corresponding to the label for representing the kth class, the class center vector C corresponds to the label for representing the kth class_kAnd updating, and acquiring a feature vector of a three-dimensional shape corresponding to the three-dimensional sample model of the sample picture, and updated parameters and class center vectors.

Through the step S400, the feature vectors of all the two-dimensional sketch and three-dimensional sample models of the same class are close to the class center vector corresponding to the class, and meanwhile, the distance between different classes in a high-dimensional space is far enough, so that classification can be realized without positive and negative samples, and the training time and the calculation cost are reduced.

Referring to fig. 3, fig. 3 is a flowchart of a retrieval method according to an embodiment of the present invention. As will be explained in detail below with respect to the flow shown in fig. 3, the method is applied to the electronic device in fig. 1, and the method includes:

s600: and acquiring a target picture to be retrieved.

The target picture to be retrieved may be a two-dimensional hand-drawn sketch, a picture shot by a camera, or a projection picture of a three-dimensional model, and the picture to be retrieved is input to the electronic device 100 with image processing capability, wherein the electronic device 100 can implement three-dimensional shape retrieval corresponding to the target picture to be retrieved through a pre-trained three-dimensional shape retrieval model.

S700: and acquiring a target characteristic vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model.

And inputting the target picture into a pre-trained three-dimensional shape retrieval model through terminal equipment, and acquiring a target characteristic vector corresponding to the target picture.

S800: and respectively calculating Euclidean distances between the target characteristic vector and characteristic vectors corresponding to all three-dimensional shapes in the three-dimensional shape database, and retrieving and sorting the Euclidean distances from small to large to obtain a sorting linked list.

The three-dimensional shape database stores each three-dimensional sample shape and corresponding feature vector, obtains a sorting linked list by sorting the Euclidean distances, and selects the three-dimensional shape corresponding to the feature vector with the shortest Euclidean distance according to the sorting linked list.

Referring to fig. 4, fig. 4 is a block diagram of a model training apparatus 200 according to an embodiment of the present invention. The block diagram of fig. 4 will be explained, and the apparatus shown comprises:

a first obtaining module 210, configured to obtain a sample picture and a label for characterizing an image category in the sample picture, where the sample picture includes: the two-dimensional hand-drawn sketch map comprises a two-dimensional hand-drawn sketch map and projection pictures of multiple visual angles.

The input module 220 is configured to input the sample picture into a pre-trained three-dimensional shape retrieval model, and extract a first feature vector of the two-dimensional hand-drawn sketch and a second feature vector of each of the plurality of projection pictures.

A feature fusion module 230, configured to fuse the first feature vector and each second feature vector into the same high-dimensional subspace, and obtain a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector.

An updating module 240, configured to update the parameters in the pre-trained three-dimensional shape retrieval model and the class center vector corresponding to the label based on the third feature vector, the plurality of fourth feature vectors, the label, the plurality of class center vectors, and a preset criterion, and obtain the feature vector of the three-dimensional shape corresponding to the three-dimensional sample model of the sample picture, and the updated parameters and the class center vectors.

As an embodiment, the apparatus further comprises:

and the screening module is used for screening the plurality of fourth feature vectors based on a first preset criterion to obtain the screened fourth feature vectors, wherein the number of the screened fourth feature vectors is less than that before screening.

As an embodiment, the update module 240 includes:

and the first distance acquisition module is used for acquiring a first distance between the third feature vector and each class center vector in the plurality of class center vectors based on a second preset criterion.

And the second distance acquisition module is used for acquiring a second distance between the plurality of fourth feature vectors and each class center vector based on a third preset criterion.

And the updating submodule is used for updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vector corresponding to the label based on the plurality of first distances, the plurality of second distances and a third preset criterion, and acquiring the feature vector of the three-dimensional shape corresponding to the three-dimensional sample model of the sample picture, the updated parameters and the updated class center vector.

Referring to fig. 5, fig. 5 is a block diagram of a state retrieval device 300 according to an embodiment of the present invention. The block diagram of fig. 5 will be explained, and the apparatus shown comprises:

and a target picture obtaining module 310, configured to obtain a target picture to be retrieved.

A target feature vector obtaining module 320, configured to obtain a target feature vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model.

And the retrieval module 330 is configured to calculate euclidean distances between the target feature vector and feature vectors corresponding to all three-dimensional shapes in the three-dimensional shape database, and retrieve and sort the euclidean distances from small to large to obtain a sort linked list.

In addition, an embodiment of the present invention further provides a storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the training method and the retrieval method provided in any one of the embodiments of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the model training apparatus 200 and the retrieving apparatus 300 described above may refer to the corresponding processes in the model training method and the retrieving method, and will not be described in detail herein.

In summary, the model training method, the search method, the apparatus, the electronic device and the storage medium provided by the embodiments of the present invention: obtaining a sample picture and a label for representing image categories in the sample picture, wherein the sample picture comprises: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles; inputting the sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures; merging the first feature vector and each second feature vector into the same high-dimensional subspace, and acquiring a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector; updating parameters in the pre-trained three-dimensional shape retrieval model and class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring feature vectors corresponding to the three-dimensional sample model and the updated parameters and class center vectors; the classification of the images is realized by using the class center vector, a positive sample and a negative sample are not required to be generated for training, the feature vectors corresponding to the projection pictures of multiple visual angles and the feature vectors corresponding to the two-dimensional hand-drawn sketch are merged into the same high-dimensional subspace, the defect that the mode difference between the center of the high-dimensional space represented by the multi-visual-angle model and the hand-drawn sketch is large is overcome, the training time is shortened, and the retrieval precision is improved.

The functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of model training, the method comprising:

obtaining a sample picture and a label for representing image categories in the sample picture, wherein the sample picture comprises: the method comprises the following steps of (1) two-dimensional hand-drawing sketches and projection pictures of multiple visual angles;

inputting the sample picture into a pre-trained three-dimensional shape retrieval model, and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures;

merging the first feature vector and each second feature vector into the same high-dimensional subspace, and acquiring a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector;

updating parameters in the pre-trained three-dimensional shape retrieval model and class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring feature vectors of three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors;

after obtaining a class center vector corresponding to the tag based on the tag, the method further includes:

screening the plurality of fourth feature vectors based on a first preset criterion to obtain screened fourth feature vectors, wherein the number of the screened fourth feature vectors is less than that before screening;

wherein, the updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the plurality of fourth feature vectors, the labels, the plurality of class center vectors and a preset criterion to obtain the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors comprises:

acquiring a first distance between the third feature vector and each class center vector in a plurality of class center vectors based on a second preset criterion;

acquiring a second distance between a plurality of fourth feature vectors and each class center vector based on a third preset criterion;

updating parameters in the pre-trained three-dimensional shape retrieval model and class center vectors corresponding to the labels based on the first distances, the second distances and a third preset criterion, and obtaining feature vectors of three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors.

2. A retrieval method applied to the three-dimensional shape retrieval model recited in claim 1, the method comprising:

acquiring a target picture to be retrieved;

acquiring a target characteristic vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model;

and respectively calculating Euclidean distances between the target characteristic vector and characteristic vectors corresponding to all three-dimensional shapes in the three-dimensional shape database, and retrieving and sorting the Euclidean distances from small to large to obtain a sorting linked list.

3. An exercise device, the device comprising:

the image processing device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a sample picture and a label used for representing image categories in the sample picture, and the sample picture comprises: a two-dimensional hand-drawn sketch of the three-dimensional sample model and projection pictures of a plurality of visual angles;

the input module is used for inputting the sample picture into a pre-trained three-dimensional shape retrieval model and extracting a first characteristic vector of the two-dimensional hand-drawn sketch and a second characteristic vector of each of a plurality of projection pictures;

the feature fusion module is used for fusing the first feature vector and each second feature vector into the same high-dimensional subspace to obtain a third feature vector corresponding to the first feature vector and a fourth feature vector corresponding to the second feature vector;

the updating module is used for updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the third feature vectors, the fourth feature vectors, the labels, the class center vectors and a preset criterion, and acquiring the feature vectors of the three-dimensional shapes corresponding to the three-dimensional sample model of the sample picture and the updated parameters and class center vectors;

the screening module is used for screening the plurality of fourth feature vectors based on a first preset criterion to obtain screened fourth feature vectors, wherein the number of the screened fourth feature vectors is less than that before screening;

wherein the update module comprises:

a first distance obtaining module, configured to obtain, based on a second preset criterion, a first distance between the third feature vector and each class center vector of the multiple class center vectors;

a second distance obtaining module, configured to obtain, based on a third preset criterion, a second distance between the plurality of fourth feature vectors and each class center vector;

and the updating submodule is used for updating the parameters in the pre-trained three-dimensional shape retrieval model and the class center vectors corresponding to the labels based on the plurality of first distances, the plurality of second distances and a third preset criterion, and acquiring the feature vectors corresponding to the three-dimensional shapes corresponding to the sample pictures and the updated parameters and class center vectors.

4. A search apparatus applied to the three-dimensional shape search model recited in claim 3, the apparatus comprising:

the target picture acquisition module is used for acquiring a target picture to be retrieved;

the target characteristic vector acquisition module is used for acquiring a target characteristic vector corresponding to the target picture based on the target picture and the three-dimensional shape retrieval model;

and the retrieval module is used for respectively calculating Euclidean distances between the target characteristic vector and the characteristic vectors corresponding to all the three-dimensional shapes in the three-dimensional shape database, and retrieving and sorting the Euclidean distances from small to large to obtain a sorting linked list.

5. An electronic device, comprising a processor and a memory coupled to the processor, the memory storing a computer program that, when executed by the processor, causes the electronic device to perform the method of any of claims 1-2.

6. A storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method according to any one of claims 1-2.