CN115687676B

CN115687676B - Information retrieval method, terminal and computer-readable storage medium

Info

Publication number: CN115687676B
Application number: CN202211707254.8A
Authority: CN
Inventors: 邸德宁; 廖紫嫣; 杨凯航; 郝敬松; 朱树磊; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-03-31
Anticipated expiration: 2042-12-29
Also published as: CN115687676A

Abstract

The invention provides an information retrieval method, a terminal and a computer readable storage medium, wherein the information retrieval method comprises the steps of extracting the characteristics of information to be queried to obtain first characteristic information and second characteristic information corresponding to the information to be queried; retrieving in a preset database based on the second characteristic information to obtain a candidate retrieval data set corresponding to the second characteristic information; performing feature fusion on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried; and searching in the candidate search data set based on the fusion characteristic information of the information to be queried, and determining a search result corresponding to the information to be queried. According to the method and the device, the second characteristic information of the information to be inquired is subjected to characteristic upgrading based on the candidate neighbor characteristics corresponding to the second characteristic information of the information to be inquired, so that the detail expression capacity of the second characteristic information is improved, the recall capacity of a difficult sample is improved, and high recall of a retrieval result is realized.

Description

Information retrieval method, terminal and computer-readable storage medium

Technical Field

The present invention relates to the field of information retrieval technologies, and in particular, to an information retrieval method, a terminal, and a computer-readable storage medium.

Background

With the proliferation of mass storage and digital imaging devices (video cameras, still cameras), a large number of different types of pictures are produced each day, such as science, medicine, geography, life, etc. How to effectively retrieve the massive pictures so as to facilitate people to effectively browse, search and manage the pictures which are interested by the people is a problem which needs to be solved urgently at present.

At present, original images and text data are compressed into D-dimensional feature vectors through a deep learning model, feature similarity between similar contents is high, and similarity is low if the contents are not similar. And D-dimension feature vectors are extracted from the uploaded retrieval graph, then the D-dimension feature vectors are compressed to S-dimension feature vectors, and the similarity between the features is compared with the S-dimension feature vectors contained in a preset database one by one to determine a retrieval result. However, the S-dimensional feature vector lacks more semantic information than the D-dimensional feature vector of the original picture, so that the retrieval text is not reliable enough, and the short feature retrieval result is not accurate enough.

Disclosure of Invention

The invention mainly solves the technical problem of providing an information retrieval method, a terminal and a computer readable storage medium, and solves the problem that in the prior art, the retrieval based on the short features of a retrieval image is difficult to recall.

In order to solve the technical problems, the first technical scheme adopted by the invention is as follows: provided is an information retrieval method including:

performing feature extraction on information to be queried to obtain first feature information and second feature information corresponding to the information to be queried; the second characteristic information is dimension reduction characteristic information of the first characteristic information;

retrieving in a preset database based on the second characteristic information to obtain a candidate retrieval data set corresponding to the second characteristic information; the preset database comprises a plurality of preset second characteristic information;

performing feature fusion on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried; the candidate neighbor features are preset second feature information with the largest preset number of similarity with the second feature information;

and searching in the candidate search data set based on the fusion characteristic information of the information to be queried, and determining a search result corresponding to the information to be queried.

The method for extracting the features of the information to be queried to obtain the first feature information and the second feature information corresponding to the information to be queried includes the following steps:

performing feature extraction on information to be queried to obtain first feature information of the information to be queried;

respectively performing dimensionality reduction processing on the first feature information based on at least two conversion models to obtain sub-feature information of the information to be queried;

and splicing the sub-characteristic information respectively output by each conversion model to obtain second characteristic information of the information to be queried.

Wherein the at least two conversion models comprise a first conversion model and a second conversion model with different model parameters;

the training method of the second conversion model comprises the following steps:

acquiring first training data, wherein the first training data is associated with corresponding first sample characteristics, and the first training data has corresponding labeled neighbor information sequences and identifiers; the identifiers include a first identifier and a second identifier;

inputting the first sample characteristics corresponding to the first training data into a second conversion model for characteristic conversion to obtain second sample characteristics corresponding to the first training data;

searching in a preset database based on the second sample characteristics to obtain a predicted neighbor information sequence corresponding to the first training data;

in response to the first training data having the first identifier, training a second conversion model based on an error value between the labeled neighbor information sequence and the predicted neighbor information sequence of the first training data;

in response to the first training data having the second identifier, the second transformation model is trained after doubling the error value between the labeled neighbor information sequence and the predicted neighbor information sequence of the first training data.

The determination mode of the identifier of the first training data comprises the following steps:

performing feature conversion on the first sample feature of the first training data through the trained first conversion model to obtain a sample feature and a confidence coefficient corresponding to the first training data;

in response to the confidence coefficient exceeding a confidence coefficient threshold value, labeling the first training data corresponding to the confidence coefficient as a first identifier;

in response to the confidence level not exceeding the confidence level threshold, the first training data corresponding to the confidence level is labeled as the second identifier.

The feature fusion is performed on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried, and the method includes the following steps:

determining the similarity corresponding to the second feature information of the information to be queried and each candidate neighbor feature corresponding to the information to be queried based on the second feature information of the information to be queried and each candidate neighbor feature corresponding to the information to be queried;

and generating fusion feature information corresponding to the information to be queried by adopting the graph neural network based on the corresponding similarity between the second feature information of the information to be queried and the candidate neighbor features.

The training method of the graph neural network comprises the following steps:

acquiring second training data, wherein the second training data comprises an undirected graph formed by a plurality of characteristic nodes; the label similarity is associated among all the characteristic nodes;

generating fusion features corresponding to the feature nodes respectively through a graph neural network based on an undirected graph of the second training data;

determining the prediction similarity between the feature nodes based on the fusion features respectively corresponding to the feature nodes;

and training the graph neural network based on the weighted sum of error values between the corresponding prediction similarity and the labeling similarity between the feature nodes corresponding to the undirected graph.

The plurality of feature nodes comprise a main feature node and a neighbor feature node;

training a graph neural network based on a weighted sum of error values between corresponding prediction similarity and labeling similarity between feature nodes corresponding to an undirected graph, comprising:

in response to one of the two feature nodes being a master feature node and the other being a neighbor feature node, multiplying a corresponding error value between the two feature nodes by a first weight;

and in response to that the two feature nodes are both neighbor feature nodes, multiplying the corresponding error value between the two feature nodes by a second weight, wherein the first weight is greater than the second weight.

The method for searching in the candidate search data set based on the fusion characteristic information of the information to be queried and determining the search result corresponding to the information to be queried comprises the following steps:

calculating the similarity between the fusion feature information of the information to be queried and the fusion feature information of each candidate neighbor feature in the candidate retrieval data set;

selecting first feature information associated with candidate neighbor features corresponding to the similarity with the maximum value and the preset number;

and determining a retrieval result corresponding to the information to be queried based on the similarity between the first characteristic information of the information to be queried and the first characteristic information of each selected candidate neighbor characteristic.

In order to solve the above technical problems, the second technical solution adopted by the present invention is: there is provided a terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being adapted to execute the sequence data to implement the steps in the information retrieval method described above.

In order to solve the technical problems, the third technical scheme adopted by the invention is as follows: there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps in the above-mentioned information retrieval method.

The invention has the beneficial effects that: the information retrieval method comprises the steps of extracting features of information to be queried to obtain first feature information and second feature information corresponding to the information to be queried; the second characteristic information is dimension reduction characteristic information of the first characteristic information; retrieving in a preset database based on the second characteristic information to obtain a candidate retrieval data set corresponding to the second characteristic information; performing feature fusion on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried; and retrieving in the candidate retrieval data set based on the fusion characteristic information of the information to be queried, and determining a retrieval result corresponding to the information to be queried. According to the method and the device, the second characteristic information of the information to be inquired and each candidate neighbor characteristic are subjected to characteristic fusion based on the candidate neighbor characteristic corresponding to the second characteristic information of the information to be inquired so as to perform characteristic upgrading on the second characteristic information, the detail expression capability of the second characteristic information and the semantic information of the rich characteristics are improved, the retrieval result retrieved based on the fusion characteristic information is more comprehensive, the recall capability of a difficult sample is improved, and high recall of the retrieval result is realized.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of an information retrieval method provided by the present invention;

FIG. 2 is a flowchart illustrating a method for training a first transformation model in an information retrieval method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a training method for neural networks in the information retrieval method according to an embodiment of the present invention;

FIG. 4 is a block diagram of a terminal according to an embodiment of the present invention;

FIG. 5 is a block diagram of an embodiment of a computer-readable storage medium provided in the present invention.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, "plurality" herein means two or more than two.

In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings and the detailed description for an information retrieval method provided by the present invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of an information retrieval method according to the present invention. The present embodiment provides an information retrieval method, which includes the following steps.

S1: performing feature extraction on information to be queried to obtain first feature information and second feature information corresponding to the information to be queried; the second feature information is dimension reduction feature information of the first feature information.

S2: and searching in a preset database based on the second characteristic information to obtain a candidate search data set corresponding to the second characteristic information.

S3: and performing feature fusion on the second feature information and the candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried.

S4: and retrieving in the candidate retrieval data set based on the fusion characteristic information of the information to be queried, and determining a retrieval result corresponding to the information to be queried.

In an embodiment, the specific steps of obtaining the first feature information and the second feature information corresponding to the information to be queried in step S1 are as follows.

And acquiring the information to be inquired. The information to be queried may be image data or text data.

And performing feature extraction on the information to be queried through a feature extraction network to obtain first feature information of the information to be queried.

And respectively carrying out dimensionality reduction processing on the first characteristic information based on at least two conversion models to obtain sub-characteristic information of the information to be inquired. And splicing the sub-feature information respectively output by each conversion model to obtain second feature information of the information to be inquired.

In a specific embodiment, the at least two conversion models comprise a first conversion model and a second conversion model with different model parameters. The first conversion model and the second conversion model respectively perform characteristic conversion on the first characteristic information to obtain different sub-characteristic information. And the dimensions of each sub-feature information are the same.

In this embodiment, the first conversion model and the second conversion model are trained in turn based on the same training data set. In the training process of the first conversion model and the second conversion model, cascade type matching supervision information needs to be added, namely, the training data are subjected to feature conversion through the first conversion model obtained through training, the training data with unreliable second feature information obtained through conversion of the first conversion model are marked, and the marked training data are subjected to key training when the second conversion model is trained.

In one embodiment, the training method of the first conversion model includes the following steps.

Referring to fig. 2, fig. 2 is a flowchart illustrating a training method of a first transformation model in an information retrieval method according to an embodiment of the present invention.

S21: acquiring a plurality of first training data; and each first training data is associated with the corresponding first sample characteristic and the labeled neighbor information sequence corresponding to the first training data.

Specifically, the type of the first training data is determined according to the type of data that needs to be detected. The first training data may be training image data or training text data.

And performing feature extraction on the first training data to obtain a first sample feature corresponding to the first training data. And the dimension of the first sample characteristic is the same as the dimension of the first characteristic information corresponding to the information to be inquired.

And searching in a preset database based on the first sample characteristic to obtain a plurality of neighbor preset information corresponding to the first training data. Specifically, the corresponding similarity between the first sample characteristic and preset first characteristic information corresponding to each preset information in a preset database is calculated. And sequencing the preset information corresponding to each preset first characteristic information according to the similarity to obtain a labeling adjacent information sequence corresponding to the first training data.

S22: and inputting the first sample features corresponding to the first training data into the first conversion model for feature conversion to obtain second sample features corresponding to the first training data.

Specifically, a first sample feature corresponding to the first training data is input into the first conversion model for feature conversion, so as to obtain a second sample feature corresponding to the first training data.

S23: and searching in a preset database based on the second sample characteristics to obtain a predicted neighbor information sequence corresponding to the first training data.

Specifically, the search is performed in a preset database based on the second sample characteristic. And calculating the similarity between the second sample characteristic and each preset first characteristic information in the preset database. And sequencing the preset information corresponding to each preset first characteristic information in the preset database according to the similarity to obtain a prediction neighbor information sequence corresponding to the first training data.

S24: and training the first conversion model based on the labeled neighbor information sequence corresponding to the first training data and the error value between the predicted neighbor information sequences.

And training the first conversion model based on the error value between the labeled neighbor information sequence and the predicted neighbor information sequence corresponding to the first training data until the first conversion model converges.

And performing feature conversion on the first sample features respectively corresponding to the plurality of first training data through the trained first conversion model to obtain the sample features and the confidence degrees corresponding to the first training data. And calculating an error value between the predicted adjacent information sequence and the labeled adjacent information sequence of the second sample characteristic obtained by converting the first training data. The larger the error value, the smaller the confidence. I.e. confidence = 1-error value. In response to the confidence exceeding the confidence threshold, the first training data corresponding to the confidence is labeled as the first identifier. And in response to the confidence level not exceeding the confidence level threshold, labeling the first training data corresponding to the confidence level as a second identifier.

In one embodiment, the training method of the second conversion model includes the following steps.

Acquiring first training data, wherein the first training data is associated with corresponding first sample characteristics, and the first training data has a corresponding labeled neighbor information sequence and an identifier; the identifiers include a first identifier and a second identifier; inputting the first sample characteristics corresponding to the first training data into a second conversion model for characteristic conversion to obtain second sample characteristics corresponding to the first training data; searching in a preset database based on the second sample characteristics to obtain a prediction neighbor information sequence corresponding to the first training data; in response to the first training data having the first identifier, training a second conversion model based on an error value between the labeled neighbor information sequence and the predicted neighbor information sequence of the first training data; in response to the first training data having the second identifier, the second conversion model is trained after doubling the error value between the labeled neighbor information sequence and the predicted neighbor information sequence of the first training data.

And in response to the at least two conversion models further comprise other conversion models, respectively performing feature conversion on the first training data through the first conversion model and the second conversion model which are trained to determine the identifier of each training data. Other transformation models are heavily trained based on training data that is difficult to recall for the first transformation model and the second transformation model.

In this embodiment, the first feature information is subjected to dimension reduction processing by the first conversion model and the second conversion model obtained through the training, so as to obtain S/2-dimensional sub-feature information output by the first conversion model and S/2-dimensional sub-feature information output by the second conversion model. And splicing the S/2-dimensional sub-feature information output by the first conversion model and the S/2-dimensional sub-feature information output by the second conversion model to obtain S-dimensional second feature information corresponding to the information to be queried. The second characteristic information obtained by the method has more comprehensive characteristics, and the omission of detailed characteristics is avoided.

And retrieving in a preset database based on second characteristic information corresponding to the information to be queried. The preset database comprises a plurality of preset second characteristic information.

Specifically, the similarity between the second feature information of the information to be queried and each preset second feature information in a preset database is calculated, the preset second feature information is ranked based on the similarity, and the preset number of preset second feature information with the maximum similarity is selected as candidate neighbor features of the information to be queried. And forming a candidate retrieval data set corresponding to the information to be queried by the plurality of candidate neighbor features.

In an embodiment, the specific steps of obtaining the fusion feature information corresponding to the information to be queried in step S3 are as follows.

Determining the corresponding similarity between the second feature information of the information to be queried and each candidate neighbor feature corresponding to the information to be queried based on the second feature information of the information to be queried and each candidate neighbor feature corresponding to the information to be queried; and generating fusion characteristic information corresponding to the information to be inquired by adopting the graph neural network based on the corresponding similarity between the second characteristic information of the information to be inquired and the candidate neighbor characteristic.

Specifically, an undirected graph is established between the second feature information of the information to be queried and each corresponding candidate neighbor feature. In the undirected graph, the second feature information of the information to be queried is used as a master node, and the corresponding candidate neighbor features are used as neighbor nodes. And determining edges among the nodes according to the similarity respectively corresponding to the second feature information of the information to be inquired and the candidate neighbor features.

And inputting an undirected graph corresponding to the information to be queried into a graph neural network, and performing feature fusion on the second feature information and each candidate neighbor feature by the graph neural network based on the candidate neighbor features in the undirected graph and the similarity between the candidate neighbor features and the second feature information to generate fused feature information corresponding to the information to be queried.

Traversing all candidate neighbor features in the undirected graph, and performing feature fusion on the selected candidate neighbor features, other candidate neighbor features connected through edges and the second feature information to generate fusion feature information of the selected candidate neighbor features.

By the method, the second feature information and each candidate neighbor feature are respectively subjected to feature upgrading to obtain fusion feature information respectively corresponding to the second feature information and the candidate neighbor feature.

The fusion feature information has higher discrimination in a local space than the second feature information or the candidate neighbor features, the features of the fusion feature information are refined, and the recall rate of the retrieval result can be improved.

The proposal proposes to use the GCN network to realize the upgrading of the second characteristic information. The GCN is good at mining the relevance among different nodes in the graph data, continuously gathers the information of each node based on the relevance, and then the information is fused and then each node is upgraded to form better overall relevance capture.

In one embodiment, the method of training the neural network is as follows. The graph neural network may be a multi-layer GCN network, and specifically, the graph neural network may be an AMNet network.

Referring to fig. 3, fig. 3 is a flowchart illustrating a training method of a graph neural network in an information retrieval method according to an embodiment of the present invention.

S31: acquiring second training data, wherein the second training data comprise an undirected graph formed by a plurality of characteristic nodes; and the characteristic nodes are associated with labeling similarity.

Specifically, the second training data includes an undirected graph composed of a plurality of feature nodes. The length of the edge between the feature nodes in the undirected graph represents the similarity of the feature nodes at the two ends of the edge. And taking the corresponding real similarity between the characteristic nodes as the labeling similarity.

S32: and generating fusion characteristics corresponding to the characteristic nodes respectively through the undirected graph of the graph neural network based on the second training data.

Specifically, a feature node is selected as a main feature node, other feature nodes are selected as neighbor feature nodes, and fusion features corresponding to the main feature node are generated based on the main feature node and the node features of the neighbor nodes connected with the main feature node.

And traversing all the feature nodes in the undirected graph to obtain fusion features corresponding to the feature nodes respectively.

S33: and determining the prediction similarity between the feature nodes based on the fusion features respectively corresponding to the feature nodes.

Specifically, the prediction similarity between feature nodes is determined based on the similarity between the fusion features respectively corresponding to the feature nodes.

S34: and training the graph neural network based on the weighted sum of error values between the corresponding prediction similarity and the labeling similarity between the feature nodes corresponding to the undirected graph.

Specifically, the neural network of the graph is trained through error values between corresponding prediction similarity and labeling similarity among feature nodes in the undirected graph.

In a particular embodiment, the plurality of feature nodes further includes a master feature node and a neighbor feature node. That is, the undirected graph is a connection graph formed by the main feature node and the corresponding adjacent feature nodes.

In response to one of the two feature nodes being a master feature node and the other being a neighbor feature node, the corresponding error value between the two feature nodes is multiplied by the first weight to ensure the prediction accuracy between the master feature node and the neighbor feature node. And in response to that the two feature nodes are both neighbor feature nodes, multiplying the corresponding error value between the two feature nodes by a second weight to assist the convergence and stabilization of the model. The first weight is greater than the second weight.

In an embodiment, the specific steps of obtaining the search result corresponding to the information to be queried in step S4 are as follows.

Calculating the similarity between the fusion characteristic information of the information to be inquired and the fusion characteristic information of each candidate neighbor characteristic in the candidate retrieval data set; selecting first feature information associated with candidate neighbor features corresponding to the similarity with the maximum value and the preset number; and determining a retrieval result corresponding to the information to be queried based on the similarity between the first feature information of the information to be queried and the first feature information of each selected candidate neighbor feature.

In the information retrieval method provided by this embodiment, feature extraction is performed on information to be queried to obtain first feature information and second feature information corresponding to the information to be queried; the second characteristic information is dimension reduction characteristic information of the first characteristic information; retrieving in a preset database based on the second characteristic information to obtain a candidate retrieval data set corresponding to the second characteristic information; performing feature fusion on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried; and retrieving in the candidate retrieval data set based on the fusion characteristic information of the information to be queried, and determining a retrieval result corresponding to the information to be queried. The second feature information of the information to be queried and each candidate neighbor feature are subjected to feature fusion based on the candidate neighbor feature corresponding to the second feature information of the information to be queried so as to perform feature upgrading on the second feature information, the detail expression capability of the second feature information and the semantic information of the rich features are improved, retrieval results retrieved based on the fusion feature information are more comprehensive, the recall capability of difficult samples is improved, and high recall of the retrieval results is realized.

Referring to fig. 4, fig. 4 is a schematic diagram of a framework of an embodiment of a terminal according to the present invention. The terminal 80 comprises a memory 81 and a processor 82 coupled to each other, the processor 82 being configured to execute program instructions stored in the memory 81 to implement the steps of any of the above-described embodiments of the information retrieval method. In one particular implementation scenario, the terminal 80 may include, but is not limited to: a microcomputer, a server, and in addition, the terminal 80 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 82 is configured to control itself and the memory 81 to implement the steps of any of the above-described embodiments of the information retrieval method. Processor 82 may also be referred to as a CPU (Central Processing Unit). The processor 82 may be an integrated circuit chip having signal processing capabilities. The Processor 82 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be collectively implemented by an integrated circuit chip.

Referring to fig. 5, fig. 5 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer readable storage medium 90 stores program instructions 901 capable of being executed by a processor, the program instructions 901 being for implementing the steps of any of the above-described embodiments of the information retrieval method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight different aspects of the various embodiments that are the same or similar, which can be referenced with one another and therefore are not repeated herein for brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An information retrieval method, characterized by comprising:

performing feature fusion on the second feature information and candidate neighbor features contained in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried; the candidate neighbor features are preset second feature information with the largest similarity with the second feature information, and the number of the candidate neighbor features is preset number of the preset second feature information;

retrieving in the candidate retrieval data set based on the fusion characteristic information of the information to be queried, and determining a retrieval result corresponding to the information to be queried;

the performing feature fusion on the second feature information and the candidate neighboring features included in the candidate retrieval data set corresponding to the second feature information to obtain fusion feature information corresponding to the information to be queried includes:

determining the similarity corresponding to the second feature information of the information to be queried and the candidate neighbor features corresponding to the information to be queried respectively based on the second feature information of the information to be queried and the candidate neighbor features corresponding to the information to be queried;

generating fusion feature information corresponding to the information to be queried based on the second feature information of the information to be queried and the corresponding similarity between the candidate neighbor features by adopting a graph neural network;

retrieving the fusion characteristic information based on the information to be queried in the candidate retrieval data set, and determining a retrieval result corresponding to the information to be queried, wherein the retrieving comprises the following steps:

calculating the similarity between the fusion characteristic information of the information to be inquired and the fusion characteristic information of each candidate neighbor characteristic in the candidate retrieval data set;

selecting the first feature information associated with the candidate neighbor features corresponding to the similarity with the maximum value and the preset number;

and determining a retrieval result corresponding to the information to be queried based on the similarity between the first feature information of the information to be queried and the first feature information of each selected candidate neighbor feature.

2. The information retrieval method according to claim 1,

the method for extracting the features of the information to be queried to obtain the first feature information and the second feature information corresponding to the information to be queried comprises the following steps:

performing feature extraction on the information to be queried to obtain first feature information of the information to be queried;

respectively performing dimension reduction processing on the first characteristic information based on at least two conversion models to obtain sub-characteristic information of the information to be queried;

and splicing the sub-feature information respectively output by each conversion model to obtain second feature information of the information to be queried.

3. The information retrieval method according to claim 2, wherein the at least two conversion models include a first conversion model and a second conversion model that differ in model parameters;

acquiring first training data, wherein the first training data is associated with corresponding first sample characteristics, and the first training data has a corresponding labeled neighbor information sequence and an identifier; the identifier comprises a first identifier and a second identifier;

inputting the first sample features corresponding to the first training data into the second conversion model for feature conversion to obtain second sample features corresponding to the first training data;

retrieving in the preset database based on the second sample characteristic to obtain a predicted neighbor information sequence corresponding to the first training data;

in response to the first training data having the first identifier, training the second conversion model based on an error value between a sequence of labeled neighbor information of the first training data and the sequence of predicted neighbor information;

in response to the first training data having the second identifier, the second conversion model is trained after doubling the error value between the labeled neighbor information sequence and the predicted neighbor information sequence of the first training data.

4. The information retrieval method according to claim 3,

in response to the confidence level exceeding a confidence level threshold, labeling the first training data corresponding to the confidence level as a first identifier;

in response to the confidence level not exceeding the confidence level threshold, labeling the first training data corresponding to the confidence level as a second identifier.

5. The information retrieval method according to claim 1,

the training method of the graph neural network comprises the following steps:

acquiring second training data, wherein the second training data comprise an undirected graph formed by a plurality of characteristic nodes; the characteristic nodes are associated with labeling similarity;

generating fusion features corresponding to the feature nodes respectively through the graph neural network based on the undirected graph of the second training data;

and training the graph neural network based on the weighted sum of error values between the prediction similarity and the labeling similarity corresponding to the characteristic nodes of the undirected graph.

6. The information retrieval method of claim 5, wherein the plurality of feature nodes include a master feature node and a neighbor feature node;

the training the graph neural network based on a weighted sum of error values between the prediction similarity and the labeling similarity corresponding to each feature node of the undirected graph comprises:

in response to one of the two feature nodes being the master feature node and the other being the neighbor feature node, multiplying the corresponding error value between the two feature nodes by a first weight;

in response to both of the feature nodes being the neighbor feature node, multiplying the error value corresponding between the two feature nodes by a second weight, the first weight being greater than the second weight.

7. A terminal, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor is configured to execute sequence data to implement the steps in the information retrieval method according to any one of claims 1 to 6.

8. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, implements the steps of the information retrieval method according to any one of claims 1 to 6.