WO2023213233A1

WO2023213233A1 - Task processing method, neural network training method, apparatus, device, and medium

Info

Publication number: WO2023213233A1
Application number: PCT/CN2023/091356
Authority: WO
Inventors: 邰骋; 汤林鹏
Original assignee: 墨奇科技（北京）有限公司
Priority date: 2022-05-06
Filing date: 2023-04-27
Publication date: 2023-11-09

Abstract

A task processing method, a neural network training method, an apparatus, a device, and a medium. The task processing method comprises: obtaining first data and second data; obtaining respective graph representations of a first scale and respective graph representations of a second scale of the first data and the second data, the second scale being lower than the first scale, the graph representation of each scale comprising a node of the scale, the node of each scale comprising an attribute of a vector type, a node of at least one scale of each piece of data being obtained by sparsifying dense data corresponding to the data, and the graph representation of at least one scale of each piece of data comprising adjacent edges representing a relative relationship of the node of the scale; performing graph matching on the first scale and the second scale on the first data and the second data, respectively to obtain a first matching result and a second matching result; and determining a multi-scale matching result on the basis of the first matching result and/or the second matching result, and further determining a task processing result.

Description

Task processing methods, neural network training methods, devices, equipment and media

Cross-references to related applications

This application claims priority from Chinese Patent Application No. 202210488516X submitted on May 06, 2022 and Chinese Patent Application No. 2022104884665 submitted on May 06, 2022, the contents of which are incorporated herein by reference in their entirety.

Technical field

The present disclosure relates to the field of artificial intelligence technology, and specifically relates to a task processing method, a neural network training method, a task processing device, a neural network training device, electronic equipment, a computer-readable storage medium, and a computer program product.

Background technique

When analyzing and processing unstructured data such as images, videos, voices, texts, molecular structures, and protein sequences, the original form of these data is often difficult to use directly to produce effective results. A more effective method is to convert the unstructured data into Transform data into semi-structured intermediate representations, and then perform analysis on the intermediate representations. Therefore, determining a suitable intermediate representation of unstructured data and how to use such intermediate representation to effectively analyze and process unstructured data has become an urgent problem to be solved.

The approaches described in this section are not necessarily those that have been previously envisioned or employed. Unless otherwise indicated, it should not be assumed that any method described in this section is prior art merely by virtue of its inclusion in this section. Similarly, unless otherwise indicated, the issues mentioned in this section should not be considered to be recognized in any prior art.

Contents of the invention

The present disclosure provides a task processing method, a neural network training method, a task processing device, a neural network training device, electronic equipment, a computer-readable storage medium, and a computer program product.

According to an aspect of the present disclosure, a task processing method is provided, including: acquiring first data and second data, where the first data and the second data are respectively one of image data, audio data, text data and sequence data. ; Acquire a graph representation of the first scale of each of the first data and the second data. The graph representation of the first scale includes at least one node of the first scale, wherein the node of the first scale has attributes, and the attributes of the nodes of the first scale Contains attributes of vector type property; obtain a second-scale graph representation of each of the first data and the second data, the second scale is lower than the first scale, and the second-scale graph representation includes at least one second-scale node, wherein the second-scale node Having an attribute, the attribute of the node of the second scale includes an attribute of vector type, wherein the node of at least one scale of each of the first data and the second data is obtained by sparsifying the dense data corresponding to the data. Obtained, the graph representation of at least one scale of each data includes at least one adjacent edge, each of the at least one adjacent edge is used to represent the relative relationship between two nodes of the same scale, and the adjacent edges have attributes; the first Perform graph matching on the graph representation of the first scale of the data and the graph representation of the first scale of the second data to obtain a first matching result; combine the graph representation of the second scale of the first data with the graph representation of the second scale of the second data. The graph representation performs graph matching to obtain a second matching result; determines a multi-scale matching result based on the first matching result and the second matching result; and determines a task processing result based on the multi-scale matching result.

According to one aspect of the present disclosure, a neural network training method is provided. The method includes: acquiring first sample data and second sample data, where the first sample data and the second sample data are respectively image data, audio data, One of text data, molecular structure data and sequence data; obtain respective multi-scale graph representations of the first sample data and the second sample data, wherein the multi-scale graph representation is determined using a graph representation extraction network, and the multi-scale The graph representation includes a graph representation of the first scale and a graph representation of the second scale; graph matching is performed on the graph representation of the first scale of the first sample data and the graph representation of the first scale of the second sample data to obtain the representation of the first scale. The first current matching result of the matching degree of one scale; performing graph matching on the graph representation of the second scale of the first sample data and the graph representation of the second scale of the second sample data to obtain the matching degree characterizing the second scale. the second current matching result; obtain the target matching result and/or target task processing result of the first sample data and the second sample data; according to the target matching result and/or target task processing result, and the first current matching result and/or Or the second current matching result is used to determine the loss value; and based on the loss value, the training graph represents the extraction network.

According to another aspect of the present disclosure, a task processing device is provided, including: a first acquisition unit configured to acquire first data and second data, where the first data and the second data are respectively image data, audio data, One of text data, molecular structure data, and sequence data; a second acquisition unit configured to acquire a first-scale graph representation of each of the first data and the second data, the first-scale graph representation including at least one first nodes of the scale, wherein the nodes of the first scale have attributes, and the attributes of the nodes of the first scale include vector type attributes; the third acquisition unit is configured to acquire the graphs of the second scale of the first data and the second data respectively. represents that the second scale is lower than the first scale, the graph representation of the second scale includes at least one node of the second scale, wherein the node of the second scale has attributes, and the attributes of the nodes of the second scale include attributes of vector type, where , each of the first data and the second data The nodes of at least one scale are obtained by sparsifying the dense data corresponding to the data. The graph representation of at least one scale of each data includes at least one adjacent edge, and each of the at least one adjacent edge is represented by In order to represent the relative relationship between two nodes of the same scale, the adjacent edges have attributes; the first graph matching unit is configured to graph the graph representation of the first scale of the first data and the graph representation of the first scale of the second data. matching to obtain the first matching result; the second graph matching unit is configured to perform graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the second matching result. ; The first determination unit is configured to determine the multi-scale matching result based on the first matching result and the second matching result; and the second determination unit is configured to determine the task processing result based on the multi-scale matching result.

According to another aspect of the present disclosure, a neural network training device is provided. The method includes: a fourth acquisition unit configured to acquire first sample data and second sample data, first sample data and second sample The data are respectively one of image data, audio data, text data, molecular structure data and sequence data; the fifth acquisition unit is configured to acquire the respective multi-scale graph representations of the first sample data and the second sample data, Among them, the multi-scale graph representation is determined using the graph representation extraction network, and the multi-scale graph representation includes a first-scale graph representation and a second-scale graph representation; the third graph matching unit is configured to convert the first sample data The graph representation of the first scale and the graph representation of the first scale of the second sample data are graph matched to obtain a first current matching result that represents the matching degree of the first scale; the fourth graph matching unit is configured to match the first The graph representation of the second scale of the sample data is graph matched with the graph representation of the second scale of the second sample data to obtain a second current matching result that represents the matching degree of the second scale; the seventh acquisition unit is configured as Obtain the target matching result and/or the target task processing result of the first sample data and the second sample data; the third determination unit is configured to calculate the target matching result and/or the target task processing result according to the first current matching result and /or the second current matching result determines the loss value; and the training unit is configured to train the graph representation extraction network according to the loss value.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to cause the computer to perform the above method.

According to another aspect of the present disclosure, a computer program product is also provided, including a computer program, wherein the computer program implements the above method when executed by a processor.

It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

Description of the drawings

The drawings illustrate exemplary embodiments and constitute a part of the specification, and together with the written description, serve to explain exemplary implementations of the embodiments. The embodiments shown are for illustrative purposes only and do not limit the scope of the claims. Throughout the drawings, the same reference numbers refer to similar, but not necessarily identical, elements.

Figure 1 shows a flowchart of a task processing method according to an embodiment of the present disclosure;

Figure 2 shows a schematic diagram of a multi-scale graph representation according to an embodiment of the present disclosure;

Figure 3 shows a flow chart of the graph matching process at each scale in the method shown in Figure 1;

Figure 4 shows a flow chart for determining the matching results of candidate matching point pairs in the method shown in Figure 3;

Figure 5 shows a flow chart for determining the matching results of candidate matching edge pairs in the method shown in Figure 3;

Figure 6 shows a flow chart of a training method of a neural network according to an embodiment of the present disclosure;

Figure 7 shows a flow chart for obtaining first sample data and second sample data in the method shown in Figure 6;

Figure 8 shows a flow chart for determining the loss value in the method shown in Figure 6;

Figure 9 shows a flow chart of a training method of a neural network according to an embodiment of the present disclosure;

Figure 10 shows a structural block diagram of a task processing device according to an embodiment of the present disclosure;

Figure 11 shows a structural block diagram of a neural network training device according to an embodiment of the present disclosure; and

FIG. 12 shows a structural block diagram of an electronic device of a server or a client according to an embodiment of the present disclosure.

Detailed ways

Example embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding and should be considered as illustrative only. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

In this disclosure, unless otherwise stated, the use of the terms “first”, “second”, etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements. Such terms are only used for Distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of the element, and in some cases, based on contextual description, they may refer to different instances.

The terminology used in the description of various examples in this disclosure is for the purpose of describing the particular example only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element may be one or more. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Figure 1 shows a flow chart of a task processing method 100 according to an embodiment of the present disclosure. The method 100 includes:

Step 101: Obtain first data and second data. The first data and second data are respectively one of image data, audio data, text data, molecular structure data and sequence data;

Step 102: Obtain a graph representation of the first scale of each of the first data and the second data. The graph representation of the first scale includes at least one node of the first scale, wherein the node of the first scale has attributes, and the node of the first scale The attributes include vector type attributes;

Step 103: Obtain the graph representation of the second scale of each of the first data and the second data. The second scale is lower than the first scale. The graph representation of the second scale includes at least one node of the second scale, where the second scale of The nodes have attributes, and the attributes of the nodes of the second scale include attributes of vector type, wherein the nodes of at least one scale of each of the first data and the second data are obtained by sparsifying the dense data corresponding to the data. The obtained graph representation of at least one scale of each data includes at least one adjacent edge, each of the at least one adjacent edge is used to represent the relative relationship between two nodes of the same scale, and the adjacent edges have attributes;

Step 104: Perform graph matching on the graph representation of the first scale of the first data and the graph representation of the first scale of the second data to obtain a first matching result;

Step 105: Perform graph matching on the second-scale graph representation of the first data and the second-scale graph representation of the second data to obtain a second matching result;

Step 106: Determine the multi-scale matching result based on the first matching result and the second matching result; and

Step 107: Determine the task processing result based on the multi-scale matching results.

According to the method of this embodiment, by extracting the decoupling features of multiple scales of the first data and the second data, a more general and stronger multi-scale graph representation of each data can be obtained, and then through the multi-scale graph The representation of graph matching and the determination of task processing results based on the graph matching results make it possible to efficiently and fully utilize the rich information contained in the data for task processing to obtain accurate task processing results. In addition, graph matching of multi-scale graph representation can enhance robustness to better cope with changes in image perspective, changes in text expression, and differences in speech speakers.

The first data and the second data may respectively be one of image data (including pictures and videos), audio data, text data, molecular structure data and sequence data. Sequence data can be, for example, protein sequence data, gene sequence data, or data in other sequence forms. The first data and the second data may be the same type of data or may be different types of data, which is not limited here.

The first data and the second data may be original data or data obtained after specific processing. In some embodiments, the image data may be an original image, or a pre-processed image after pre-processing the original image; the audio data may be the original sampling data of audio, or it may be a pre-processed image after pre-processing the original sampling data. Preprocessed data (such as a spectrogram obtained by preprocessing original sampling data); text data can be multiple original strings, or preprocessed data obtained by preprocessing text data, which is not limited here. .

After obtaining the first data and the second data, a multi-scale graph representation corresponding to each data can be obtained. In a multi-scale graph representation, each scale graph representation may include at least one node. Nodes can have attributes, and the attributes of nodes can include attributes of vector type or attributes of scalar type. The scalar type attributes may further include categorical attributes (for example, discrete values) and numerical attributes (for example, continuous values). In an example embodiment, the nodes in the graph representation of a certain scale may be, for example, original data (or dense data as described in detail below. The dense data may be, for example, a feature map obtained after feature extraction of the original data. Preprocessed data obtained by preprocessing the original data) multiple objects obtained by target detection. The vector type attributes of a node may include, for example, the feature vector corresponding to the object. The numerical attributes of the node may include, for example, the coordinates, size, direction field, gradient field, texture density and significance of the node of the object's neighborhood. The category attribute may include, for example, the classification category of the object. It is understood that different nodes may include different attributes.

According to some embodiments, in the multi-scale graph representation, at least one scale graph representation may further include at least one adjacent edge. Adjacency edges can be used to characterize the relative relationship between two nodes on the same scale. Adjacent edges can have attributes, and the attributes of adjacent edges can include attributes of vector type or scalar type. In an example embodiment, the vector type attribute of the adjacent edge may, for example, include the feature vectors of its two corresponding nodes and/or the further processing results of the two feature vectors, and the numerical attribute of the adjacent edge may, for example, include the adjacency The position information and/or geometric information such as the coordinates, length, and angle of the edge may also include the significance of the adjacent edge. The category attributes of the adjacent edge may include the category of the adjacent edge, such as different types of chemical bonds, different types of forces, etc. wait.

According to some embodiments, the multi-scale graph representation may also include at least one dependent edge. Dependent edges can be used to characterize the dependence relationship between two nodes at different scales. A dependent edge can have attributes, and the attributes of a subordinate edge can include attributes of vector type or scalar type. In an example embodiment, the dependent edge may, for example, represent a relationship between target detection objects at two scales, such as a vehicle detected at a first scale and a wheel of the vehicle detected at a second scale. Can have dependent edges. The vector type attributes of the subordinate edge may include, for example, the feature vectors of the two corresponding nodes and/or the further processing results of the two feature vectors. The numerical attributes of the subordinate edge may include, for example, the coordinates, length, and angle of the subordinate edge. Such as location information and geometric information as well as the subordinate edge connected to The correlation between connected nodes, the category attribute of the subordinate edge may include the category attribute of the node connected by the subordinate edge.

It should be noted that the graph referred to in the embodiment of the present invention is a generalized graph and may include a single node graph or a multi-node graph. Graph matching can be to match the nodes contained in the graph, or it can also be to match the nodes and edges contained in the graph. When a scale graph is a single-node graph, graph matching refers to the matching between vectors corresponding to nodes. When a scale graph is a multi-node graph, graph matching can include graph matching in the traditional sense, graph matching using attributes of nodes and edges (including attributes of vector type and scalar type), or graph matching. It can include node/edge pairing check (for example, node/edge pairing check for solving geometric relationships through projective transformation, etc.) and a combination of the above. Among them, graph matching using the attributes of nodes and edges will be explained in detail later.

Figure 2 shows a schematic diagram of a multi-scale graph representation according to one embodiment of the present disclosure. As shown in Figure 2, graph representations 202, 204, and 206 of three scales from high to low constitute a multi-scale graph representation. Each graph representation includes multiple nodes, and the graph representation 206 includes multiple adjacent edges. Graph representations 202, 204, 206 may be obtained by sparsifying dense data 208. Specifically, the dense data 208 includes three dense data corresponding to the three scales respectively. By sparsifying the three dense data respectively, the graph representations 202, 204, and 206 of the three scales can be obtained.

As a result, by obtaining graph representations of different features including scalars, vectors, graphs, etc. at different scales, a more versatile and powerful intermediate representation of various types of data can be obtained, and downstream matching can be improved. The accuracy of the results of tasks, retrieval tasks, classification tasks, recognition tasks, generation tasks, and other types of data analysis and processing related tasks. In addition, by using subordinate edges, the correlation between graph representations at different scales can be strengthened, thereby further enriching the information included in multi-scale graph representations.

It will be appreciated that the present disclosure does not limit the number of scales included in a multi-scale graph representation. In some embodiments, the multi-scale graph representation may include two-scale, three-scale, or more-scale graph representations, which are not limited here. For ease of description, this disclosure uses the first scale and the second scale lower than the first scale as examples to illustrate the form, generation method, matching method, etc. of the multi-scale graph representation, but is not intended to limit the scope of the disclosure.

It should be noted that the level of the scale can be understood as the corresponding graph representation's emphasis on the whole or part of the data. For example, the size of the corresponding part of each node in the graph representation in the original data, the size of the corresponding part of the graph representation of the scale, The level of scale is measured by the number of nodes and other methods. In the example, the nodes represented by the high-scale graph may correspond to the entire image and the text paragraph, and the nodes represented by the low-scale graph may correspond to parts of the image, words or words in the text, and so on.

How to obtain a multi-scale graph representation will be described below with reference to embodiments.

According to some embodiments, the nodes of at least one scale may be generated by dense data corresponding to the data. obtained by sparsification.

Dense data, or dense graphs, may include, for example, original images with dense pixels, feature maps including dense feature vectors obtained by convolving the original images, audio data including dense sampling points (and spectralization of the audio data The obtained spectrogram includes dense pixels), text paragraphs including dense characters or words, and dense molecular structure data and sequence data, etc. By sparsifying dense data, multiple nodes can be obtained, that is, a sparse graph. Among them, each node can correspond to a part of the area in the dense data and has attributes. It can be understood that dense data may also include multiple nodes, such as pixels in images, sampling points in audio data, words or words in text data, etc., and each node may include a label type attribute (for example, in locations, categories in dense data) and vector-type attributes (e.g., feature vectors).

According to some embodiments, at least one node of the first scale and at least one node of the second scale may be obtained by sparsifying the same dense data respectively. In other words, the same dense data can be sparsified to different degrees to obtain nodes of different scales.

According to some embodiments, dense data may include multiple scales. Dense data that includes multiple scales can be feature maps at multiple scales in a feature pyramid. At least one node of the first scale and at least one node of the second scale may be obtained by separately sparsifying each of the two scales of the dense data. In other words, you can first obtain multiple dense data of different scales, and then sparse the dense data of each scale separately to obtain nodes of the corresponding scale. In an example embodiment, the original image can be convolved with different downsampling multiples to obtain feature maps of different sizes, that is, dense data of different scales. Furthermore, these dense data can be sparsified separately to obtain nodes of different scales. Both of the above methods can generate nodes on multiple scales in parallel after obtaining dense data.

In some embodiments, the saliency of dense nodes can be exploited to determine nodes in dense data. Properties of scalar type for dense nodes can include saliency. Salience represents the importance of each dense node in dense data, which can be represented by the probability distribution over all dense nodes. In some embodiments, the saliency of a dense node may be determined based on the feature vector of the dense node. In an example embodiment, the feature vectors of all dense nodes may be processed using a saliency network to determine the saliency of each dense node.

Sparsifying the dense data may include, for example, determining nodes whose significance satisfies the third preset condition among at least a part of the plurality of dense nodes as the sparsified nodes. It can be understood that those skilled in the art can set the third preset condition according to needs, which is not limited here. In an example embodiment, the third preset condition may be top-k, that is, selecting the k dense nodes with the highest significance as the sparsified nodes, and/or the third The preset condition can be a node whose significance is greater than the significance threshold.

In some embodiments, in addition to saliency, the attention score generated by the attention mechanism or other measures of the importance of dense nodes can also be used as the basis for filtering nodes during the sparsification process. These methods are all in within the scope of this disclosure.

In some embodiments, detection-based approaches may be used to determine nodes in dense data. The detection-based method may include key point detection, target detection, or other types of detection, which are not limited here.

It can be understood that node sparsification can be performed through a node sparsification network, and the node sparsification network can include a detection network, a saliency network, etc. When the node sparse network is a detection network, the dense data is input into the detection network, and the sparse nodes corresponding to the dense data and the corresponding confidence of the nodes are obtained. When the node sparse network is a saliency network, the dense data and the feature vector corresponding to the dense data are input into the saliency network, and the saliency score of each dense node corresponding to the dense data is obtained. The saliency score is greater than the saliency threshold and/ Or the top k dense nodes with the highest significance scores are used as the nodes after sparsification. In some embodiments, conditions such as non-maximal suppression (non-maximal suppression) can also be jointly considered for comprehensive screening.

According to some embodiments, the nodes of at least one scale may be obtained by sparsifying the part of the dense data corresponding to the node position of another scale, and the nodes of the other scale may be obtained by sparsifying the dense data. owned. In an example embodiment, the nodes of the first scale may be determined, for example, through target detection, and each node of the first scale may correspond to a part of the area in the dense data (ie, the detection frame output by target detection), then The nodes of the second scale can be obtained by sparsifying the corresponding part of the dense data of each node of the first scale, thereby obtaining the nodes of the second scale corresponding to each node of the first scale. In this way, more valuable second-scale nodes can be obtained, thereby improving the processing efficiency and accuracy of subsequent matching tasks and downstream tasks.

According to some embodiments, nodes of at least one scale may be obtained by merging dense data. In some example embodiments, for example, clustering or graph neural network methods may be used to sparse dense data to obtain low-scale nodes. These low-scale nodes can then be further processed to obtain high-scale nodes.

According to some embodiments, the nodes of at least one scale may be obtained by merging low-scale nodes, and the low-scale nodes may be obtained by sparsifying dense data. In some example embodiments, a clustering or graph neural network method may be used to cluster multiple low-scale nodes obtained by sparsification, or the package A subgraph containing multiple low-scale nodes is input into the graph neural network to obtain higher-scale nodes and/or node attributes.

The above-mentioned merging of nodes can be based on the scalar type attributes of the nodes (for example, location information), or it can be based on the vector type attributes of the nodes (for example, feature vectors), or it can also be based on the adjacent edges in the graph representation. Scalar or vector type attributes (for example, co-occurrence probability, correlation, etc. of two connected nodes) are not limited here.

The attributes of nodes at each scale may be determined before, at the same time, or after determining the locations of the nodes at each scale and their correspondence with nodes at other scales or dense nodes.

According to some embodiments, the attributes of the node obtained by sparsification may be determined based on the attributes of at least a part of the dense nodes corresponding to the node among the plurality of dense nodes. In some embodiments, the attributes of the node may be determined based on the attributes of neighbor nodes within a certain range of the dense node corresponding to the node location in the dense data. For example, this part of neighbor nodes can be input into the feature extraction network to obtain the feature vector of the node, or the average of the feature vectors of this part of neighbor nodes can be determined as the feature vector of the node, or the feature vector of this part of neighbor nodes can be based on significant The weighted average of the properties is determined as the feature vector of the node. In an example embodiment, the node is determined through target detection, then the dense nodes in the detection frame corresponding to the node in the dense data can be input into the feature extraction network to extract the feature vector (vector type) corresponding to the node. properties). In another example embodiment, the node is determined through merging, and the attributes of the node can be determined through clustering or a graph neural network based on the attributes of all low-scale nodes used to merge to obtain the node.

According to some embodiments, the nodes of at least one scale may be obtained by merging nodes of another scale obtained by sparsification, and the attributes of the merged nodes may be nodes that have a subordinate relationship with the node among the nodes of another scale. properties are determined.

In some embodiments, the attributes of at least a part of the dense nodes corresponding to the node or the nodes having a subordinate relationship with the node may be further processed to obtain the attributes of the node. In an example embodiment, a graph neural network may be used to process the attributes of the nodes corresponding to the node to obtain the attributes of the node. In addition to the above methods, various attributes of nodes can also be determined through other methods, which are not limited here.

Multiscale graphs can include adjacency edges when the relative relationships between nodes are helpful in characterizing the data. For example, the distance between two targets in the image, the role between the two targets in the image, the association between the preceding and following words in speech, and the interaction of different groups in the sequence.

According to some embodiments, at least one adjacent edge may be determined based on respective attributes of at least one node on the same scale. By analyzing the attributes of nodes, nodes with associated relationships can be determined in a single-scale graph representation. Point pairs to generate corresponding adjacent edges.

In some embodiments, adjacency edges may be generated based on rules. In some embodiments, adjacency edges may be generated between the top k node pairs whose distance is less than a preset threshold and/or the closest distance. In some embodiments, adjoining edges may be generated only along specific directions. It can be understood that those skilled in the art can set corresponding adjacent edge generation rules by themselves based on prior knowledge, and generate adjacent edges according to the set rules, which is not limited here.

In some embodiments, candidate adjacency edges may be generated first, and then adjacency edges may be filtered out from the candidate adjacency edges. According to some embodiments, the at least one adjacent edge is determined by performing the following steps: determining at least one candidate adjacent edge based on at least one node of the same scale; determining the respective attributes of the at least one candidate adjacent edge based on at least one node of the same scale. significance; and determining the adjacent edge whose significance satisfies the fourth preset condition among the at least one candidate adjacent edge as at least one adjacent edge. By using saliency to generate adjacent edges, the generation process of adjacent edges can be optimized through training to improve the effectiveness of the generated adjacent edges. It can be understood that those skilled in the art can set corresponding fourth preset conditions according to needs. In an example embodiment, the fourth preset condition may be the top k items whose significance is greater than the significance threshold and/or the highest significance.

According to some embodiments, the attributes of each of the at least one adjacent edge may be determined based on at least one of respective attributes of two nodes connected by the adjacent edge and a relative relationship between the two nodes. In an example embodiment, the position, length, angle, interaction size, etc. of the adjacent edge connecting the two nodes can be determined as attributes of the adjacent edge based on the respective positions/attributes of the two nodes. In some embodiments, a priori knowledge can be used to determine the relative relationship between two nodes based on rules, and the attributes of the adjacent edges can be determined based on the relative relationship.

According to some embodiments, at least one affiliation edge may be directly determined based on affiliation relationships between nodes at two scales. In an example embodiment, the first node at the first scale is obtained by performing object detection on dense data, and the second node at the second scale is obtained by performing further object detection on the area corresponding to the first node in the dense data. And obtained, the first node and the second node have a subordinate relationship, and a subordinate edge can be generated between the first node and the second node. In another example embodiment, the nodes at the second scale are obtained by clustering dense data, and the nodes at the first scale are obtained by merging the nodes at the second scale, and then the merging is used to obtain the first There is a subordinate relationship between the nodes of the second scale and the nodes of the first scale, and subordinate edges can be generated between the nodes of the second scale and the nodes of the first scale.

According to some embodiments, the attributes of the dependent edge may be determined based on the attributes of the two nodes connected to the dependent edge. As described above, the attributes of the subordinate edge can be determined in various ways according to the vector type attributes and/or the scalar type attributes of the two nodes connected to the subordinate edge, which are not limited here.

According to some embodiments, the graph representation of the first scale of each of the first data and the second data may be generated using the first network and/or the graph representation of the second scale of the first data and the second data may be generated using the first network. generated by the second network. In some embodiments, the above-mentioned generation process of nodes, adjacent edges, and subordinate edges and the determination process of attributes of nodes, adjacent edges, and subordinate edges may be performed entirely or partially using the first network or the second network, or may be performed entirely or partially. Part of it is performed using a rule-based method, part of the link may be performed using the first network or the second network, and another part of the link is performed using a rule-based method, which is not limited here. When using the network to generate nodes, adjacent edges, and subordinate edges and/or determine attributes, differentiable parts can be added to the matching results, so that the generation process and/or the determination of attributes can be optimized through training to Further improve the expressive ability of graph representation.

After obtaining the multi-scale graph representations of the first data and the second data, graph matching can be performed on the graph representations of the first data and the second data at different scales to obtain matching results corresponding to each scale, and then according to These matching results determine the multi-scale matching results.

According to some embodiments, as shown in Figure 3, the graph matching process for each of the first scale and the second scale may include:

Step 301: Determine a candidate matching point pair according to at least one node included in the graph representation of the scale of the first data and at least one node included in the graph representation of the scale of the second data, wherein the candidate matching point pair includes a node belonging to the first A first candidate matching node of the graph representation of the scale of one data and a second candidate matching node belonging to the graph representation of the scale of the second data;

Step 302: For the candidate matching point pair, determine the matching result of the candidate matching point pair based on the feature vector of the first candidate matching node included in the candidate matching point pair and the feature vector of the second candidate matching node included in the candidate matching point pair. ;

Step 303: Based on the matching results of the candidate matching point pairs, determine the matching results of the graph representation of the scale of the first data and the graph representation of the scale of the second data.

It can be understood that the matching result of the graph representation of the scale of the first data and the graph representation of the scale of the second data can be determined based on the matching results of multiple candidate matching point pairs, when the graph representation of the scale includes adjacent edges. , can also be determined based on the matching results of multiple candidate matching edge pairs.

As a result, matching is performed in two dimensions: the graph structure composed of nodes in the graph representation and the attributes of the nodes themselves (such as feature vectors), making it possible to make full use of the information contained in the data for matching, improving the matching results and the results of subsequent tasks. accuracy.

In step 301, nodes in the graph representation of the different data (and in the example, between nodes The similarity information of the structure presented by adjacent edges) determines the matching relationship between nodes in the graph representation of different data to obtain candidate matching point pairs. Existing matching algorithms can be combined to perform node matching between graph representations of different data to obtain candidate matching point pairs.

In some embodiments, point-by-point matching can be used to quickly obtain candidate matching point pairs.

In an example embodiment, when it is determined that the matching result of the candidate matching point pair is a match in step 302, a new candidate matching point pair may be determined (for example, based on the confirmed matching matching point pair A and B, determine the nearest node A Neighbor C and the nearest neighbor D of node B are new candidate matching point pairs), and then perform step 302 for the new candidate matching point pair until the new candidate matching point pair does not match or the new candidate matching point pair cannot be determined. Furthermore, in step 303, the matching results of the graph representation of the scale of the first data and the graph representation of the second data of the scale are determined based on the matching results of all historical candidate matching point pairs in the graph representation of the scale.

In some example embodiments, steps 302 and 303 may be performed each time a new candidate matching point pair is obtained, and then it is determined whether to continue searching for more candidate matching point pairs based on the currently obtained matching result represented by the graph. If the matching results represented by the graph at this time can determine the two data matches (for example, the matching score is greater than the preset threshold), the search can be stopped and the results returned; otherwise, the search can be continued until no more candidate matching point pairs can be found.

In some embodiments, by combining a tree growing algorithm and a beam search, a branch can be grown on the tree composed of the matched nodes at each step of the recursion, and the new leaves (i.e., all possible growing branches) can be calculated. Score, select the best k leaves as the next branches to achieve point-by-point matching. It is understandable that other methods can also be used to achieve point-by-point matching, which is not limited here.

In some embodiments, a global matching method (such as the Hungarian algorithm) can be used to obtain candidate matching point pairs.

In some embodiments, dynamic programming may be used to obtain candidate matching point pairs. The dynamic programming method can obtain the globally optimal matching result. In an example embodiment, the matching result may include multiple candidate matching point pairs, step 302 may be performed for each candidate matching point pair to obtain the corresponding matching result, and in step 303, based on all candidate matching point pairs The match result determines the match result of the graph representation.

In step 302, various methods may be used to determine the matching results of the two nodes based on the attributes of the first candidate matching node and the attributes of the second candidate matching node.

In some embodiments, the matching result of the candidate matching point pair may be, for example, the similarity between the feature vector of the first candidate matching node and the feature vector of the second candidate matching node. In some embodiments, the matching result of the candidate matching point pair may also be the saliency of the first candidate matching node, the saliency of the second candidate matching node, and the feature vector of the first candidate matching node and the second candidate matching node. The product of similarities between feature vectors. Such a number The matching result of value type can also be called the matching score of the node.

In some embodiments, the first point pair matching result can be determined using the scalar type attribute of the node, and then it is determined based on the first point pair matching result whether it is necessary to further use the vector type attribute of the node to determine the second point pair matching result. As shown in Figure 4, step 302, determining the matching result of the candidate matching point pair may include: step 401, based on the scalar type attribute of the first candidate matching node included in the candidate matching point pair and the third candidate matching point pair included in the candidate matching point pair. The scalar type attributes of the two candidate matching nodes determine the matching result of the first point pair of the candidate matching point pair; step 402, in response to determining that the matching result of the first point pair of the candidate matching point pair satisfies the first preset condition, based on the candidate matching The feature vector of the first candidate matching node included in the point pair and the feature vector of the second candidate matching node included in the candidate matching point pair determine the second point pair matching result of the candidate matching point pair; and step 403, based on the second Point pair matching results determine the matching results of candidate matching point pairs. In this way, on the one hand, prior knowledge can be used to judge the matching results based on the attributes of the scalar type. On the other hand, the amount of calculation can be reduced and the calculation speed of the matching results can be improved.

In step 401, for example, the consistency or correlation of the category attributes included in the scalar type attributes may be determined as the first point pair matching result, or the difference, ratio, or difference of the numerical attributes included in the scalar type attributes may be determined. Other calculation results are determined as the first point pair matching results, and the first point pair matching results can also be determined through other methods, which are not limited here.

In step 402, the first preset condition may correspond to the above-mentioned first point pair matching result, for example, it may be that the category attributes are consistent, or it may be that the difference between the numerical attributes is less than a threshold. It can be understood that those skilled in the art can set the first preset condition according to needs, which is not limited here. The method of determining the matching result of the second point is similar to the method of determining the matching result of the two nodes using their respective feature vectors described above, and will not be described again here.

In step 403, the second point pair matching result may be directly determined as the matching result of the candidate matching point pair, or the matching result of the candidate matching point pair may be determined based on the first point pair matching result and the second point pair matching result. In an example embodiment, the first point pair matching result is the ratio of the numerical attributes of the two nodes in the candidate matching point pair, and the second point pair matching result is the similarity of the feature vectors of the two nodes, then it can be The comprehensive calculation result of the ratio and the similarity is determined as the matching result of the candidate matching point pair.

In some embodiments, after obtaining the candidate matching point pairs, the candidate matching point pairs can also be filtered using the scalar type attribute of the node, so that some unmatched point pairs can be filtered out to obtain a more accurate graph representation matching result. And it can reduce the calculation amount of the graph representation matching result calculation process.

Return to Figure 3. In some embodiments, the neighbor nodes and neighbor adjacency edges of the first candidate matching node may also be used. And the attributes of the neighbor nodes and neighbor adjacent edges of the second candidate matching node determine the matching results of the two candidate matching nodes. It can be understood that when the neighbor nodes of two nodes are relatively similar and the edges connecting the two nodes are relatively similar, the probability of matching between the two nodes is higher.

In step 303, the matching result of the graph representation of the first data and the second data at the scale may be, for example, the sum of the matching scores of all candidate matching point pairs. It will be appreciated that other means of determining the matching result of the graph representation may also be used. In one embodiment, a comparison result of a comprehensive matching score and a preset threshold may be determined as the final matching result. In one embodiment, each candidate matching point pair may have a weight, and the final matching result may be, for example, a weighted sum of matching scores of all candidate matching point pairs. In one embodiment, the matching result of the candidate matching point pair indicates whether the attributes of the candidate matching point pair are consistent, and then the matching result represented by the graph can be determined based on these binary judgment results.

When performing graph matching, the adjacent edges included in the graph representation can also be matched, and the matching result of the graph representation can be determined based on the matching results of the adjacent edges. In some embodiments, if there are only nodes in the graph representation, matching can be performed based on the nodes; if the graph representation includes nodes that can be adjacent to edges, both can be used for matching at the same time.

According to some embodiments, as shown in Figure 3, the graph matching process for each scale in the first scale and the second scale may further include:

Step 304: Determine a candidate matching edge pair according to at least one adjacent edge included in the graph representation of the scale of the first data and at least one adjacent edge included in the graph representation of the scale of the second data, wherein the candidate matching edge pair includes a first candidate matching adjacency edge belonging to the graph representation of the scale of the first data and a second candidate matching adjacency edge belonging to the graph representation of the scale of the second data;

Step 305: For the candidate matching edge pair, determine the matching result of the candidate matching edge pair based on the attributes of the first candidate matching adjacent edge included in the candidate matching edge pair and the attributes of the second candidate matching adjacent edge included in the candidate matching edge pair. ;as well as

Step 306: Based on the matching results of the candidate matching edge pairs, determine the matching results of the graph representation of the scale of the first data and the graph representation of the scale of the second data.

Therefore, by matching the graph structure composed of nodes and adjacent edges in the graph representation and the attributes included in the adjacent edges, the information contained in the data can be fully utilized for matching, and the matching results and subsequent tasks are improved. accuracy of results.

In some embodiments, step 304 may be performed simultaneously with step 301. In other words, the method described above can be used to obtain candidate matching point pairs and candidate matching edge pairs at the same time. In some embodiments, candidate matches may be determined first Match point pairs, and then determine candidate matching edge pairs based on the adjacent edges between the points included in these candidate matching point pairs.

It can be understood that the method of determining the matching result of the candidate matching edge pair is similar to the method of determining the matching result of the candidate matching point pair. The method of determining the matching result of the graph representation based on the matching result of the candidate matching edge pair is the same as the method of determining the matching result of the candidate matching point pair. The method of determining the graph representation of the pair matching results is similar and will not be described in detail here.

In step 305, various methods may be used to determine the matching results of the two adjacent edges based on the attributes of the first candidate matching adjacent edge and the attributes of the second candidate matching adjacent edge.

In some embodiments, the matching result of the candidate matching edge pair may be, for example, the similarity between the feature vector of the first candidate matching adjacent edge and the feature vector of the second candidate matching adjacent edge. In some embodiments, the matching result of the candidate matching edge pair may be the saliency of the first candidate matching adjacent edge, the saliency of the second candidate matching adjacent edge, and the feature vector of the first candidate matching adjacent edge and the second candidate matching The product of the similarities of the eigenvectors of adjacent edges.

In some embodiments, you can first use the scalar type attribute of the adjacent edge to determine the first edge pair matching result, and then determine whether it is necessary to further use the vector type attribute of the adjacent edge to determine the second edge pair matching based on the first edge pair matching result. result. As shown in Figure 5, step 305, determining the matching result of the candidate matching edge pair may include: step 501, based on the scalar type attribute of the first candidate matching adjacent edge included in the candidate matching edge pair and the scalar type attribute included in the candidate matching edge pair. The scalar type attribute of the second candidate matching adjacent edge determines the first edge pair matching result of the candidate matching edge pair; step 502, in response to determining that the first edge pair matching result of the candidate matching edge pair satisfies the second preset condition, based on The feature vector of the first candidate matching adjacent edge included in the candidate matching edge pair and the feature vector of the second candidate matching adjacent edge included in the candidate matching edge pair determine the second edge pair matching result of the candidate matching edge pair; and step 503 , based on the second edge pair matching result, determine the matching result of the candidate matching edge pair.

It can be understood that the operations on the candidate matching edge pairs in steps 501 to 503 are similar to the operations on the candidate matching point pairs in steps 401 to 403, respectively, and will not be described again here. Those skilled in the art can set the second preset condition according to their needs, which is not limited here.

Return to Figure 3. In some embodiments, the neighbor nodes of the first candidate matching adjacent edge and the neighbor nodes of the second candidate matching adjacent edge may also be used to determine the matching results of the two candidate matching adjacent edges.

In step 303, the matching result of the graph representation of the first data and the second data at this scale may be the sum of the matching scores of all candidate matching point pairs and/or the matching scores of all candidate matching edge pairs, or it may be It is obtained by using other methods based on the matching results of the candidate matching point pairs and/or the matching results of the candidate matching edge pairs, and is not limited here.

In some embodiments, in addition to the match score, the match result may also be determined based on the pairing check results of the node // edge. For example, node/edge pairing checking includes node/edge pairing checking for solving geometric relationships through projective transformation, etc. It is understandable that the matching of nodes in different graph representations can also correspond to the transformation relationship in the geometric space between data. Explicit transformations can include projective transformation in scene matching and isometric transformation in fingerprint matching. Implicit transformations can include changes in speakers and environments in speech-related tasks. The pairing check results of nodes//edges can affect the matching results in two ways: First, in the process of node matching to obtain candidate matching point pairs, constraints can be added to match the point pairs/edge pairs that satisfy the constraints. Match point/edge pairs as candidates, thereby bringing prior knowledge into the matching process and speeding up the matching process. The second type: after obtaining the initial graph matching result based on the matching result of the candidate matching point pair/edge pair, the check result can be determined based on the pairing check result of the node//edge pair, and the final graph can be determined based on the initial graph matching result and the check result. Matching results. For example, if the initial graph matching result shows that the matching degree is 80%, and the inspection result shows that it does not match, the final graph matching result can be weighted to obtain, for example, 70%.

In the process of graph matching for multi-scale graph representation, graph matching at each scale can be performed independently, or graph matching at a certain scale can be performed first, and then whether to perform graph matching at other scales is determined based on the matching results at that scale. matching, or adjust graph matching strategies or graph matching parameters at other scales.

According to some embodiments, step 105, performing graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the second matching result may include: in response to determining the first matching result For successful matching, graph matching is performed on the graph representation of the first data at the second scale and the graph representation at the second scale of the second data to obtain a second matching result. Therefore, by first performing the first-scale graph matching that reflects the overall information (less information), and then judging whether to perform the second-scale graph that embodies local information (larger information) based on the first-scale graph matching results. Matching enables the number of second-scale graph matching to be reduced, thereby reducing the overall time-consuming of the matching process and improving task processing efficiency without affecting the matching results and subsequent task processing results.

According to some embodiments, step 105, performing graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the second matching result may include: in response to determining the first matching result For successful matching, the first sub-image of the second scale of the first data is matched with the second sub-image of the second scale of the second data. Wherein, the first matching result indicates that the first node in the graph representation of the first scale of the first data and the second node in the graph representation of the first scale of the second data successfully match, and the first subgraph may include the first data The second subgraph may include nodes having a subordinate relationship with the second node in the second scale graph representation of the second data.

Therefore, by first performing graph matching at the first scale, and then matching subgraphs of nodes whose graph matching results at the first scale indicate successful matching, there is no need to match parts of the graph representation that have a high probability of mismatch, and thus it is possible to Reduce the number of nodes and/or adjacent edges that need to be calculated for matching results without affecting the matching results and subsequent tasks. In the case of processing results, the overall time-consuming of the matching process is further reduced and the task processing efficiency is improved.

According to some embodiments, step 105, performing graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the second matching result may include: based on the attributes of the current node, and Whether the node of the first scale to which the current node has a subordinate relationship is successfully matched determines the matching result of the current node, where the current node is a node of the second scale. By considering vertical relationships in the low-scale graph matching process and using the matching results of high-scale nodes that have a subordinate relationship with the node as reference elements, the accuracy of the low-scale graph matching results can be improved.

In step 106, a corresponding method and logic for determining the multi-scale matching result based on the first matching result and the second matching result can be set according to requirements. In some embodiments, when both the first matching result and the second matching result are successful matches, the multi-scale matching result is determined to be a successful match. In some embodiments, when the low-scale second matching result successfully matches, the multi-scale matching result is determined to be a successful match. In some embodiments, the first matching result and the second matching result may be, for example, the matching degree of the graph representations at the first scale and the second scale, and the multi-scale matching result may be the matching based on the graph representations at the two scales. The calculation result of the degree, such as the average of the matching degree represented by the graph at two scales. It can be understood that the multi-scale matching results can also be determined in other ways, which are not limited here.

According to some embodiments, the task is a matching task, and step 107, determining the task processing result based on the multi-scale matching result may include: using the multi-scale matching result as the result of the final task.

According to some embodiments, the second data may be obtained from a database. Step 107: Determining the task processing result based on the multi-scale matching result may include: determining at least one second data that matches the first data based on the multi-scale matching result of the first data and multiple second data in the database; and based on the multi-scale matching result. At least one second data determines the task processing result. Therefore, through the above method, other types of tasks based on multi-scale graph representation can be converted into matching tasks represented by multi-scale graphs. In an example embodiment, the final task may be a recognition task, a matching task, or a search task implemented by matching means, and then the matched at least one second data may be directly used as the search result. In an example embodiment, the final task is a classification task, and the first data and the matched at least one second data can all be input into a model for the classification task, so that the model can use the at least one second data as a A reference for classification to complete the classification of the first data. In an example embodiment, the final task may be a generation task (for example, filling in the blanks of text or images), then the partially vacant first data and the matched at least one second data may all be input into a model for the generation task, Thus, the model uses at least one second data as a generated reference to complete the generation of the first data. In this way, the task can be completed with the help of data similar to the first data, compared to just using The first data input model provides the model with richer information, and can more accurately obtain the results of classification and generation tasks without increasing the complexity of the model.

Multi-scale graph representation, graph matching, and task processing of different types of data will be described below with reference to embodiments.

In an example embodiment, both the first data and the second data may be image data, the dense data may be a feature map obtained based on the corresponding image data, and the multiple dense nodes in the dense data may be multiple nodes in the feature map. pixels. A plurality of nodes at the second scale may be obtained by sparsifying the first data (eg, based on saliency). The attributes of these nodes may include the location of the node in the first data, and the corresponding feature vector of the node (for example, the feature vector of the node is determined based on the node's neighborhood in the feature map, or based on the node's corresponding local location in the first data). The image determines the feature vector of the node. The feature vector of the node can be used to describe the attributes of the neighborhood where the node is located, such as the direction field. If the first data is finger print data, the feature vector of the node can be used to describe the texture density of the neighborhood where the node is located. wait). Similarly, nodes of the second scale of the second data can be obtained. Multiple nodes of the first scale can be obtained by merging the nodes of the second scale of the first data. The attributes of these first nodes may also include the location of the node in the first data, and the feature vector corresponding to the node. Similarly, the nodes of the first scale of the second data can be obtained. The second scale may also include adjacent edges for establishing connections between multiple nodes of the second scale that have a subordinate relationship with the same node of the first scale, and the nodes of the first scale that have a subordinate relationship and the second scale Scale nodes can also have dependent edges between them. The attributes of adjacent edges and subordinate edges may include, for example, the relative position, angle, and length of the edge (for example, used to describe the force of the node it is connected to) between the edge and its corresponding node.

The multi-scale graph structure of image data can extract the geometric information of the target in the image (for example, the positional relationship of multiple targets in the image, or the positional relationship between different parts of the same target), while retaining rich detailed information ( For example, the feature vector of a node). The graph representation of different scales takes into account both the whole and the local part, and is more robust to defects, deformations, perspective changes, occlusions, attack samples, etc., and has stronger interpretability. By using such multi-scale graph representations for graph matching, and using graph matching to solve complex downstream tasks (downstream tasks such as image matching, image search, image classification, and image generation), more accurate and reliable results can be obtained. In addition, due to its multi-scale characteristics, when performing tasks such as image data retrieval and comparison, preliminary screening can be carried out based on high-scale graph representations, and then low-scale graph representations can be used for accurate retrieval and comparison. At the same time, based on Prior knowledge is used to constrain (e.g., geometric constraints) to obtain accurate results.

In an example embodiment, both the first data and the second data may be text data, the dense data may be text paragraphs, and the nodes in the dense data may be words/words in the text paragraphs. Understandably, in dense data Nodes can also be text features corresponding to these words/words. By sparsifying the first data and the second data, nodes of the first scale and the second scale can be obtained. These nodes can correspond to sentences, clauses, phrases, vocabulary and other text fragments of different scales in the text paragraph. . The attributes of these nodes may include, for example, the word embedding of the corresponding text fragment, and may also include its position in the text paragraph. For example, the adjacency edges between nodes can be used to reflect the relationship between different text fragments, and the subordinate edges can be used to reflect the subordinate relationship between text fragments of different scales.

The multi-scale graph representation of text data can extract the structural relationships and/or logical relationships between text fragments of different scales such as words, words, phrases, clauses, sentences, paragraphs, etc. in the text paragraph, and can retain the structural relationships in these texts. The text feature vector corresponding to the element enables better processing of various natural language processing tasks. The graph representation of different scales takes into account both the whole and the part, and is more robust to incomplete sentences, incomplete sentences, sentence deformations, different languages, etc. Downstream tasks can be text translation, text continuation, automatic question and answer, etc.

In an example embodiment, both the first data and the second data may be audio data, then the dense data may be a spectrogram of the audio data, and the nodes in the dense data may be pixels in the spectrogram. The nodes at the first scale may be, for example, multiple segment regions obtained by segmenting the spectrogram in the time-delay direction, and the nodes at the second scale may be, for example, feature points extracted from the spectrogram. There may be adjacent edges between the second nodes for connecting adjacent feature points.

The multi-scale graph representation of audio data can extract multiple segments in the time direction, as well as multiple feature points in each segment and the correlation between these feature points (for example, time distance, frequency domain distance), and retain The feature vectors corresponding to these feature points enable the problem caused by the randomness of different voices, intonations, speaking methods, and content to be solved when completing audio-related tasks, especially speech-related tasks. The graph representation of different scales takes into account both the whole and the part, and is more robust to incomplete speech, noise, etc. Downstream tasks can be speech translation, etc.

In some example embodiments, the first data and the second data can also be various types of complex data such as molecules, genes, proteins, sequences, etc., then the nodes in the dense data can be the smallest units of the corresponding data types, for example, atoms. , base pairs, amino acids, etc. The nodes represented by the graph can be consistent with dense data, or they can be higher-scale units, such as atomic groups, functional groups, segments composed of multiple base pairs (for example, coding regions and non-coding regions, or enhancements at lower scales). (e.g., promoter, exon, intron, terminator, etc.), amino acid sequence in protein, peptide chain, etc. Adjacent edges and subordinate edges between nodes can be used to reflect various relationships between units of the same scale (for example, chemical bonds, hydrogen bonds) and various relationships between units of different scales (for example, subordinate relationships). In addition, multi-scale graph representation can also reflect the structure of these data at different scales, such as the primary, secondary, tertiary, and quaternary structures of proteins. Downstream tasks can be molecular structure data, property/structure prediction of sequence data, etc.

The multi-scale graph representation of these complex data can represent its complex spatial structure and detailed information, and can reflect the different relationships between various types of units in the complex data. Therefore, the use of multi-scale graph representation can make full use of the above-mentioned features of complex data. information for matching tasks or other downstream tasks.

In some embodiments, other types of data can be converted into image data first, and then a multi-scale graph representation is generated based on the converted image data. In an example, other types of data such as audio data and text data can be converted into image data, and then a multi-scale graph representation can be extracted based on the image data, and then various downstream tasks can be completed based on the graph representation.

In some embodiments, graph matching can also be performed between graph representations of different types of data to accomplish specific cross-modal tasks.

Figure 6 shows a flow chart of a neural network training method 600 according to an embodiment of the present disclosure. The method 600 includes:

Step 601: Obtain first sample data and second sample data. The first sample data and the second sample data are respectively one of image data, audio data, text data, molecular structure data and sequence data;

Step 602: Obtain multi-scale graph representations of the first sample data and the second sample data, where the multi-scale graph representation is determined using the graph representation extraction network, and the multi-scale graph representation includes the graph representation of the first scale and the second scale graph representation. Graphical representation of scale;

Step 603: Perform graph matching on the graph representation of the first scale of the first sample data and the graph representation of the first scale of the second sample data to obtain a first current matching result that represents the matching degree of the first scale;

Step 604: Perform graph matching on the graph representation of the second scale of the first sample data and the graph representation of the second scale of the second sample data to obtain a second current matching result that represents the matching degree of the second scale;

Step 605: Obtain the target matching results and/or target task processing results of the first sample data and the second sample data;

Step 606: Determine the loss value based on the target matching result and/or the target task processing result, and the first current matching result and/or the second current matching result; and

Step 607: Train the graph representation extraction network based on the loss value.

According to the method of this embodiment, by using the loss value determined according to the graph matching result, the target matching result and/or the target task processing result to train the graph representation extraction network, the graph representation extraction network can be used to obtain accurate and suitable Multi-scale graph representation of downstream tasks can help downstream tasks obtain accurate task processing results.

It can be understood that the first sample data and the second sample data are similar to the first data and the second data described above. In steps 601 to 604, the first sample data and its multi-scale graph representation are obtained, and the first sample data and the second sample data are obtained. The operation of two-sample data and its multi-scale graph representation and the graph matching of graph representations of different scales are similar to the operations of steps 101 to 105 in Figure 1 and will not be described again here.

According to some embodiments, the graph representation of each scale in the multi-scale graph representation may include at least one node, the node may include attributes, and the attributes of the node may include scalar type attributes and vector type attributes. The graph representation of at least one scale in the multi-scale graph representation may include at least one adjacent edge. Each of the at least one adjacent edge is used to characterize the relative relationship between two nodes of the same scale. The adjacent edge has an attribute. Properties include scalar type properties and vector type properties.

According to some embodiments, the scalar type attributes of a node may include the node's saliency, label, and other attributes, the vector type attributes of the node may include feature vectors of the node, and the scalar type attributes of the adjacent edges may include adjacent edges. The saliency, label, and other attributes of the adjacent edge include the feature vector of the adjacent edge.

The result of graph matching can be the similarity of two graph representations. The similarity represented by two graphs can be the sum of the similarity of each node/edge, the sum of the significance of each node/edge * the similarity. In the example, the similarity of the node/edge can be determined according to its attributes. In this way, a supervision signal for each local feature (attribute of node/edge) can be generated, and a certain local feature can be trained separately.

The target matching result can be a matching or non-matching result, or a result that represents the degree of matching (for example, a matching degree of 99%); the target task can be a matching task, a retrieval task, a classification task, a recognition task, a fill-in-the-blank task, and others. Various data analysis and processing related tasks. When the target task is a matching task, the result of the target task is the target matching result.

In a specific implementation, the target matching result and/or the target task processing result may be determined based on the annotation of the sample data. For example, if the two sample data are marked as positive samples of each other, the target matching result is marked as "matching". For example, if the target task is a classification task of classifying sample images, the target task processing result can be marked as category "1". In this way, the matching results and/or the final task results can be annotated without the need to annotate the specific graph representation extracted by the graph representation extraction network.

According to some embodiments, the target matching result and/or the target task processing result may be determined according to one of the following: based on manual annotation, based on teacher model and/or pre-trained model, based on auxiliary constraint information, based on rules. .

Specifically, the target matching results and/or target task processing results can be manually annotated. It can be understood that manual annotation can be a data dimension rather than a scale dimension. For example, it can be marked whether the first data and the second data match, without marking whether a certain scale in the first data matches a certain scale in the second data. . In fact, if we know whether the first and second data match, we also know whether the various scales match. In this way, the label of the scale dimension can be obtained based on the label of the data dimension, which greatly increases the number of supervision signals.

In another specific implementation, the target matching result and/or the target task processing result may be determined according to the teacher model and/or the pre-trained model. Among them, the teacher model and the pre-training model can be models with certain reasoning capabilities that have been previously trained using a large amount of data. Such models can also be used to perform knowledge distillation to train the graph representation extraction network. For example, use the teacher model/pre-training model to extract multi-scale graph representations of the first data and the second scale, judge the matching results and/or task processing results based on the multi-scale graph representation, and judge the matching results based on the multi-scale graph representation. and/or the task processing result determines the target matching result and/or the target task processing result (for example, filtering the matching results or task processing with high confidence as the target matching result and/or the target task processing result).

In another specific implementation, the target matching result and/or the target task processing result may be determined based on rules. The rules in the rule-based approach can be determined based on prior knowledge. For example, a multi-scale graph representation of the first data and the second scale is extracted based on a specific rule, the matching result and/or the task processing result are judged based on the multi-scale graph representation, and the matching result and/or the task processing result is used as the target matching result and /or target task processing results.

It can be understood that other methods can also be used to obtain the target matching results and/or the target task processing results, which are not limited here.

According to some embodiments, the target matching result may be determined using the network that has undergone the Nth round of training, and the target matching result may be determined based on the network that has undergone the Nth round of training. As shown in Figure 7, step 601, obtaining the first sample data and the second sample data may include:

Step 701: Use the network that has undergone the Nth round of training to extract the respective multi-scale graph representations of the first unlabeled data and the second unlabeled data;

Step 702: Perform graph matching on the graph representation of the first scale of the first unlabeled data and the graph representation of the first scale of the second unlabeled data to obtain a first unlabeled data matching result that represents the matching degree of the first scale. ;

Step 703: Perform graph matching on the graph representation of the second scale of the first unlabeled data and the graph representation of the second scale of the second unlabeled data to obtain a second unlabeled data matching result that represents the matching degree of the second scale. ;

Step 704: Determine the unlabeled data matching result based on the first unlabeled data matching result and/or the second unlabeled data matching result;

Step 705: In response to determining that the first unlabeled data and the second unlabeled data satisfy the first condition, determine the first unlabeled data and the second unlabeled data as the first sample data and the second sample that are mutually positive samples. Data, wherein the first unlabeled data and the second unlabeled data satisfy the first condition, including the unlabeled data matching result satisfying the first matching condition, and the target matching result of the positive sample indicates the corresponding first sample data and second sample data match; and/or

Step 706: In response to determining that the first unlabeled data and the second unlabeled data satisfy the second condition, determine the first unlabeled data and the second unlabeled data as the first sample data and the second sample that are negative samples of each other. data, wherein the first unlabeled data and the second unlabeled data satisfy the second condition, including the unlabeled data matching result satisfying the second matching condition, and the target matching result of the negative sample indicates the corresponding first sample data and second sample data Mismatch.

The matching result of unlabeled data can be a floating point number or an integer. For example, the first matching result of unlabeled data is similarity, which is a floating point number; the matching result of the second unlabeled data is how many nodes/edges were matched, which is an integer. .

The first condition, the second condition, the first matching condition and the second matching condition can be set by the user. For example, the first matching condition can be that the first unlabeled data matching result is greater than 80% and the second unlabeled data matching result is greater than 5 nodes. /side. It can be understood that the more stringent the first matching condition and the second matching condition are set, the more reliable the target matching results corresponding to the positive samples/negative samples generated from the unlabeled data will be.

In addition to the requirements of matching conditions, when determining the first unlabeled data and the second unlabeled data as sample data, the first condition and the second condition may also have auxiliary condition requirements. Auxiliary conditions can be time and place conditions, expert secondary confirmation conditions, etc. For example, when the first unlabeled data and the second unlabeled data are image data, their shooting spatio-temporal information can be used as an auxiliary condition to determine whether they are positive samples/negative samples. For example, if the similarity between two images is high, the number of matching nodes/edges is large, and the shooting time and location are close, the probability that the two images contain the same object is greater, and the probability that the two images are positive samples of each other is greater. .

Therefore, the network that has passed the Nth round is used to generate positive samples and/or negative samples in the above way, so that the graph representation extraction network can use these samples for the N+1th round of training, and only needs to label a small amount of sample data to obtain the Nth round. After training, the model can use the model after the Nth round of training to obtain more sample data for further training, which greatly reduces the requirements for the amount of annotation during the model training process. Moreover, when positive samples and negative samples are generated at the same time, such positive samples and negative samples can be used for comparative learning to have the ability to extract accurate graph representation, while reducing the cost of obtaining samples.

In an example embodiment, in the multi-scale graph representation generated by the graph representation extraction network, the features on a certain node are not as good as other features (insufficient robustness), then in the graph matching of positive samples, the matching error will be mainly From the features of this node, the supervision signal will focus on the features of this node during training to enhance the robustness of this feature.

It can be understood that steps 701 to 703 in Figure 7 extract the respective multi-scale graph representations of the first unlabeled data and the second unlabeled data, and combine the differences between the first unlabeled data and the second unlabeled data. The operation of graph matching 1 based on the scale representation is similar to the operation of steps 101 to 105 in Figure 1 and will not be described again here.

In step 704, the unlabeled data matching result may be determined based on one or both of the first unlabeled data matching result and the second unlabeled data matching result. In an embodiment that is relatively strict on data quality, in response to determining that both the first unlabeled data matching result and the second unlabeled data matching result indicate a successful match, the unlabeled data matching result is determined to be a match. In some embodiments, when the first scale and the second scale meet specific matching conditions, the unlabeled data matching result is determined to be a match. For example, if the first scale similarity is greater than 80% and the second scale matches 5 nodes, the unlabeled data matching result is determined as a match. In some cases, cross-validation can be performed between different scales to generate more supervisory signals. In some embodiments, since the lower-scale graph representation matching results involve more detailed features and have higher credibility than macroscopic features, the unlabeled data can be matched when the lower-scale graph representation matching results indicate a successful match. The result is determined to be a match. In some embodiments, when the multi-scale graph representation includes three or more scales, the unlabeled data matching results may be determined as a match when the highest and lowest scale graph representation matching results indicate a successful match. In a more data-forgiving embodiment, unlabeled data matching results may be determined to be a match when the higher scale graph representation matching result indicates a successful match.

It is understood that cross-validation between different scales can also be performed in other ways to generate supervision signals, which is not limited here.

It can be understood that "the Nth round of training" means that the network has undergone at least one round of training and thus has a certain inference ability, but it is not intended to limit the specific number of training rounds of the network.

According to some embodiments, the loss value may include a match loss value and/or a task loss value. As shown in Figure 8, step 606, determining the loss value according to the target matching result and/or the target task processing result, and the first current matching result and/or the second current matching result may include: Step 801, according to the first current matching result result and/or the second current matching result, determine the matching loss value; and/or, step 802, determine the current task result according to the first current matching result and/or the second current matching result, and determine the current task result according to the target task processing result and the current Task results, determine the task loss value.

In some embodiments, it is possible to directly obtain the target matching result at a certain or certain scales, or the target matching result between multi-scale graph representations, and then the target matching result and the first current matching result and/or the second The current matching result determines the corresponding matching loss value, thereby generating a supervision signal to train the network.

In some embodiments, for example, in a fill-in-the-blank task, if the corresponding target task processing result can be obtained, the current task processing result can be determined according to the first current matching result and/or the second current matching result, and then the current task processing result can be determined according to the target task. The task processing result and the current task processing result determine the corresponding task loss value, thereby generating the corresponding supervision signal to train the network. For example, if there is a plurality of second data, the current matching results of the first data and the second data are determined according to the first current matching result and/or the second current matching result, and the first data and the plurality of second data are neutralized by the first The current matching result of the data is the matched second data input into the fill-in-the-blank network to obtain the task processing result. At this time, the target matching result may not be known, but the target task processing result is known, and the supervision signal can be determined based on the target task processing result.

According to some embodiments, step 801, determining the matching loss value according to the first current matching result and/or the second current matching result may include: determining the current matching result according to the first current matching result and/or the second current matching result; And determine the matching loss value based on the current matching result and the target matching result.

In some embodiments, if the multi-scale graph can be directly obtained to represent the direct target matching result, the current matching result can be determined based on the first current matching result and/or the second current matching result, and then the current matching result can be determined based on the target matching result. The result determines the corresponding matching loss value, thereby generating the corresponding supervision signal to train the network.

According to some embodiments, the graph representation extraction network may include a first network for extracting a graph representation at a first scale. In some embodiments, step 606, determining the matching loss value according to the first current matching result and/or the second current matching result may include: determining the first scale matching loss value according to the target matching result and the first current matching result. Step 607: Training the graph representation extraction network according to the loss value may include: matching the loss value according to the first scale and training the first network.

According to some embodiments, the graph representation extraction network may include a second network for extracting a graph representation at a second scale. In some embodiments, step 606, determining the matching loss value according to the first current matching result and/or the second current matching result may include: determining the second scale matching loss value according to the target matching result and the second current matching result. Step 607, training the graph representation extraction network according to the loss value may include: matching the loss value according to the second scale, and training the second network. From this, the loss values of the first scale and the second scale can be calculated separately, and the corresponding network models can be trained respectively.

According to some embodiments, the graph representation extraction network may include at least one of the following: a network module for determining a scalar-type attribute of a node; a network module for determining a vector-type attribute of a node; and a scalar module for determining adjacent edges. A network module for properties of a type; and a network module for determining properties of a vector type of an adjacent edge. It can be understood that the graph representation extraction network may also include a feature extraction network module that obtains dense data from raw data. The loss value can act on the differentiable parts corresponding to these network modules, thereby achieving the training of these network modules.

According to some embodiments, the aforementioned sparsification module for sparsifying dense data to obtain sparse nodes, Merging low-scale nodes obtained by sparsification to obtain high-scale nodes is also implemented through neural networks. Correspondingly, the graph representation extraction network includes at least one of the following: used to sparse dense data to obtain sparse nodes. The sparsification module; the merging module that merges the low-scale nodes obtained by sparsification to obtain high-scale nodes.

According to some embodiments, nodes are obtained according to the sparsification module, the nodes are connected to form adjacent edges, and the adjacent edges whose significance is greater than the threshold are determined as retained adjacent edges through the network module used to determine the significance attribute of the adjacent edges. According to the determined nodes The module for /edge vector type attributes extracts attributes of node/edge vector type.

According to some embodiments, both nodes and edges include determination modules and attribute extraction modules. The determination module is used to determine nodes/edges. The node determination module may include a sparsification module (for example, it may be a detection module or a saliency module) or a merging module. The edge determination module may include a saliency module; the attribute extraction module may be a determination module. Modules with properties other than salience. According to some embodiments, these modules are network modules.

In one embodiment, the current matching result can be obtained based on the matching degree of the two graph representations. The matching degree represented by the graph can be expressed as: the sum of the matching degrees of all candidate matching point pairs and the matching degrees of all candidate matching adjacent edge pairs, where the matching degree of the candidate matching point pair is the significance of the first candidate matching point. , the significance of the second candidate matching point, and the product of the similarity between the feature vector of the first candidate matching point and the feature vector of the second candidate matching point. The matching degree of the candidate matching adjacent edge pair is the first candidate matching adjacent edge pair. The product of the saliency, the saliency of the second candidate matching adjacent edge, and the similarity of the feature vector of the first candidate matching adjacent edge and the feature vector of the second candidate matching adjacent edge. Therefore, through the above method, unmatched nodes/edges will be weakened, so that stable and reliable local features can be retained at different scales.

According to some embodiments, the graph representation extraction network may include a rule module and a network module. The rule module may be, for example, a rule-based module that utilizes prior knowledge. Such modules can be used without training, but their accuracy is poor compared to trained network modules, and they have poor robustness, strong limitations, and are usually difficult to train or optimize. Although the trained network module can output accurate results, has a wider adaptability range and is robust, it is difficult to converge quickly when the training is difficult.

According to some embodiments, as shown in Figure 9, the training method 900 further includes at least one of the following steps: Step 901, in response to determining that the fifth preset condition is met, replace the first rule module in the rule module with a network module; And step 902, in response to determining that the sixth preset condition is met, add a network module to the graph representation extraction network. The operations of steps 903 to 909 in Figure 9 are similar to the operations of steps 601 to 607 in Figure 6, and are not limited here. Step 909, training the graph representation extraction network according to the loss value may include: training the network module according to the loss value.

In some embodiments, in the initial stage of training, rule modules can be used in some links in the graph representation extraction network, and network modules can be used in other parts of the network to train these network modules. After this part of the network module converges, you can add more network modules, or replace the rule module with a network module and continue training to improve the performance of the network. In this way, it can not only make full use of prior knowledge, but also prompt the speed and effect of network training.

In some embodiments, the fifth preset condition and the sixth preset condition may be, for example, a specific number of training rounds, the current matching accuracy of the network, or other preset conditions such as convergence speed, trend, etc. . It can be understood that those skilled in the art can determine the fifth preset condition and the sixth preset condition by themselves according to needs, which are not limited here.

Figure 10 shows a structural block diagram of a task processing device 1000 according to an embodiment of the present disclosure. The device 1000 includes: a first acquisition unit 1010 configured to acquire first data and second data, the first data and the second data. The two data are respectively one of image data, audio data, text data, molecular structure data, and sequence data; the second acquisition unit 1020 is configured to acquire a first-scale graphic representation of each of the first data and the second data, The graph representation of the first scale includes at least one node of the first scale, wherein the node of the first scale has attributes, and the attributes of the nodes of the first scale include attributes of vector type; the third obtaining unit 1030 is configured to obtain the first A graph representation of a second scale respectively of the data and the second data, the second scale being lower than the first scale, the graph representation of the second scale including at least one node of the second scale, wherein the node of the second scale has an attribute, and the graph representation of the second scale The attributes of the scale nodes include attributes of vector type, wherein at least one scale node of each of the first data and the second data is obtained by sparsifying the dense data corresponding to the data, each The graph representation of at least one scale of the data includes at least one adjacent edge, each of the at least one adjacent edge is used to characterize the relative relationship between two nodes of the same scale, and the adjacent edge has attributes; the first graph matching unit 1040 is Configured to perform graph matching on the graph representation of the first scale of the first data and the graph representation of the first scale of the second data to obtain a first matching result; the second graph matching unit 1050 is configured to graph the first data The graph representation of the second scale and the graph representation of the second data at the second scale are graph matched to obtain a second matching result; the first determination unit 1060 is configured to determine multiple scale matching results; and the second determination unit 1070 is configured to determine the task processing result based on the multi-scale matching results.

It can be understood that the operations of units 1010 to 1070 in the device 1000 are similar to the operations of steps 101 to 107 in the method 100, and will not be described again.

Figure 11 shows a structural block diagram of a neural network training device 1100 according to an embodiment of the present disclosure. The device 1100 includes: a fourth acquisition unit 1110 configured to acquire first sample data and second sample data, No. The first sample data and the second sample data are respectively one of image data, audio data, text data, molecular structure data, and sequence data; the fifth acquisition unit 1120 is configured to acquire the first sample data and the second sample data. Respective multi-scale graph representation, wherein the multi-scale graph representation is determined using the graph representation extraction network, and the multi-scale graph representation includes a graph representation of the first scale and a graph representation of the second scale; the third graph matching unit 1130 is configured To perform graph matching on the graph representation of the first scale of the first sample data and the graph representation of the first scale of the second sample data to obtain a first current matching result that represents the matching degree of the first scale; the fourth graph matching Unit 1140 is configured to perform graph matching on the graph representation of the second scale of the first sample data and the graph representation of the second scale of the second sample data to obtain a second current matching result that represents the matching degree of the second scale. ; The seventh obtaining unit 1150 is configured to obtain the target matching result and/or the target task processing result of the first sample data and the second sample data; the third determining unit 1160 is configured to obtain the target matching result and/or target task processing result according to The task processing result, and the first current matching result and/or the second current matching result determine the loss value; and the training unit 1170 is configured to train the graph representation extraction network according to the loss value.

It can be understood that the operations of units 1111 to 1170 in the device 1100 are similar to the operations of steps 601 to 607 in the method 600, and will not be described again here.

According to embodiments of the present disclosure, an electronic device, a readable storage medium, and a computer program product are also provided.

Illustrative examples of such electronic devices, non-transitory computer-readable storage media, and computer program products are described below in conjunction with FIG. 12 .

Figure 12 illustrates an example configuration of an electronic device 1200 that may be used to implement the methods described herein. Each of the above-described apparatus 1000 and apparatus 1100 may also be fully or at least partially implemented by an electronic device 1200 or similar device or system.

Electronic device 1200 may be a variety of different types of devices. Examples of electronic devices 1200 include, but are not limited to: desktop computers, server computers, laptop or netbook computers, mobile devices (e.g., tablet computers, cellular or other wireless phones (e.g., smartphones), notepad computers, mobile stations), Wearable devices (eg, glasses, watches), entertainment devices (eg, entertainment appliances, set-top boxes communicatively coupled to display devices, game consoles), televisions or other display devices, automotive computers, and the like.

Electronic device 1200 may include at least one processor 1202 , memory 1204 , communication interface(s) 1206 , display device 1208 , other input/output (I/O) devices capable of communicating with each other, such as through system bus 1214 or other suitable connections. 1210 and one or more mass storage devices 1212.

Processor 1202 may be a single processing unit or multiple processing units, and all processing units may include single or multiple computing unit or multiple cores. Processor 1202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any device that manipulates signals based on operating instructions. Among other capabilities, processor 1202 may be configured to retrieve and execute computer-readable instructions, such as program code for operating system 1216 , applications 1218 , stored in memory 1204 , mass storage device 1212 , or other computer-readable media. program codes, program codes of other programs 1220, etc.

Memory 1204 and mass storage device 1212 are examples of computer-readable storage media for storing instructions executed by processor 1202 to implement the various functions previously described. For example, memory 1204 may generally include both volatile memory and non-volatile memory (eg, RAM, ROM, etc.). Additionally, mass storage devices 1212 may generally include hard drives, solid state drives, removable media including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network attached storage , storage area network, etc. Memory 1204 and mass storage device 1212 may both be collectively referred to herein as memory or computer-readable storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code, so The computer program code described may be executed by processor 1202 as a particular machine configured to perform the operations and functions described in the examples herein.

Multiple programs may be stored on the mass storage device 1212. These programs include an operating system 1216, one or more applications 1218, other programs 1220, and program data 1222, and they may be loaded into memory 1204 for execution. Examples of such applications or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the components/functions of: method 100 , method 600 , and/or method 900 (including method 100 , method 600 , any suitable step of method 900), and/or additional embodiments described herein.

Although illustrated in FIG. 12 as being stored in memory 1204 of electronic device 1200 , modules 1216 , 1218 , 1220 , and 1222 , or portions thereof, may be implemented using any form of computer-readable media accessible by electronic device 1200 . As used herein, "computer-readable media" includes at least two types of computer-readable media, namely, computer-readable storage media and communication media.

Computer-readable storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, programs module or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD), or other optical storage devices, magnetic cassettes, tapes, disk storage devices or other magnetic storage devices device, or any other non-transmission medium that can be used to store information for access by an electronic device. In contrast, a communication medium may consist of a modulated data signal such as a carrier wave or other transport mechanism Concretely implement computer readable instructions, data structures, program modules or other data. Computer-readable storage media, as defined herein, does not include communications media.

One or more communication interfaces 1206 are used to exchange data with other devices, such as over a network, direct connection, etc. Such communication interface may be one or more of the following: any type of network interface (e.g., Network Interface Card (NIC)), wired or wireless (such as IEEE 802.11 Wireless LAN (WLAN)) wireless interface, global microwave Access interoperability (Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, BluetoothTM interface, Near Field Communication (NFC) interface, etc. Communication interface 1206 can facilitate communications within a variety of network and protocol types, including wired networks (eg, LAN, cable, etc.) and wireless networks (eg, WLAN, cellular, satellite, etc.), the Internet, and so on. Communication interface 1206 may also provide communication with external storage devices (not shown) such as in a storage array, network attached storage, storage area network, and the like.

In some examples, a display device 1208, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 1210 may be devices that receive various inputs from the user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/ Output devices and so on.

The techniques described herein may be supported by these various configurations of electronic device 1200 and are not limited to specific examples of the techniques described herein. For example, this functionality can also be implemented in whole or in part on the "cloud" through the use of distributed systems. A cloud includes and/or represents a platform for resources. The platform abstracts the underlying functionality of the cloud's hardware (e.g., servers) and software resources. Resources may include applications and/or data that may be used while performing computing processing on a server remote from electronic device 1200 . Resources may also include services provided over the Internet and/or through subscriber networks such as cellular or Wi-Fi networks. The platform can abstract resources and functionality to connect electronic device 1200 with other electronic devices. Therefore, implementation of the functionality described in this article can be distributed throughout the cloud. For example, functionality may be implemented partly on the electronic device 1200 and partly through a platform that abstracts the functionality of the cloud.

While the present disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and illustrative rather than restrictive; the disclosure is not limited to what is disclosed. Example. By studying the drawings, the disclosure, and the appended claims, those skilled in the art will be able to understand and implement variations to the disclosed embodiments in practicing the claimed subject matter. In the claims, the word "comprising" does not exclude other elements or steps not listed, the indefinite article "a" or "an" does not exclude a plurality, and the term "plurality" means two or more, and the term "based on" shall be construed to mean "based at least in part on." The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims

A task processing method including:

Obtain first data and second data, where the first data and the second data are respectively one of image data, audio data, text data, molecular structure data and sequence data;

Obtaining a graph representation of a first scale of each of the first data and the second data, the graph representation of the first scale including at least one node of the first scale, wherein the node of the first scale has an attribute, The attributes of the nodes of the first scale include attributes of vector type;

Obtaining a graph representation of a second scale of each of the first data and the second data, the second scale being lower than the first scale, the graph representation of the second scale including at least one node of the second scale , wherein the nodes of the second scale have attributes, and the attributes of the nodes of the second scale include vector type attributes,

Wherein, the nodes of at least one scale of each of the first data and the second data are obtained by sparsifying the dense data corresponding to the data, and the at least one scale of each of the data The graph representation includes at least one adjacent edge, each of the at least one adjacent edge is used to characterize the relative relationship between two nodes of the same scale, and the adjacent edge has an attribute;

Perform graph matching on the graph representation of the first scale of the first data and the graph representation of the first scale of the second data to obtain a first matching result;

Perform graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain a second matching result;

determining a multi-scale matching result based on the first matching result and the second matching result; and

Based on the multi-scale matching results, task processing results are determined.
The method of claim 1, wherein each multi-scale graph representation of the data includes a graph representation of the first scale and a graph representation of the second scale, the multi-scale graph representation of the data including at least one dependent edge, Each subordinate edge in the at least one subordinate edge is used to represent a subordinate relationship between two nodes at different scales, and the subordinate edge has an attribute.
The method of claim 2, wherein the attributes of the dependent edge are determined based on attributes of two nodes connected to the dependent edge.
The method according to any one of claims 1-3, wherein at least one of the graphical representation of the first scale and the graphical representation of the second scale of each data satisfies at least one of the following:

The attributes of nodes of this scale include attributes of scalar type;

The properties of the adjacent edges of this scale include properties of scalar type;

The attributes of the subordinate edges of this scale include attributes of scalar type;

The attributes of the adjacent edges of the scale include attributes of vector type; and

The attributes of the subordinate edges of this scale include attributes of vector type.
The method according to any one of claims 1-4, wherein the graph representation of the first scale of each of the first data and the second data is generated using a first network and/or the first The second scale graph representation of each of the data and said second data is generated using a second network.
The method according to any one of claims 1-5, wherein the vector type attribute of the node includes a feature vector, and wherein the graph matching process of each scale includes:

A candidate matching point pair is determined based on at least one node included in the graph representation of the scale of the first data and at least one node included in the graph representation of the scale of the second data, wherein the candidate matching point pair comprising a first candidate matching node belonging to the graph representation of the scale of the first data and a second candidate matching node belonging to the graph representation of the scale of the second data;

For the candidate matching point pair, the candidate matching is determined based on the feature vector of the first candidate matching node included in the candidate matching point pair and the feature vector of the second candidate matching node included in the candidate matching point pair. Matching results of point pairs;

Based on the matching result of the candidate matching point pair, determine the matching result of the graph representation of the scale of the first data and the graph representation of the scale of the second data;

and / or,

A candidate matching edge pair is determined based on at least one adjacent edge included in the graph representation of the scale of the first data and at least one adjacent edge included in the graph representation of the scale of the second data, wherein the candidate matching edge pair An edge pair includes a first candidate matching adjacency edge belonging to the graph representation of the scale of the first data and a second candidate matching adjacency edge belonging to the graph representation of the scale of the second data;

For the candidate matching edge pair, based on the attributes of the first candidate matching adjacent edge included in the candidate matching edge pair and the attribute of the second candidate matching adjacent edge included in the candidate matching edge pair to determine the matching result of the candidate matching edge pair; and

Based on the matching result of the candidate matching edge pair, a matching result of the graph representation of the scale of the first data and the graph representation of the scale of the second data is determined.
The method according to claim 6, wherein the attributes of the node further include attributes of a scalar type, and wherein determining the matching result of the candidate matching point pair includes:

The first candidate matching point pair is determined based on the scalar type attribute of the first candidate matching node included in the candidate matching point pair and the scalar type attribute of the second candidate matching node included in the candidate matching point pair. One-point matching results;

In response to determining that the first point pair matching result of the candidate matching point pair satisfies the first preset condition, based on the feature vector of the first candidate matching node included in the candidate matching point pair and the feature vector included in the candidate matching point pair The feature vector of the second candidate matching node, determines the second point pair matching result of the candidate matching point pair; and

Based on the second point pair matching result, determine the matching result of the candidate matching point pair,

and / or

Wherein, the attributes of the adjacent edges include scalar type attributes and vector type attributes, and the vector type attributes of the adjacent edges include feature vectors, wherein determining the matching result of the candidate matching edge pair includes:

The candidate matching edge pair is determined based on an attribute of a scalar type of a first candidate matching adjacent edge included in the candidate matching edge pair and an attribute of a scalar type of a second candidate matching adjacent edge included in the candidate matching edge pair. The first edge pair matching result;

In response to determining that the first edge pair matching result of the candidate matching edge pair satisfies the second preset condition, based on the feature vector of the first candidate matching adjacent edge included in the candidate matching edge pair and the candidate matching edge pair. including the feature vector of the second candidate matching adjacent edge, determining the second edge pair matching result of the candidate matching edge pair; and

Based on the second edge pair matching result, a matching result of the candidate matching edge pair is determined.
The method of claim 6, wherein the scalar-type attribute of a node includes the saliency of the node, and/or the scalar-type attribute of the adjacent edge includes the saliency of the adjacent edge.
The method according to claim 8, wherein determining the matching result of the candidate matching point pair includes:

The significance of the first candidate matching node included in the candidate matching point pair, the significance of the second candidate matching node included in the candidate matching point pair, and the feature vector of the first candidate matching node are summed The product of the three similarities between the feature vectors of the second candidate matching result is determined as the matching result of the candidate matching point pair, and/or,

Wherein, determining the matching result of the candidate matching edge pair includes:

The significance of the first candidate matching adjacent edge included in the candidate matching edge pair, the significance of the second candidate matching adjacent edge included in the candidate matching edge pair, and the characteristics of the first candidate matching adjacent edge The product of the similarity between the vector and the feature vector of the second candidate matching adjacent edge is determined as the matching result of the candidate matching edge pair.
The method according to any one of claims 1 to 9, wherein graph matching is performed on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the first Two matching results include:

In response to determining that the first matching result is a successful match, graph matching is performed on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain a second matching result.
The method according to any one of claims 1 to 9, wherein graph matching is performed on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the first Two matching results include:

In response to determining that the first matching result is a successful match, matching the first subgraph of the second scale of the first data and the second subgraph of the second scale of the second data, wherein: The first matching result indicates that the first node in the graph representation of the first scale of the first data and the second node in the graph representation of the first scale of the second data are successfully matched, and the first subgraph includes The second subgraph includes a node that has a subordinate relationship with the first node in the graph representation of the second scale of the first data, and the second subgraph includes a node that has a subordinate relationship with the second node in the graph representation of the second scale of the second data. node.
The method according to any one of claims 1 to 9, wherein graph matching is performed on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain the first Two matching results include:

The matching result of the current node is determined based on the attributes of the current node and whether the node of the first scale that has a subordinate relationship with the current node is successfully matched, wherein the current node is a node of the second scale.
The method according to any one of claims 1 to 12, wherein the at least one first scale node and the at least one second scale node are obtained by sparsifying the same dense data respectively,

and / or,

Wherein, the dense data includes multiple scales, and the at least one first scale node and the at least one second scale node are obtained by respectively sparsifying two scales among the multiple scales of the dense data. of.
The method according to any one of claims 1-12, wherein the nodes of at least one scale of each data are obtained by sparsifying part of the data in the dense data corresponding to the node position of another scale. , the nodes of another scale are obtained by sparsifying the dense data.
The method according to any one of claims 1 to 12, wherein nodes of at least one scale of each data are obtained by merging nodes of another scale, and the nodes of another scale are obtained by merging nodes of another scale. The dense data is obtained by sparsifying it.
The method according to any one of claims 13-15, wherein the dense data includes a plurality of dense nodes, the dense nodes have attributes, and the attributes of the dense nodes include scalar type attributes and vector type attributes. , the scalar type attribute of the dense node includes significance, and the vector type attribute of the dense node includes a feature vector,

Wherein, the significance of the dense node is determined according to the feature vector of the dense node, and wherein sparsifying the dense data includes making the significance of at least a part of the dense nodes among the plurality of dense nodes satisfy The nodes of the third preset condition are determined to be sparse nodes.
The method according to any one of claims 13-15, wherein the dense data includes a plurality of dense nodes, the dense nodes have attributes,

Wherein, the attributes of the nodes obtained by sparsification are determined based on the attributes of at least a part of the dense nodes corresponding to the node among the plurality of dense nodes,

and / or,

Wherein, the nodes of at least one scale are obtained by merging the nodes of another scale, and the attributes of the merged nodes are determined based on the attributes of nodes of the nodes of another scale that have a subordinate relationship with the node.
The method according to any one of claims 1 to 17, wherein the at least one adjacency edge is determined according to respective attributes of at least one node of the same scale, wherein each of the at least one adjacency edge The attributes of the edge are determined based on at least one of the attributes of the two nodes connected by the adjacent edge and the relative relationship between the two nodes.
The method of claim 18, wherein the at least one adjacent edge is determined by performing the following steps:

Determine at least one candidate adjacency edge based on at least one node of the same scale;

determining respective saliencies of the at least one candidate adjacent edge based on respective attributes of the at least one node of the same scale; and

The adjacent edge whose significance satisfies the fourth preset condition among the at least one candidate adjacent edge is determined as the at least one adjacent edge.
The method according to any one of claims 1-19, wherein the second data is obtained from a database, wherein determining the task processing result based on the multi-scale matching result includes:

determining at least one second data that matches the first data based on a multi-scale matching result of the first data and a plurality of second data in the database; and

Based on the at least one second data, a task processing result is determined.
The method according to any one of claims 1-20, wherein the first data and the second data are image data, and the dense data is a feature map obtained based on the corresponding image data, so The multiple dense nodes in the dense data are multiple pixels in the feature map.
A neural network training method, the method includes:

Obtain first sample data and second sample data, where the first sample data and the second sample data are respectively one of image data, audio data, text data, molecular structure data and sequence data;

Obtain a multi-scale graph representation of each of the first sample data and the second sample data, wherein the multi-scale graph representation is determined using a graph representation extraction network, and the multi-scale graph representation includes a first scale The figure represents the sum of Graphical representation of two scales;

Perform graph matching on the graph representation of the first scale of the first sample data and the graph representation of the first scale of the second sample data to obtain a first current matching result that represents the matching degree of the first scale;

Perform graph matching on the graph representation of the second scale of the first sample data and the graph representation of the second scale of the second sample data to obtain a second current matching result that represents the matching degree of the second scale;

Obtain the target matching results and/or target task processing results of the first sample data and the second sample data;

Determine a loss value according to the target matching result and/or the target task processing result, and the first current matching result and/or the second current matching result; and

Based on the loss value, the graph representation extraction network is trained.
The method of claim 22, wherein the loss value includes a matching loss value and/or a task loss value,

Wherein, determining the loss value according to the target matching result and/or the target task processing result, and the first current matching result and/or the second current matching result includes:

Determine the matching loss value according to the first current matching result and/or the second current matching result and the target matching result;

and / or,

Determine the current task result according to the first current matching result and/or the second current matching result; and

The task loss value is determined based on the target task processing result and the current task result.
The method according to claim 23, wherein determining the matching loss value according to the first current matching result and/or the second current matching result includes:

Determine the current matching result according to the first current matching result and/or the second current matching result; and

Determine the matching loss value based on the current matching result and the target matching result.
The method according to claim 23, wherein the graph representation extraction network includes a first network for extracting a graph representation of a first scale, wherein according to the first current matching result and/or the second current Matching results, determining the matching loss value include:

Determine a first scale matching loss value according to the target matching result and the first current matching result;

Wherein, according to the loss value, training the graph representation extraction network includes:

Train the first network according to the first scale matching loss value,

and / or,

Wherein, the graph representation extraction network includes a second network for extracting a graph representation of a second scale, wherein determining the matching loss value according to the first current matching result and/or the second current matching result includes:

Determine a second scale matching loss value according to the target matching result and the second current matching result;

Wherein, according to the loss value, training the graph representation extraction network includes:

The second network is trained according to the second scale matching loss value.
The method according to any one of claims 22-25, wherein the target matching result and/or the target task processing result is determined according to one of the following:

Based on manual annotation, teacher model and/or pre-trained model, auxiliary constraint information, rule-based approach.
The method according to any one of claims 22-26, wherein the target matching result is determined using a network that has undergone the Nth round of training, wherein obtaining the first sample data and the second sample data includes:

Using the network that has undergone the Nth round of training to extract the respective multi-scale graph representations of the first unlabeled data and the second unlabeled data;

Perform graph matching on the graph representation of the first scale of the first unlabeled data and the graph representation of the first scale of the second unlabeled data to obtain a first unlabeled data match that represents the matching degree of the first scale. result;

Perform graph matching on the graph representation of the second scale of the first unlabeled data and the graph representation of the second scale of the second unlabeled data to obtain a second unlabeled data match that represents the matching degree of the second scale. result;

Determine the unlabeled data matching result according to the first unlabeled data matching result and/or the second unlabeled data matching result;

In response to determining that the first unlabeled data and the second unlabeled data satisfy a first condition, determining the first unlabeled data and the second unlabeled data as first sample data that are positive samples and second sample data, wherein the first unlabeled data and the second unlabeled data satisfying the first condition include the matching result of the unlabeled data satisfying the first matching condition, and the target matching result of the positive sample indicates the corresponding The first sample data and the second sample Data matching; and/or

In response to determining that the first unlabeled data and the second unlabeled data satisfy a second condition, determining the first unlabeled data and the second unlabeled data as first sample data as negative samples and second sample data, wherein the first unlabeled data and the second unlabeled data satisfying the second condition include the matching result of the unlabeled data satisfying the second matching condition, and the target matching result of the negative sample indicates the corresponding The first sample data and the second sample data do not match.
The method according to claim 27, wherein determining the unlabeled data matching result according to the first unlabeled data matching result and/or the second unlabeled data matching result includes:

In response to determining that the second unlabeled data matching result indicates that the graph representation of the second scale of the first unlabeled data and the graph representation of the second scale of the second unlabeled data are successfully matched, the unlabeled data is The data matching result is determined to be a match.
The method according to any one of claims 22-28, wherein the graph representation extraction network includes a rule module and a network module, wherein the method further includes at least one of the following steps:

In response to determining that the fifth preset condition is met, replace the first rule module in the rule modules with a network module; and

In response to determining that the sixth preset condition is met, adding a network module to the graph representation extraction network,

Wherein, according to the loss value, training the graph representation extraction network includes:

According to the loss value, the network module is trained.
The method of any one of claims 22-29, wherein each scale of the multi-scale graph representation includes at least one node, the node includes an attribute, the attribute of the node includes a scalar type properties and vector type properties,

Wherein, at least one scale graph representation in the multi-scale graph representation includes at least one adjacent edge, and each adjacent edge in the at least one adjacent edge is used to represent the relative relationship between two nodes of the same scale, and the adjacent edge Edges have attributes, and the attributes of adjacent edges include attributes of scalar type and attributes of vector type,

Wherein, the graph representation extraction network includes at least one of the following:

Network module for determining properties of scalar types of nodes;

Network module for determining attributes of vector types of nodes;

a network module for determining properties of scalar type for adjacent edges; and

Network module for determining properties of vector types of adjacent edges.
The method according to claim 29, wherein the scalar type attribute of the node includes the significance of the node, and/or the vector type attribute of the node includes the feature vector of the node, and/or the scalar type attribute of the adjacent edge includes adjacency. Edge saliency, and/or vector-type properties of adjacent edges include eigenvectors of adjacent edges.
A task processing device including:

A first acquisition unit configured to acquire first data and second data, where the first data and the second data are respectively one of image data, audio data, text data, molecular structure data, and sequence data;

The second acquisition unit is configured to acquire a graph representation of a first scale of each of the first data and the second data, where the graph representation of the first scale includes at least one node of the first scale, wherein the The nodes of the first scale have attributes, and the attributes of the nodes of the first scale include attributes of vector type;

A third acquisition unit configured to acquire a graph representation of a second scale of each of the first data and the second data, the second scale being lower than the first scale, and the graph representation of the second scale including at least one node of a second scale, wherein the node of the second scale has attributes, and the attributes of the nodes of the second scale include attributes of a vector type,

Wherein, the nodes of at least one scale of each of the first data and the second data are obtained by sparsifying the dense data corresponding to the data, and the at least one scale of each of the data The graph representation includes at least one adjacent edge, each of the at least one adjacent edge is used to characterize the relative relationship between two nodes of the same scale, and the adjacent edge has an attribute;

A first graph matching unit configured to perform graph matching on the graph representation of the first scale of the first data and the graph representation of the first scale of the second data to obtain a first matching result;

A second graph matching unit configured to perform graph matching on the graph representation of the second scale of the first data and the graph representation of the second scale of the second data to obtain a second matching result;

A first determining unit configured to determine a multi-scale matching result based on the first matching result and the second matching result; and

The second determination unit is configured to determine the task processing result based on the multi-scale matching result.
A neural network training device, the method includes:

The fourth acquisition unit is configured to acquire first sample data and second sample data, which are image data, audio data, text data, molecular structure data and sequence respectively. one of the data;

The fifth acquisition unit is configured to acquire a multi-scale graph representation of each of the first sample data and the second sample data, wherein the multi-scale graph representation is determined using a graph representation extraction network, and the multi-scale graph representation is The scale graph representation includes the graph representation of the first scale and the graph representation of the second scale;

A third graph matching unit configured to perform graph matching on the graph representation of the first scale of the first sample data and the graph representation of the first scale of the second sample data to obtain a match representing the first scale. The first current matching result of degree;

A fourth graph matching unit configured to perform graph matching on the graph representation of the second scale of the first sample data and the graph representation of the second scale of the second sample data to obtain a match representing the second scale. The second current matching result of degree;

A seventh acquisition unit configured to acquire target matching results and/or target task processing results of the first sample data and the second sample data;

A third determination unit configured to determine a loss value according to the target matching result and/or the target task processing result, and the first current matching result and/or the second current matching result; and

A training unit configured to train the graph representation extraction network according to the loss value.
An electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform any one of claims 1-31 Methods.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1-31.
A computer program product comprising a computer program, wherein the computer program implements the method of any one of claims 1-31 when executed by a processor.