Detailed Description
The following description of the embodiments of the present disclosure will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the disclosure. All other embodiments, which can be made by one of ordinary skill in the art without undue burden based on the embodiments of the present disclosure, are also within the scope of the present disclosure.
As used in this disclosure and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Aiming at the technical problems, the technical concept of the present disclosure is to extract semantic features from digital collection search query and alternative digital collection text description by adopting a natural language processing technology based on deep learning, so as to intelligently realize semantic matching between the digital collection search query and the alternative digital collection text description.
Based on this, fig. 1 shows a flowchart of a data processing method for digital collection management according to an embodiment of the present disclosure. Fig. 2 shows an architectural diagram of a data processing method for digital collection management according to an embodiment of the present disclosure. As shown in fig. 1 and 2, a data processing method for digital collection management according to an embodiment of the present disclosure includes the steps of: s110, acquiring digital collection retrieval query input by a user; s120, extracting text description of the alternative digital collection from the database; s130, carrying out joint semantic analysis on the digital collection search query and the text description of the alternative digital collection to obtain a search query-alternative digital collection semantic matching feature vector; and S140, determining whether to return the alternative digital collection based on the search query-alternative digital collection semantic matching feature vector.
Specifically, in the technical scheme of the present disclosure, firstly, a digital collection search query input by a user is obtained; at the same time, a textual description of the alternative digital collection is extracted from the database.
In digital collection management systems, the search query entered by the user is typically in the form of natural language, and the textual description of the alternative digital collection also contains rich semantic information. Merely by keyword matching or simple text similarity calculation, the query intent of the user and the content features of the digital collection often cannot be sufficiently captured. Therefore, in the technical solution of the present disclosure, it is expected that the digital collection search query and the text description of the alternative digital collection are subjected to joint semantic analysis to obtain a search query-alternative digital collection semantic matching feature vector. In this way, the model is enabled to better understand the semantic relationships between them for deep understanding and matching.
In a specific example of the present disclosure, the encoding process for performing joint semantic analysis on the digital collection search query and the text description of the candidate digital collection to obtain a search query-candidate digital collection semantic matching feature vector includes: firstly, carrying out semantic coding on the digital collection search query to obtain a digital collection search query semantic coding feature vector; simultaneously, carrying out semantic coding on the text description of the alternative digital stock to obtain a semantic coding feature vector of the text description of the alternative digital stock; and then using an attention layer between features to perform feature interaction based on an attention mechanism on the alternative digital collection text description semantic coding feature vector and the digital collection search query semantic coding feature vector so as to obtain a search query-alternative digital collection semantic matching feature vector.
It is worth mentioning that the goal of the traditional attention mechanism is to learn an attention weight matrix, applied to the individual neural nodes of the current layer, giving them greater weight for those important nodes and less weight for those secondary nodes. Because each neural node contains certain characteristic information, the neural network can select information which is more critical to the current task target from a plurality of characteristic information through the operation. The attention layers among the features are different, and the dependency relationship among the feature information is focused more.
Accordingly, as shown in fig. 3, performing joint semantic analysis on the digital collection search query and the text description of the alternative digital collection to obtain a search query-alternative digital collection semantic matching feature vector, including: s131, carrying out semantic coding on the digital collection search query to obtain a digital collection search query semantic coding feature vector; s132, carrying out semantic coding on the text description of the alternative digital stock to obtain a semantic coding feature vector of the text description of the alternative digital stock; and S133, fusing the description semantic coding feature vector of the alternative digital collection text and the digital collection search query semantic coding feature vector to obtain the search query-alternative digital collection semantic matching feature vector. It should be appreciated that this section involves three steps S131, S132 and S133 for performing joint semantic analysis to obtain semantically matched feature vectors for search query-alternative digital collections. The purpose of step S131 is to convert the search query into semantic codes so as to be able to represent the semantic meaning of the query, the semantically encoded feature vectors may capture key semantic information of the query, such as the subject matter, intent, and features of the query. The purpose of step S132 is to convert the text description of the alternative digital collection into semantic codes so as to be able to represent the semantic meaning of the text description, and the semantic code feature vectors can capture key semantic information of the text description, such as the features, content and style of the description. The purpose of step S133 is to fuse the semantic coding feature vector of the query with the semantic coding feature vector of the alternative digital collection to obtain a comprehensive feature vector, reflecting the degree of semantic matching between the query and the alternative digital collection, where the feature vector may be used to calculate the degree of similarity or matching between the query and the alternative digital collection, so as to perform retrieval and recommendation of the digital collection. Through the combined semantic analysis of the three steps, the semantic information of the query and the alternative digital collection can be encoded and matched, so that more accurate digital collection retrieval and recommendation are realized.
More specifically, in step S133, fusing the candidate digital stock text description semantic coding feature vector and the digital stock search query semantic coding feature vector to obtain the search query-candidate digital stock semantic matching feature vector, including: and performing feature interaction based on an attention mechanism on the alternative digital collection text description semantic coding feature vector and the digital collection search query semantic coding feature vector by using an attention layer among features to obtain the search query-alternative digital collection semantic matching feature vector. It should be appreciated that the role of step S133 is to capture important semantic interaction information between the query and the alternative digital stock through a attentive mechanism to better gauge the degree of matching between them. The inter-feature attention layer can automatically learn and assign different attention weights according to the similarity and the association degree between the semantically encoded feature vectors of the query and the alternative digital collection. In this way, important semantic interaction information can get more attention, and unimportant information can get less attention, and in this way, the expression capability and the discrimination of the semantic matching feature vectors can be improved. The combination of the feature interaction and the attention mechanism can better capture the semantic relation between the query and the alternative digital collection, thereby improving the accuracy of the search query and the accuracy of recommendation. Through the semantic matching feature vector, the similarity or the matching degree between the alternative digital collection and the query can be better evaluated, and further more accurate digital collection recommendation and retrieval are realized.
It is worth mentioning that the inter-feature attention layer is a layer in a neural network for learning the relevance and importance between different features. It dynamically assigns weights between different features through an attention mechanism to focus more on important features during feature interactions. In the inter-feature attention layer, the input is typically a set of feature vectors, which may be features from different sources or representing different aspects. Each feature vector is associated with a weight that represents the importance of the feature in the feature interaction. These weights may be calculated and adjusted by the attention mechanism. The attention mechanism typically calculates the weights based on similarity or correlation between feature vectors. It may use different methods such as dot product attention, additive attention, or multi-headed attention to calculate the degree of association between features and generate corresponding weights. These weights may be used to weight a linear combination of features to achieve interaction between features. Through the inter-feature attention layer, the network can automatically learn the association and importance between features and adjust the weights of the features according to the needs of the task. This may improve the modeling capabilities of the model for interactions and associations between different features, thereby improving the expressive power and performance of the model. In the semantic matching task, the inter-feature attention layer can help capture important semantic interaction information between the query and the alternative digital collection, so that matching accuracy is improved.
And then, passing the search query-alternative digital collection semantic matching feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether to return the alternative digital collection.
Accordingly, as shown in fig. 4, based on the search query-alternative digital collection semantic matching feature vector, determining whether to return the alternative digital collection includes: s141, performing feature distribution optimization on the search query-alternative digital stock semantic matching feature vector to obtain an optimized search query-alternative digital stock semantic matching feature vector; and S142, enabling the optimized search query-alternative digital collection semantic matching feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether to return the alternative digital collection. It should be understood that the step S141 aims to improve the expressive power and the discrimination of feature vectors by adjusting the distribution of features. Through feature distribution optimization, semantic matching information between the query and the alternative digital collection can be captured better, and matching accuracy is improved. In step S142, the semantic matching feature vector of the optimized search query-alternative digital stock is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether to return the alternative digital stock. The classifier may be a classifier for determining whether to return the alternative digital collection or a multi-classifier for determining the category or ordering of the return alternative digital collection. The alternative digital collections can be screened and ordered according to the semantic matching degree of the feature vectors through the classifier, so that the most relevant and proper alternative digital collections are provided. The purpose of these two steps is to further optimize and decide on the basis of the semantically matched feature vectors to determine whether to return an alternative digital stock. The feature distribution optimization can improve the representation capability of the feature vector, so that the feature vector is more suitable for classification decision. And the classifier makes a specific decision according to the matching degree of the feature vectors to determine whether to return the alternative digital stock. The goal of these steps is to improve the accuracy of the retrieval and recommendation of digital collections, enabling users to obtain alternative digital collections that better meet their needs.
In the technical scheme of the disclosure, the candidate digital collection text description semantic coding feature vector and the digital collection search query semantic coding feature vector respectively express text semantic features of the digital collection search query and the text description of the candidate digital collection, so that when feature interaction based on an attention mechanism is performed by using an inter-feature attention layer, the search query-candidate digital collection semantic matching feature vector can express text semantic dependency relationship features of the text semantic features of the text description of the digital collection search query and the candidate digital collection, and when the text semantic features of the digital collection search query and the text description of the candidate digital collection are taken as foreground object features, the inter-feature attention layer can also introduce background distribution noise related to feature distribution interference of source text semantic features, and the search query-candidate digital collection semantic matching feature vector also further has single sample semantic feature and sample inter-sample hierarchical space features, so that the search query-candidate digital collection semantic matching feature vector expresses the desired feature matching effect based on the expression of the search query-candidate digital collection.
Accordingly, the applicant of the present disclosure performs a probability density feature imitation paradigm-based distribution gain on the search query-alternative digital stock semantic matching feature vectors.
Accordingly, in one particular example, the search query-alternative numberPerforming feature distribution optimization on the word stock semantic matching feature vector to obtain an optimized search query-alternative digital stock semantic matching feature vector, including: performing feature distribution optimization on the search query-alternative digital stock semantic matching feature vector by using the following optimization formula to obtain the optimized search query-alternative digital stock semantic matching feature vector; wherein, the optimization formula is:wherein (1)>Is the semantic matching feature vector of the search query-alternative digital collection +.>Is the length of the semantic matching feature vector of the search query-alternative digital collection, +.>Is the +.f. of the search query-alternative digital stock semantic matching feature vector>Characteristic value of individual position->Square of two norms representing the semantic matching feature vector of the search query-alternative digital collection, and +.>Is a weighted superparameter,/->An exponential operation representing a numerical value, the exponential operation representing the calculation of a natural exponential function value that is a power of the numerical value,a +.f. representing the semantic matching feature vector of the optimized search query-alternative digital stock>Characteristic values of the individual positions.
Here, based on the characteristic simulation paradigm of the standard cauchy distribution on the probability density for the natural gaussian distribution, the distribution gain based on the probability density characteristic simulation paradigm can use the characteristic scale as a simulation mask to distinguish foreground object characteristics and background distribution noise in a high-dimensional characteristic space, so that the unconstrained distribution gain of the high-dimensional characteristic distribution is obtained by carrying out the associated semantic cognition distribution soft matching of the characteristic space mapping on the high-dimensional space based on the semantic space grading of the high-dimensional characteristics, the expression effect of the search query-candidate digital collection semantic matching characteristic vector based on the characteristic distribution characteristic is improved, and the accuracy of the classification result obtained by the classifier of the search query-candidate digital collection semantic matching characteristic vector is also improved.
More specifically, in step S142, the optimized search query-alternative digital collection semantic matching feature vector is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether to return the alternative digital collection, and the method includes: using the full connection layer of the classifier to carry out full connection coding on the optimized search query-alternative digital stock semantic matching feature vector so as to obtain a coding classification feature vector; and inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
That is, in the technical solution of the present disclosure, the labels of the classifier include returning the alternative digital stock (first label) and not returning the alternative digital stock (second label), where the classifier determines to which classification label the optimized search query-alternative digital stock semantic matching feature vector belongs through a soft maximum function. It is noted that the first tag p1 and the second tag p2 do not contain a manually set concept, and in fact, during the training process, the computer model does not have a concept of "whether to return an alternative digital stock", which is simply two kinds of classification tags and outputs the probability that the feature is under the two classification tags, i.e., the sum of p1 and p2 is one. Therefore, the classification result of whether to return the alternative digital collection is actually converted into the classified probability distribution conforming to the natural rule through classifying the labels, and the physical meaning of the natural probability distribution of the labels is essentially used instead of the language text meaning of whether to return the alternative digital collection.
It should be appreciated that the role of the classifier is to learn the classification rules and classifier using a given class, known training data, and then classify (or predict) the unknown data. Logistic regression (logistics), SVM, etc. are commonly used to solve the classification problem, and for multi-classification problems (multi-class classification), logistic regression or SVM can be used as well, but multiple bi-classifications are required to compose multiple classifications, but this is error-prone and inefficient, and the commonly used multi-classification method is the Softmax classification function. It should be noted that the fully-connected encoding refers to a process of inputting the semantic matching feature vector of the optimized search query-alternative digital stock into the fully-connected layer for encoding. Fully connected layers are a common layer type in neural networks, where each node is connected to all nodes of the previous layer. In the full-join encoding process, each node performs linear transformation and nonlinear activation operations on the input feature vectors to generate encoded classification feature vectors. The purpose of full-join encoding is to map the input feature vector to a higher dimensional feature space through multi-layer nonlinear transformation, and extract a richer and abstract feature representation. These encoded classification feature vectors may better capture semantic and structural information of the input features, thereby improving the performance and decision making capabilities of the classifier. Full-concatenated coding may be implemented by stacking multiple full-concatenated layers, each of which performs linear transformation and nonlinear activation operations on the input feature vector. Parameters between these layers are learned and optimized during training by back propagation algorithms to enable the encoded classification feature vectors to better represent important information of the input features. The encoded classification feature vector is input into the Softmax classification function of the classifier, which can be converted into a classification result. The Softmax function maps the encoded classification feature vector to a probability distribution, one probability value for each class. These probability values represent the likelihood that the input feature vector belongs to the respective category and can be used for decision and prediction in multi-classification tasks. Through the full-connection coding and the Softmax classification function, the semantic matching feature vector of the optimized search query-alternative digital collection can be converted into the coding classification feature vector with higher expression capability and decision capability, and the classification result is represented through probability distribution. This helps to improve the performance of the classifier, enabling it to more accurately determine whether to return an alternative digital stock.
In summary, according to the data processing method for digital collection management disclosed by the embodiment of the disclosure, semantic matching between digital collection search query and alternative digital collection text description can be intelligently realized.
FIG. 5 illustrates a block diagram of a data processing system 100 for digital collection management, according to an embodiment of the present disclosure. As shown in fig. 5, a data processing system 100 for digital collection management according to an embodiment of the present disclosure includes: an input obtaining module 110, configured to obtain a digital collection search query input by a user; a text description extraction module 120 for extracting a text description of the alternative digital collection from the database; the joint semantic analysis module 130 is configured to perform joint semantic analysis on the digital collection search query and the text description of the alternative digital collection to obtain a search query-alternative digital collection semantic matching feature vector; and a return confirmation module 140, configured to determine whether to return the alternative digital collection based on the search query-alternative digital collection semantic matching feature vector.
In one possible implementation, the joint semantic analysis module 130 includes: the search query semantic coding unit is used for carrying out semantic coding on the digital collection search query to obtain a digital collection search query semantic coding feature vector; the text description semantic coding unit is used for carrying out semantic coding on the text description of the alternative digital stock to obtain an alternative digital stock text description semantic coding feature vector; and the fusion unit is used for fusing the alternative digital stock text description semantic coding feature vector and the digital stock search query semantic coding feature vector to obtain the search query-alternative digital stock semantic matching feature vector.
In a possible implementation manner, the fusion unit is configured to: and performing feature interaction based on an attention mechanism on the alternative digital collection text description semantic coding feature vector and the digital collection search query semantic coding feature vector by using an attention layer among features to obtain the search query-alternative digital collection semantic matching feature vector.
In one possible implementation, the return confirmation module 140 includes: the feature distribution optimizing unit is used for carrying out feature distribution optimization on the search query-alternative digital stock semantic matching feature vector so as to obtain an optimized search query-alternative digital stock semantic matching feature vector; and the classification unit is used for enabling the optimized search query-alternative digital collection semantic matching feature vector to pass through a classifier to obtain a classification result, and the classification result is used for indicating whether to return the alternative digital collection.
In a possible implementation manner, the feature distribution optimizing unit is configured to: performing feature distribution optimization on the search query-alternative digital stock semantic matching feature vector by using the following optimization formula to obtain the optimized search query-alternative digital stock semantic matching feature vector; wherein, the optimization formula is:wherein (1)>Is the semantic matching feature vector of the search query-alternative digital collection +.>Is the length of the semantic matching feature vector of the search query-alternative digital collection, +.>Is the +.f. of the search query-alternative digital stock semantic matching feature vector>Characteristic value of individual position->Square of two norms representing the semantic matching feature vector of the search query-alternative digital collection, and +.>Is a weighted superparameter,/->An exponential operation representing a numerical value, the exponential operation representing the calculation of a natural exponential function value that is a power of the numerical value,a +.f. representing the semantic matching feature vector of the optimized search query-alternative digital stock>Characteristic values of the individual positions.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described data processing system 100 for digital collection management have been described in detail in the above description of the data processing method for digital collection management with reference to fig. 1 to 4, and thus, repetitive descriptions thereof will be omitted.
As described above, the data processing system 100 for digital collection management according to the embodiment of the present disclosure may be implemented in various wireless terminals, such as a server or the like having a data processing algorithm for digital collection management. In one possible implementation, data processing system 100 for digital collection management according to embodiments of the present disclosure may be integrated into a wireless terminal as one software module and/or hardware module. For example, the data processing system 100 for digital collection management may be a software module in the operating system of the wireless terminal, or may be an application developed for the wireless terminal; of course, the data processing system 100 for digital collection management may also be one of the many hardware modules of the wireless terminal.
Alternatively, in another example, the data processing system 100 for digital collection management and the wireless terminal may be separate devices, and the data processing system 100 for digital collection management may be connected to the wireless terminal through a wired and/or wireless network and transmit interactive information in an agreed data format.
Fig. 6 illustrates an application scenario diagram of a data processing method for digital collection management according to an embodiment of the present disclosure. As shown in fig. 6, in this application scenario, first, a digital collection search query (e.g., D1 illustrated in fig. 6) input by a user and a text description of an alternative digital collection (e.g., D2 illustrated in fig. 6) are acquired from a database, and then the digital collection search query and the text description of the alternative digital collection are input into a server (e.g., S illustrated in fig. 6) in which a data processing algorithm for digital collection management is deployed, wherein the server is capable of processing the digital collection search query and the text description of the alternative digital collection using the data processing algorithm for digital collection management to obtain a classification result for indicating whether to return the alternative digital collection.
Further, it is worth mentioning that the present disclosure relates to a data processing system for digital collection management, which is capable of efficient storage, retrieval, presentation and analysis of digital collections. The system comprises the following components: a digital collection database for storing metadata and content of the digital collection, and other information related to the digital collection, such as author, source, copyright, history, classification, etc.; the digital collection acquisition module is used for acquiring digital collections from different data sources, converting the digital collections into a uniform format such as XML, JSON and the like, and storing the digital collections into the digital collection database; the digital collection retrieval module is used for retrieving the digital collection meeting the conditions from the digital collection database according to the query conditions of the user and returning the digital collection to the user; a digital collection display module for displaying digital collections in different manners, such as lists, grids, slides, maps, etc., according to the user's selection, and providing additional information related to the digital collections, such as comments, links, comments, etc.; and the digital collection analysis module is used for carrying out various analyses such as statistics, clustering, classification, association, recommendation and the like on the digital collection, and displaying analysis results to a user in the forms of charts, reports and the like. It should be appreciated that a digital blind box is a digital collection based on blockchain technology that is scarce, unique, and non-tamperable.
The invention has the advantages of realizing comprehensive management of the digital collection and improving the utilization value and social influence of the digital collection.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.