CN112329933B - Data processing method, device, server and storage medium - Google Patents

Data processing method, device, server and storage medium Download PDF

Info

Publication number
CN112329933B
CN112329933B CN202011200592.3A CN202011200592A CN112329933B CN 112329933 B CN112329933 B CN 112329933B CN 202011200592 A CN202011200592 A CN 202011200592A CN 112329933 B CN112329933 B CN 112329933B
Authority
CN
China
Prior art keywords
data
multimedia data
target
expression
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011200592.3A
Other languages
Chinese (zh)
Other versions
CN112329933A (en
Inventor
欧子菁
王婧雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011200592.3A priority Critical patent/CN112329933B/en
Publication of CN112329933A publication Critical patent/CN112329933A/en
Application granted granted Critical
Publication of CN112329933B publication Critical patent/CN112329933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Abstract

The embodiment of the invention discloses a data processing method, a data processing device, a server and a storage medium, wherein the method comprises the following steps: acquiring an initial characteristic expression of target multimedia data to be processed; converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting one data feature of the target multimedia data; acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data; and re-determining the target characteristic expression of the target multimedia data according to the expression segment after weighting processing, and identifying the original multimedia data by adopting a new characteristic expression, thereby improving the processing efficiency of the server on data processing based on the new characteristic expression.

Description

Data processing method, device, server and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, a server, and a storage medium.
Background
With the deep development of internet technology, in order to improve the efficiency of data comparison performed by a server, a current server usually performs feature analysis on data first, so as to encode original data into a coding sequence (such as a hash code sequence) based on data features, so as to represent the original data by using the coding sequence, and then, when performing subsequent data comparison, the server can directly compare the coding sequence of the original data without analyzing and comparing the features of the original data. However, when the server performs data comparison based on the coding sequences, each coding sequence needs to be sequentially used as a reference and sequentially compared with the coded data of all other data, and because the current data volume is huge and shows a rapid growth trend, how to express the original data by using a new characteristic expression improves the data processing efficiency of the server, and the method becomes a current research hotspot.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, a server and a storage medium, and original multimedia data are represented by adopting a new characteristic expression, so that the processing efficiency of the server on data processing based on the new characteristic expression can be improved.
In one aspect, an embodiment of the present invention provides a data processing method, including:
acquiring an initial feature expression of target multimedia data to be processed, wherein the initial feature expression is used for reflecting N data features of the target multimedia data, and N is a positive integer greater than or equal to 1;
converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting a data feature of the target multimedia data;
acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data;
and re-determining a target characteristic expression of the target multimedia data according to the expression segment after the weighting processing, wherein the target characteristic expression is used for acquiring multimedia data associated with the target multimedia data.
In another aspect, an embodiment of the present invention provides a data processing apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial characteristic expression of target multimedia data to be processed, the initial characteristic expression is used for reflecting N data characteristics of the target multimedia data, and N is a positive integer greater than or equal to 1;
a conversion unit, configured to convert the initial feature expression into a plurality of expression segments, where each expression segment is used to reflect a data feature of the target multimedia data;
the processing unit is used for acquiring the importance degree of each data feature in the target multimedia data and carrying out weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data;
and the determining unit is used for re-determining a target characteristic expression of the target multimedia data according to the expression segment after the weighting processing, wherein the target characteristic expression is used for acquiring multimedia data associated with the target multimedia data.
In another aspect, an embodiment of the present invention provides a server, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store a computer program that supports a terminal to execute the foregoing method, where the computer program includes program instructions, and the processor is configured to call the program instructions to perform the following steps:
acquiring an initial characteristic expression of target multimedia data to be processed, wherein the initial characteristic expression is used for reflecting N data characteristics of the target multimedia data, and N is a positive integer greater than or equal to 1;
converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting a data feature of the target multimedia data;
acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data;
and re-determining a target characteristic expression of the target multimedia data according to the expression segment after weighting, wherein the target characteristic expression is used for acquiring multimedia data associated with the target multimedia data.
In still another aspect, an embodiment of the present invention provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the computer-readable storage medium is configured to perform the data processing method according to the first aspect.
In the embodiment of the invention, after the server acquires the initial feature expression of the target multimedia data to be processed, the initial feature expression is converted into a plurality of expression segments, so that each expression segment reflects one data feature, and based on the conversion of the initial feature expression of the target multimedia data, the server can acquire the expression segments under different data features, so that the target multimedia data can be differentially expressed based on the difference of the data features. After the server obtains the expression segment corresponding to each data feature, further, the server can perform weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature, so as to obtain the expression segment after weighting processing, and re-determine the target feature expression of the target multimedia data based on the expression segment after weighting processing, so that the difference of the expression segments corresponding to each data feature can be more obvious based on the weighting processing of the expression segments corresponding to each data feature by the server, the enhancement of the feature difference between different data features of the target multimedia data is realized, and the target multimedia data can be more accurately represented by adopting the target feature expression.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of a model structure of an object model according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a data processing method provided by an embodiment of the invention;
FIG. 3a is a schematic diagram of the generation of a plurality of expression fragments according to an embodiment of the present invention;
FIG. 3b is a schematic diagram of determining the importance of data features according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram of a data processing method provided by an embodiment of the invention;
FIG. 5a is a schematic diagram of model training according to an embodiment of the present invention;
FIG. 5b is a schematic flow chart of hierarchical clustering according to an embodiment of the present invention;
FIG. 5c is a schematic diagram of a hierarchy of objects obtained by hierarchical clustering according to an embodiment of the present invention;
FIG. 5d is a diagram illustrating a model structure of an object model according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a data processing apparatus provided by an embodiment of the present invention;
fig. 7 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, sense the environment, acquire knowledge and use the knowledge to obtain the best result, in other words, it is a comprehensive technique of computer science, it attempts to understand the essence of Intelligence and produces a new intelligent machine that can react in a way similar to human Intelligence, i.e. it is to study the design principle and implementation method of various intelligent machines to make the machines have the functions of sensing, reasoning and decision making. The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence basic technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, and the like, and the artificial intelligence software technology mainly includes several directions such as computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, and the like. The data processing method provided by the embodiment of the invention mainly relates to the fields of Natural Language Processing (NLP) and Machine Learning (ML) of artificial intelligence, wherein the NLP is an important direction in the fields of computer science and artificial intelligence, and is used for researching various theories and methods capable of realizing effective communication between a person and a computer by using natural Language, and the NLP is a science integrating linguistics, computer science and mathematics, so that the research in the field relates to natural Language, namely the Language used by people daily, and is closely connected with the research of linguistics, and the natural Language processing technology generally comprises the technologies of text processing, semantic understanding, Machine translation, robot question and answer, knowledge and the like. Machine Learning (ML) is a multi-domain cross discipline, which relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like, and is used for specially researching how a computer simulates or realizes human Learning behaviors to acquire new knowledge or skills and reorganizes an existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning. The data processing method provided by the embodiment of the invention determines the feature weights of different data features by determining different importance degrees of the data features of target multimedia data, so that expression segments of different data features of the target multimedia data can be weighted according to the feature weights, and the target multimedia data can be characterized by a plurality of expression segments, wherein the expression segments can be hash code vector segments of the target multimedia data, and the target multimedia data can be characterized by hash code vectors on multiple granularity levels.
After the server obtains the feature representation of the target multimedia data on multiple granularity levels, when the feature representation of the target multimedia data is adopted for data search, the server can perform data search by utilizing the hash code vector (namely the expression fragment) in a segmented manner based on different granularity levels, and can perform hierarchical data screening according to different data features based on the segmented feature representation, so that multimedia data similar to each level is obtained through search and output. In a specific implementation, the server may determine an initial feature expression of the target multimedia data based on feature analysis of the target multimedia data, so as to convert the initial feature expression into a plurality of expression segments, where it may be understood that the initial feature expression of the target multimedia data is generated based on all data features of the target multimedia data, and one expression segment is only used for reflecting one data feature of the target multimedia data, that is, if the data features of the target multimedia data are N, the initial feature expression is generated based on N data features of the target multimedia data, where N is a positive integer greater than or equal to 1.
In one embodiment, the server may invoke a trained target model to perform feature analysis on the target multimedia data to obtain an initial feature expression of the target multimedia data, where the target model is a model based on a Variational Auto Encoder (VAE), the Variational Auto Encoder includes an Encoder (Encoder) and a decoder (decoder), and the Encoder is capable of learning a distribution of the feature expression of a sample multimedia data, so as to make a feature space more canonical and have a certain capability of generating new multimedia data different from the original sample multimedia data. The initial feature expression of the multimedia data may be an implicit vector (implicit vector) of the multimedia data, or may be a feature vector of the multimedia data, wherein the variational self-encoder is developed based on a self-encoder, and the self-encoder is composed of an encoder (or an encoding network) and a decoder (or a decoding network), but the self-encoder and the variational self-encoder are different in that the encoder of the self-encoder directly encodes the input multimedia data into a feature expression (such as an implicit vector z), and the variational self-encoder obtains the feature expression from the input multimedia dataAnd then generating a feature expression (such as an implicit vector z) of the multimedia data according to the obtained variance and mean value samples. The model structure of the variational auto-encoder (i.e. the model structure of the object model) can be as shown in fig. 1, if the object multimedia data inputted into the variational auto-encoder model is x as shown in fig. 1 i Then the target multimedia data x i After passing through an encoder, a mean value mu and a variance sigma are determined from target multimedia data, the mean value mu and the variance sigma can be further sampled to obtain an implicit vector z of the target multimedia data, and after passing through a decoder, the target multimedia data is reconstructed according to the implicit vector z, so that reconstructed multimedia data x 'of the target multimedia data can be obtained' i
According to the model structure of the target model shown in fig. 1, the training process for the target model is to continuously adjust the model parameters (or network parameters) of the target model, i.e. the parameters of the encoder (or encoding network) and the decoder (or decoding network), so as to continuously reduce the reconstructed multimedia data x 'generated by the target model' i And original target multimedia data x i The difference between the data, and a process of continuously decreasing the relative entropy (i.e., KL (Kullback-Leibler divergence)) between the distribution (μ, σ) based on the mean μ and variance σ and a standard normal distribution. In one embodiment, in order to obtain the data association relationship between the multimedia data by learning the target model, when performing model training on the target model, it is necessary to not only continuously reduce the data difference between the original multimedia data and the reconstructed multimedia data, but also reduce the difference between feature expressions for invoking the target model to generate two multimedia data with association relationship, so that the target model learns the data association relationship between the multimedia data, that is, if two multimedia data with data association relationship are x respectively 1 And x 2 Then, after the data association relation among the multimedia data is obtained by the training of the target model, the multimedia data x is generated by adopting the trained target model 1 And the generated multimedia data x 2 Are similar to each other.
After the target model learns the data association relationship among the multimedia data, the server calls the trained target model to generate initial feature expressions of the multimedia data with the data association relationship, the similarity between the initial feature expressions is high, the similarity between the initial feature expressions of the multimedia data without the data association relationship is low, and further the server can classify the multimedia data or search the data and the like based on the similarity between the initial feature expressions of different multimedia data. Specifically, the server can classify the multimedia data similar to the corresponding initial characteristic expressions into the same category, and classify the multimedia data with larger difference of the corresponding initial characteristic expressions into different categories; or, when the server searches data, the server may compare the initial feature expression of the search data with the feature expression of the multimedia data in the database, so as to use the multimedia data similar to the initial feature expression of the search data as the search result of the search data.
When the server searches or classifies data based on the initial characteristic expression, any two multimedia data need to be compared in sequence, and the data processing pressure of the server is increased, so that in order to further improve the speed of the server for searching or classifying the data, the server can convert the initial characteristic expression into a plurality of expression segments after obtaining the initial characteristic expression of the target multimedia data, so that one expression segment reflects one data characteristic of the multimedia data, and then the server can respectively perform weighting processing on the corresponding expression segments based on the importance degree of each data characteristic, thereby re-determining the target characteristic expression of the target multimedia data based on the expression segments after weighting processing. Based on the conversion of the initial characteristic expression of the target multimedia data by the server, the server can obtain the expression segments for reflecting different data characteristics included in the target multimedia data, the server is made available a multi-granular characterization of the target multimedia data, and, based on the server's weighting of each expression segment, so that the server can enlarge the difference between the expression segments of different data characteristics, when the weighted vector segments are used to re-determine the target characteristic expression of the target multimedia data, and perform data search or data classification based on the target characteristic expression, the accuracy of data searching or data classification performed by the server can be improved due to the large difference of the expression segments with different data characteristics, and in addition, the data processing speed of the server can be improved based on the data searching of different expression segments.
Referring to fig. 2, which is a schematic flow chart of a data processing method according to an embodiment of the present invention, as shown in fig. 2, the method may include:
s201, acquiring an initial characteristic expression of target multimedia data to be processed.
S202, converting the initial characteristic expression into a plurality of expression segments, wherein each expression segment is used for reflecting a data characteristic of the target multimedia data.
In steps S201 and S202, the initial feature expression is used to reflect N data features of the target multimedia data, where N is a positive integer greater than or equal to 1, it can be understood that the N data features of the target multimedia data reflected by the initial feature expression are all data features of the target multimedia data, where after the target multimedia data is acquired, the server may input the target multimedia data into a trained target model, so as to acquire the initial feature expression of the target multimedia data from the trained target model, and in a specific implementation, after the target multimedia data is input into the trained target model, the server may invoke a coding network of the target model to perform feature analysis on the target multimedia data, so as to use an output of the coding network as the initial feature expression of the target multimedia data, the output of the coding network is a hidden vector (or a feature vector), so the initial feature expression of the target multimedia data obtained by the server may be the hidden vector (or the feature vector). After the server obtains the initial feature expression, in order to obtain different data feature expression segments for reflecting the target multimedia data, the server may convert the initial feature expression, so as to obtain expression segments corresponding to each data feature.
In one embodiment, when the server converts the initial feature expression and obtains a plurality of expression segments, the server may generate the initial feature expression through a multilayer linear network, where the number of layers of the linear network is equal to the number of expression segments that need to be generated, that is, if the number of expression segments that need to be generated is n, the server may pass the initial feature expression through n layers of linear networks, so as to obtain n expression segments. Wherein, in a plurality of expression segments obtained by converting the initial feature expression by the multilayer linear network, the segment length of each expression segment is determined based on the length of the target feature expression of the target multimedia data to be generated and the number of the plurality of expression segments obtained by conversion, if the length of the target feature expression of the target multimedia data to be generated is L, the number of the plurality of expression segments obtained by conversion is n, and both L and n are positive integers greater than 1, the length of each expression segment obtained by the multilayer linear network is [ L/n [ ]]. In one embodiment, if the initial feature expression is z i The length of the target characteristic expression to be generated is L, and the server converts the initial characteristic expression z into an initial characteristic expression z i Passing through n layers of linear network as shown in FIG. 3a, n expression fragments can be obtained, wherein the n expression fragments are { z' i1 ,…,z′ in And the length of each expression fragment is [ L/n ]]. In an embodiment, when the length of each expression segment is determined based on the length of the target feature expression of the target multimedia data to be generated and the number of the plurality of expression segments obtained through conversion, if the length of the target feature expression of the target multimedia data to be generated is L and the number of the plurality of expression segments obtained through conversion is n, the segment length of each expression segment can be randomly determined, and the sum of the lengths of the n segments is L.
In an embodiment, if the trained target model includes the multilayer linear network and the server converts the initial feature expression, the trained target model may be invoked to convert the initial feature expression to obtain an expression segment corresponding to each data feature, or, if the trained target model does not include the multilayer linear network, the server may also invoke other models including the multilayer linear network and input the initial feature expression obtained from the target model into the model including the multilayer linear network, so as to convert the initial feature expression into a plurality of expression segments. After the server obtains a plurality of expression segments, in order to merge the importance degree of each data feature in the target multimedia data into the corresponding expression segment, the server may assign different weights to each expression segment according to the importance degree of each data feature, that is, step S203 is executed instead.
S203, acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data.
After the server obtains the plurality of expression segments, the importance degree of each data feature in the target multimedia data can be further determined, wherein the importance degrees of different data features in the target multimedia data are determined based on the accuracy degree of the division of the target multimedia data by the data features, for example, when data search is performed based on the data features of the target multimedia data or the target multimedia data is classified based on the data features, if the data search (or classification) performed based on a certain data feature is wrong, the data search (or classification) performed based on the subsequent data features is wrong, the server can consider that the importance degree of the certain data feature is higher, that is, the accuracy degree of the division of the target multimedia data based on the certain data feature is higher. In one embodiment, the data features of the target multimedia data obtained by the server are hierarchically related data features, and the data range referred to by the previous-layer data features is larger than the data range referred to by the next-layer data features, where the hierarchically related data features may be, for example, sports and football, the sports is a previous-layer data feature, the football is a next-layer data feature, and the data range referred to by the sports is larger than the data range referred to by the football.
After the server determines the importance degree of each data feature in the target multimedia data, the server may perform weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature, and re-determine the target feature expression of the target multimedia data according to the expression segment even after the weighting processing, that is, execute step S204 instead. In an embodiment, when the server obtains the importance degree of each data feature in the target multimedia data, the server may determine based on the hierarchical relationship of the data features, and specifically, the server may regard, of two data features having the hierarchical relationship, that the importance degree of a data feature in a previous layer is greater than that of a data feature in a next layer, so that the server may adopt a larger weight value as a weight value of an expression of the data feature in the previous layer and a smaller weight value as a weight value of an expression of the data feature in the next layer when performing weighting processing on the expression of the data feature. In one embodiment, if two data features having a hierarchical relationship are recorded in a feature node of a certain hierarchical structure, it can be understood that the feature node recording the previous data feature is closer to the top node (or the root node) than the feature node recording the next data feature, as shown in fig. 3b, feature a and feature b shown in fig. 3b are two data features having a hierarchical relationship, and since the feature node recording feature a is closer to the root node than the feature node recording feature b, the server may consider the importance level of feature a to be higher than that of feature b when determining the importance levels of feature a and feature b, respectively.
And S204, re-determining the target characteristic expression of the target multimedia data according to the expression segment after the weighting processing.
After the server determines the expression segments after the weighting processing, the server may re-determine the target feature expression of the target multimedia data based on the expression segments after the weighting processing, where the initial feature expression of the target multimedia data determined by the server may be an implicit vector, then a plurality of expression segments obtained based on the conversion of the initial feature expression may be implicit sub-vectors, and further, the weighting processing on each expression segment is the weighting processing on each implicit sub-vector. It can be understood that the expression segments obtained by the server after the weighting processing may be implicit sub-vectors after the weighting processing, in one embodiment, when the server determines the target feature expression of the target multimedia data again according to the expression segments after the weighting processing, the expression segments after the weighting processing are recombined to obtain the target feature expression of the target multimedia data, specifically, when the server combines the expression segments after the weighting processing, the expression segments after the weighting processing corresponding to the data features may be recombined in sequence according to the order from the highest importance degree to the lowest importance degree of the data features in the target multimedia data, or the expression segments after the weighting processing corresponding to the data features may be recombined in sequence according to the order from the lowest importance degree to the highest importance degree of the data features, in the embodiment of the present invention, the order and the manner in which the server performs weighting processing on the expression segments of the data features are not limited, and the server may combine the expression segments of all the data features in any order or manner.
In one embodiment, if the server-obtained expression segments are n { z's as described above' i1 ,…,z′ in Then, after determining the importance degree of the data feature corresponding to each expression fragment and performing weighting processing on the n expression fragments based on the determined importance degree, the server may obtain n weighted expression fragments, where the weighted expression fragments may be represented by formula (1):
Figure BDA0002753629490000101
wherein the weight is 1 -weight n Determining a weight value z 'for weighting the expression segment of each data characteristic based on the importance degree of each data characteristic by the server' i1 ~z′ in The n data characteristics of the target multimedia data respectively correspond to the expression segments.
After the server determines the expression segments after the weighting processing, in order to further determine a target feature expression of the target multimedia data, the obtained expression segments may be combined to obtain a combined expression segment, where the combined expression segment is represented by formula (2):
Figure BDA0002753629490000102
wherein the content of the first and second substances,
Figure BDA0002753629490000111
is z 'in formula (1)' i1 Weight of the ball 1 By the analogy, the method can be used,
Figure BDA0002753629490000112
is z 'in formula (1)' in Weight of the ball n
In an embodiment, after the server combines the expression segments after weighting processing to obtain a combined expression segment shown in formula (2), the server may directly use the combined expression segment as the target feature expression of the target multimedia data, that is, the target feature expression of the target multimedia data is the implicit vector (or feature vector, or hash code vector) z 'indicated by formula (2)' i . Alternatively, the server may perform binarization processing on the combined expression segment after obtaining the combined expression segment, and use the binarized expression as the target feature expression of the target multimedia data, and it is understood that after performing binarization processing on the expression indicated by equation (2), the hash code represented by 0/1 of the target multimedia data may be obtained, specifically, after obtaining the combined expression segmentThe segment is subjected to binarization processing by adopting the following formula (3):
h i =binarize(z′ i ) (3)
wherein, z' i For the expression of the fragments after binding, h i For the expression fragment after binarization treatment, binarize represents p-z' i The indicated expression fragment was subjected to binarization processing. After the server performs binarization processing on the combined expression segments to obtain hash codes, the obtained hash codes can be used as target characteristic expressions of the target multimedia data, wherein the hash codes refer to algorithms for obtaining a string of codes only composed of 0/1 of multimedia data (including text data or image data) through some hash algorithms, and the string of obtained codes is the hash codes. In an embodiment, the target feature expression obtained by the server is used for obtaining the multimedia data associated with the target multimedia data, and since the target feature expression is obtained based on the expression segments after the weighting processing, when the server obtains the multimedia data associated with the target multimedia data based on the target feature expression, the multimedia data associated with each expression segment (that is, the similarity with each expression segment satisfies a preset similarity threshold) can be sequentially obtained based on each expression segment, so that the data query amount of the server can be reduced, and the data processing capability of the server can be improved.
In one embodiment, the target multimedia data may be text data, or may also be image data, or audio data, and more specifically, the text data may be medical text, or educational text, and the like, the image data may be medical image, or educational image, and the like, and the audio data may be music file, or video file, and the like, for example. If the target multimedia data is a medical text, after the server acquires the medical text, the server can call the trained target model to perform characteristic analysis on the acquired medical text, thus, the initial feature expression of the medical text is obtained, and it can be understood that the server calls the target model to perform the feature analysis on the medical text to generate the initial feature expression, which is generated by the target model according to all the data features of the medical text, that is, the initial feature expression reflects all the data features of the medical text, and further, after the initial feature expression of the medical text is generated, in order to implement hierarchical representation of the data features of the medical text, the server may convert the initial feature expression into a plurality of expression segments, so that one expression segment reflects one data feature of the medical text.
In an embodiment, based on the obtained expression segment corresponding to each data feature of the medical text, the server may perform weighting processing on each expression segment according to the importance degree of each data feature in the multimedia data, and determine the target feature expression of the target multimedia data according to the expression segment after the weighting processing. Furthermore, the server may perform data searching (or data classification) based on target feature expressions of different medical texts, specifically, when performing data searching based on a target feature expression of the medical text, the server may invoke the target model to obtain an initial feature expression of the search text for performing the medical text searching first, so as to determine the target feature expression of the search text based on the conversion of the initial feature expression of the search text and the division of the importance degree of each data feature in the initial search text, and then when performing the medical text searching based on the target feature expression of the search text, the server may determine a data feature with the highest importance degree from the target feature expressions of each medical text in the medical text library first after determining the target feature expression of the medical text in the medical text library, and based on the target feature expression of each medical text in the medical text library, and further, the server can determine texts matched with the expression segments of the secondary important data features from the obtained reference medical texts and sequentially screen the texts based on the expression segments of the secondary important data features in the search text, so that the server can screen the medical texts which are most matched with the search text. In the screening process of the server, screening of each layer is executed based on the screening result of the upper layer, so that the screening efficiency of the server for the medical texts can be improved. In addition, when the target multimedia data is other text data, or when the target multimedia data is image data or audio/video data, the corresponding target characteristic expression can be determined according to the method, so that data search is performed.
In the embodiment of the invention, after the server acquires the initial feature expression of the target multimedia data to be processed, the initial feature expression is converted into a plurality of expression segments, so that each expression segment reflects one data feature, and based on the conversion of the initial feature expression of the target multimedia data, the server can acquire the expression segments under different data features, so that the target multimedia data can be differentially expressed based on the difference of the data features. After the server obtains the expression segment corresponding to each data feature, further, the server can perform weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature, so as to obtain the expression segment after weighting processing, and re-determine the target feature expression of the target multimedia data based on the expression segment after weighting processing, so that the difference of the expression segments corresponding to each data feature can be more obvious based on the weighting processing of the expression segments corresponding to each data feature by the server, thereby realizing the enhancement of the feature difference between different data features of the target multimedia data, and further enabling the representation of the target multimedia data by adopting the target feature expression to be more accurate.
Referring to fig. 4, which is a schematic flow chart of a data processing method according to an embodiment of the present invention, as shown in fig. 4, the method may include:
s401, target multimedia data to be processed and a trained target model are obtained.
S402, inputting the target multimedia data into the trained target model, and obtaining an initial characteristic expression of the target multimedia data from the trained target model.
In step S401And in step S402, the target model is a variational self-encoder-based model, a training process of the target model is a process of continuously optimizing network parameters of the target model, and in order to enable the target model to learn a data association relationship between multimedia data, the server can determine that a sample multimedia data set includes at least one sample multimedia data and a data association relationship between any two sample multimedia data when the target model is trained by using the sample multimedia data set, so that the network parameters of the target model can be adjusted according to the data association relationship and the at least one sample multimedia data to obtain a trained target model. In a specific implementation, the sample used by the server to train the target model is a triplet (x) i ,x j ,w ij ) Wherein x is i For one sample multimedia data, x j For another sample of multimedia data, w ij For sample multimedia data x i And sample multimedia data x j The data association relationship between the two, it can be understood that at w ij When equal to 0, represents sample multimedia data x i And sample multimedia data x j Has no data association relation between them, and is in w ij When 1, it means sample multimedia data x i And sample multimedia data x j Have data association relation among them. In one embodiment, x i And x j Or TF-IDF (a commonly used weighting technique for information retrieval and data mining) representation of sample multimedia data, where TF is Term Frequency (Term Frequency) and IDF is Inverse text Frequency index (Inverse text Frequency).
After acquiring the sample for training the target model, when training the target model, as shown in fig. 5a, the target model may receive the triplet (x) i ,x j ,w ij ) Then, (x) can be first i ,x j ,w ij ) By means of a coding network (or encoder) that can learn x separately i ,x j Corresponding hidden vector z i ,z j Distribution (hypothesis)The distribution is a standard normal distribution, and the learned parameters are the mean μ and variance σ) parameters (μ ii ) And (mu) jj ) After learning the distributed parameters, the two parameters can be sampled to obtain the hidden vector z i ,z j And further input to the decoding network so that the decoding network pair z i ,z j Reconstructing the data to obtain reconstructed multimedia data x of original multimedia data i ′,x j ', and passing the decoding network through x i ′,x j ' reconstructing the data association relationship between original sample multimedia data to obtain the data association relationship for representing z i ,z j W of the relationship between ij ′。
In one embodiment, the coding network (or encoder) in the target model is composed of two linear layers, and the specific structure of the two linear layers can be shown as formulas (4) and (5):
t 1 =ReLU(W 1 x+b 1 ) (4)
t 2 =ReLU(W 2 t 1 +b 2 ) (5)
wherein ReLU is a Linear rectifying function (ReLU), t 1 Is the output of the first layer linear network, W 1 Is the network weight (network parameter) of the first layer linear network, b 1 Is the offset of the first layer linear network, x is the sample multimedia data input to the first layer linear network, x can be x as described above i Or x as above j ;t 2 Is the output of the second layer linear network, W 2 Is the network weight (network parameter) of the second layer linear network, b 2 Is a bias of the second layer linear network. Based on the structure of the two-layer linear network of the encoder, the target model encoder determines the parameter mean μ and variance σ of the distribution of the hidden vectors to be learned according to the following equations (6) and (7):
μ=W 3 t 2 +b 3 (6)
logσ=W 4 t 2 +b 4 (7)
wherein, W 3 And W 4 Network parameters of two other layers of the network, also encoders, b 3 And b 4 Is the bias of the other two layers of the network of the encoder. Further, the target model may be sampled by equation (8) to obtain the hidden vector z:
z~N(μ(x),diag(σ 2 (x))) (8)
where N is the standard positive-theta distribution of compliance and diag is a function used to construct the diagonal matrix, then it can be understood from equation (8) that the sampling of the implicit vector z is based on the mean and variance of the standard positive-theta distribution of z compliance.
In one embodiment, the decoding network of the object model is used to reconstruct the generated implicit vector z into multimedia data (or TF-IDF representation of multimedia data), in particular, the decoding network comprises a fully-connected layer and a softmax layer (a logistic regression network layer). It can be understood that the training of the target model includes the following 5 aspects:
1) reducing original sample multimedia data x i And corresponding reconstructed multimedia data x i ' loss of reconstruction;
2) reducing original sample multimedia data x j And corresponding reconstructed multimedia data x j ' loss of reconstruction;
3) reducing original sample multimedia data x i And x j Data association relation w between ij And corresponding hidden vector (initial feature expression) z i And z j Correlation relationship w' ij Loss of reconstruction;
4) raw sample multimedia data x i Initial feature expression z i Distribution of (u) ii ) KL divergence from the standard normal distribution;
5) raw sample multimedia data x j Initial feature expression z j Distribution of (u) jj ) KL divergence from the standard normal distribution.
It will be appreciated that the server is adapting the network of object models based on data associations and sample multimedia dataNetwork parameters, i.e. the coding network of the server first calling the target model, for any two sample multimedia data (such as the multimedia data x mentioned above) i And x j ) Performing feature analysis to obtain initial feature expression of each sample multimedia data (such as the multimedia data x mentioned above) i Initial feature expression z i And multimedia data x j Initial feature expression z j ) And correlation between the initial feature expressions (i.e., w 'as described above)' ij ) (ii) a Further, a decoding network capable of calling the target model performs data reconstruction on the initial feature expression to obtain reconstructed multimedia data of each sample multimedia data, such as original sample multimedia data i And corresponding reconstructed multimedia data x i ', and original text multimedia data x j And corresponding reconstructed multimedia data x j ' so that the server can reduce the data difference between each sample multimedia data and the corresponding reconstructed multimedia data, and the direction of the difference in correlation between the data correlation and the correlation of the feature expression, adjusting the network parameters of the target model, therefore, the training of the target model is realized, and the trained target model can learn the data association relationship among the multimedia data, that is, the initial feature expressions (i.e. hidden vectors) of the multimedia data with the data association relationship generated by the trained target model are similar (or the similarity degree is more than or equal to the preset similarity threshold), and the initial feature expressions of the multimedia data without data association relationship generated by adopting the trained target model are dissimilar (or the similarity degree is less than a preset similarity threshold).
In an embodiment, after the server finishes training the target model, the server may invoke the target model that learns the data association relationship of the multimedia data (i.e., the trained target model) to process the target multimedia data to be processed, so as to obtain an initial feature expression in which the data association relationship of the target multimedia data is fused, and generate a target feature expression having a hierarchical structure based on the initial feature expression.
And S403, converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting a data feature of the target multimedia data.
In an embodiment, after the initial feature expression is obtained, in order to obtain a target feature expression with a hierarchical structure, the server may convert the obtained initial feature expression into a plurality of expression segments, so that one expression segment reflects one data feature of the target multimedia data, specifically, the initial feature expression includes a hidden vector, and when the server converts the initial feature expression into the plurality of expression segments, the server performs vector conversion on the hidden vector to obtain a plurality of hidden sub-vectors, and uses the obtained hidden sub-vector as one expression segment. In a specific implementation, if the implicit vector (initial feature expression) determined by the server is z i When the server performs vector conversion on the hidden vector to obtain a plurality of hidden sub-vectors, the server can firstly convert the hidden vector z into the hidden sub-vector i Obtaining a hidden vector z through n layers of linear networks i Reconstructed multimedia data { x 'under different data characteristics obtained by characteristic reconstruction under guidance of different data characteristics' i1 ,…,x′ in }. Further, the server may generate an expression fragment of each data feature by reconstructing the multimedia data under each data feature, and the obtained expression fragment may be { z' i1 ,…,z′ in It can be understood that an expression fragment is obtained for reflecting a data characteristic of the target multimedia data.
After the server obtains the expression segments for representing each data feature, the expression segments corresponding to each data feature can be weighted based on the importance degree of each data feature in the target multimedia data, so that the target feature expression of the target multimedia data is determined based on the weighted expression segments.
S404, obtaining the importance degree of each data feature in the target multimedia data, and carrying out weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data.
S405, re-determining the target characteristic expression of the target multimedia data according to the expression segment after weighting processing.
In steps S404 and S405, when the server obtains the importance degree of each data feature in the target multimedia data, the server may first perform clustering processing on the multiple data features of the target multimedia data to obtain a hierarchical relationship between the multiple data features of the target multimedia data, so as to determine the importance degree between the data features based on the hierarchical relationship, where the importance degree of an upper data feature having the hierarchical relationship is higher than the importance degree of a lower data feature. In one embodiment, in order to perform clustering processing on the data features of the target multimedia data so as to obtain the hierarchical relationship between the plurality of data features of the target multimedia data, the server needs to first obtain a target hierarchical structure for recording different data features, wherein in order to obtain the target hierarchical structure for recording different data features, the server may perform clustering processing on the data features of the sample multimedia data so that the data features are aggregated into different hierarchies.
In one embodiment, in order to cluster the data features of the sample multimedia data and generate a certain hierarchy, the server may cluster the data features of each sample multimedia data in the sample multimedia data set by using a Hierarchical Clustering method, which is an incremental Hierarchical Clustering method, and the incremental Hierarchical Clustering method may be, for example, a COBWEB Clustering method (a frequent pattern-based Clustering method). The COBWEB clustering method can sort sample multimedia data to be clustered into a classification tree one by one, and does not need to input the number of clusters to be generated or the number of layers in advance during clustering, so that the generated hierarchical structure can be dynamically adjusted. The hierarchical clustering is one of clustering methods, and the hierarchical clustering method can aggregate data into clusters on multiple levels of fine granularity, coarse granularity and the like.
When a hierarchical clustering method is adopted to cluster a sample multimedia data set to obtain a target hierarchical structure for recording different data characteristics, an initial hierarchical structure and the sample multimedia data set can be obtained first, the sample multimedia data set comprises a plurality of sample multimedia data and at least one data characteristic included in each sample multimedia data, wherein the initial hierarchical structure at least comprises a top node, and further, according to the at least one data characteristic included in any sample multimedia data, a server can perform structure adjustment on the initial hierarchical structure and take the adjusted initial hierarchical structure as the target hierarchical structure. The COBWEB clustering method mainly constructs a classification tree (i.e., the target hierarchy) through four operations, which are: adding a new cluster, adding the current multimedia data into the cluster, splitting the cluster, and combining the two clusters, that is, the server can perform structural adjustment on the initial hierarchy by using one or more of the following four adjustment operations: adding feature nodes to the initial hierarchy (i.e. adding one new cluster), merging feature nodes of the initial hierarchy (i.e. merging two clusters), splitting feature nodes of the initial hierarchy (i.e. splitting one cluster), and determining the correspondence between any sample multimedia data and the feature path in the initial hierarchy (i.e. adding the current multimedia data to the cluster).
Based on the above four adjustment operations for the initial hierarchical structure, when the server performs the structural adjustment on the initial hierarchical structure according to at least one data feature included in any sample multimedia data, the server may determine, according to at least one data feature included in any sample multimedia data, a classification utility parameter corresponding to any one of the adjustment operations to be performed; and adjusting the initial hierarchical structure according to the adjustment operation corresponding to the maximum classification utility parameter. In one embodiment, the server may determine a classification Utility parameter (CU) corresponding to any one of the adjustment operations according to equation (9), which is defined as:
Figure BDA0002753629490000181
where k is the number of layers (or clusters) in the target hierarchy, A i Features of dimension i, V, representing any sample multimedia data i,j A value C representing j possible conditions of any sample multimedia data under the ith dimension characteristic 1 ,C 2 ,…,C k Is a larger cluster C p The sub-cluster of (1). Specifically, the server may invoke code and adjust an initial hierarchical structure for recording different data characteristics based on each sample multimedia data in the obtained sample multimedia data set, thereby obtaining a target hierarchical structure for recording different data characteristics, wherein the server may invoke coweb pseudo code to perform the adjustment of the initial hierarchical structure, thereby obtaining the target hierarchical structure, and the execution logic of the coweb pseudo code is as shown in fig. 5b, specifically, the pseudo code is invoked to execute the steps s 11-s 19 as follows:
s11, obtaining an initial hierarchical structure;
s12, determining whether the initial hierarchy includes child nodes (or child nodes) in addition to the top node (i.e., root node);
s13, if the initial hierarchy does not include child nodes, i.e. in case the initial hierarchy includes only top level nodes, adding a child node to the top level node, and determining the data characteristics corresponding to the record of the added child node based on the sample multimedia data;
s14, determining whether the adjustment operation for the child node is splitting the child node or adding the data characteristics of all multimedia data to the word node by analyzing the data characteristics of any multimedia data in the sample multimedia data, thereby realizing the adjustment of the initial hierarchical structure;
s15, if the initial hierarchy includes child nodes, calculating classification utility parameters (CU) corresponding to any adjustment operation performed on any sample multimedia data for the included child nodes;
s16, if the CU corresponding to the operation of executing the splitting node is maximum, splitting a node to obtain two nodes;
s17, if the CU corresponding to the operation of executing the merging node is the largest, merging the two nodes;
s18, if the CU corresponding to the operation of deleting the node is the largest, deleting the node;
s19, if the CU maximum for which the operation of adding sample multimedia data to the node is performed, then sample multimedia data is added to the node.
In an embodiment, a target hierarchical structure determined by the server may be as shown in fig. 5c, where the target hierarchical structure includes at least one feature path, one feature path includes at least one feature node, each feature node records a data feature, an edge connecting any two feature nodes in the one feature path indicates that the data features recorded by the any two feature nodes have a hierarchical relationship, and the recorded data features shown in fig. 5c are feature nodes of sports and feature nodes of football, and since the two feature nodes are connected by one edge, the data features recorded by the two feature nodes respectively have a hierarchical relationship. After the server obtains a target hierarchical structure for recording data features, the server may perform clustering processing on a plurality of data features of the target multimedia data, determine a target feature path for describing a hierarchical relationship between the plurality of data features of the target multimedia data from the target hierarchical structure, and further may use, in any two feature nodes connected by edges of the target feature path, a data feature recorded by a low-level feature node as an upper-level data feature and a data feature recorded by a high-level feature node as a lower-level data feature.
Based on the determined target hierarchical structure for recording different data features, the server may determine the importance degree corresponding to each data feature in the target multimedia data based on the hierarchical structure, so as to perform weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data, and thus re-determine the target feature expression of the target multimedia data based on the segment after the weighting processing. In one embodiment, the server needs to define a weight for different levels first in order to be able to combine hash codes on different levels, because, when data searching (or classifying) is performed by using multiple data features of multimedia data, if data searching (or classifying) is performed based on feature nodes of lower levels (closer to the root node), the error probability of subsequent search classification is high, that is, it is more unlikely that similar multimedia data can be correctly searched by performing data searching based on data features recorded by feature nodes of higher levels (far from the root node). As can be seen, the lower the hierarchy of data features corresponds to a higher degree of importance. Then, if the server determines that the target multimedia data corresponds to the target feature path of the target hierarchical mechanism, the server may determine the hierarchical relationship of each data feature in the target multimedia data based on the feature path, and if the level of any data feature is k and the total level is n, the importance degree of any data feature may be represented by equation (10):
Figure BDA0002753629490000191
and k is the level of any data feature of the target multimedia data, and n is the total level determined by clustering all data features of the target multimedia data. In one embodiment, if the trained target model includes an encoding network, a decoding network, and a network structure for generating a target feature expression, as shown in fig. 5d, the server may use any two sample multimedia data x in the sample multimedia data set i And x j Inputting the target model, the coding network of the target model learns the data characteristics of any two multimedia data and the data association relation between any two multimedia data, thereby realizing the purpose of learning the target modelThe encoding network parameter and the decoding network parameter are updated, and then the initial characteristic expression of the target multimedia data generated by the encoding network and the decoding network after training is adopted can be input into the network structure for generating the target characteristic expression, if the initial characteristic expression is z i Then the initial feature expression z can be expressed i Inputting the target feature expression generation network structure to the target feature expression generation network structure, so that the target feature expression generation network structure splits and weights the expression segments of the initial feature expression to obtain a target feature expression of the target multimedia data, where the target feature expression may be, for example, z' i The expression indicated is, alternatively, p' i H-by-h after semantic hashing (semantic hashing) processing of indicated expression i The indicated hash code sequence. The semantic hashing is a hashing algorithm which maps a high-dimensional space vector to a low-dimensional Hamming space and keeps the similarity of the original space vectors so that the Hamming distance of a new space vector reflects the similarity of the original space vectors.
It can be understood that in a data search scenario, the importance degree of the characteristic data feature is positively correlated with the accuracy when data search is performed according to the corresponding data feature; after re-determining the target characteristic expression of the target multimedia data according to the expression segments after weighting processing, the server can also sequentially acquire the expression segments of the corresponding data characteristics according to the sequence from high importance degree to low importance degree; and sequentially determining reference multimedia data with the similarity between the obtained expression segments meeting a preset threshold value, taking the obtained reference multimedia data as search data of the target multimedia data, screening the multimedia data meeting the search requirement based on the expression segments corresponding to the data characteristics of each layer by the server based on hierarchical screening, and comparing only the multimedia data meeting the search requirement of the previous layer in the data characteristic screening of the next layer without comparing the whole amount of multimedia data, so that the data processing pressure of the server can be reduced, and the processing efficiency of the server can be improved.
In the embodiment of the invention, based on the training of the server to the target model, the target model can learn the data association relationship between the multimedia data, so that when the trained hash code is adopted to generate the initial characteristic expression of the multimedia data, the generated information of the initial characteristic expression is richer, meanwhile, because the hierarchical clustering method is fused when the target characteristic expression of the target multimedia data is generated, the expression segments of the learning segments are trained and learned on a plurality of levels, and the expression segments on multiple levels are fused to form the final target characteristic expression of the target multimedia data, so that the semantic information of the target characteristic expression is enriched, the data can be searched (or classified) hierarchically from top to bottom by utilizing a hierarchical structure without comparing the characteristic similarity of all multimedia data in the whole multimedia data set, the time of data search is greatly saved, and the data processing pressure of the server is effectively reduced.
Based on the description of the above data processing method embodiment, an embodiment of the present invention further provides a data processing apparatus, which may be a computer program (including a program code) running in the server. The data processing apparatus may be configured to execute the data processing method as shown in fig. 2 and fig. 4, referring to fig. 6, and the data processing apparatus includes: an acquisition unit 601, a conversion unit 602, a processing unit 603, and a determination unit 604.
An obtaining unit 601, configured to obtain an initial feature expression of target multimedia data to be processed, where the initial feature expression is used to reflect N data features of the target multimedia data, and N is a positive integer greater than or equal to 1;
a conversion unit 602, configured to convert the initial feature expression into a plurality of expression segments, where each expression segment is used to reflect a data feature of the target multimedia data;
a processing unit 603, configured to obtain an importance degree of each data feature in the target multimedia data, and perform weighting processing on an expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data;
a determining unit 604, configured to re-determine a target feature expression of the target multimedia data according to the expression segment after the weighting processing, where the target feature expression is used to obtain multimedia data associated with the target multimedia data.
In an embodiment, the obtaining unit 601 is specifically configured to:
acquiring target multimedia data to be processed and a trained target model;
inputting the target multimedia data into the trained target model, and acquiring an initial characteristic expression of the target multimedia data from the trained target model.
In one embodiment, the apparatus further comprises: an adjustment unit 605.
The obtaining unit 601 is further configured to obtain a sample multimedia data set, where the sample multimedia data set includes at least one sample multimedia data and a data association relationship between any two sample multimedia data;
an adjusting unit 605, configured to adjust a network parameter of the target model according to the data association relationship and the at least one sample multimedia data, so as to obtain a trained target model.
In an embodiment, the adjusting unit 605 is specifically configured to:
calling a coding network of a target model to perform feature analysis on any two sample multimedia data to obtain an initial feature expression of each sample multimedia data and an incidence relation between the initial feature expressions;
calling a decoding network of the target model to perform data reconstruction on the initial characteristic expression to obtain reconstructed multimedia data of each sample multimedia data;
and adjusting the network parameters of the target model according to the direction of reducing the data difference between each sample multimedia data and the corresponding reconstructed multimedia data and the correlation difference between the data correlation and the correlation of the characteristic expression.
In an embodiment, the initial feature expression includes an implicit vector, and the converting unit 602 is specifically configured to:
and carrying out vector conversion on the implicit vectors to obtain a plurality of implicit sub-vectors, and taking the obtained implicit sub-vector as an expression segment.
In an embodiment, the obtaining unit 601 is specifically configured to:
clustering the plurality of data characteristics of the target multimedia data to obtain a hierarchical relationship among the plurality of data characteristics of the target multimedia data;
the hierarchical relationship is used for indicating the importance degree of different data characteristics, and the importance degree of the upper layer data characteristics with the hierarchical relationship is higher than that of the lower layer data characteristics.
In an embodiment, the determining unit 604 is further configured to determine, according to the hierarchical relationship, a hierarchy in which any data feature is located, and a total hierarchy included in the hierarchical relationship;
the determining unit 604 is further configured to use a ratio between a level where the any data feature is located and the total level as an importance degree of the any data feature.
In an embodiment, the determining unit 604 is specifically configured to:
acquiring a target hierarchical structure for recording different data characteristics, wherein the target hierarchical structure comprises at least one characteristic path, one characteristic path comprises at least one characteristic node, each characteristic node records one data characteristic, and an edge connecting any two characteristic nodes in the one characteristic path represents that the data characteristics recorded by any two characteristic nodes have a hierarchical relationship;
clustering a plurality of data features of the target multimedia data, and determining a target feature path for describing the hierarchical relationship among the plurality of data features of the target multimedia data from the target hierarchical structure;
and in any two feature nodes connected with the edge of the target feature path, the data features recorded by the low-level feature nodes are used as the upper-layer data features, and the data features recorded by the high-level feature nodes are used as the lower-layer data features.
In an embodiment, the obtaining unit 601 is specifically configured to:
obtaining an initial hierarchical structure and a sample multimedia data set, wherein the sample multimedia data set comprises a plurality of sample multimedia data and at least one data feature included in each sample multimedia data, and the initial hierarchical structure at least comprises a top node;
and performing structural adjustment on the initial hierarchical structure according to at least one data characteristic included in any sample multimedia data, and taking the adjusted initial hierarchical structure as a target hierarchical structure.
In one embodiment, the adjustment operation performed to structurally adjust the initial hierarchy includes one or more of: adding feature nodes to the initial hierarchical structure, merging the feature nodes of the initial hierarchical structure, splitting the feature nodes of the initial hierarchical structure, and determining the corresponding relation between any sample multimedia data and feature paths in the initial hierarchical structure; the adjusting unit 605 is specifically configured to:
determining a classification utility parameter corresponding to the execution of any one adjustment operation according to at least one data feature included in any sample multimedia data;
and adjusting the initial hierarchical structure according to the adjustment operation corresponding to the maximum classification utility parameter.
In an embodiment, the obtaining unit 601 is further configured to sequentially obtain expression segments of corresponding data features according to an order from the highest importance degree to the lowest importance degree;
the determining unit 604 is further configured to sequentially determine reference multimedia data whose similarity with the obtained expression segments meets a preset threshold, and use the obtained reference multimedia data as search data of the target multimedia data.
In this embodiment of the present invention, after the obtaining unit 601 obtains the initial feature expression of the target multimedia data to be processed, the converting unit 602 may convert the initial feature expression into a plurality of expression segments, so that each expression segment reflects one data feature, and based on the conversion of the initial feature expression of the target multimedia data, the obtaining unit may obtain expression segments under different data features, so as to implement differential representation of the target multimedia data based on different data features. After obtaining the expression segment corresponding to each data feature, further, the processing unit 603 may perform weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature, so that the determining unit 604 obtains the expression segment after weighting processing, and re-determines the target feature expression of the target multimedia data based on the expression segment after weighting processing, and based on weighting processing on the expression segments corresponding to each data feature, the difference between the expression segments corresponding to each data feature may be more obvious, so that the enhancement of the feature difference between different data features of the target multimedia data is achieved, and further, the representation of the target multimedia data by using the target feature expression may be more accurate.
Fig. 7 is a schematic block diagram of a server according to an embodiment of the present invention. The server in the present embodiment as shown in fig. 7 may include: one or more processors 701; one or more input devices 702, one or more output devices 703, and memory 704. The processor 701, the input device 702, the output device 703, and the memory 704 are connected by a bus 705. The memory 704 is used to store a computer program comprising program instructions, and the processor 701 is used to execute the program instructions stored by the memory 704.
The memory 704 may include volatile memory (volatile memory), such as random-access memory (RAM); the memory 704 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the memory 704 may also comprise a combination of the above types of memory.
The processor 701 may be a Central Processing Unit (CPU). The processor 701 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or the like. The PLD may be a field-programmable gate array (FPGA), a General Array Logic (GAL), or the like. The processor 701 may also be a combination of the above structures.
In an embodiment of the present invention, the memory 704 is configured to store a computer program, the computer program includes program instructions, and the processor 701 is configured to execute the program instructions stored in the memory 704, so as to implement the steps of the corresponding methods as described above in fig. 2 and fig. 4.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
acquiring an initial characteristic expression of target multimedia data to be processed, wherein the initial characteristic expression is used for reflecting N data characteristics of the target multimedia data, and N is a positive integer greater than or equal to 1; converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting a data feature of the target multimedia data;
acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data;
and re-determining a target characteristic expression of the target multimedia data according to the expression segment after the weighting processing, wherein the target characteristic expression is used for acquiring multimedia data associated with the target multimedia data.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
acquiring target multimedia data to be processed and a trained target model;
inputting the target multimedia data into the trained target model, and acquiring an initial characteristic expression of the target multimedia data from the trained target model.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
acquiring a sample multimedia data set, wherein the sample multimedia data set comprises at least one sample multimedia data and a data association relation between any two sample multimedia data;
and adjusting the network parameters of the target model according to the data association relation and the at least one sample multimedia data to obtain the trained target model.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
calling a coding network of a target model to perform feature analysis on any two sample multimedia data to obtain an initial feature expression of each sample multimedia data and an incidence relation between the initial feature expressions;
calling a decoding network of the target model to perform data reconstruction on the initial characteristic expression to obtain reconstructed multimedia data of each sample multimedia data;
and adjusting the network parameters of the target model according to the direction of reducing the data difference between each sample multimedia data and the corresponding reconstructed multimedia data and the relevance difference between the data incidence relation and the incidence relation of the characteristic expression.
In one embodiment, the initial feature expression comprises a hidden vector, and the processor 701 is configured to call the program instructions to perform:
and carrying out vector conversion on the implicit vectors to obtain a plurality of implicit sub-vectors, and taking the obtained implicit sub-vector as an expression segment.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
clustering the plurality of data characteristics of the target multimedia data to obtain a hierarchical relationship among the plurality of data characteristics of the target multimedia data;
the hierarchical relationship is used for indicating the importance degree of different data characteristics, and the importance degree of the upper layer data characteristics with the hierarchical relationship is higher than that of the lower layer data characteristics.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
determining the level of any data feature and the total level included by the level relation according to the level relation;
and taking the ratio of the level of any data characteristic to the total level as the importance degree of any data characteristic.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
acquiring a target hierarchical structure for recording different data characteristics, wherein the target hierarchical structure comprises at least one characteristic path, one characteristic path comprises at least one characteristic node, each characteristic node records one data characteristic, and an edge connecting any two characteristic nodes in the one characteristic path represents that the data characteristics recorded by any two characteristic nodes have a hierarchical relationship;
clustering a plurality of data features of the target multimedia data, and determining a target feature path for describing the hierarchical relationship among the plurality of data features of the target multimedia data from the target hierarchical structure;
and in any two feature nodes connected with the edge of the target feature path, the data features recorded by the low-level feature nodes are used as the upper-layer data features, and the data features recorded by the high-level feature nodes are used as the lower-layer data features.
In one embodiment, the processor 701 is configured to call the program instructions to perform:
obtaining an initial hierarchical structure and a sample multimedia data set, wherein the sample multimedia data set comprises a plurality of sample multimedia data and at least one data feature included in each sample multimedia data, and the initial hierarchical structure at least comprises a top node;
and performing structural adjustment on the initial hierarchical structure according to at least one data characteristic included in any sample multimedia data, and taking the adjusted initial hierarchical structure as a target hierarchical structure.
In one embodiment, the adjustment operation performed to structurally adjust the initial hierarchy includes one or more of: adding feature nodes to the initial hierarchical structure, merging the feature nodes of the initial hierarchical structure, splitting the feature nodes of the initial hierarchical structure, and determining the corresponding relation between any sample multimedia data and a feature path in the initial hierarchical structure; the processor 701 is configured to call the program instructions for performing:
the performing a structural adjustment on the initial hierarchical structure according to at least one data feature included in any sample multimedia data comprises:
determining a classification utility parameter corresponding to any one adjustment operation according to at least one data characteristic included in any sample multimedia data;
and adjusting the initial hierarchical structure according to the adjustment operation corresponding to the maximum classification utility parameter.
In one embodiment, the importance of the data features positively correlates with the accuracy of a data search performed according to the corresponding data features; the processor 701 is configured to invoke the program instructions for performing:
sequentially acquiring expression fragments of corresponding data characteristics according to the sequence of the importance degrees from high to low;
and sequentially determining reference multimedia data of which the similarity with the obtained expression segments meets a preset threshold, and taking the obtained reference multimedia data as the search data of the target multimedia data.
Embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method embodiments as shown in fig. 2 and 4. The computer-readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described with reference to a particular embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (14)

1. A data processing method, comprising:
acquiring an initial characteristic expression of target multimedia data to be processed, wherein the initial characteristic expression is used for reflecting N data characteristics of the target multimedia data, and N is a positive integer greater than or equal to 1;
converting the initial feature expression into a plurality of expression segments, wherein each expression segment is used for reflecting one data feature of the target multimedia data;
acquiring the importance degree of each data feature in the target multimedia data, and performing weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data; the importance degree of different data characteristics in the target multimedia data is determined based on the precision degree of the data characteristics for dividing the target multimedia data; or the importance degree of different data characteristics in the target multimedia data is determined based on the hierarchical relationship of the data characteristics;
and recombining and determining a target characteristic expression of the target multimedia data according to the expression segments after weighting, wherein the target characteristic expression is used for acquiring multimedia data associated with the target multimedia data.
2. The method of claim 1, wherein obtaining the initial feature expression of the target multimedia data to be processed comprises:
acquiring target multimedia data to be processed and a trained target model;
inputting the target multimedia data into the trained target model, and acquiring an initial characteristic expression of the target multimedia data from the trained target model.
3. The method of claim 2, further comprising:
acquiring a sample multimedia data set, wherein the sample multimedia data set comprises at least one sample multimedia data and a data association relation between any two sample multimedia data;
and adjusting the network parameters of the target model according to the data association relation and the at least one sample multimedia data to obtain the trained target model.
4. The method of claim 3, wherein said adjusting network parameters of said object model based on said data association and said at least one sample multimedia data comprises:
calling a coding network of a target model to perform feature analysis on any two sample multimedia data to obtain an initial feature expression of each sample multimedia data and an incidence relation between the initial feature expressions;
calling a decoding network of the target model to perform data reconstruction on the initial characteristic expression to obtain reconstructed multimedia data of each sample multimedia data;
and adjusting the network parameters of the target model according to the direction of reducing the data difference between each sample multimedia data and the corresponding reconstructed multimedia data and the relevance difference between the data incidence relation and the incidence relation of the characteristic expression.
5. The method of claim 1, wherein the initial feature expression comprises a hidden vector, and wherein converting the initial feature expression into a plurality of expression segments comprises:
and carrying out vector conversion on the implicit vectors to obtain a plurality of implicit sub-vectors, and taking the obtained implicit sub-vector as an expression segment.
6. The method of claim 1, wherein the obtaining the importance of each data feature in the target multimedia data comprises:
clustering the plurality of data characteristics of the target multimedia data to obtain a hierarchical relationship among the plurality of data characteristics of the target multimedia data;
the hierarchical relationship is used for indicating the importance degree of different data characteristics, and the importance degree of the upper layer data characteristics with the hierarchical relationship is higher than that of the lower layer data characteristics.
7. The method of claim 6, further comprising:
determining the level of any data feature and the total level included by the level relation according to the level relation;
and taking the ratio of the level of any data characteristic to the total level as the importance degree of any data characteristic.
8. The method of claim 6, wherein clustering the plurality of data features of the target multimedia data to obtain a hierarchical relationship between the plurality of data features of the target multimedia data comprises:
acquiring a target hierarchical structure for recording different data characteristics, wherein the target hierarchical structure comprises at least one characteristic path, one characteristic path comprises at least one characteristic node, each characteristic node records one data characteristic, and an edge connecting any two characteristic nodes in the one characteristic path represents that the data characteristics recorded by any two characteristic nodes have a hierarchical relationship;
clustering a plurality of data features of the target multimedia data, and determining a target feature path for describing the hierarchical relationship among the plurality of data features of the target multimedia data from the target hierarchical structure;
and in any two feature nodes connected by the edge of the target feature path, the data feature recorded by the low-level feature node is used as the upper-layer data feature, and the data feature recorded by the high-level feature node is used as the lower-layer data feature.
9. The method of claim 8, wherein obtaining a target hierarchy for recording different data characteristics comprises:
obtaining an initial hierarchical structure and a sample multimedia data set, wherein the sample multimedia data set comprises a plurality of sample multimedia data and at least one data feature included in each sample multimedia data, and the initial hierarchical structure at least comprises a top node;
and performing structural adjustment on the initial hierarchical structure according to at least one data characteristic included in any sample multimedia data, and taking the adjusted initial hierarchical structure as a target hierarchical structure.
10. The method of claim 9, wherein performing a structural adjustment of the initial hierarchy comprises one or more of: adding feature nodes to the initial hierarchical structure, merging the feature nodes of the initial hierarchical structure, splitting the feature nodes of the initial hierarchical structure, and determining the corresponding relation between any sample multimedia data and feature paths in the initial hierarchical structure;
the performing a structural adjustment on the initial hierarchical structure according to at least one data feature included in any sample multimedia data comprises:
determining a classification utility parameter corresponding to any one adjustment operation according to at least one data characteristic included in any sample multimedia data;
and adjusting the initial hierarchical structure according to the adjustment operation corresponding to the maximum classification utility parameter.
11. The method of claim 1, wherein the importance of the data features positively correlates with the accuracy of a data search performed on the basis of the corresponding data features; after the target feature expression of the target multimedia data is re-determined according to the expression segments after the weighting processing, the method further includes:
sequentially obtaining expression fragments of corresponding data characteristics according to the sequence of the importance degrees from high to low;
and sequentially determining reference multimedia data of which the similarity with the obtained expression segments meets a preset threshold, and taking the obtained reference multimedia data as the search data of the target multimedia data.
12. A data processing apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an initial characteristic expression of target multimedia data to be processed, the initial characteristic expression is used for reflecting N data characteristics of the target multimedia data, and N is a positive integer greater than or equal to 1;
a conversion unit, configured to convert the initial feature expression into a plurality of expression segments, where each expression segment is used to reflect a data feature of the target multimedia data;
the processing unit is used for acquiring the importance degree of each data feature in the target multimedia data and carrying out weighting processing on the expression segment corresponding to each data feature according to the importance degree of each data feature in the target multimedia data; the importance degree of different data characteristics in the target multimedia data is determined based on the precision degree of the data characteristics for dividing the target multimedia data; or the importance degree of different data characteristics in the target multimedia data is determined based on the hierarchical relationship of the data characteristics;
and the determining unit is used for recombining and determining a target characteristic expression of the target multimedia data according to the expression segments after the weighting processing, wherein the target characteristic expression is used for acquiring the multimedia data associated with the target multimedia data.
13. A server comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 11.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 11.
CN202011200592.3A 2020-10-30 2020-10-30 Data processing method, device, server and storage medium Active CN112329933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011200592.3A CN112329933B (en) 2020-10-30 2020-10-30 Data processing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011200592.3A CN112329933B (en) 2020-10-30 2020-10-30 Data processing method, device, server and storage medium

Publications (2)

Publication Number Publication Date
CN112329933A CN112329933A (en) 2021-02-05
CN112329933B true CN112329933B (en) 2022-09-27

Family

ID=74324136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011200592.3A Active CN112329933B (en) 2020-10-30 2020-10-30 Data processing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN112329933B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254989B (en) * 2021-04-27 2022-02-15 支付宝(杭州)信息技术有限公司 Fusion method and device of target data and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019154262A1 (en) * 2018-02-07 2019-08-15 腾讯科技(深圳)有限公司 Image classification method, server, user terminal, and storage medium
CN110955789A (en) * 2019-12-31 2020-04-03 腾讯科技(深圳)有限公司 Multimedia data processing method and equipment
CN111241311A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Media information recommendation method and device, electronic equipment and storage medium
CN111563551A (en) * 2020-04-30 2020-08-21 支付宝(杭州)信息技术有限公司 Multi-mode information fusion method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019154262A1 (en) * 2018-02-07 2019-08-15 腾讯科技(深圳)有限公司 Image classification method, server, user terminal, and storage medium
CN110955789A (en) * 2019-12-31 2020-04-03 腾讯科技(深圳)有限公司 Multimedia data processing method and equipment
CN111241311A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Media information recommendation method and device, electronic equipment and storage medium
CN111563551A (en) * 2020-04-30 2020-08-21 支付宝(杭州)信息技术有限公司 Multi-mode information fusion method and device and electronic equipment

Also Published As

Publication number Publication date
CN112329933A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN111708873B (en) Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium
Cao et al. Deep neural networks for learning graph representations
CN112966074B (en) Emotion analysis method and device, electronic equipment and storage medium
CN110532353B (en) Text entity matching method, system and device based on deep learning
US8280915B2 (en) Binning predictors using per-predictor trees and MDL pruning
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN114048350A (en) Text-video retrieval method based on fine-grained cross-modal alignment model
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN111026869A (en) Method for predicting multi-guilty names by using sequence generation network based on multilayer attention
CN115393692A (en) Generation formula pre-training language model-based association text-to-image generation method
CN112115716A (en) Service discovery method, system and equipment based on multi-dimensional word vector context matching
CN112329933B (en) Data processing method, device, server and storage medium
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN113920379A (en) Zero sample image classification method based on knowledge assistance
Jeyakarthic et al. Optimal bidirectional long short term memory based sentiment analysis with sarcasm detection and classification on twitter data
CN110941958A (en) Text category labeling method and device, electronic equipment and storage medium
CN116958622A (en) Data classification method, device, equipment, medium and program product
CN114490954A (en) Document level generation type event extraction method based on task adjustment
CN114329181A (en) Question recommendation method and device and electronic equipment
CN111708745A (en) Cross-media data sharing representation method and user behavior analysis method and system
Gabralla et al. Deep learning for document clustering: a survey, taxonomy and research trend
Alali A novel stacking method for multi-label classification
CN115688771B (en) Document content comparison performance improving method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40038853

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant