WO2018145604A1 - 样本选择方法、装置及服务器 - Google Patents

样本选择方法、装置及服务器 Download PDF

Info

Publication number
WO2018145604A1
WO2018145604A1 PCT/CN2018/075114 CN2018075114W WO2018145604A1 WO 2018145604 A1 WO2018145604 A1 WO 2018145604A1 CN 2018075114 W CN2018075114 W CN 2018075114W WO 2018145604 A1 WO2018145604 A1 WO 2018145604A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
training
data
similarity
pairs
Prior art date
Application number
PCT/CN2018/075114
Other languages
English (en)
French (fr)
Inventor
黄圣君
高能能
袁坤
陈伟
王迪
Original Assignee
南京航空航天大学
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京航空航天大学, 腾讯科技(深圳)有限公司 filed Critical 南京航空航天大学
Priority to EP18750641.5A priority Critical patent/EP3582144A4/en
Publication of WO2018145604A1 publication Critical patent/WO2018145604A1/zh
Priority to US16/353,754 priority patent/US10885390B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • G06V10/7788Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being a human, e.g. interactive learning with a human teacher
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present application relates to the field of metric learning technologies, and in particular, to a sample selection method, apparatus, and server.
  • Metric learning refers to automatically learning a distance metric that reasonably describes the semantic similarity between two objects from a sample pair with labeled relevance. It is a machine learning technique commonly used in the field of image retrieval.
  • metric learning of cross-modal data
  • a training sample set is pre-built, the training sample set includes a large number of training samples, each training sample includes a set of sample pairs with labeled correlations, and then the training is adopted.
  • the sample set trains a metric model that measures the correlation between a set of cross-modal data.
  • the training sample set is usually constructed by randomly sampling the sample pairs from the unlabeled sample set as training samples, and the selected training samples are assigned to the labeling personnel for correlation labeling.
  • the quality of the training samples selected by the random sampling method is low, which results in the accuracy of the measurement model obtained in the final training.
  • the training efficiency of the model is low, and the labeling of the training samples requires more time and cost.
  • the embodiment of the present application provides a sample selection method, device, and server, which are used to solve the prior art.
  • the accuracy of the existing model is low, and the training efficiency of the model is low.
  • the labeling of training samples requires a lot of time and cost.
  • a sample selection method for use in a server, the method comprising:
  • n sets of sample pairs are selected from the unlabeled sample set, each set of sample pairs includes two samples, each sample including p modal data, the n being a positive integer, and the p being an integer greater than one;
  • a sample selection device for use in a server, the device comprising:
  • a first calculating module configured to calculate, for each of the n sets of sample pairs, data of each modality of one sample included in the sample pair and each modality of another sample Partial similarity between data, obtaining p ⁇ p partial similarities;
  • a second calculating module configured to calculate an overall similarity between the two samples included in the sample pair according to the p ⁇ p partial similarities
  • a third calculating module configured to acquire a degree of difference between the p ⁇ p partial similarities and the overall similarity
  • a selection module configured to select a training sample from a sample pair of the n sets of sample pairs that meet a preset condition; wherein the preset condition satisfies the difference degree is greater than a first threshold and the overall similarity is less than a second Threshold.
  • a server comprising a processor and a memory, the memory storing a computer program loaded by the processor and executed to implement sample selection as described in the above aspect method.
  • a computer readable storage medium having stored thereon a computer program loaded by a processor and executed to implement a sample selection method as described in the above aspects.
  • a computer program product for performing the sample selection method of the above aspects when the computer program product is executed.
  • the quality of training samples can be significantly improved.
  • the most valuable sample pair refers to a sample pair that has a positive effect on improving the accuracy of the metric model, such as a sample pair that the metric model has not learned in the previous training process. Since the quality of the training sample selected by the embodiment of the present application is higher than that of the prior art, the following advantages are obtained: First, in the case of selecting an equal number of training samples, the training selected by the method provided by the embodiment of the present application is selected. The accuracy of the measurement model obtained by the sample training is higher.
  • the technical solution provided by the embodiment of the present application can train a higher-precision measurement model with fewer training samples by selecting a high-quality training sample training metric model.
  • FIG. 1 is a flowchart of a sample selection method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a sample selection method provided by another embodiment of the present application.
  • FIG. 3 is a flowchart of a model training process provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an annotation interface according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a search interface according to an embodiment of the present application.
  • FIG. 7 is a block diagram of a sample selection apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
  • the active learning technology is applied to the metric learning for the cross-modal data, and the training sample is significantly improved by actively selecting the most valuable sample pair for correlation labeling and training for the training sample.
  • Quality in order to improve the accuracy of the measurement model obtained by the final training, improve the training efficiency of the measurement model, and reduce the technical cost of the time and cost required to mark the training samples.
  • the method provided by the embodiment of the present application may be a server.
  • the server can be a server, a server cluster consisting of several servers, or a cloud computing service center.
  • the technical solution provided by the embodiment of the present application can be applied to the field of cross-modal data retrieval, for example, the search field of two modal data across images and texts.
  • the metric model provided by the embodiment of the present application can accurately calculate the correlation between the same modal data and the cross-modal data, thereby achieving the purpose of accurately feeding back the content that needs to be retrieved to the user.
  • FIG. 1 shows a flowchart of a sample selection method provided by an embodiment of the present application.
  • the method can include the following steps:
  • Step 101 Select n sets of sample pairs from the unlabeled sample set, each set of sample pairs includes two samples, each sample includes p modal data, n is a positive integer, and p is an integer greater than 1.
  • the unlabeled sample set includes multiple samples, each of which includes multiple modal data. For each of the p modal data included in each sample, there is a correlation between the data of the p modalities.
  • each sample includes data of both modal and textual text, where the text is textual information used to describe the image, such as text for introducing the content of the image.
  • the division angle of the modality is not limited.
  • the modality is divided from a data type perspective, and the data of different modalities may be data of different data types, such as images and text, audio and text, video and text, images, and audio, and the like.
  • the modality is divided from the perspective of data content, and the data of different modalities may be data of different data contents.
  • data about physical health status may be classified into the following different modes: blood pressure data , blood glucose data, ECG data, weight data, etc.
  • n sets of sample pairs are selected from the unlabeled sample set using random sampling.
  • an unlabeled sample set is represented by a set U, and a set U includes N 1 samples.
  • n sets of sample pairs are selected from N 1 samples by random sampling, and the n sets of sample pairs constitute a candidate training sample set, and the candidate training sample set is represented by a set P.
  • each sample includes data for both modal and textual modalities.
  • Step 102 Calculate, for each set of sample pairs of n sets of sample pairs, a partial similarity between data of each modality of one sample included in the sample pair and data of each modality of another sample, P ⁇ p partial similarities are obtained.
  • the calculated p ⁇ p partial similarities include: similarity between p identical modal data, and similarity between p ⁇ p-p cross-modal data.
  • the similarity between the same modal data is the similarity between the image of the first sample and the image of the second sample, the similarity between the text of the first sample and the text of the second sample, respectively, and the other two
  • the similarity between the cross-modal data is the similarity between the image of the first sample and the text of the second sample, the similarity between the text of the first sample and the image of the second sample, respectively.
  • step 102 includes the following sub-steps:
  • Step 102a for each set of sample pairs of the n sets of sample pairs, extracting features from the data of each modality of each sample included in the sample;
  • the extracted features are different for the data of different modalities, and may be preset according to actual application requirements, which is not limited in this embodiment.
  • the features extracted from each of the samples include image features and text features.
  • the image feature is extracted from the image included in the sample, and the image feature can be extracted in a digital image processing textbook.
  • the image feature includes, but is not limited to, at least one of a color feature, a texture feature, and a shape feature.
  • the text features are extracted from the text included in the sample.
  • the text feature extraction method can adopt the classical method in the field of natural language processing.
  • the text features include but are not limited to TF-IDF (Term Frequency–Inverse Document Frequency).
  • LDA Latent Dirichlet Allocation
  • the sample is represented by o
  • the image feature extracted from the sample o is
  • the text feature extracted from sample o is
  • step 102b p ⁇ p partial similarities are calculated according to the characteristics of the data of each modality of one sample included in the sample and the characteristics of the data of each modality of the other sample.
  • the trained measurement model is obtained by training the already constructed training sample set.
  • the set of training samples that have been constructed is represented by a set L, which includes N 2 training samples, each of which includes a set of sample pairs with labeled correlations.
  • the high quality training samples are selected by the method flow provided by the embodiment, and the training sample set L is updated by using the high quality training samples, and then the updated training training set L is used to re-run the metric model. Training to achieve the goal of optimizing the measurement model.
  • the initial metric model can be trained from the initial training sample set.
  • the initial training sample set may contain a small number of training samples, which may be randomly sampled from the unlabeled sample set, and the selected training samples are assigned to the labeling personnel for correlation labeling and used for training the initial Metric model.
  • the metric model is represented by M, the metric model M is a matrix of k rows and k columns, and k represents the sum of the number of dimensions (ie, the number of terms) of the features of the p modal data included in one sample, and k is greater than An integer of 1.
  • the image features extracted from the sample o are The text feature extracted from sample o is Then the metric model M is a matrix of (D x + D z ) rows (D x + D z ) columns.
  • the metric model M can be thought of as consisting of p ⁇ p sub metric models. Each sub-metric model is used to calculate a partial similarity based on the characteristics of one modal data of one object and the characteristics of one modal data of another object.
  • the metric model M is used to calculate the overall similarity between two objects based on p x p partial similarities.
  • the above object may be one sample in the training sample, or may be data input when the correlation model is used to measure the correlation.
  • each sample includes data of two modes of image and text
  • the image features extracted from sample o are The text feature extracted from sample o is Then the measurement model M can be expressed as:
  • the measurement model M can be regarded as composed of four sub-metric models, namely M 11 , M 12 , M 21 (ie ) and M 22 .
  • the sub-metric model M 11 is a matrix of D x rows and D x columns for calculating the similarity between the images of the two objects according to the image features of one object and the image features of the other object;
  • the sub-metric model M 12 a matrix of D x rows and D z columns for calculating the similarity between the image of one object and the text of another object according to the image feature of one object and the text feature of another object;
  • the sub-metric model M 21 is D a matrix of z- row D x columns for calculating the similarity between the text of one object and the image of another object according to the text feature of one object and the image feature of another object;
  • the sub-metric model M 22 is a D z row A matrix of Dz columns for calculating the similarity between the texts of the two objects based on the text characteristics of one object and the text characteristics of
  • Step 103 Calculate the overall similarity between the two samples included in the sample pair according to the p ⁇ p partial similarities.
  • the overall similarity is represented by S M .
  • Sample calculations for two samples (i o, o J) o comprises a global similarity between S i and o j M (o i, o j) , for example, calculated as follows:
  • x i represents the image feature of the sample o i
  • z i represents the text feature of the sample o i
  • x j represents the image feature of the sample o j
  • z j represents the text feature of the sample o j .
  • the overall similarity between the two samples included in the sample pair is still calculated from the p ⁇ p partial similarities using the already trained metric model M.
  • Step 104 Obtain a degree of difference between p ⁇ p partial similarities and overall similarity.
  • the degree of difference can be determined by various measures such as variance, residual, and the like.
  • S M represents the overall similarity between the two samples o i and o j included in the sample pair (o i , o j ), that is, S M (o i , o j ), here abbreviated as S M ; with Representing the partial similarity between the two samples o i and o j included in the sample pair (o i , o j ), that is, with Abbreviated here as with The degree of difference var(o i ,o j ) takes into account the partial similarity between the same modal data and the cross-modal data, and reflects the degree of inconsistency between the similarity of each part and the overall similarity.
  • Step 105 Select a training sample from the sample pairs of the n sets of sample pairs that meet the preset condition; wherein the preset condition satisfies the difference degree greater than the first threshold and the overall similarity is less than the second threshold.
  • the most valuable sample pairs are selected accordingly as training samples.
  • the most valuable sample pair refers to a sample pair that has a positive effect on improving the accuracy of the metric model, such as a sample pair that the metric model has not learned in the previous training process.
  • the purpose of selecting a high-quality training sample can be achieved.
  • step 105 includes selecting, from the n sets of sample pairs, a sample pair having a degree of difference greater than a first threshold and an overall similarity being less than a second threshold as a training sample.
  • the value of the first threshold may be preset according to actual requirements. If the degree of difference of the sample pairs to be selected is larger, the value of the first threshold is larger.
  • the value of the second threshold may also be preset according to actual requirements. If the corresponding overall similarity of the sample pairs to be selected is smaller, the value of the second threshold is smaller.
  • step 105 includes the following sub-steps:
  • Step 105a For each pair of sample pairs in the n sets of sample pairs, calculate the corresponding information amount of the sample pair according to the corresponding overall similarity and degree of difference of the sample pair;
  • Step 105b Select a sample pair whose information amount is greater than a third threshold from the n sets of sample pairs as a training sample.
  • the value of the third threshold may be preset according to actual needs. If the degree of difference between the selected sample pairs is larger and the overall similarity is smaller, the value of the third threshold is larger.
  • the n pairs of sample pairs are sorted according to the order of the information amount to obtain a sample pair sequence, from the sample pair sequence
  • the first m sample pairs are selected as training samples, and m is a positive integer.
  • the value of m can be preset according to the number of training samples selected in actual need.
  • the method provided by the embodiment applies the active learning technology to the metric learning for the cross-modal data, and the quality of the training sample can be significantly improved by actively selecting the most valuable sample pair as the training sample. Since the quality of the training sample selected by the embodiment of the present application is higher than that of the prior art, the following advantages are obtained: First, in the case of selecting an equal number of training samples, the training selected by the method provided by the embodiment of the present application is selected. The accuracy of the measurement model obtained by the sample training is higher. Secondly, in order to obtain the measurement model of the same precision, the number of training samples required by the method provided by the embodiment of the present application is less, which helps to improve the model. Training efficiency, reducing the time and cost of marking training samples. Therefore, compared with the prior art, the technical solution provided by the embodiment of the present application can train a higher-precision measurement model with fewer training samples by selecting a high-quality training sample training metric model.
  • the above steps 105 further include the following steps.
  • Step 106 Acquire a correlation corresponding to the training samples marked by the labeling personnel, and the correlation corresponding to the training samples is used to indicate that the two samples included in the training sample are related or unrelated.
  • the labeling personnel After screening the high-quality training samples by the embodiment shown in FIG. 1 , the labeling personnel need to perform correlation labeling on the training samples, that is, the labeling personnel judge that the two samples included in the training samples are related or unrelated, and are marked. The result of the judgment. Markers can be considered as experts, and their relevance is highly accurate and authoritative.
  • the training samples selected from the n pairs of sample pairs constitute a training sample set to be labeled
  • the training sample set to be labeled is represented by the set Q
  • each training sample in the set Q is provided to the labeling personnel for relevance labeling.
  • the server sends the training sample set Q to be marked to the user equipment corresponding to the labeling personnel.
  • the user equipment After receiving the training sample set Q to be labeled, the user equipment displays each training sample to obtain corresponding training samples marked by the labeling personnel. Relevance and sent to the server.
  • step 107 a training sample is added to the training sample set.
  • the training samples with the labeled correlations are added to the training sample set L, and the training sample set L is updated.
  • Step 108 training a metric model using a training sample set, the metric model is used to measure the correlation between two objects, each object including at least one modal data.
  • the metric model M is retrained using the updated training sample set L to achieve the purpose of optimizing the accuracy of the metric model M.
  • the metric model can be retrained multiple times to finally train a high-precision metric model.
  • a small number of high quality training samples are selected from the unlabeled sample set, and the existing training samples and the newly selected training samples are retrained to obtain a more accurate metric model.
  • a small number of high quality training samples are again selected from the unlabeled sample set, and the existing training samples and the newly selected training samples are retrained to obtain a more accurate measurement model. Loop until the re-trained metric model's accuracy meets the requirements.
  • step 108 includes the following sub-steps:
  • step 108a the metric model is initialized.
  • the metric model M is a matrix of k rows and k columns, k represents the sum of the number of dimensions (i.e., the number of terms) of the features of the p modal data included in one sample, and k is an integer greater than 1.
  • the metric model M is initialized to an identity matrix.
  • the Accelerated Proximal Gradient (APG) method is used to optimize the target function corresponding to the metric model M, and the search sequence Q corresponding to the metric model M needs to be initialized.
  • the search sequence Q is a temporary variable of the metric model M in the process of optimizing the objective function, which is used to record the suboptimal solution of the metric model M, and the optimal solution of the metric model M can be calculated according to the search sequence Q.
  • the search sequence Q is also a matrix of k rows and k columns.
  • the APG method is used to optimize the objective function, which can speed up the optimization process of the objective function.
  • the metric model M is initialized to an identity matrix
  • the search sequence Q is initialized to a matrix of all zeros.
  • each sample includes data of two modes of image and text
  • the image features extracted from the sample o are The text feature extracted from sample o is Then the metric model M and the search sequence Q are each a matrix of (D x + D z ) ⁇ (D x + D z ) size.
  • training sample set L can be done as follows:
  • the correlation of the training samples ie, the sample pairs (o i , o j )
  • the sample pairs (o i , o j ) are added to the set S, and the correlation y ij is +1. If the correlation of the sample pairs (o i , o j ) is irrelevant, the sample pairs (o i , o j ) are added to the set D, and the correlation y ij takes a value of -1.
  • step 108b the objective function corresponding to the metric model is optimized by using the training sample set to obtain an optimized objective function.
  • the objective function is:
  • w ij represents the weight of the sample pair (o i , o j )
  • y ij represents the correlation of the sample pair (o i , o j )
  • S M (o i , o j ) represents the sample pair (o i , o j ) corresponds to the overall similarity.
  • the overall similarity between the sample pairs (o i , o j ) is calculated by using a bilinear similarity measure function of the following form:
  • the correlation of the annotation is set to the weight w ij corresponding to the related sample pair (o i , o j ).
  • indicates the number of elements in the set, that is,
  • * represents the kernel norm of the matrix M.
  • the regularization of the kernel norm is applied to the matrix M in order to learn the relationship between the different modal data.
  • the objective function can be abbreviated as:
  • the APG method is used to optimize the objective function, and the optimized objective function is:
  • Step 108c Determine an augmentation matrix corresponding to the metric model according to the optimized objective function.
  • step 108d the singular value decomposition is performed on the ⁇ -wide matrix corresponding to the metric model, and the singular value decomposition result is obtained.
  • U is a (D x + D z ) ⁇ (D x + D z ) order ⁇ matrix
  • is a semi-positive definite (D x + D z ) ⁇ (D x + D z ) order diagonal matrix
  • V T is The conjugate transpose of V is a (D x + D z ) ⁇ (D x + D z ) order ⁇ matrix.
  • the diagonal element of ⁇ is expressed as ⁇ ii , i is The ith singular value.
  • Step 108e Update the metric model according to the singular value decomposition result to obtain the updated metric model.
  • the metric model M and the search sequence Q are updated according to the following formula:
  • ⁇ 1 1
  • M t represents the metric model M before update
  • M t+1 represents the updated metric model M
  • Q t represents the search sequence Q before update
  • Q t+1 represents the updated search sequence Q.
  • step 108f it is judged whether the updated metric model reaches the preset stop training condition; if not, the execution starts from the above step 108b; if yes, the process ends.
  • the preset stop training condition includes at least one of the following: the number of iterations reaches a preset value, and the measurement model M does not change any more.
  • the above preset value can be set in advance after considering the training precision and speed of the model. If the training accuracy of the model is high, a larger value can be taken. If the training speed of the model is higher, a smaller value can be adopted.
  • the metric model is retrained by using the updated training sample set, so that the accuracy of the metric model is optimized.
  • FIG. 4 shows a flowchart of a model optimization process provided by an embodiment of the present application.
  • the model optimization process includes the following steps:
  • Step 401 constructing an initial training sample set
  • the initial training sample set may contain a small number of training samples, which may be randomly sampled from the unlabeled sample set, and the selected training samples are assigned to the labeling personnel for correlation labeling and used for training the initial Metric model.
  • Step 402 training a metric model by using a training sample set
  • Step 403 output a metric model
  • Step 404 verifying the accuracy of the metric model by using the verification sample set
  • the verification sample set includes at least one verification sample, and each verification sample includes a set of sample pairs with labeled correlation.
  • the metric model is used to predict the correlation between the sample pairs included in the verification sample, and the correlation between the predicted correlation and the annotation is compared to determine whether the prediction result is accurate.
  • Step 405 it is determined whether the accuracy of the metric model meets the requirements; if yes, the process ends; if not, then perform the following step 406;
  • determining whether the accuracy of the metric model is greater than or equal to a preset accuracy threshold if the accuracy of the metric model is greater than or equal to a preset accuracy threshold, determining that the requirement is met; if the accuracy of the metric model is less than a preset
  • the accuracy threshold is determined to have not met the requirements.
  • the accuracy threshold may be set in advance according to the accuracy requirement of the metric model, and the higher the accuracy requirement, the greater the accuracy threshold is set.
  • Step 406 The active learning technique is used to select high-quality training samples from the unlabeled sample set, and the selected training samples are added to the training sample set by the labeling personnel for relevance labeling; and step 406 is followed by step 402.
  • the technical solution provided by the embodiment of the present application can be applied to the field of cross-modal data retrieval, for example, the search field of two modal data across images and texts.
  • the use of active learning technology to select the most valuable sample pairs as training samples, and to be labeled by professional labelers, can reduce the cost of labeling, and can effectively train accurate measurement models.
  • the related information of a public number usually includes two types of cross-modal data such as an image (such as an icon of a public number) and a text (such as an introduction of a public number).
  • the unlabeled sample set such as the public number Nanjing property market and Jiangbei property market.
  • the labeling personnel perform the correlation labeling, and the corresponding labeling interface can be exemplarily shown in FIG. 5.
  • the trained training samples are then added to the training sample set, and the updated training sample set is used to retrain the measurement model. If the accuracy of the trained metric model does not meet the requirements, continue to select the training sample to update the training sample set and retrain the metric model again. If the accuracy of the trained metric model meets the requirements, then a precise metric model has been obtained. Then, using the metric model, the public number associated with the search information can be retrieved according to the search information input by the user.
  • the search interface can be exemplarily shown in FIG. 6.
  • FIG. 7 shows a block diagram of a sample selection device provided by an embodiment of the present application.
  • the apparatus has a function of implementing the above-described method examples, and the functions may be implemented by hardware or by hardware to execute corresponding software.
  • the apparatus can include a selection module 710, a first calculation module 720, a second calculation module 730, a third calculation module 740, and a selection module 750.
  • the module 710 is configured to perform the above step 101.
  • the first calculating module 720 is configured to perform step 102 above.
  • the second calculating module 730 is configured to perform step 103 above.
  • the third calculating module 740 is configured to perform the above step 104.
  • the selection module 750 is configured to perform the above step 105.
  • the selection module 750 includes: a calculation unit and a selection unit.
  • the calculation unit is configured to perform the above step 105a.
  • the selection unit is configured to perform the above step 105b.
  • the first calculating module 720 includes: an extracting unit and a calculating unit.
  • the extracting unit is configured to perform the above step 102a.
  • the calculation unit is configured to perform the above step 102b.
  • the device further includes: an obtaining module, an adding module, and a training module.
  • the acquisition module is configured to perform step 106 above.
  • the adding module is used to perform the above step 107.
  • the training module is configured to perform the above step 108.
  • the training module includes: an initialization unit, an optimization unit, a determination unit, a decomposition unit, an update unit, and a determination unit.
  • the initialization unit is configured to perform the above step 108a.
  • the optimization unit is operative to perform step 108b above.
  • the determining unit is configured to perform the above step 108c.
  • the decomposition unit is configured to perform the above step 108d.
  • the update unit is configured to perform the above step 108e.
  • the judging unit is for performing the above step 108f.
  • FIG. 8 is a schematic structural diagram of a server provided by an embodiment of the present application. This server is used to implement the method provided in the above embodiments. Specifically:
  • the server 800 includes a central processing unit (CPU) 801, a system memory 804 including a random access memory (RAM) 802 and a read only memory (ROM) 803, and a system bus 805 that connects the system memory 804 and the central processing unit 801.
  • the server 800 also includes a basic input/output system (I/O system) 806 that facilitates transfer of information between various devices within the computer, and mass storage for storing the operating system 813, applications 814, and other program modules 815.
  • I/O system basic input/output system
  • the basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse or keyboard for user input of information.
  • the display 808 and input device 809 are both connected to the central processing unit 801 via an input and output controller 810 that is coupled to the system bus 805.
  • the basic input/output system 806 can also include an input output controller 810 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus.
  • input and output controller 810 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 807 is connected to the central processing unit 801 by a mass storage controller (not shown) connected to the system bus 805.
  • the mass storage device 807 and its associated computer readable medium provide non-volatile storage for the server 800. That is, the mass storage device 807 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • the computer readable medium can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the server 800 can also be operated by a remote computer connected to the network through a network such as the Internet. That is, the server 800 can be connected to the network 812 through a network interface unit 811 connected to the system bus 805, or can be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811. .
  • the memory also includes one or more programs, the one or more programs being stored in a memory and configured to be executed by one or more processors.
  • the one or more programs described above include instructions for performing the above method.
  • non-transitory computer readable storage medium comprising a computer program, such as a memory comprising a computer program executable by a processor of a server to perform each of the above method embodiments step.
  • the non-transitory computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, or the like.
  • a plurality as referred to herein means two or more.
  • "and/or” describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种样本选择方法、装置及服务器,属于度量学习技术领域。所述方法包括:从未标注样本集中选取n组样本对,每一组样本对包括两个样本,每一个样本包括p种模态的数据(101);对于每一组样本对,计算样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度(102);根据p×p个部分相似度计算样本对包括的两个样本之间的整体相似度(103);获取p×p个部分相似度与整体相似度之间的差异程度(104);从n组样本对中选择符合预设条件的样本对作为训练样本(105)。所述方法、装置及服务器通过选择高质量的训练样本训练度量模型,能够用更少的训练样本训练出更高精度的度量模型。

Description

样本选择方法、装置及服务器
本申请要求于2017年2月8日提交国家知识产权局、申请号为201710069595.X、发明名称为“样本选择方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及度量学习技术领域,特别涉及一种样本选择方法、装置及服务器。
背景技术
度量学习(metric learning)是指从已标注相关性的样本对中自动学习出合理描述两个对象之间的语义相似度的距离度量,是图像检索领域中常用的一种机器学习技术。
目前,已经有一些较为成熟的有关度量学习的技术,但这些技术大多是针对单模态数据(如图像和图像)之间的相关性度量,并不适用于跨模态数据(如图像和文本)之间的相关性度量。针对跨模态数据的度量学习,在现有技术中,预先构建训练样本集,该训练样本集中包括大量的训练样本,每一个训练样本包括一组已标注相关性的样本对,而后采用该训练样本集训练度量模型,该度量模型用于度量一组跨模态数据之间的相关性。训练样本集的构建方式通常采用随机抽样方式从未标注样本集中选取样本对作为训练样本,并将选取的训练样本交由标注人员进行相关性标注。
在现有技术中,采用随机抽样方式选取的训练样本质量偏低,导致最终训练得到的度量模型的准确性较低。并且,由于需要大量已标注相关性的训练样本来训练度量模型,导致模型的训练效率较低,且对训练样本的标注需要耗费较多的时间和成本。
发明内容
本申请实施例提供了一种样本选择方法、装置及服务器,用以解决现有技 术在针对跨模态数据的度量学习中,所存在的模型的准确性较低,模型的训练效率较低,以及对训练样本的标注需要耗费较多的时间和成本的问题。所述技术方案如下:
一方面,提供了一种样本选择方法,应用于服务器中,所述方法包括:
从未标注样本集中选取n组样本对,每一组样本对包括两个样本,每一个样本包括p种模态的数据,所述n为正整数,所述p为大于1的整数;
对于所述n组样本对中的每一组样本对,计算所述样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度;
根据所述p×p个部分相似度计算所述样本对包括的两个样本之间的整体相似度;
获取所述p×p个部分相似度与所述整体相似度之间的差异程度;
从所述n组样本对中符合预设条件的样本对中选择训练样本;其中,所述预设条件满足所述差异程度大于第一阈值且所述整体相似度小于第二阈值。
另一方面,提供了一种样本选择装置,应用于服务器中,所述装置包括:
选取模块,用于从未标注样本集中选取n组样本对,每一组样本对包括两个样本,每一个样本包括p种模态的数据,所述n为正整数,所述p为大于1的整数;
第一计算模块,用于对于所述n组样本对中的每一组样本对,计算所述样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度;
第二计算模块,用于根据所述p×p个部分相似度计算所述样本对包括的两个样本之间的整体相似度;
第三计算模块,用于获取所述p×p个部分相似度与所述整体相似度之间的差异程度;
选择模块,用于从所述n组样本对中符合预设条件的样本对中选择训练样本;其中,所述预设条件满足所述差异程度大于第一阈值且所述整体相似度小于第二阈值。
再一方面,提供了一种服务器,所述服务器包括处理器和存储器,所述存储器中存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如上述方面所述的样本选择方法。
又一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序由处理器加载并执行以实现如上述方面所述的样本选择方法。
还一方面,提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于执行上述方面所述的样本选择方法。
本申请实施例提供的技术方案带来的有益效果包括:
将主动学习(active learning)技术应用于针对跨模态数据的度量学习中,通过主动选择最有价值的样本对作为训练样本,能够显著提升训练样本的质量。其中,最有价值的样本对是指对提升度量模型的准确性具有积极效果的样本对,例如度量模型在之前的训练过程中还未学习掌握的样本对。由于本申请实施例相较于现有技术选择的训练样本的质量更高,因此存在如下优势:第一,在选择同等数量的训练样本的情况下,采用本申请实施例提供的方法选择的训练样本训练得到的度量模型的准确性更高;第二,在为了获得同等精度的度量模型的情况下,采用本申请实施例提供的方法所需的训练样本的数量更少,有助于提高模型的训练效率,减少对训练样本的标注所需耗费的时间和成本。所以,相较于现有技术,本申请实施例提供的技术方案,通过选择高质量的训练样本训练度量模型,能够用更少的训练样本训练出更高精度的度量模型。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的样本选择方法的流程图;
图2是本申请另一实施例提供的样本选择方法的流程图;
图3是本申请一个实施例提供的模型训练过程的流程图;
图4是本申请一个实施例提供的模型优化过程的流程图;
图5是本申请实施例涉及的标注界面的示意图;
图6是本申请实施例涉及的检索界面的示意图;
图7是本申请一个实施例提供的样本选择装置的框图;
图8是本申请一个实施例提供的服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本申请实施例中,将主动学习技术应用于针对跨模态数据的度量学习中,通过主动选择最有价值的样本对作为训练样本进行相关性标注并用于模型训练,能够显著提升训练样本的质量,从而达到提升最终训练得到的度量模型的准确性,提高度量模型的训练效率,以及减少对训练样本的标注所需耗费的时间和成本的技术效果。下面将基于上面所述的本申请实施例涉及的共性方面,对本申请实施例进行进一步详细说明。
本申请实施例提供的方法,各步骤的执行主体可以是服务器。例如,该服务器可以是一台服务器,也可以是由若干台服务器组成的服务器集群,或者是一个云计算服务中心。
本申请实施例提供的技术方案可应用于跨模态数据检索领域,例如跨图像和文本两种模态数据的检索领域。通过本申请实施例提供的度量模型能够精确计算相同模态数据、跨模态数据之间的相关性,从而达到向用户精准反馈所需检索的内容的目的。
请参考图1,其示出了本申请一个实施例提供的样本选择方法的流程图。该方法可以包括如下几个步骤:
步骤101,从未标注样本集中选取n组样本对,每一组样本对包括两个样本,每一个样本包括p种模态的数据,n为正整数,p为大于1的整数。
未标注样本集中包括多个样本,每一个样本包括多种模态的数据。对于每一个样本包括的p种模态的数据,这p种模态的数据之间具有相关性。示例性地,每一个样本包括图像和文本两种模态的数据,其中文本是用于描述图像的文字信息,比如文本用于对图像的内容进行介绍。
在本申请实施例中,对模态的划分角度不作限定。在一个示例中,从数据类型角度对模态进行划分,不同种模态的数据可以是不同数据类型的数据,例如图像和文本、音频和文本、视频和文本、图像和音频等。在另一个示例中,从数据内容角度对模态进行划分,不同种模态的数据可以是不同数据内容的数据,例如有关身体健康状况的数据可分为如下多种不同的模态:血压数据、血 糖数据、心电图数据、体重数据等。
可选地,采用随机抽样方式从未标注样本集中选取n组样本对。
例如,未标注样本集以集合U表示,集合U中包括N 1个样本。采用随机抽样方式从N 1个样本中选取n组样本对,该n组样本对构成候选训练样本集,候选训练样本集以集合P表示。示例性地,假设每一个样本包括图像和文本两种模态的数据。
步骤102,对于n组样本对中的每一组样本对,计算样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度。
对于每一组样本对,计算得到的p×p个部分相似度包括:p个相同模态数据之间的相似度,以及p×p-p个跨模态数据之间的相似度。仍然以上述示例为例,假设每一个样本包括图像和文本两种模态的数据,对于每一个样本对(包括第一样本和第二样本)能够计算得到4个部分相似度,其中2个相同模态数据之间的相似度分别为第一样本的图像和第二样本的图像之间的相似度、第一样本的文本和第二样本的文本之间的相似度,另外2个跨模态数据之间的相似度分别为第一样本的图像和第二样本的文本之间的相似度、第一样本的文本和第二样本的图像之间的相似度。
可选地,步骤102包括如下几个子步骤:
步骤102a,对于n组样本对中的每一组样本对,从样本对包括的每一个样本的每一种模态的数据中提取特征;
在实际应用中,针对不同模态的数据,所提取的特征也有所不同,其可根据实际应用需求预先设定,本实施例对此不作限定。示例性地,假设每一个样本包括图像和文本两种模态的数据,则从每一个样本中提取的特征包括图像特征和文本特征。其中,图像特征从样本包括的图像中提取,图像特征的提取方式可采用数字图像处理教科书中的经典方法,图像特征包括但不限于颜色特征、纹理特征、形状特征中的至少一种。文本特征从样本包括的文本中提取,文本特征的提取方式可采用自然语言处理领域中的经典方法,文本特征包括但不限于TF-IDF(Term Frequency–Inverse Document Frequency,词频-反文档频率)特征、LDA(Latent Dirichlet Allocation,隐含狄利克雷模型)特征、词性特征等。
例如,样本以o表示,从样本o中提取的图像特征为
Figure PCTCN2018075114-appb-000001
从样本o中 提取的文本特征为
Figure PCTCN2018075114-appb-000002
该样本o即可以特征向量的形式表示为o=[x;z];其中,D x表示图像特征的维度数(即项数),D z表示文本特征的维度数(即项数)。
步骤102b,根据样本对包括的一个样本的每一种模态的数据的特征和另一个样本的每一种模态的数据的特征,计算得到p×p个部分相似度。
采用已经训练得到的度量模型根据样本对包括的一个样本的每一种模态的数据的特征和另一个样本的每一种模态的数据的特征,计算得到p×p个部分相似度。其中,已经训练得到的度量模型采用已经构建的训练样本集进行训练得到。已经构建的训练样本集以集合L表示,其包括N 2个训练样本,每一个训练样本包括一组已标注相关性的样本对。在本申请实施例中,通过本实施例提供的方法流程选取高质量的训练样本,并采用这些高质量的训练样本更新训练样本集L,而后采用更新后的训练样本集L对度量模型重新进行训练,以达到优化度量模型的目的。
最初的度量模型可由最初的训练样本集进行训练得到。最初的训练样本集中可以包含少量的训练样本,这部分少量的训练样本可采用随机抽样方式从未标注样本集中选取,并将选取的训练样本交由标注人员进行相关性标注后用于训练最初的度量模型。在本实施例中,度量模型以M表示,度量模型M为k行k列的矩阵,k表示一个样本包括的p种模态数据的特征的维度数(即项数)之和,k为大于1的整数。示例性地,假设每一个样本包括图像和文本两种模态的数据,从样本o中提取的图像特征为
Figure PCTCN2018075114-appb-000003
从样本o中提取的文本特征为
Figure PCTCN2018075114-appb-000004
则度量模型M是一个(D x+D z)行(D x+D z)列的矩阵。度量模型M可以看作是由p×p个子度量模型构成。每一个子度量模型用于根据一个对象的一种模态的数据的特征和另一个对象的一种模态的数据的特征,计算得到一个部分相似度。度量模型M用于根据p×p个部分相似度计算两个对象之间的整体相似度。上述对象可以是训练样本中的一个样本,也可以是采用该度量模型进行相关性度量时输入的数据。
示例性地,假设p为2,每一个样本包括图像和文本两种模态的数据,从样本o中提取的图像特征为
Figure PCTCN2018075114-appb-000005
从样本o中提取的文本特征为
Figure PCTCN2018075114-appb-000006
则度量模型M可以表示为:
Figure PCTCN2018075114-appb-000007
度量模型M可以看作是由4个子度量模型构成,分别为M 11、M 12、M 21 (即
Figure PCTCN2018075114-appb-000008
)和M 22。其中,子度量模型M 11为D x行D x列的矩阵,用于根据一个对象的图像特征和另一个对象的图像特征计算这两个对象的图像之间的相似度;子度量模型M 12为D x行D z列的矩阵,用于根据一个对象的图像特征和另一个对象的文本特征计算这一个对象的图像和另一个对象的文本之间的相似度;子度量模型M 21为D z行D x列的矩阵,用于根据一个对象的文本特征和另一个对象的图像特征计算这一个对象的文本和另一个对象的图像之间的相似度;子度量模型M 22为D z行D z列的矩阵,用于根据一个对象的文本特征和另一个对象的文本特征计算这两个对象的文本之间的相似度。采用该度量模型M能够计算得到4个部分相似度,分别表示为
Figure PCTCN2018075114-appb-000009
Figure PCTCN2018075114-appb-000010
步骤103,根据p×p个部分相似度计算样本对包括的两个样本之间的整体相似度。
可选地,将p×p个部分相似度相加得到整体相似度。在本实施例中,整体相似度以S M表示。
以计算样本对(o i,o j)包括的两个样本o i和o j之间的整体相似度S M(o i,o j)为例,计算过程如下:
Figure PCTCN2018075114-appb-000011
其中,x i表示样本o i的图像特征,z i表示样本o i的文本特征,x j表示样本o j的图像特征,z j表示样本o j的文本特征。
另外,仍然采用已经训练得到的度量模型M根据p×p个部分相似度计算样本对包括的两个样本之间的整体相似度。
步骤104,获取p×p个部分相似度与整体相似度之间的差异程度。
上述差异程度可以由多种衡量标准,例如方差、残差等。
可选地,当采用方差衡量差异程度时,样本对(o i,o j)对应的差异程度var(o i,o j)为:
Figure PCTCN2018075114-appb-000012
其中,S M表示样本对(o i,o j)包括的两个样本o i和o j之间的整体相似度,也即S M(o i,o j),此处简写为S M
Figure PCTCN2018075114-appb-000013
Figure PCTCN2018075114-appb-000014
分别表示样本对(o i,o j)包括的两个样本o i和o j之间的部分相似度,也即
Figure PCTCN2018075114-appb-000015
Figure PCTCN2018075114-appb-000016
Figure PCTCN2018075114-appb-000017
此处简写为
Figure PCTCN2018075114-appb-000018
Figure PCTCN2018075114-appb-000019
差异程度var(o i,o j)同时考虑了相同模态数据和跨模态数据之间的部分相似度,并反映了各个部分相似度与整体相似度之间的不一致程度。
步骤105,从n组样本对中符合预设条件的样本对中选择训练样本;其中,预设条件满足差异程度大于第一阈值且整体相似度小于第二阈值。
在计算得到每一组样本对对应的整体相似度和差异程度之后,据此选择最有价值的样本对作为训练样本。其中,最有价值的样本对是指对提升度量模型的准确性具有积极效果的样本对,例如度量模型在之前的训练过程中还未学习掌握的样本对。
在本申请实施例中,通过选取差异程度较大且整体相似度较小的样本对作为训练样本,能够达到选取高质量的训练样本的目的。
在一个示例中,步骤105包括:从n组样本对中选择差异程度大于第一阈值且整体相似度小于第二阈值的样本对作为训练样本。第一阈值的取值可根据实际需求预先设定,若想选择的样本对对应的差异程度越大,则第一阈值的取值越大。第二阈值的取值也可根据实际需求预先设定,若想选择的样本对对应的整体相似度越小,则第二阈值的取值越小。
在另一个示例中,步骤105包括如下几个子步骤:
步骤105a,对于n组样本对中的每一组样本对,根据样本对对应的整体相似度和差异程度,计算样本对对应的信息量;
样本对(o i,o j)对应的信息量dis(o i,o j)为:
Figure PCTCN2018075114-appb-000020
步骤105b,从n组样本对中选择信息量大于第三阈值的样本对作为训练样本。
第三阈值的取值可根据实际需求预先设定,若想选择的样本对对应的差异程度越大且整体相似度越小,则第三阈值的取值较大。
在另一种可能的实施方式中,在计算得到每一组样本对对应的信息量之后,按照信息量由大到小的顺序对n组样本对进行排序得到样本对序列,从样本对序列中选择前m个样本对作为训练样本,m为正整数。其中,m的取值可根据实际需要选取的训练样本的数量预先设定。
通过计算样本对对应的信息量,能够实现将样本对对应的差异程度最大化,并将样本对对应的整体相似度最小化,从而选取符合预设条件的样本对作为训练样本。
综上所述,本实施例提供的方法,将主动学习技术应用于针对跨模态数据的度量学习中,通过主动选择最有价值的样本对作为训练样本,能够显著提升训练样本的质量。由于本申请实施例相较于现有技术选择的训练样本的质量更高,因此存在如下优势:第一,在选择同等数量的训练样本的情况下,采用本申请实施例提供的方法选择的训练样本训练得到的度量模型的准确性更高;第二,在为了获得同等精度的度量模型的情况下,采用本申请实施例提供的方法所需的训练样本的数量更少,有助于提高模型的训练效率,减少对训练样本的标注所需耗费的时间和成本。所以,相较于现有技术,本申请实施例提供的技术方案,通过选择高质量的训练样本训练度量模型,能够用更少的训练样本训练出更高精度的度量模型。
在基于图1所示实施例提供的一个可选实施例中,如图2所示,上述步骤105之后还包括如下几个步骤。
步骤106,获取由标注人员标注的训练样本对应的相关性,训练样本对应的相关性用于指示训练样本包括的两个样本相关或不相关。
通过上述图1所示实施例筛选出高质量的训练样本之后,需要由标注人员对这些训练样本进行相关性标注,也即由标注人员判断训练样本包括的两个样本相关或不相关,并标注出判断结果。标注人员可认为是专家,其标注的相关性具有较高的准确性和权威性。
可选地,假设从上述n组样本对中选取的训练样本构成待标注训练样本集,待标注训练样本集以集合Q表示,将集合Q中的每一个训练样本提供给标注人员进行相关性标注。
示例性地,服务器向标注人员对应的用户设备发送待标注训练样本集Q,用户设备接收到待标注训练样本集Q之后,将各个训练样本进行显示,获取由 标注人员标注的各个训练样本对应的相关性,并发送给服务器。
步骤107,将训练样本添加至训练样本集。
在完成相关性标注之后,将已标注相关性的训练样本添加至训练样本集L,实现对训练样本集L进行更新。
步骤108,采用训练样本集训练度量模型,度量模型用于度量两个对象之间的相关性,每一个对象包括至少一种模态的数据。
采用更新后的训练样本集L重新训练度量模型M,以达到对度量模型M进行准确性优化的目的。
在实际实现时,可通过多次重新训练度量模型,以最终训练出一个高精度的度量模型。采用已经训练得到的度量模型从未标注样本集中选取少量的高质量的训练样本,结合已有的训练样本和新选取的训练样本重新训练得到一个更高精度的度量模型。之后,采用这个重新训练出的度量模型再次从未标注样本集中选取少量的高质量的训练样本,结合已有的训练样本和新选取的训练样本重新训练得到一个更高精度的度量模型,以此循环,直至重新训练出的度量模型的准确性达到要求时停止。
可选地,如图3所示,步骤108包括如下几个子步骤:
步骤108a,初始化度量模型。
在上文已经介绍,度量模型M为k行k列的矩阵,k表示一个样本包括的p种模态数据的特征的维度数(即项数)之和,k为大于1的整数。可选地,将度量模型M初始化为单位矩阵。
可选地,在本实施例中,采用加速近邻梯度(Accelerated Proximal Gradient,APG)方法优化度量模型M对应的目标函数,还需初始化度量模型M对应的搜索序列Q。搜索序列Q是在优化目标函数的过程中度量模型M的一个临时变量,其用于记录度量模型M的次优解,根据搜索序列Q便能够计算得到度量模型M的最优解。搜索序列Q也是一个k行k列的矩阵。在本实施例中,采用APG方法优化目标函数,可以加快目标函数的优化过程。
可选地,将度量模型M初始化为单位矩阵,将搜索序列Q初始化为元素全为零的矩阵。
示例性地,假设每一个样本包括图像和文本两种模态的数据,从样本o中提取的图像特征为
Figure PCTCN2018075114-appb-000021
从样本o中提取的文本特征为
Figure PCTCN2018075114-appb-000022
则度量模型M和搜索序列Q均是一个(D x+D z)×(D x+D z)大小的矩阵。
另外,对于训练样本集L可以做如下处理:
如果训练样本(也即样本对(o i,o j))标注的相关性为相关,则将样本对(o i,o j)添加至集合S中,其相关性y ij取值为+1;如果样本对(o i,o j)标注的相关性为不相关,则将样本对(o i,o j)添加至集合D中,其相关性y ij取值为-1。采用如下式子表示:
Figure PCTCN2018075114-appb-000023
步骤108b,采用训练样本集对度量模型对应的目标函数进行优化,得到优化后的目标函数。
可选地,目标函数为:
Figure PCTCN2018075114-appb-000024
其中,w ij表示样本对(o i,o j)对应的权重,y ij表示样本对(o i,o j)对应的相关性,S M(o i,o j)表示样本对(o i,o j)对应的整体相似度。在本申请实施例中,采用如下形式的双线性相似性度量函数计算样本对(o i,o j)之间的整体相似度:
可选地,为了便于计算,将标注的相关性为相关的样本对(o i,o j)对应的权重w ij设为
Figure PCTCN2018075114-appb-000026
将标注的相关性为不相关的样本对(o i,o j)对应的权重w ij设为
Figure PCTCN2018075114-appb-000027
记号|·|表示集合中元素的个数,即|S|表示集合S中元素的个数,|D|表示集合D中元素的个数。
另外,||M|| *表示矩阵M的核范数。在本实施例中,对矩阵M施加核范数的正则化是为了学习不同模态数据之间的联系。
目标函数可以简写成:
Figure PCTCN2018075114-appb-000028
其中,
Figure PCTCN2018075114-appb-000029
在本实施例中,采用APG方法优化目标函数,优化后的目标函数为:
Figure PCTCN2018075114-appb-000030
其中,l′(Q t)为函数l(M)关于Q t的一阶导数。
需要说明的是,在本实施例中,仅以采用APG方法优化目标函数为例,本实施例并不限定采用其它方法对目标函数进行优化。
步骤108c,根据优化后的目标函数确定度量模型对应的增广矩阵。
将上述优化后的目标函数整理后可得:
Figure PCTCN2018075114-appb-000031
其中,
Figure PCTCN2018075114-appb-000032
表示度量模型M的増广矩阵,
Figure PCTCN2018075114-appb-000033
步骤108d,对度量模型对应的増广矩阵进行奇异值分解,得到奇异值分解结果。
Figure PCTCN2018075114-appb-000034
进行奇异值分解,得到奇异值分解结果:
Figure PCTCN2018075114-appb-000035
其中,U是(D x+D z)×(D x+D z)阶酉矩阵;Σ是半正定(D x+D z)×(D x+D z)阶对角矩阵;V T是V的共轭转置,是(D x+D z)×(D x+D z)阶酉矩阵。Σ的对角元素表示为Σ ii,i即为
Figure PCTCN2018075114-appb-000036
的第i个奇异值。
步骤108e,根据奇异值分解结果对度量模型进行更新,得到更新后的度量模型。
可选地,按照下述公式对度量模型M和搜索序列Q进行更新:
Figure PCTCN2018075114-appb-000037
Figure PCTCN2018075114-appb-000038
其中,
Figure PCTCN2018075114-appb-000039
α 1=1,M t表示更新前的度量模型M,M t+1表示更新后的度量模型M,Q t表示更新前的搜索序列Q,Q t+1表示更新后的搜索序列Q。如果在下述步骤108f判断出更新后的度量模型未达到预设的停止训练条件,则还需要重复进行下一轮训练,在下一轮训练中,利用更新后的搜索序列Q计算度量模型的增广矩阵。
步骤108f,判断更新后的度量模型是否达到预设的停止训练条件;若否,则再次从上述步骤108b开始执行;若是,则结束流程。
其中,预设的停止训练条件包括以下至少一项:迭代轮数达到预设值,度量模型M不再发生变化。上述预设值可综合考虑模型的训练精度和速度后预先进行设定,若对模型的训练精度要求较高则可取较大值,若对模型的训练速度要求较高则可取较小值。
通过本实施例提供的方法,采用更新后的训练样本集重新训练度量模型,以使得度量模型的准确性得到优化。
请参考图4,其示出了本申请一个实施例提供的模型优化过程的流程图。该模型优化过程包括如下几个步骤:
步骤401,构建最初的训练样本集;
最初的训练样本集中可以包含少量的训练样本,这部分少量的训练样本可 采用随机抽样方式从未标注样本集中选取,并将选取的训练样本交由标注人员进行相关性标注后用于训练最初的度量模型。
步骤402,采用训练样本集训练度量模型;
有关模型的训练过程参见上述图3所示实施例中的介绍说明,此处不再赘述。
步骤403,输出度量模型;
步骤404,采用验证样本集对度量模型的准确性进行验证;
其中,验证样本集包括至少一个验证样本,每一个验证样本包括一组已标注相关性的样本对。通过将验证样本输入至度量模型,采用度量模型预测验证样本所包括的样本对之间的相关性,并将预测得到的相关性与标注的相关性进行比对,即可确定预测结果是否准确。在实际应用中,可综合考虑多个验证样本对应的预测结果,得到度量模型的准确性。例如,度量模型的准确性=预测结果为正确的验证样本的数量/验证样本的总数量。
步骤405,判断度量模型的准确性是否达到要求;若是,则结束流程;若否,则执行下述步骤406;
可选地,判断度量模型的准确性是否大于等于预设的准确性阈值;若度量模型的准确性大于等于预设的准确性阈值,则确定达到要求;若度量模型的准确性小于预设的准确性阈值,则确定未达到要求。其中,准确性阈值可预先根据对度量模型的精度要求进行设定,精度要求越高该准确性阈值设定地越大。
步骤406,采用主动学习技术从未标注样本集中选取高质量的训练样本,并将选取的训练样本交由标注人员进行相关性标注后添加至训练样本集中;步骤406之后再次从步骤402开始执行。
其中,训练样本的选取过程可参见上述图1实施例中的介绍说明,训练样本的标注过程可参见上述图2所示实施例中的介绍说明。
本申请实施例提供的技术方案可应用于跨模态数据检索领域,例如跨图像和文本两种模态数据的检索领域。利用主动学习技术选择最有价值的样本对作为训练样本,并交由专业的标注人员进行相关性标注,能够减少标注代价,且能够高效训练出精准的度量模型。示例性地,以公众号搜索为例,一个公众号的相关信息通常包括图像(如公众号的图标)和文本(如公众号的简介)两种跨模态数据。假设当前训练样本集中已有少量的训练样本,利用这些少量的训 练样本训练出最初的度量模型,采用主动学习技术从未标注样本集中选择高质量的训练样本(比如公众号南京楼市和江北楼市),交由标注人员进行相关性标注,相应地标注界面可参见图5示例性示出。而后将已标注相关性的训练样本添加至训练样本集中,采用更新后的训练样本集重新训练度量模型。如果训练出的度量模型准确性未达到要求,则继续选取训练样本更新训练样本集,并再次重新训练度量模型。如果训练出的度量模型的准确性达到要求,则说明已经得到精准的度量模型。而后,利用该度量模型即可根据用户输入的检索信息检索出与检索信息相关的公众号,检索界面可参见图6示例性示出。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图7,其示出了本申请一个实施例提供的样本选择装置的框图。该装置具有实现上述方法示例的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:选取模块710、第一计算模块720、第二计算模块730、第三计算模块740和选择模块750。
选取模块710,用于执行上述步骤101。
第一计算模块720,用于执行上述步骤102。
第二计算模块730,用于执行上述步骤103。
第三计算模块740,用于执行上述步骤104。
选择模块750,用于执行上述步骤105。
可选地,选择模块750,包括:计算单元和选择单元。计算单元用于执行上述步骤105a。选择单元用于执行上述步骤105b。
可选地,第一计算模块720,包括:提取单元和计算单元。提取单元用于执行上述步骤102a。计算单元用于执行上述步骤102b。
可选地,所述装置还包括:获取模块、添加模块和训练模块。获取模块用于执行上述步骤106。添加模块用于执行上述步骤107。训练模块用于执行上述步骤108。
可选地,所述训练模块,包括:初始化单元、优化单元、确定单元、分解单元、更新单元和判断单元。初始化单元用于执行上述步骤108a。优化单元用于执行上述步骤108b。确定单元用于执行上述步骤108c。分解单元用于执行上述步骤108d。更新单元用于执行上述步骤108e。判断单元用于执行上述步 骤108f。
相关细节可参考上述方法实施例。
需要说明的是:上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图8,其示出了本申请一个实施例提供的服务器的结构示意图。该服务器用于实施上述实施例中提供的方法。具体来讲:
所述服务器800包括中央处理单元(CPU)801、包括随机存取存储器(RAM)802和只读存储器(ROM)803的系统存储器804,以及连接系统存储器804和中央处理单元801的系统总线805。所述服务器800还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)806,和用于存储操作系统813、应用程序814和其他程序模块815的大容量存储设备807。
所述基本输入/输出系统806包括有用于显示信息的显示器808和用于用户输入信息的诸如鼠标、键盘之类的输入设备809。其中所述显示器808和输入设备809都通过连接到系统总线805的输入输出控制器810连接到中央处理单元801。所述基本输入/输出系统806还可以包括输入输出控制器810以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器810还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备807通过连接到系统总线805的大容量存储控制器(未示出)连接到中央处理单元801。所述大容量存储设备807及其相关联的计算机可读介质为服务器800提供非易失性存储。也就是说,所述大容量存储设备807可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存 储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器804和大容量存储设备807可以统称为存储器。
根据本申请的各种实施例,所述服务器800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器800可以通过连接在所述系统总线805上的网络接口单元811连接到网络812,或者说,也可以使用网络接口单元811来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行。上述一个或者一个以上程序包含用于执行上述方法的指令。
在示例性实施例中,还提供了一种包括计算机程序的非临时性计算机可读存储介质,例如包括计算机程序的存储器,上述计算机程序可由服务器的处理器执行以完成上述方法实施例中的各个步骤。例如,所述非临时性计算机可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。
在示例性实施例中,还提供了一种计算机程序产品,当该计算机程序产品被执行时,其用于实现上述方法实施例中的各个步骤的功能。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (4)

  1. 一种样本选择方法,其特征在于,应用于服务器中,所述方法包括:
    从未标注样本集中选取n组样本对,每一组样本对包括两个样本,每一个样本包括p种模态的数据,所述n为正整数,所述p为大于1的整数;
    对于所述n组样本对中的每一组样本对,计算所述样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度;
    根据所述p×p个部分相似度计算所述样本对包括的两个样本之间的整体相似度;
    获取所述p×p个部分相似度与所述整体相似度之间的差异程度;
    从所述n组样本对中符合预设条件的样本对中选择训练样本;其中,所述预设条件满足所述差异程度大于第一阈值且所述整体相似度小于第二阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述n组样本对中符合预设条件的样本对中选择训练样本,包括:
    对于所述n组样本对中的每一组样本对,根据所述样本对对应的整体相似度和差异程度,计算所述样本对对应的信息量;
    从所述n组样本对中选择所述信息量大于第三阈值的样本对作为所述训练样本。
  3. 根据权利要求1或2所述的方法,其特征在于,所述对于所述n组样本对中的每一组样本对,计算所述样本对包括的一个样本的每一种模态的数据和另一个样本的每一种模态的数据之间的部分相似度,得到p×p个部分相似度,包括:
    对于所述n组样本对中的每一组样本对,从所述样本对包括的每一个样本的每一种模态的数据中提取特征;
    根据所述样本对包括的一个样本的每一种模态的数据的特征和另一个样本的每一种模态的数据的特征,计算得到所述p×p个部分相似度。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述从所述n组
PCT/CN2018/075114 2017-02-08 2018-02-02 样本选择方法、装置及服务器 WO2018145604A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18750641.5A EP3582144A4 (en) 2017-02-08 2018-02-02 SAMPLE SELECTION PROCESS, APPARATUS AND SERVER
US16/353,754 US10885390B2 (en) 2017-02-08 2019-03-14 Sample selection method and apparatus and server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710069595.X 2017-02-08
CN201710069595.XA CN108399414B (zh) 2017-02-08 2017-02-08 应用于跨模态数据检索领域的样本选择方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/353,754 Continuation US10885390B2 (en) 2017-02-08 2019-03-14 Sample selection method and apparatus and server

Publications (1)

Publication Number Publication Date
WO2018145604A1 true WO2018145604A1 (zh) 2018-08-16

Family

ID=63094358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075114 WO2018145604A1 (zh) 2017-02-08 2018-02-02 样本选择方法、装置及服务器

Country Status (5)

Country Link
US (1) US10885390B2 (zh)
EP (1) EP3582144A4 (zh)
CN (1) CN108399414B (zh)
MA (1) MA47466A (zh)
WO (1) WO2018145604A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353626A (zh) * 2018-12-21 2020-06-30 阿里巴巴集团控股有限公司 数据的审核方法、装置及设备
CN112308170A (zh) * 2020-11-10 2021-02-02 维沃移动通信有限公司 建模方法、装置及电子设备
CN112861962A (zh) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 样本处理方法、装置、电子设备和存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472743A (zh) * 2019-07-31 2019-11-19 北京百度网讯科技有限公司 样本集中特征穿越的处理方法及装置、设备与可读介质
CN110533489B (zh) * 2019-09-05 2021-11-05 腾讯科技(深圳)有限公司 应用于模型训练的样本获取方法及装置、设备、存储介质
CN110543636B (zh) * 2019-09-06 2023-05-23 出门问问创新科技有限公司 一种对话系统的训练数据选择方法
CN110738476B (zh) * 2019-09-24 2021-06-29 支付宝(杭州)信息技术有限公司 一种样本迁移方法、装置及设备
CN111027318B (zh) * 2019-10-12 2023-04-07 中国平安财产保险股份有限公司 基于大数据的行业分类方法、装置、设备及存储介质
CN110766080B (zh) * 2019-10-24 2022-03-08 腾讯医疗健康(深圳)有限公司 一种标注样本确定方法、装置、设备及存储介质
US10856191B1 (en) * 2019-11-08 2020-12-01 Nokia Technologies Oy User equipment configuration
CN111369979B (zh) * 2020-02-26 2023-12-19 广州市百果园信息技术有限公司 训练样本获取方法、装置、设备及计算机存储介质
CN111461191B (zh) * 2020-03-25 2024-01-23 杭州跨视科技有限公司 为模型训练确定图像样本集的方法、装置和电子设备
CN113449750A (zh) * 2020-03-26 2021-09-28 顺丰科技有限公司 模型训练方法、使用方法、相关装置及存储介质
CN111651660B (zh) * 2020-05-28 2023-05-02 拾音智能科技有限公司 一种跨媒体检索困难样本的方法
CN112036491A (zh) * 2020-09-01 2020-12-04 北京推想科技有限公司 确定训练样本的方法及装置、训练深度学习模型的方法
CN112364999B (zh) * 2020-10-19 2021-11-19 深圳市超算科技开发有限公司 冷水机调节模型的训练方法、装置及电子设备
CN112308139B (zh) * 2020-10-29 2024-03-22 中科(厦门)数据智能研究院 一种基于主动学习的样本标注方法
CN112559602B (zh) * 2021-02-21 2021-07-13 北京工业大数据创新中心有限公司 一种工业设备征兆的目标样本的确定方法及系统
CN112990765B (zh) * 2021-04-16 2022-10-21 广东电网有限责任公司 一种电网调度操作票管理系统
CN112990375B (zh) * 2021-04-29 2021-09-24 北京三快在线科技有限公司 一种模型训练方法、装置、存储介质及电子设备
CN116821724B (zh) * 2023-08-22 2023-12-12 腾讯科技(深圳)有限公司 多媒体处理网络生成方法、多媒体处理方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299144A1 (en) * 2007-04-06 2010-11-25 Technion Research & Development Foundation Ltd. Method and apparatus for the use of cross modal association to isolate individual media sources
CN103049526A (zh) * 2012-12-20 2013-04-17 中国科学院自动化研究所 基于双空间学习的跨媒体检索方法
CN105930873A (zh) * 2016-04-27 2016-09-07 天津中科智能识别产业技术研究院有限公司 一种基于子空间的自步跨模态匹配方法
CN106095893A (zh) * 2016-06-06 2016-11-09 北京大学深圳研究生院 一种跨媒体检索方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8682065B2 (en) * 2008-12-24 2014-03-25 Microsoft Corporation Distance metric learning with feature decomposition
US8171049B2 (en) * 2009-09-18 2012-05-01 Xerox Corporation System and method for information seeking in a multimedia collection
US8645287B2 (en) * 2010-02-04 2014-02-04 Microsoft Corporation Image tagging based upon cross domain context
CN102129477B (zh) * 2011-04-23 2013-01-09 山东大学 一种多模态联合的图像重排序方法
US8706729B2 (en) * 2011-10-12 2014-04-22 California Institute Of Technology Systems and methods for distributed data annotation
CN102663447B (zh) * 2012-04-28 2014-04-23 中国科学院自动化研究所 基于判别相关分析的跨媒体检索方法
US9922272B2 (en) * 2014-09-25 2018-03-20 Siemens Healthcare Gmbh Deep similarity learning for multimodal medical images
CN104317834B (zh) * 2014-10-10 2017-09-29 浙江大学 一种基于深度神经网络的跨媒体排序方法
CN104462380A (zh) * 2014-12-11 2015-03-25 北京中细软移动互联科技有限公司 商标检索方法
CN105701504B (zh) * 2016-01-08 2019-09-13 天津大学 用于零样本学习的多模态流形嵌入方法
US10002415B2 (en) * 2016-04-12 2018-06-19 Adobe Systems Incorporated Utilizing deep learning for rating aesthetics of digital images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299144A1 (en) * 2007-04-06 2010-11-25 Technion Research & Development Foundation Ltd. Method and apparatus for the use of cross modal association to isolate individual media sources
CN103049526A (zh) * 2012-12-20 2013-04-17 中国科学院自动化研究所 基于双空间学习的跨媒体检索方法
CN105930873A (zh) * 2016-04-27 2016-09-07 天津中科智能识别产业技术研究院有限公司 一种基于子空间的自步跨模态匹配方法
CN106095893A (zh) * 2016-06-06 2016-11-09 北京大学深圳研究生院 一种跨媒体检索方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3582144A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353626A (zh) * 2018-12-21 2020-06-30 阿里巴巴集团控股有限公司 数据的审核方法、装置及设备
CN111353626B (zh) * 2018-12-21 2023-05-26 阿里巴巴集团控股有限公司 数据的审核方法、装置及设备
CN112308170A (zh) * 2020-11-10 2021-02-02 维沃移动通信有限公司 建模方法、装置及电子设备
CN112861962A (zh) * 2021-02-03 2021-05-28 北京百度网讯科技有限公司 样本处理方法、装置、电子设备和存储介质
CN112861962B (zh) * 2021-02-03 2024-04-09 北京百度网讯科技有限公司 样本处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
EP3582144A4 (en) 2021-01-06
CN108399414B (zh) 2021-06-01
US20190213447A1 (en) 2019-07-11
MA47466A (fr) 2019-12-18
EP3582144A1 (en) 2019-12-18
US10885390B2 (en) 2021-01-05
CN108399414A (zh) 2018-08-14

Similar Documents

Publication Publication Date Title
WO2018145604A1 (zh) 样本选择方法、装置及服务器
US11727243B2 (en) Knowledge-graph-embedding-based question answering
Yu et al. Diverse few-shot text classification with multiple metrics
CN108846126B (zh) 关联问题聚合模型的生成、问答式聚合方法、装置及设备
US20230351192A1 (en) Robust training in the presence of label noise
US20190258925A1 (en) Performing attribute-aware based tasks via an attention-controlled neural network
WO2018227800A1 (zh) 一种神经网络训练方法及装置
CN111126574A (zh) 基于内镜图像对机器学习模型进行训练的方法、装置和存储介质
CN112119388A (zh) 训练图像嵌入模型和文本嵌入模型
RU2664481C1 (ru) Способ и система выбора потенциально ошибочно ранжированных документов с помощью алгоритма машинного обучения
Luo et al. Graph entropy guided node embedding dimension selection for graph neural networks
Goh et al. Food-image Classification Using Neural Network Model
CN111898703B (zh) 多标签视频分类方法、模型训练方法、装置及介质
US10163036B2 (en) System and method of analyzing images using a hierarchical set of models
CN110795938A (zh) 文本序列分词方法、装置及存储介质
CN112074828A (zh) 训练图像嵌入模型和文本嵌入模型
WO2020151175A1 (zh) 文本生成方法、装置、计算机设备及存储介质
CN113821668A (zh) 数据分类识别方法、装置、设备及可读存储介质
CN108228684A (zh) 聚类模型的训练方法、装置、电子设备和计算机存储介质
CN111563158A (zh) 文本排序方法、排序装置、服务器和计算机可读存储介质
CN111858947A (zh) 自动知识图谱嵌入方法和系统
CN110489613B (zh) 协同可视数据推荐方法及装置
CN110457523B (zh) 封面图片的选取方法、模型的训练方法、装置及介质
CN117056612B (zh) 基于ai辅助的备课资料数据推送方法及系统
WO2022160442A1 (zh) 答案生成方法、装置、电子设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18750641

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018750641

Country of ref document: EP

Effective date: 20190909