WO2023273572A1 - Feature extraction model construction method and target detection method, and device therefor - Google Patents

Feature extraction model construction method and target detection method, and device therefor Download PDF

Info

Publication number
WO2023273572A1
WO2023273572A1 PCT/CN2022/089230 CN2022089230W WO2023273572A1 WO 2023273572 A1 WO2023273572 A1 WO 2023273572A1 CN 2022089230 W CN2022089230 W CN 2022089230W WO 2023273572 A1 WO2023273572 A1 WO 2023273572A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
image
model
feature extraction
similarity
Prior art date
Application number
PCT/CN2022/089230
Other languages
French (fr)
Chinese (zh)
Inventor
江毅
孙培泽
杨朔
袁泽寰
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023273572A1 publication Critical patent/WO2023273572A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present application relates to the technical field of image processing, and in particular to a feature extraction model building method, a target detection method and equipment thereof.
  • Target detection also known as target extraction
  • target detection is an image segmentation technology based on target geometric statistics and features; and target detection has a wide range of applications (for example, target detection can be applied to robotics or automatic driving and other fields).
  • the present application provides a feature extraction model construction method, a target detection method and equipment thereof, which can improve the accuracy of target detection.
  • the embodiment of the present application provides a method for constructing a feature extraction model, the method comprising:
  • the sample pair includes a sample image and a sample object text identifier; the actual information similarity of the sample pair is used to describe the The degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
  • the sample pair is input into the model to be trained, and the extracted feature of the sample pair output by the model to be trained is obtained; wherein, the extracted feature of the sample pair includes the extracted feature of the sample image and Extracting features of the text identifier of the sample object;
  • the model is updated according to the actual information similarity of the sample pair and the predicted information similarity of the sample pair, and continuing to execute the step of inputting the sample pair into the model to be trained , until the preset stop condition is reached, the feature extraction model is determined according to the model to be trained.
  • the model to be trained includes a text feature extraction sub-model and an image feature extraction sub-model;
  • the process of determining the feature extraction of the sample binary group includes:
  • the method before inputting the sample pair into the model to be trained, the method further includes:
  • the text feature extraction sub-model is initialized, so that the similarity between the text features output by the initialized text feature extraction sub-model for any two objects is the same as that of the two objects The degree of correlation between them is positively correlated; wherein, the preset prior knowledge is used to describe the degree of correlation between different objects.
  • the similarity between the extracted feature of the sample image and the extracted feature of the sample object text identifier Determine the process, including:
  • the process of determining the actual information similarity of the sample pair includes:
  • sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, then according to the actual position of the sample object in the sample image, determine the actual information similarity.
  • the embodiment of the present application also provides a target detection method, the method comprising:
  • the feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiments of the present application;
  • a target detection result corresponding to the image to be detected is determined according to the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • the embodiment of the present application also provides a feature extraction model construction device, including:
  • a sample acquisition unit configured to acquire a similarity between a sample pair and the actual information of the sample pair; wherein, the sample pair includes a sample image and a sample object text identifier; the actual information of the sample pair
  • the similarity is used to describe the degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
  • a feature prediction unit configured to input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained; wherein, the extracted features of the sample pair include the Extracting features of the sample image and extracting features of the text identifier of the sample object;
  • a model updating unit configured to update the model to be trained according to the actual information similarity of the sample pair and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, And continue to execute the step of inputting the sample pair into the model to be trained until the preset stop condition is reached, and a feature extraction model is determined according to the model to be trained.
  • the embodiment of the present application also provides a target detection device, including:
  • An information acquisition unit configured to acquire the image to be detected and the text identification of the object to be detected
  • a feature extraction unit configured to input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the text identifier of the object to be detected output by the feature extraction model feature extraction; wherein, the feature extraction model is constructed using any implementation of the method for constructing a feature extraction model provided in the embodiments of the present application;
  • the result determination unit is configured to determine the target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • the embodiment of the present application also provides a device, which is characterized in that the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the feature extraction model construction method provided in the embodiment of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiment of the application.
  • the embodiment of the present application also provides a computer-readable storage medium, which is characterized in that the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the feature extraction model construction method provided in the embodiment of the present application Any implementation of the method, or execute any implementation of the target detection method provided by the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product, which is characterized in that, when the computer program product runs on the terminal device, the terminal device executes any implementation method of the feature extraction model construction method provided in the embodiment of the present application way, or execute any implementation of the target detection method provided in the embodiment of the present application.
  • the embodiment of the present application has at least the following advantages:
  • the feature extraction model is first constructed by using the similarity between the sample pair and the actual information of the sample pair, so that the constructed feature extraction model has better feature extraction performance;
  • the constructed feature extraction model performs feature extraction for the image to be detected and the text identifier of the object to be detected, and obtains and outputs the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected; finally, according to the extraction of the image to be detected
  • the similarity between the feature and the extracted feature of the text mark of the object to be detected determines the target detection result corresponding to the image to be detected.
  • the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by
  • the text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
  • Fig. 1 is a flow chart of a method for constructing a feature extraction model provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the nth sample binary group provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of a sample image including multiple objects provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a model to be trained provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the relationship between different objects provided by the embodiment of the present application.
  • FIG. 6 is a flow chart of a target detection method provided in an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a feature extraction model construction device provided in an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an object detection device provided by an embodiment of the present application.
  • a target object such as a cat
  • the information carried by the image should be similar to the information carried by the object text identifier of the target object (for example, the target object is in
  • the information carried by each pixel in the region of the image should be the same as the information carried by the object text identifier of the target object).
  • the embodiment of the present application provides a method for constructing a feature extraction model, the method includes: obtaining the similarity between the sample doublet and the actual information of the sample doublet, so that the sample doublet includes the sample image and The sample object text identifier and the actual information similarity of the sample binary group are used to describe the similarity between the information actually carried by the sample image and the information actually carried by the sample object text identifier; the sample binary group is input into the waiting Training the model to obtain the extraction feature of the sample binary group output by the model to be trained; wherein, the extraction feature of the sample binary group includes the extraction feature of the sample image and the extraction feature of the sample object text identifier; according to the sample binary The actual information similarity of the tuple, and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, update the model to be trained, and continue to execute the process of inputting the sample binary group into the model to be trained Steps until the preset stop condition is reached, according to the model to be
  • the extracted features of the sample image and the extracted features of the sample object text identifier output by the trained model for the sample binary group can accurately represent the information carried by the sample image and the information carried by the sample object text identifier.
  • Information so that the similarity between the extracted features of the sample image and the extracted features of the sample object text is almost close to the actual information similarity of the sample binary group, so that the trained model to be trained has better features Extraction performance, so that the feature extraction model built based on the trained model to be trained also has better feature extraction performance, so that the subsequent target detection process can be performed more accurately based on the built feature extraction model, which is conducive to improving Object detection accuracy.
  • the embodiment of the present application does not limit the execution subject of the feature extraction model construction method.
  • the feature extraction model construction method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers.
  • the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer.
  • the server can be an independent server, a cluster server or a cloud server.
  • the following first introduces the relevant content of the feature extraction model construction method (that is, the construction process of the feature extraction model), and then introduces the relevant content of the target detection method (that is, the application process of the feature extraction model) content.
  • this figure is a flow chart of a method for constructing a feature extraction model provided by an embodiment of the present application.
  • the feature extraction model construction method provided in the embodiment of the present application includes S101-S106:
  • S101 Obtain a similarity between a sample pair and the actual information of the sample pair.
  • the sample pair refers to the model input data that needs to be input to the model to be trained during the training process of the model to be trained; and the sample pair includes a sample image and a text identifier of a sample object.
  • the sample image refers to an image that needs to be subjected to target detection processing.
  • the sample object text identifier is used to uniquely identify the sample object.
  • sample object text identifier may be an object category name (for example, a cat).
  • the embodiment of the present application does not limit the number of sample pairs, for example, the number of sample pairs may be N.
  • N is a positive integer. That is, the model to be trained can be trained using N sample pairs.
  • the embodiment of the present application does not limit the sample type of the sample pair, for example, when the nth sample pair includes the nth sample image and the nth sample object text identifier, and the nth sample object text
  • the identification is used to uniquely identify the nth sample object, if the nth sample object exists in the nth sample image, it can be determined that the nth sample binary group belongs to a positive sample; if the nth sample image If the nth sample object does not exist in , it can be determined that the nth sample pair belongs to a negative sample.
  • the actual information similarity of the sample binary group is used to describe the similarity between the information actually carried by the sample image and the information actually carried by the sample object text mark, so that the actual information similarity of the sample binary group can accurately represent The relationship between the sample image and the sample object text identifier; specifically, it may include: when the sample object text identifier is used to uniquely identify the sample object, if the actual information similarity of the sample binary group is greater, it means The greater the possibility that the sample object exists in the sample image; the smaller the actual information similarity of the sample binary group is, the less likely it is that the sample object exists in the sample image.
  • the information actually carried by the nth sample image should be as much as possible Close to the information actually carried by the text identifier of the nth sample object (for example, the information actually carried by each pixel in the area where the nth sample object is located in the nth sample image should be the same as the information carried by the nth sample object text The information actually carried by the identifier remains the same).
  • the embodiment of the present application provides a process of obtaining the actual information similarity of the sample pair, which may specifically include: if the sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, Then, according to the actual position of the sample object in the sample image, the actual information similarity of the sample binary group is determined.
  • the embodiment of the present application does not limit the determination process of the actual information similarity of the sample pair.
  • it may specifically include: first, according to the actual position of the sample object in the sample image, determine The image area of the sample object, so that the image area of the sample object can represent the area occupied by the sample object in the sample image; then the actual information similarity corresponding to each pixel in the image area of the sample object is determined as The first preset similarity value (for example, 1), and the actual information similarity corresponding to each pixel point in the sample image except the image area of the sample object is determined as the second preset similarity value (for example, 0).
  • the first preset similarity value for example, 1
  • the second preset similarity value for example, 0
  • the nth sample pair includes the nth sample image and the nth sample object text identifier, and the nth sample image is an h ⁇ w ⁇ 3-dimensional image, then the nth sample two
  • the actual information similarity of tuples can be a h ⁇ w-dimensional matrix determined according to formulas (1)-(2)
  • n Indicates the actual information similarity of the nth sample binary group; Indicates the position of the pixel point in the i-th row and j-column of the n-th sample image in the n-th sample image, i is a positive integer, i ⁇ h, h is a positive integer, j is a positive integer, j ⁇ w, w is a positive integer; Z n represents the area where the nth sample object is located in the nth sample image; Indicates the similarity between the information actually carried by the pixel point in the i-th row and j-th column in the n-th sample image and the information actually carried by the n-th sample object text logo, and if It means that the area where the nth sample object is located in the nth sample image includes the pixels in the ith row and jth column in the nth sample image, so it can be determined that the ith row and jth column in the nth sample image The information actually carried by the pixel
  • the determination process may specifically include: when the actual information similarity of the nth sample pair includes the nth sample image
  • the actual information similarity corresponding to each pixel in the nth sample image is located in the area where the nth sample object is located in the nth sample image (such as Within the object bounding box shown in Figure 2)
  • the actual information similarity corresponding to the i-th row and j-th column pixel in the n-th sample image is 1; if the i-th row and j-th pixel in the n-th sample image Column pixel points are located outside the range of the nth sample object in the nth sample image (outside the object bounding box as shown in Figure 2)
  • the nth sample binary group includes the nth sample image and the nth sample object text identification
  • Q eg, 3 objects in the nth sample image
  • the nth sample object text identifier is used to uniquely identify the qth object in the nth sample image (such as a dog, person or horse in Figure 3)
  • the actual information similarity of the nth sample binary group can be Determine according to the area occupied by the qth object in the nth sample image, specifically: the actual information similarity corresponding to each pixel in the area occupied by the qth object in the nth sample image is averaged It is determined as the first preset similarity value (for example, 1), and the actual information similarity corresponding to each pixel point outside the area occupied by the qth object in the nth sample image is determined as the second preset Set a similarity value (for example, 0).
  • q is a positive integer
  • q ⁇ Q is a positive integer, and q ⁇ Q.
  • the object text identifier of the qth object constructs a sample pair, and the area occupied by the qth object in the nth sample image is used to determine the actual information similarity of the sample pair.
  • dog refers to the object text identifier of a dog
  • person refers to the object text identifier of a person
  • hoorse refers to the object text identifier of a horse.
  • the sample image and the sample object text identifier are acquired, according to the association relationship between the sample image and the sample object text identifier (for example, whether there is a sample object text in the sample image identify the uniquely identified sample object, and the location of the sample object in the sample image), determine the similarity between the information actually carried by the sample image and the information actually carried by the text identifier of the sample object, so that the subsequent During the training process of the training model, the similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object is taken as the learning goal.
  • S102 Input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained.
  • the extracted features of the sample pair are used to represent the information carried by the sample pair; and the extracted features of the sample pair include the extracted features of the sample image and the extracted features of the text identifier of the sample object.
  • the extracted features of the sample image are used to represent the information carried by the sample image prediction.
  • the embodiment of the present application does not limit the representation of the feature extraction of the sample image. For example, if a sample image is h ⁇ w ⁇ 3 dimensional, the feature extraction of the sample image can be performed using a h ⁇ w ⁇ c dimensional feature map. express.
  • the extracted features of the sample object text identifier are used to represent the information carried by the sample object text identifier prediction.
  • the embodiment of the present application does not limit the representation manner of the extracted features of the sample object text identifier.
  • the extracted features of a sample object text identifier may be represented by a 1 ⁇ c-dimensional feature vector.
  • the model to be trained is used to perform feature extraction on input data of the model to be trained (for example, perform text feature extraction on text data, and/or perform image feature extraction on image data).
  • the embodiment of the present application does not limit the structure of the model to be trained.
  • the model 400 to be trained may include a text feature extraction sub-model 401 and an image feature extraction sub-model 402.
  • the process of using the model to be trained 400 to determine the feature extraction of the sample pair may specifically include steps 11-12:
  • Step 11 Input the sample image into the image feature extraction sub-model 402 to obtain the extracted features of the sample image output by the image feature extraction sub-model 402 .
  • the image feature extraction sub-model 402 is used for image feature extraction; moreover, the embodiment of the present application does not limit the implementation of the image feature extraction sub-model 402, and any existing or future image feature extraction function can be used
  • the model structure is implemented.
  • Step 12 Input the text mark of the sample object into the text feature extraction sub-model 401, and obtain the extracted features of the text mark of the sample object output by the text feature extraction sub-model 401.
  • the text feature extraction sub-model 401 is used for text feature extraction; moreover, the embodiment of the present application does not limit the implementation of the text feature extraction sub-model 401, and any existing or future text feature extraction function can be used
  • the model structure (such as Bert, GPT-3 and other language models) is implemented.
  • the image feature extraction sub-model 402 in the model to be trained 400 can be used for the model to be trained.
  • This feature extraction sub-model 401 performs text feature extraction on the sample object text identifier in the sample binary group, obtains and outputs the extraction features of the sample object text identifier, so that the sample object text identifier extraction features can represent the sample object The text identifies the information carried by the prediction.
  • the embodiment of the present application also provides a possible implementation of the feature extraction model construction method.
  • the feature extraction model construction method includes S107 in addition to S101-S106:
  • the preset prior knowledge is used to describe the degree of association between different objects (for example, as shown in Figure 5, cats and tigers belong to the cat family, so that the degree of association between cats and tigers is high; are both lions and lionesses, making the association between lions and lionesses even higher).
  • the correlation degree between two objects is 1, it means that the two objects belong to the same type of object; if the correlation degree between the two objects is 0, it means that the two objects are completely different There are similarities (that is, there is no association relationship between the two objects).
  • the embodiment of the present application does not limit the preset prior knowledge, for example, the preset prior knowledge may include a pre-built object knowledge graph.
  • the object knowledge map can be used to describe the degree of correlation between different objects; and the object knowledge map can be constructed in advance based on a large amount of knowledge information related to objects.
  • this embodiment of the present application does not limit the implementation manner of "initialization processing" in S107.
  • the "initialization processing" in S107 may refer to pre-training. That is, the text feature extraction sub-model 401 is pre-trained using the preset prior knowledge, so that the trained text feature extraction sub-model 401 can perform feature extraction according to the preset prior knowledge, so that the initialized text
  • the similarity between the text features output by the feature extraction sub-model 401 for any two objects is positively correlated with the degree of association between the two objects.
  • the "initialized text feature extraction sub-model 401” if the preset prior knowledge indicates that the correlation between the first object and the second object is higher, then the "initialized text feature The higher the similarity between the text features (such as “v 5 ” and “v 3 ” in FIG. 5 ) respectively output by the extraction sub-model 401” for the first object (such as a cat) and the second object (such as a lion); if In the preset prior knowledge, the lower the degree of correlation between the first object and the second object is, the text features output by the "initialized text feature extraction sub-model 401" for the first object and the second object respectively The lower the similarity between.
  • v 1 in Fig. 5 indicates the text features output by the “initialized text feature extraction sub-model 401” for tigers; “v 2 ” indicates that the “initialized text feature extraction sub-model 401” targets The text features output by Leopard; ... (and so on); “v 6 " indicates the text features output by the "initialized text feature extraction sub-model 401" for lynx.
  • the embodiment of the present application does not limit the execution time of S107, and it only needs to be completed before S102 is executed (that is, S107 only needs to be completed before training the model to be trained).
  • the text The feature extraction sub-model 401 is pre-trained, so that the text feature extraction sub-model 401 in the model to be trained 400 can learn to perform feature extraction according to preset prior knowledge, so that the training process of the model to be trained 400 continues to optimize the The text feature extraction performance of the text feature extraction sub-model 401, so that the text feature extraction sub-model 401 in the trained model 400 to be trained can better perform feature extraction based on preset prior knowledge, which is conducive to improving the model 400 to be trained feature extraction performance, which is beneficial to improve the feature extraction performance of the feature extraction model constructed based on the model to be trained 400, and further helps to improve the target detection performance when using the feature extraction model for target detection.
  • the nth sample pair can be input into the model to be trained, so that the model to be trained can target the nth sample pair in the nth sample pair.
  • the nth sample image and the nth sample object text mark carry out feature extraction respectively, obtain and output the extraction feature of the nth sample image and the extraction feature of the nth sample object text mark (that is, the nth sample The extraction feature of the binary group), so that the extraction feature of the sample image and the extraction feature of the nth sample object text mark can respectively represent the information carried by the sample image prediction and the information carried by the sample object text mark prediction,
  • n is a positive integer
  • n ⁇ N N is a positive integer
  • N represents the number of sample pairs.
  • the similarity of the prediction information of the sample binary group refers to the similarity between the extracted features of the sample image and the extracted features of the sample object text identification, so that the similarity of the predicted information of the sample binary group can be used to describe the sample image prediction The degree of similarity between the information carried and the information carried by the sample object text identifier prediction.
  • the embodiment of the present application does not limit the method of determining the similarity of the prediction information of the sample pair (that is, the implementation of S103).
  • S103 may specifically include S1031-S1032:
  • S1031 Determine the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier.
  • the feature map of the sample image is used to represent the information carried by the sample image; moreover, the embodiment of the present application does not limit the feature map of the sample image, for example, if a sample image is h ⁇ w ⁇ 3 dimensional, and the sample object text identifier
  • the extracted feature of is 1 ⁇ c dimension
  • the feature map of the sample image can be h ⁇ w ⁇ c dimension.
  • h is a positive integer
  • w is a positive integer
  • c is a positive integer.
  • the embodiment of the present application does not limit the representation of the feature map of the sample image.
  • the feature map of the sample image can use h ⁇ w pixel-level feature extraction To represent, and each pixel-level extracted feature is 1 ⁇ c dimension.
  • the pixel-level extracted feature located in the i-th row and j-column in the feature map of the sample image is used to represent the information carried by the pixel point prediction in the i-th row and j-column in the sample image.
  • S1031 may be implemented using formula (3).
  • b ij represents the similarity between the pixel-level extracted features located in the i-th row and j-th column in the feature map of the sample image and the extracted features of the sample object text mark, so that b ij can be used to describe the first
  • the degree of similarity between the predicted information carried by the pixel point in row i, column j, and the predicted information carried by the text mark of the sample object Represents the pixel-level extraction features located in the i-th row and j-th column in the feature map of the sample image, so that It is used to describe the information carried by the pixel point prediction in the i-th row and j-th column in the sample image, and is a 1 ⁇ c-dimensional feature vector
  • H n represents the extracted feature of the sample object text mark, so that this H n is used to describe the information carried by the sample object text mark prediction, and H n is a 1 ⁇ c-dimensional feature vector
  • S( ⁇ ) means to perform similarity calculation
  • i is a positive integer
  • S1032 may be calculated by using formula (4).
  • the predictive information similarity of the nth sample binary group can be calculated according to the extracting feature of the nth sample image and the extracting feature of the nth sample object text mark, so that the nth sample
  • the predicted information similarity of the two sample pairs can accurately describe the similarity between the predicted information carried by the nth sample image and the predicted information carried by the nth sample object text mark, so that the subsequent information can be based on the
  • the prediction information similarity of n sample pairs determines the feature extraction performance of the model to be trained.
  • n is a positive integer
  • n ⁇ N N is a positive integer
  • N represents the number of sample pairs.
  • S104 Determine whether a preset stop condition is met, if yes, execute S106; if not, execute S105.
  • the preset stop condition can be set in advance.
  • the preset stop condition can be that the loss value of the model to be trained is lower than the preset loss threshold, or that the rate of change of the loss value of the model to be trained is lower than the preset rate of change threshold (that is, the model to be trained reaches convergence), It is also possible for the number of updates of the model to be trained to reach a preset number threshold.
  • the loss value of the model to be trained is used to describe the feature extraction performance of the model to be trained; and the embodiment of the present application does not limit the calculation method of the loss value of the model to be trained, existing or future ones can be used Any method capable of calculating the loss value of the model to be trained according to the predicted information similarity of the sample pair and the actual information similarity of the sample pair is implemented.
  • S105 Update the model to be trained according to the predicted information similarity of the sample pair and the actual information similarity of the sample pair, and return to S102.
  • the feature extraction performance of the current round of the model to be trained is still relatively poor, so it can be based on the similarity of the prediction information of the sample pair Degree and the difference between the actual information similarity of the sample pair, update the model to be trained, so that the updated model to be trained has better feature extraction performance, and use the updated model to be trained Continue to execute S102 and its subsequent steps.
  • the feature extraction model can be constructed according to the current round of the model to be trained (for example, directly determine the current round of the model to be trained as feature extraction or, according to the model structure and model parameters of the model to be trained in the current round, determine the model structure and model parameters of the feature extraction model, so that the model structure and model parameters of the feature extraction model are respectively the same as those of the model to be trained in the current round The model structure and model parameters remain the same), so that the feature extraction performance of the constructed feature extraction model is consistent with the feature extraction performance of the current round of the model to be trained, so that the constructed feature extraction model also has better features Extract performance.
  • the feature extraction model construction method after obtaining the actual information similarity between the sample doublet and the sample doublet, first use the sample doublet and the sample doublet
  • the actual information similarity of the group trains the model to be trained, so that the similarity between the extracted features of the sample image output by the trained model for the sample pair and the extracted features of the sample object text identifier is almost close to the
  • the similarity of the actual information of the sample binary groups so that the trained model to be trained has better feature extraction performance, and then the feature extraction model constructed based on the trained model to be trained also has better feature extraction performance, In this way, the subsequent target detection process can be performed more accurately based on the constructed feature extraction model, which is conducive to improving the accuracy of target detection.
  • the feature extraction model After the feature extraction model is constructed, the feature extraction model can be used for target detection. Based on this, an embodiment of the present application further provides a target detection method, which will be described below with reference to the accompanying drawings.
  • this figure is a flow chart of a target detection method provided by an embodiment of the present application.
  • the target detection method provided in the embodiment of this application includes S601-S603:
  • S601 Obtain an image to be detected and a text identification of an object to be detected.
  • the image to be detected refers to an image that needs to be subjected to target detection processing.
  • the text identifier of the object to be detected is used to uniquely identify the object to be detected. That is, S601-S603 may be used to determine whether there is an object to be detected that is uniquely identified by a text identifier of the object to be detected in the image to be detected.
  • the embodiment of the present application does not limit the text identification of the object to be detected.
  • the text identification of the object to be detected can be any sample object text identification used in the process of building the feature extraction model, or it can be any text identification except in Any object text identification other than the sample object text identification used in the construction of the feature extraction model.
  • the object text identifier to be detected may be a tiger.
  • the object detection method provided in the embodiment of the present application is an open world-oriented object detection method.
  • S602 Input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected output by the feature extraction model.
  • the feature extraction model is used to perform feature extraction on the input data of the feature extraction model; and the feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiment of the present application. For details, please refer to the above Method embodiment one .
  • the extracted features of the image to be detected are used to represent the information carried by the image to be detected.
  • the extracted features of the text identifier of the object to be detected are used to represent the information carried by the text identifier of the object to be detected.
  • the image to be detected and the text identifier of the object to be detected can be input into a pre-built feature extraction model, so that the feature extraction model is specific to the Feature extraction is performed on the image to be detected and the text mark of the object to be detected, and the extracted features of the image to be detected and the text mark of the object to be detected are obtained and output, so that the extracted features of the image to be detected can represent the
  • the information carried by the detection image and the extracted features of the text mark of the object to be detected can represent the information carried by the text mark of the object to be detected.
  • S603 Determine a target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • the target detection result corresponding to the image to be detected is used to describe the relationship between the image to be detected and the text identifier of the object to be detected.
  • this embodiment of the present application does not limit the representation of the target detection result corresponding to the image to be detected.
  • the target detection result corresponding to the image to be detected may include the The possibility that the object to be detected exists in the detection image (such as the possibility that each pixel in the image to be detected is located in the area where the object to be detected is located in the image to be detected), and/or the object to be detected is located in the area to be detected Detect locations in an image.
  • the embodiment of the present application does not limit the method of determining the target detection result corresponding to the image to be detected.
  • the process of determining the target detection result corresponding to the image to be detected may include steps 21-22:
  • Step 21 Calculate the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected is used to describe the degree of similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected.
  • this embodiment of the present application does not limit the representation of the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • it can be represented by an h ⁇ w-dimensional similarity matrix.
  • the similarity value in the i-th row and j-column in the h ⁇ w-dimensional similarity matrix can describe the information carried by the pixel in the i-th row and j-column in the image to be detected and the information carried by the text mark of the object to be detected.
  • the degree of similarity between the information can be used to indicate the possibility that the pixel point in row i and column j in the image to be detected is located in the area where the object to be detected is located in the image to be detected.
  • step 21 For the relevant content of step 21, please refer to the relevant content of S103 above, just replace “sample image” in S103 above with “image to be detected”, and replace “sample object text identifier” with “to be detected Detect object text mark” is enough.
  • Step 22 Determine the target detection result corresponding to the image to be detected according to the preset similarity condition and the similarity between the extracted features of the image to be detected and the extracted features of the text mark of the object to be detected.
  • the preset similarity condition can be set in advance, for example, if the similarity between the extracted feature of the image to be detected and the extracted feature of the text mark of the object to be detected is represented by an h ⁇ w-dimensional similarity matrix, Then the preset similarity condition may be greater than a preset similarity threshold (eg, 0.5).
  • a preset similarity threshold eg, 0.5
  • step 22 may specifically include: judging whether the similarity value in the i-th row and j-column in the above-mentioned h ⁇ w-dimensional similarity matrix is greater than the preset similarity threshold, if greater than the preset similarity threshold, then determine The information carried by the pixel point in the i-th row and j-column in the image to be detected is similar to the information carried by the text mark of the object to be detected, so it can be determined that the pixel point in the i-th row and j-column in the image to be detected is located in the object to be detected In the area where the image to be detected is located; if it is not greater than the preset similarity threshold, it can be determined that the information carried by the pixel
  • the constructed feature extraction model can be used to perform feature extraction on the image to be detected and the text identifier of the object to be detected, and obtain And output the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected; then determine the image to be detected according to the similarity between the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected Corresponding target detection results.
  • the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by
  • the text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
  • the target detection method provided in the embodiment of the present application can not only be based on the
  • the used sample object text identification for target detection can also be used for target detection based on any object text identification other than the sample object text identification used in the construction process of the feature extraction model, which is conducive to improving the feature extraction
  • the model is aimed at the target detection performance of the non-sample object, thereby helping to improve the target detection performance of the target detection method provided in the embodiment of the present application.
  • the embodiment of the present application does not limit the execution subject of the object detection method.
  • the object detection method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers.
  • the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer.
  • the server can be an independent server, a cluster server or a cloud server.
  • an embodiment of the present application further provides a device for constructing a feature extraction model, which will be explained and described below with reference to the accompanying drawings.
  • FIG. 7 is a schematic structural diagram of a feature extraction model construction device provided in an embodiment of the present application.
  • the feature extraction model construction device 700 provided in the embodiment of the present application includes:
  • the sample obtaining unit 701 is configured to obtain a sample double group and the actual information similarity of the sample double group; wherein, the sample double group includes a sample image and a sample object text identifier; the actual information of the sample double group
  • the information similarity is used to describe the degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
  • a feature prediction unit 702 configured to input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained; wherein, the extracted features of the sample pair include the Extracting features of the sample image and extracting features of the sample object text identifier;
  • a model updating unit 703 configured to update the model to be trained according to the actual information similarity of the sample pair and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier , and continue to execute the step of inputting the sample pair into the model to be trained until a preset stop condition is reached, and a feature extraction model is determined according to the model to be trained.
  • the model to be trained includes a text feature extraction sub-model and an image feature extraction sub-model;
  • the process of determining the feature extraction of the sample binary group includes:
  • the feature extraction model building device 700 also includes:
  • the initialization unit is configured to use preset prior knowledge to initialize the text feature extraction sub-model; wherein the preset prior knowledge is used to describe the relationship between different objects.
  • the process of determining the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier includes:
  • the similarity between the extracted features of the text identification determines the similarity between the extracted features of the sample image and the extracted features of the sample object text identification.
  • the process of determining the actual information similarity of the sample pair includes:
  • sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, then according to the actual position of the sample object in the sample image, determine the actual information similarity.
  • the model to be trained so that the similarity between the extracted features of the sample image output by the trained model for the sample pair and the extracted features of the sample object text identifier is almost close to the actual value of the sample pair.
  • Information similarity so that the trained model to be trained has better feature extraction performance, and then the feature extraction model constructed based on the trained model to be trained also has better feature extraction performance, so that the follow-up can be based on this
  • the constructed feature extraction model can perform the target detection process more accurately, which is conducive to improving the accuracy of target detection.
  • the embodiment of the present application also provides a target detection device, which will be explained and described below with reference to the accompanying drawings.
  • FIG. 8 this figure is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
  • the target detection device 800 provided in the embodiment of the present application includes:
  • An information acquisition unit 801 configured to acquire an image to be detected and a text identification of an object to be detected
  • a feature extraction unit 802 configured to input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the text of the object to be detected output by the feature extraction model The extracted features of the identification; wherein, the feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiment of the present application;
  • the result determination unit 803 is configured to determine a target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  • the constructed feature extraction model can be used to perform feature extraction on the image to be detected and the text identifier of the object to be detected, Obtain and output the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected; then determine the detection The object detection result corresponding to the image.
  • the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by
  • the text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
  • the target detection method provided in the embodiment of the present application can not only be based on the
  • the used sample object text identification for target detection can also be used for target detection based on any object text identification other than the sample object text identification used in the construction process of the feature extraction model, which is conducive to improving the feature extraction
  • the model is aimed at the target detection performance of non-sample objects, so as to help improve the target detection performance of the target detection device 800 provided in the embodiment of the present application.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the feature extraction model construction method provided in the embodiment of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiment of the application.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the feature extraction model construction method provided in the embodiment of the present application. Any implementation manner, or execute any implementation manner of the target detection method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product, which, when running on the terminal device, enables the terminal device to execute any implementation manner of the feature extraction model construction method provided in the embodiment of the present application , or execute any implementation of the target detection method provided in the embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a feature extraction model construction method and a target detection method, and a device therefor. Firstly, a feature extraction model is constructed by using a sample binary group and an actual information similarity of the sample binary group, such that the constructed feature extraction model has a better feature extraction performance; then, by using the constructed feature extraction model, feature extraction is performed on an image under test and an object text identifier under test, so as to obtain and output an extracted feature of the image under test and an extracted feature of the object text identifier under test; and finally, according to a similarity between the extracted feature of the image under test and the extracted feature of the object text identifier under test, a target detection result corresponding to the image under test is determined, such that the target detection result can accurately represent an association relationship between the image under test and the object text identifier under test, thereby facilitating an improvement in the target detection accuracy.

Description

一种特征提取模型构建方法、目标检测方法及其设备A feature extraction model construction method, target detection method and equipment thereof
本申请要求于2021年6月28日提交中国国家知识产权局、申请号为202110723063.X、申请名称为“一种特征提取模型构建方法、目标检测方法及其设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on June 28, 2021, with the application number 202110723063.X, and the application title "a method for building a feature extraction model, a method for object detection and its equipment" , the entire contents of which are incorporated in this application by reference.
技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种特征提取模型构建方法、目标检测方法及其设备。The present application relates to the technical field of image processing, and in particular to a feature extraction model building method, a target detection method and equipment thereof.
背景技术Background technique
目标检测(也称,目标提取)是一种基于目标几何统计及特征的图像分割技术;而且目标检测的应用领域十分广泛(如,目标检测可以应用于机器人或者自动驾驶等领域)。Target detection (also known as target extraction) is an image segmentation technology based on target geometric statistics and features; and target detection has a wide range of applications (for example, target detection can be applied to robotics or automatic driving and other fields).
然而,因现有的目标检测技术依旧存在一些缺陷,使得如何提高目标检测准确性仍是一个亟待解决的技术问题。However, because the existing target detection technology still has some defects, how to improve the accuracy of target detection is still a technical problem to be solved urgently.
发明内容Contents of the invention
为了解决现有技术中存在的以上技术问题,本申请提供一种特征提取模型构建方法、目标检测方法及其设备,能够提高目标检测准确性。In order to solve the above technical problems in the prior art, the present application provides a feature extraction model construction method, a target detection method and equipment thereof, which can improve the accuracy of target detection.
为了实现上述目的,本申请实施例提供的技术方案如下:In order to achieve the above objectives, the technical solutions provided in the embodiments of the present application are as follows:
本申请实施例提供一种特征提取模型构建方法,所述方法包括:The embodiment of the present application provides a method for constructing a feature extraction model, the method comprising:
获取样本二元组和所述样本二元组的实际信息相似度;其中,所述样本二元组包括样本图像和样本物体文本标识;所述样本二元组的实际信息相似度用于描述所述样本图像实际携带的信息和所述样本物体文本标识实际携带的信息之间的相似程度;Obtaining the similarity of actual information between the sample pair and the sample pair; wherein, the sample pair includes a sample image and a sample object text identifier; the actual information similarity of the sample pair is used to describe the The degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
将所述样本二元组输入待训练模型,得到所述待训练模型输出的所述样本二元组的提取特征;其中,所述样本二元组的提取特征包括所述样本图像的提取特征和所述样本物体文本标识的提取特征;The sample pair is input into the model to be trained, and the extracted feature of the sample pair output by the model to be trained is obtained; wherein, the extracted feature of the sample pair includes the extracted feature of the sample image and Extracting features of the text identifier of the sample object;
将所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度,确定为所述样本二元组的预测信息相似度;determining the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier as the predicted information similarity of the sample binary group;
根据所述样本二元组的实际信息相似度与所述样本二元组的预测信息相似度,更新所述待训练模型,并继续执行所述将所述样本二元组输入待训练模型的步骤,直至在达到预设停止条件时,根据所述待训练模型,确定特征提取模型。updating the model to be trained according to the actual information similarity of the sample pair and the predicted information similarity of the sample pair, and continuing to execute the step of inputting the sample pair into the model to be trained , until the preset stop condition is reached, the feature extraction model is determined according to the model to be trained.
在一种可能的实施方式中,所述待训练模型包括文本特征提取子模型和图像特征提取子模型;In a possible implementation manner, the model to be trained includes a text feature extraction sub-model and an image feature extraction sub-model;
所述样本二元组的提取特征的确定过程,包括:The process of determining the feature extraction of the sample binary group includes:
将所述样本图像输入所述图像特征提取子模型,得到所述图像特征提取子模型输出的所述样本图像的提取特征;Inputting the sample image into the image feature extraction sub-model to obtain the extracted features of the sample image output by the image feature extraction sub-model;
将所述样本物体文本标识输入所述文本特征提取子模型,得到所述文本特征提取子模型输出的所述样本物体文本标识的提取特征。Inputting the sample object text identifier into the text feature extraction sub-model to obtain the extracted features of the sample object text identifier output by the text feature extraction sub-model.
在一种可能的实施方式中,在所述将所述样本二元组输入待训练模型之前,所述方法还包括:In a possible implementation manner, before inputting the sample pair into the model to be trained, the method further includes:
利用预设先验知识,对所述文本特征提取子模型进行初始化处理,以使初始化处理后的文本特征提取子模型针对任意两个物体输出的文本特征之间的相似度与所述两个物体之间的关联度呈正相关;其中,所述预设先验知识用于描述不同物体之间的关联度。Using preset prior knowledge, the text feature extraction sub-model is initialized, so that the similarity between the text features output by the initialized text feature extraction sub-model for any two objects is the same as that of the two objects The degree of correlation between them is positively correlated; wherein, the preset prior knowledge is used to describe the degree of correlation between different objects.
在一种可能的实施方式中,若所述样本图像的提取特征包括所述样本图像的特征图,则所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度的确定过程,包括:In a possible implementation manner, if the extracted feature of the sample image includes a feature map of the sample image, the similarity between the extracted feature of the sample image and the extracted feature of the sample object text identifier Determine the process, including:
分别确定所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度;Respectively determining the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier;
根据所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度,确定所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度。According to the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier, determine the relationship between the extracted feature of the sample image and the extracted feature of the sample object text identifier similarity.
在一种可能的实施方式中,所述样本二元组的实际信息相似度的确定过程,包括:In a possible implementation manner, the process of determining the actual information similarity of the sample pair includes:
若所述样本物体文本标识用于唯一标识样本物体,且所述样本图像包括所述样本物体,则根据所述样本物体在所述样本图像中的实际位置,确定所述样本二元组的实际信息相似度。If the sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, then according to the actual position of the sample object in the sample image, determine the actual information similarity.
本申请实施例还提供了一种目标检测方法,所述方法包括:The embodiment of the present application also provides a target detection method, the method comprising:
获取待检测图像和待检测物体文本标识;Obtain the image to be detected and the text identification of the object to be detected;
将所述待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到所述特征提取模型输出的所述待检测图像的提取特征和所述待检测物体文本标识的提取特征;其中,所述特征提取模型是利用本申请实施例提供的特征提取模型构建方法的任一实施方式进行构建的;Inputting the image to be detected and the text identification of the object to be detected into a pre-built feature extraction model to obtain the extraction features of the image to be detected output by the feature extraction model and the extraction features of the text identification of the object to be detected; wherein, The feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiments of the present application;
根据所述待检测图像的提取特征与所述待检测物体文本标识的提取特征之间的相似度,确定所述待检测图像对应的目标检测结果。A target detection result corresponding to the image to be detected is determined according to the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
本申请实施例还提供了一种特征提取模型构建装置,包括:The embodiment of the present application also provides a feature extraction model construction device, including:
样本获取单元,用于获取样本二元组和所述样本二元组的实际信息相似度;其中,所述样本二元组包括样本图像和样本物体文本标识;所述样本二元组的实际信息相似度用于描述所述样本图像实际携带的信息和所述样本物体文本标识实际携带的信息之间的相似程度;A sample acquisition unit, configured to acquire a similarity between a sample pair and the actual information of the sample pair; wherein, the sample pair includes a sample image and a sample object text identifier; the actual information of the sample pair The similarity is used to describe the degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
特征预测单元,用于将所述样本二元组输入待训练模型,得到所述待训练模型输出的所述样本二元组的提取特征;其中,所述样本二元组的提取特征包括所述样本图像的提取特征和所述样本物体文本标识的提取特征;A feature prediction unit, configured to input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained; wherein, the extracted features of the sample pair include the Extracting features of the sample image and extracting features of the text identifier of the sample object;
模型更新单元,用于根据所述样本二元组的实际信息相似度、以及所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度,更新所述待训练模型,并继续执行所述将所述样本二元组输入待训练模型的步骤,直至在达到预设停止条件时,根据所述待训练模型,确定特征提取模型。a model updating unit, configured to update the model to be trained according to the actual information similarity of the sample pair and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, And continue to execute the step of inputting the sample pair into the model to be trained until the preset stop condition is reached, and a feature extraction model is determined according to the model to be trained.
本申请实施例还提供了一种目标检测装置,包括:The embodiment of the present application also provides a target detection device, including:
信息获取单元,用于获取待检测图像和待检测物体文本标识;An information acquisition unit, configured to acquire the image to be detected and the text identification of the object to be detected;
特征提取单元,用于将所述待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到所述特征提取模型输出的所述待检测图像的提取特征和所述待检测物体文本标识的提取特征;其中,所述特征提取模型是利用本申请实施例提供的特征提取模型构建方法的任一实施方式进行构建的;A feature extraction unit, configured to input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the text identifier of the object to be detected output by the feature extraction model feature extraction; wherein, the feature extraction model is constructed using any implementation of the method for constructing a feature extraction model provided in the embodiments of the present application;
结果确定单元,用于根据所述待检测图像的提取特征与所述待检测物体文本标识的提取特征之间的相似度程度,确定所述待检测图像对应的目标检测结果。The result determination unit is configured to determine the target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
本申请实施例还提供了一种设备,其特征在于,所述设备包括处理器以及 存储器:The embodiment of the present application also provides a device, which is characterized in that the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The processor is configured to execute any implementation of the feature extraction model construction method provided in the embodiment of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiment of the application.
本申请实施例还提供了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The embodiment of the present application also provides a computer-readable storage medium, which is characterized in that the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the feature extraction model construction method provided in the embodiment of the present application Any implementation of the method, or execute any implementation of the target detection method provided by the embodiment of the present application.
本申请实施例还提供了一种计算机程序产品,其特征在于,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The embodiment of the present application also provides a computer program product, which is characterized in that, when the computer program product runs on the terminal device, the terminal device executes any implementation method of the feature extraction model construction method provided in the embodiment of the present application way, or execute any implementation of the target detection method provided in the embodiment of the present application.
与现有技术相比,本申请实施例至少具有以下优点:Compared with the prior art, the embodiment of the present application has at least the following advantages:
本申请实施例提供的技术方案中,先利用样本二元组和该样本二元组的实际信息相似度构建特征提取模型,以使构建好的特征提取模型具有较好的特征提取性能;再利用构建好的特征提取模型针对待检测图像和待检测物体文本标识进行特征提取,得到并输出该待检测图像的提取特征和该待检测物体文本标识的提取特征;最后,依据该待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度,确定该待检测图像对应的目标检测结果。In the technical solution provided by the embodiment of the present application, the feature extraction model is first constructed by using the similarity between the sample pair and the actual information of the sample pair, so that the constructed feature extraction model has better feature extraction performance; The constructed feature extraction model performs feature extraction for the image to be detected and the text identifier of the object to be detected, and obtains and outputs the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected; finally, according to the extraction of the image to be detected The similarity between the feature and the extracted feature of the text mark of the object to be detected determines the target detection result corresponding to the image to be detected.
其中,因待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度能够准确地表示出该待检测图像携带的信息与该待检测物体文本标识携带的信息之间的相似程度,使得基于该相似度确定的该待检测图像对应的目标检测结果能够准确地表示出该待检测图像与该待检测物体文本标识之间的关联关系(如,该待检测图像中是否存在由待检测物体文本标识唯一标识的目标物体,以及该目标物体在该待检测图像中的位置等),如此有利于提高目标检测准确性。Among them, because the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by The text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种特征提取模型构建方法的流程图;Fig. 1 is a flow chart of a method for constructing a feature extraction model provided by an embodiment of the present application;
图2为本申请实施例提供的第n个样本二元组的示意图;FIG. 2 is a schematic diagram of the nth sample binary group provided by the embodiment of the present application;
图3为本申请实施例提供的一种包括多个物体的样本图像的示意图;FIG. 3 is a schematic diagram of a sample image including multiple objects provided by an embodiment of the present application;
图4为本申请实施例提供的待训练模型的结构示意图;FIG. 4 is a schematic structural diagram of a model to be trained provided in an embodiment of the present application;
图5为本申请实施例提供的不同物体之间的关联关系示意图;FIG. 5 is a schematic diagram of the relationship between different objects provided by the embodiment of the present application;
图6为本申请实施例提供的一种目标检测方法的流程图;FIG. 6 is a flow chart of a target detection method provided in an embodiment of the present application;
图7为本申请实施例提供的一种特征提取模型构建装置的结构示意图;FIG. 7 is a schematic structural diagram of a feature extraction model construction device provided in an embodiment of the present application;
图8为本申请实施例提供的一种目标检测装置的结构示意图。FIG. 8 is a schematic structural diagram of an object detection device provided by an embodiment of the present application.
具体实施方式detailed description
发明人在针对目标检测的研究中发现,若一个图像中存在目标物体(如,猫),则该图像携带的信息应该和该目标物体的物体文本标识携带的信息应该类似(例如,目标物体在该图像中所处区域内各个像素点携带的信息应该与该目标物体的物体文本标识携带的信息相同)。The inventor found in the research on target detection that if there is a target object (such as a cat) in an image, the information carried by the image should be similar to the information carried by the object text identifier of the target object (for example, the target object is in The information carried by each pixel in the region of the image should be the same as the information carried by the object text identifier of the target object).
基于上述发现,本申请实施例提供了一种特征提取模型构建方法,该方法包括:获取样本二元组和该样本二元组的实际信息相似度,以使该样本二元组包括样本图像和样本物体文本标识,以及该样本二元组的实际信息相似度用于描述该样本图像实际携带的信息和该样本物体文本标识实际携带的信息之间的相似程度;将该样本二元组输入待训练模型,得到该待训练模型输出的该样本二元组的提取特征;其中,该样本二元组的提取特征包括该样本图像的提取特征和该样本物体文本标识的提取特征;根据该样本二元组的实际信息相似度、以及该样本图像的提取特征与该样本物体文本标识的提取特征之间的相似度,更新该待训练模型,并继续执行该将该样本二元组输入待训练模型的步骤,直至在达到预设停止条件时,根据该待训练模型,确定特征提取模型。Based on the above findings, the embodiment of the present application provides a method for constructing a feature extraction model, the method includes: obtaining the similarity between the sample doublet and the actual information of the sample doublet, so that the sample doublet includes the sample image and The sample object text identifier and the actual information similarity of the sample binary group are used to describe the similarity between the information actually carried by the sample image and the information actually carried by the sample object text identifier; the sample binary group is input into the waiting Training the model to obtain the extraction feature of the sample binary group output by the model to be trained; wherein, the extraction feature of the sample binary group includes the extraction feature of the sample image and the extraction feature of the sample object text identifier; according to the sample binary The actual information similarity of the tuple, and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, update the model to be trained, and continue to execute the process of inputting the sample binary group into the model to be trained Steps until the preset stop condition is reached, according to the model to be trained, the feature extraction model is determined.
可见,因由训练好的待训练模型针对该样本二元组输出的样本图像的提取特征与样本物体文本标识的提取特征分别能够准确地表示出该样本图像携 带的信息以及该样本物体文本标识携带的信息,使得该样本图像的提取特征与该样本物体文本标识的提取特征之间的相似度几乎接近于该样本二元组的实际信息相似度,从而使得训练好的待训练模型具有较好的特征提取性能,进而使得基于该训练好的待训练模型构建的特征提取模型也具有较好的特征提取性能,如此使得后续能够基于该构建好的特征提取模型进行更准确地目标检测过程,有利于提高目标检测准确性。It can be seen that the extracted features of the sample image and the extracted features of the sample object text identifier output by the trained model for the sample binary group can accurately represent the information carried by the sample image and the information carried by the sample object text identifier. Information, so that the similarity between the extracted features of the sample image and the extracted features of the sample object text is almost close to the actual information similarity of the sample binary group, so that the trained model to be trained has better features Extraction performance, so that the feature extraction model built based on the trained model to be trained also has better feature extraction performance, so that the subsequent target detection process can be performed more accurately based on the built feature extraction model, which is conducive to improving Object detection accuracy.
另外,本申请实施例不限定特征提取模型构建方法的执行主体,例如,本申请实施例提供的特征提取模型构建方法可以应用于终端设备或服务器等数据处理设备。其中,终端设备可以为智能手机、计算机、个人数字助理(Personal Digital Assitant,PDA)或平板电脑等。服务器可以为独立服务器、集群服务器或云服务器。In addition, the embodiment of the present application does not limit the execution subject of the feature extraction model construction method. For example, the feature extraction model construction method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers. Wherein, the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer. The server can be an independent server, a cluster server or a cloud server.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
为了便于理解本申请的技术方案,下面先介绍特征提取模型构建方法(也就是,特征提取模型的构建过程)的相关内容,再介绍目标检测方法(也就是,特征提取模型的应用过程)的相关内容。In order to facilitate the understanding of the technical solution of this application, the following first introduces the relevant content of the feature extraction model construction method (that is, the construction process of the feature extraction model), and then introduces the relevant content of the target detection method (that is, the application process of the feature extraction model) content.
方法实施例一Method embodiment one
参见图1,该图为本申请实施例提供的一种特征提取模型构建方法的流程图。Referring to FIG. 1 , this figure is a flow chart of a method for constructing a feature extraction model provided by an embodiment of the present application.
本申请实施例提供的特征提取模型构建方法,包括S101-S106:The feature extraction model construction method provided in the embodiment of the present application includes S101-S106:
S101:获取样本二元组和该样本二元组的实际信息相似度。S101: Obtain a similarity between a sample pair and the actual information of the sample pair.
样本二元组是指在待训练模型的训练过程中需要输入该待训练模型的模型输入数据;而且该样本二元组包括样本图像和样本物体文本标识。其中,样本图像是指需要进行目标检测处理的图像。样本物体文本标识用于唯一标识样本物体。The sample pair refers to the model input data that needs to be input to the model to be trained during the training process of the model to be trained; and the sample pair includes a sample image and a text identifier of a sample object. Wherein, the sample image refers to an image that needs to be subjected to target detection processing. The sample object text identifier is used to uniquely identify the sample object.
需要说明的是,本申请实施例不限定样本物体文本标识,例如,样本物体文本标识可以是物体类别名称(如,猫)。It should be noted that the embodiment of the present application does not limit the sample object text identifier, for example, the sample object text identifier may be an object category name (for example, a cat).
另外,本申请实施例不限定样本二元组的个数,例如,样本二元组的个数可以是N。其中,N为正整数。也就是,可以利用N个样本二元组对待训练模型进行训练。In addition, the embodiment of the present application does not limit the number of sample pairs, for example, the number of sample pairs may be N. Wherein, N is a positive integer. That is, the model to be trained can be trained using N sample pairs.
此外,本申请实施例也不限定样本二元组的样本类型,例如,当第n个样本二元组包括第n个样本图像和第n个样本物体文本标识,且该第n个样本物体文本标识用于唯一标识第n个样本物体时,若该第n个样本图像中存在该第n个样本物体,则可以确定该第n个样本二元组属于正样本;若该第n个样本图像中不存在该第n个样本物体,则可以确定该第n个样本二元组属于负样本。In addition, the embodiment of the present application does not limit the sample type of the sample pair, for example, when the nth sample pair includes the nth sample image and the nth sample object text identifier, and the nth sample object text When the identification is used to uniquely identify the nth sample object, if the nth sample object exists in the nth sample image, it can be determined that the nth sample binary group belongs to a positive sample; if the nth sample image If the nth sample object does not exist in , it can be determined that the nth sample pair belongs to a negative sample.
样本二元组的实际信息相似度用于描述样本图像实际携带的信息和样本物体文本标识实际携带的信息之间的相似程度,以使该样本二元组的实际信息相似度能够准确地表示出该样本图像与该样本物体文本标识之间的关联关系;其具体可以包括:当该样本物体文本标识用于唯一标识样本物体时,若该样本二元组的实际信息相似度越大,则表示该样本图像中存在该样本物体的可能性越大;若该样本二元组的实际信息相似度越小,则表示该样本图像中存在该样本物体的可能性越小。The actual information similarity of the sample binary group is used to describe the similarity between the information actually carried by the sample image and the information actually carried by the sample object text mark, so that the actual information similarity of the sample binary group can accurately represent The relationship between the sample image and the sample object text identifier; specifically, it may include: when the sample object text identifier is used to uniquely identify the sample object, if the actual information similarity of the sample binary group is greater, it means The greater the possibility that the sample object exists in the sample image; the smaller the actual information similarity of the sample binary group is, the less likely it is that the sample object exists in the sample image.
理论上,对于第n个样本二元组(如图2所示)来说,若第n个样本图像中存在第n个样本物体,则该第n个样本图像实际携带的信息应该尽可能地接近于第n个样本物体文本标识实际携带的信息(例如,该第n个样本物体在该第n个样本图像中所处区域内各个像素点实际携带的信息应该与该第n个样本物体文本标识实际携带的信息保持相同)。Theoretically, for the nth sample pair (as shown in Figure 2), if there is an nth sample object in the nth sample image, the information actually carried by the nth sample image should be as much as possible Close to the information actually carried by the text identifier of the nth sample object (for example, the information actually carried by each pixel in the area where the nth sample object is located in the nth sample image should be the same as the information carried by the nth sample object text The information actually carried by the identifier remains the same).
基于上述理论,本申请实施例提供了一种样本二元组的实际信息相似度的获取过程,其具体可以包括:若样本物体文本标识用于唯一标识样本物体,且样本图像包括该样本物体,则根据该样本物体在该样本图像中的实际位置,确定该样本二元组的实际信息相似度。Based on the above theory, the embodiment of the present application provides a process of obtaining the actual information similarity of the sample pair, which may specifically include: if the sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, Then, according to the actual position of the sample object in the sample image, the actual information similarity of the sample binary group is determined.
另外,本申请实施例不限定样本二元组的实际信息相似度的确定过程,例如,在一种可能的实施方式下,其具体可以包括:先根据样本物体在样本图像中的实际位置,确定该样本物体的图像区域,以使该样本物体的图像区域能够表示出该样本物体在样本图像中所占区域;再将该样本物体的图像区 域内各个像素点对应的实际信息相似度均确定为第一预设相似度值(例如,1),并将该样本图像中除了该样本物体的图像区域以外的各个像素点对应的实际信息相似度均确定为第二预设相似度值(例如,0)。In addition, the embodiment of the present application does not limit the determination process of the actual information similarity of the sample pair. For example, in a possible implementation manner, it may specifically include: first, according to the actual position of the sample object in the sample image, determine The image area of the sample object, so that the image area of the sample object can represent the area occupied by the sample object in the sample image; then the actual information similarity corresponding to each pixel in the image area of the sample object is determined as The first preset similarity value (for example, 1), and the actual information similarity corresponding to each pixel point in the sample image except the image area of the sample object is determined as the second preset similarity value (for example, 0).
为了便于理解,下面结合示例进行说明。For ease of understanding, the following description will be given in combination with examples.
作为示例,若第n个样本二元组包括第n个样本图像和第n个样本物体文本标识,且该第n个样本图像为h×w×3维的图像,则该第n个样本二元组的实际信息相似度可以是依据公式(1)-(2)确定的h×w维的矩阵
Figure PCTCN2022089230-appb-000001
As an example, if the nth sample pair includes the nth sample image and the nth sample object text identifier, and the nth sample image is an h×w×3-dimensional image, then the nth sample two The actual information similarity of tuples can be a h×w-dimensional matrix determined according to formulas (1)-(2)
Figure PCTCN2022089230-appb-000001
Figure PCTCN2022089230-appb-000002
Figure PCTCN2022089230-appb-000002
Figure PCTCN2022089230-appb-000003
Figure PCTCN2022089230-appb-000003
式中,
Figure PCTCN2022089230-appb-000004
表示第n个样本二元组的实际信息相似度;
Figure PCTCN2022089230-appb-000005
表示第n个样本图像中第i行第j列像素点在该第n个样本图像内所处位置,i为正整数,i≤h,h为正整数,j为正整数,j≤w,w为正整数;Z n表示第n个样本物体在该第n个样本图像内所处区域;
Figure PCTCN2022089230-appb-000006
表示第n个样本图像中第i行第j列像素点实际携带的信息与第n个样本物体文本标识实际携带的信息之间的相似度,而且若
Figure PCTCN2022089230-appb-000007
则表示第n个样本物体在该第n个样本图像内所处区域包括该第n个样本图像中第i行第j列像素点,故可以确定第n个样本图像中第i行第j列像素点实际携带的信息与该第n个样本物体文本标识实际携带的信息是相同的,则该a ij=1;若
Figure PCTCN2022089230-appb-000008
则表示第n个样本物体在该第n个样本图像内所处区域不包括该第n个样本图像中第i行第j列像素点,故可以确定第n个样本图像中第i行第j列像素点实际携带的信息与该第n个样本物体文本标识实际携带的信息是不同的,则该a ij=0。
In the formula,
Figure PCTCN2022089230-appb-000004
Indicates the actual information similarity of the nth sample binary group;
Figure PCTCN2022089230-appb-000005
Indicates the position of the pixel point in the i-th row and j-column of the n-th sample image in the n-th sample image, i is a positive integer, i≤h, h is a positive integer, j is a positive integer, j≤w, w is a positive integer; Z n represents the area where the nth sample object is located in the nth sample image;
Figure PCTCN2022089230-appb-000006
Indicates the similarity between the information actually carried by the pixel point in the i-th row and j-th column in the n-th sample image and the information actually carried by the n-th sample object text logo, and if
Figure PCTCN2022089230-appb-000007
It means that the area where the nth sample object is located in the nth sample image includes the pixels in the ith row and jth column in the nth sample image, so it can be determined that the ith row and jth column in the nth sample image The information actually carried by the pixel is the same as the information actually carried by the nth sample object text identifier, then the a ij =1; if
Figure PCTCN2022089230-appb-000008
It means that the area where the nth sample object is located in the nth sample image does not include the pixel points in the ith row and jth column in the nth sample image, so it can be determined that the ith row and jth in the nth sample image The information actually carried by the pixel points of the column is different from the information actually carried by the text identifier of the nth sample object, then a ij =0.
基于上述公式(1)和(2)的相关内容可知,对于图2所示的第n个样本二元组来说,可以根据第n个样本物体在第n个样本图像中的位置(也就是,猫的位置),确定该第n个样本二元组的实际信息相似度;而且,该确定过程具体可以包括:当第n个样本二元组的实际信息相似度包括该第n个样本图像中各个像素点对应的实际信息相似度时,若该第n个样本图像中第i行第j列像素点位于该第n个样本物体在该第n个样本图像内所处区域范围内(如图2所示的物体边界框以内),则可以确定第n个样本图像中第i行第j列像素点对应的实际信息相似度为1;若该第n个样本图像中第i行第j列像素点位于该第n个样本 物体在该第n个样本图像内所处区域范围以外(如图2所示的物体边界框以外),则可以确定第n个样本图像中第i行第j列像素点对应的实际信息相似度为0。Based on the relevant content of the above formulas (1) and (2), it can be seen that for the nth sample binary group shown in Figure 2, the position of the nth sample object in the nth sample image (that is, , the position of the cat) to determine the actual information similarity of the nth sample pair; moreover, the determination process may specifically include: when the actual information similarity of the nth sample pair includes the nth sample image When the actual information similarity corresponding to each pixel in the nth sample image is located in the area where the nth sample object is located in the nth sample image (such as Within the object bounding box shown in Figure 2), it can be determined that the actual information similarity corresponding to the i-th row and j-th column pixel in the n-th sample image is 1; if the i-th row and j-th pixel in the n-th sample image Column pixel points are located outside the range of the nth sample object in the nth sample image (outside the object bounding box as shown in Figure 2), then it can be determined that the ith row jth in the nth sample image The actual information similarity corresponding to the column pixels is 0.
另外,当第n个样本二元组包括第n个样本图像和第n个样本物体文本标识,第n个样本图像(如图3所示图像)中存在Q(如,3)个物体,且第n个样本物体文本标识用于唯一标识该第n个样本图像中第q个物体(如图3中狗、人或者马)时,则该第n个样本二元组的实际信息相似度可以根据第q个物体在该第n个样本图像中所占区域进行确定,其具体为:将第q个物体在该第n个样本图像中所占区域内各个像素点对应的实际信息相似度均确定为第一预设相似度值(例如,1),并将该第q个物体在该第n个样本图像中所占区域以外的各个像素点对应的实际信息相似度均确定为第二预设相似度值(例如,0)。其中,q为正整数,q≤Q。In addition, when the nth sample binary group includes the nth sample image and the nth sample object text identification, there are Q (eg, 3) objects in the nth sample image (such as the image shown in Figure 3), and When the nth sample object text identifier is used to uniquely identify the qth object in the nth sample image (such as a dog, person or horse in Figure 3), the actual information similarity of the nth sample binary group can be Determine according to the area occupied by the qth object in the nth sample image, specifically: the actual information similarity corresponding to each pixel in the area occupied by the qth object in the nth sample image is averaged It is determined as the first preset similarity value (for example, 1), and the actual information similarity corresponding to each pixel point outside the area occupied by the qth object in the nth sample image is determined as the second preset Set a similarity value (for example, 0). Wherein, q is a positive integer, and q≤Q.
也就是,若想利用第n个样本图像和该第n个样本图像中第q个物体针对下文“待训练模型”进行训练,则需要根据该第n个样本图像和该第n个样本图像中第q个物体的物体文本标识构建一个样本二元组,并利用该第q个物体在该第n个样本图像中所占区域确定该样本二元组的实际信息相似度。That is, if you want to use the nth sample image and the qth object in the nth sample image to train the "model to be trained" below, you need to use the nth sample image and the nth sample image The object text identifier of the qth object constructs a sample pair, and the area occupied by the qth object in the nth sample image is used to determine the actual information similarity of the sample pair.
需要说明的是,图3中“dog”是指狗的物体文本标识;“person”是指人的物体文本标识;“horse”是指马的物体文本标识。It should be noted that in Figure 3, "dog" refers to the object text identifier of a dog; "person" refers to the object text identifier of a person; and "horse" refers to the object text identifier of a horse.
基于上述S101的相关内容可知,在获取到样本图像以及样本物体文本标识之后,可以根据该样本图像与该样本物体文本标识之间的关联关系(如,该样本图像中是否存在由该样本物体文本标识唯一标识的样本物体,以及该样本物体在该样本图像中所处位置),确定该样本图像实际携带的信息与该样本物体文本标识实际携带的信息之间的相似程度,以便后续能够在待训练模型的训练过程中以样本图像实际携带的信息与该样本物体文本标识实际携带的信息之间的相似程度作为学习目标。Based on the relevant content of S101 above, after the sample image and the sample object text identifier are acquired, according to the association relationship between the sample image and the sample object text identifier (for example, whether there is a sample object text in the sample image identify the uniquely identified sample object, and the location of the sample object in the sample image), determine the similarity between the information actually carried by the sample image and the information actually carried by the text identifier of the sample object, so that the subsequent During the training process of the training model, the similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object is taken as the learning goal.
S102:将样本二元组输入待训练模型,得到该待训练模型输出的该样本二元组的提取特征。S102: Input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained.
其中,样本二元组的提取特征用于表示该样本二元组携带的信息;而且该样本二元组的提取特征包括样本图像的提取特征和样本物体文本标识的提取特征。Wherein, the extracted features of the sample pair are used to represent the information carried by the sample pair; and the extracted features of the sample pair include the extracted features of the sample image and the extracted features of the text identifier of the sample object.
样本图像的提取特征用于表示该样本图像预测携带的信息。另外,本申请实施例不限定样本图像的提取特征的表示方式,例如,若一个样本图像为h×w×3维,则该样本图像的提取特征可以利用h×w×c维的特征图进行表示。The extracted features of the sample image are used to represent the information carried by the sample image prediction. In addition, the embodiment of the present application does not limit the representation of the feature extraction of the sample image. For example, if a sample image is h×w×3 dimensional, the feature extraction of the sample image can be performed using a h×w×c dimensional feature map. express.
样本物体文本标识的提取特征用于表示该样本物体文本标识预测携带的信息。另外,本申请实施例不限定样本物体文本标识的提取特征的表示方式,例如,一个样本物体文本标识的提取特征可以利用1×c维的特征向量进行表示。The extracted features of the sample object text identifier are used to represent the information carried by the sample object text identifier prediction. In addition, the embodiment of the present application does not limit the representation manner of the extracted features of the sample object text identifier. For example, the extracted features of a sample object text identifier may be represented by a 1×c-dimensional feature vector.
待训练模型用于针对该待训练模型的输入数据进行特征提取(如,针对文本数据进行文本特征提取,和/或,针对图像数据进行图像特征提取)。另外,本申请实施例不限定待训练模型的结构,例如,在一种可能的实施方式中,如图4所示,该待训练模型400可以包括文本特征提取子模型401和图像特征提取子模型402。The model to be trained is used to perform feature extraction on input data of the model to be trained (for example, perform text feature extraction on text data, and/or perform image feature extraction on image data). In addition, the embodiment of the present application does not limit the structure of the model to be trained. For example, in a possible implementation, as shown in FIG. 4 , the model 400 to be trained may include a text feature extraction sub-model 401 and an image feature extraction sub-model 402.
为了便于理解待训练模型400的工作原理,下面结合样本二元组的提取特征的确定过程为例进行说明。In order to facilitate the understanding of the working principle of the model to be trained 400 , the process of determining the extracted features of the sample binary group is taken as an example to describe below.
作为示例,利用待训练模型400确定样本二元组的提取特征的过程,具体可以包括步骤11-步骤12:As an example, the process of using the model to be trained 400 to determine the feature extraction of the sample pair may specifically include steps 11-12:
步骤11:将样本图像输入图像特征提取子模型402,得到该图像特征提取子模型402输出的该样本图像的提取特征。Step 11: Input the sample image into the image feature extraction sub-model 402 to obtain the extracted features of the sample image output by the image feature extraction sub-model 402 .
其中,图像特征提取子模型402用于进行图像特征提取;而且,本申请实施例不限定图像特征提取子模型402的实施方式,可以采用现有的或者未来出现的任一种具有图像特征提取功能的模型结构进行实施。Wherein, the image feature extraction sub-model 402 is used for image feature extraction; moreover, the embodiment of the present application does not limit the implementation of the image feature extraction sub-model 402, and any existing or future image feature extraction function can be used The model structure is implemented.
步骤12:将样本物体文本标识输入文本特征提取子模型401,得到该文本特征提取子模型401输出的该样本物体文本标识的提取特征。Step 12: Input the text mark of the sample object into the text feature extraction sub-model 401, and obtain the extracted features of the text mark of the sample object output by the text feature extraction sub-model 401.
其中,文本特征提取子模型401用于进行文本特征提取;而且,本申请实施例不限定文本特征提取子模型401的实施方式,可以采用现有的或者未来出现的任一种具有文本特征提取功能的模型结构(如,Bert、GPT-3等语言模型)进行实施。Wherein, the text feature extraction sub-model 401 is used for text feature extraction; moreover, the embodiment of the present application does not limit the implementation of the text feature extraction sub-model 401, and any existing or future text feature extraction function can be used The model structure (such as Bert, GPT-3 and other language models) is implemented.
基于上述步骤11至步骤12的相关内容可知,对于待训练模型400来说,在将样本二元组输入该待训练模型400之后,可以由该待训练模型400中图像特 征提取子模型402针对该样本二元组中样本图像进行图像特征提取,得到并输出该样本图像的提取特征,以使该样本图像的提取特征能够表示出该样本图像预测携带的信息;并且,由该待训练模型400中文本特征提取子模型401针对该样本二元组中样本物体文本标识进行文本特征提取,得到并输出该样本物体文本标识的提取特征,以使该样本物体文本标识的提取特征能够表示出该样本物体文本标识预测携带的信息。Based on the relevant content of the above step 11 to step 12, it can be known that for the model to be trained 400, after inputting the sample binary group into the model to be trained 400, the image feature extraction sub-model 402 in the model to be trained 400 can be used for the model to be trained. Perform image feature extraction on the sample image in the sample binary group, obtain and output the extracted feature of the sample image, so that the extracted feature of the sample image can represent the information carried by the sample image prediction; and, the model to be trained 400 Chinese This feature extraction sub-model 401 performs text feature extraction on the sample object text identifier in the sample binary group, obtains and outputs the extraction features of the sample object text identifier, so that the sample object text identifier extraction features can represent the sample object The text identifies the information carried by the prediction.
另外,为了进一步提高待训练模型400的特征提取性能,在对待训练模型400进行训练之前,可以先利用一些先验知识对该待训练模型400中文本特征提取子模型401进行初始化,以使文本特征提取子模型401后续能够基于这些先验知识进行文本特征提取。基于此,本申请实施例还提供了特征提取模型构建方法的一种可能的实施方式,在该实施方式中,该特征提取模型构建方法除了包括S101-S106以外,该方法还包括S107:In addition, in order to further improve the feature extraction performance of the model to be trained 400, before training the model 400 to be trained, some prior knowledge can be used to initialize the text feature extraction sub-model 401 of the model to be trained 400, so that the text features The extraction sub-model 401 can subsequently perform text feature extraction based on these prior knowledge. Based on this, the embodiment of the present application also provides a possible implementation of the feature extraction model construction method. In this embodiment, the feature extraction model construction method includes S107 in addition to S101-S106:
S107:利用预设先验知识,对文本特征提取子模型401进行初始化处理。S107: Initialize the text feature extraction sub-model 401 by using preset prior knowledge.
其中,预设先验知识用于描述不同物体之间的关联度(例如,如图5所示,猫与老虎同属猫科动物,使得猫与老虎之间的关联度较高;又如,狮子与母狮子都是狮子,使得狮子与母狮子之间的关联度更高)。Among them, the preset prior knowledge is used to describe the degree of association between different objects (for example, as shown in Figure 5, cats and tigers belong to the cat family, so that the degree of association between cats and tigers is high; are both lions and lionesses, making the association between lions and lionesses even higher).
需要说明的是,若两个物体之间的关联度为1,则表示该两个物体属于同一类物体;若两个物体之间的关联度为0,则表示该两个物体之间完全不存在相似点(也就是,该两个物体之间不存在关联关系)。It should be noted that if the correlation degree between two objects is 1, it means that the two objects belong to the same type of object; if the correlation degree between the two objects is 0, it means that the two objects are completely different There are similarities (that is, there is no association relationship between the two objects).
另外,本申请实施例不限定预设先验知识,例如,该预设先验知识可以包括预先构建的物体知识图谱。其中,物体知识图谱可以用于描述不同物体之间的关联度;而且该物体知识图谱可以预先根据大量与物体相关的知识信息进行构建。In addition, the embodiment of the present application does not limit the preset prior knowledge, for example, the preset prior knowledge may include a pre-built object knowledge graph. Among them, the object knowledge map can be used to describe the degree of correlation between different objects; and the object knowledge map can be constructed in advance based on a large amount of knowledge information related to objects.
此外,本申请实施例不限定S107中“初始化处理”的实施方式,例如,S107中“初始化处理”可以是指预训练。也就是,利用预设先验知识,对文本特征提取子模型401进行预训练,以使训练好的文本特征提取子模型401能够依据预设先验知识进行特征提取,从而使得初始化处理后的文本特征提取子模型401针对任意两个物体(尤其是针对该两个物体的物体标识)输出的文本特征之间的相似度与该两个物体之间的关联度呈正相关。In addition, this embodiment of the present application does not limit the implementation manner of "initialization processing" in S107. For example, the "initialization processing" in S107 may refer to pre-training. That is, the text feature extraction sub-model 401 is pre-trained using the preset prior knowledge, so that the trained text feature extraction sub-model 401 can perform feature extraction according to the preset prior knowledge, so that the initialized text The similarity between the text features output by the feature extraction sub-model 401 for any two objects (especially for the object identification of the two objects) is positively correlated with the degree of association between the two objects.
也就是,对于“初始化后的文本特征提取子模型401”来说,若在预设先验知识中表示第一物体与第二物体之间的关联度越高,则由“初始化后的文本特征提取子模型401”针对第一物体(如猫)与第二物体(如狮子)分别输出的文本特征(如图5中“v 5”和“v 3”)之间的相似度越高;若在预设先验知识中表示第一物体与第二物体之间的关联度越低,则由“初始化后的文本特征提取子模型401”针对第一物体与第二物体分别输出的文本特征之间的相似度越低。 That is, for the "initialized text feature extraction sub-model 401", if the preset prior knowledge indicates that the correlation between the first object and the second object is higher, then the "initialized text feature The higher the similarity between the text features (such as “v 5 ” and “v 3 ” in FIG. 5 ) respectively output by the extraction sub-model 401” for the first object (such as a cat) and the second object (such as a lion); if In the preset prior knowledge, the lower the degree of correlation between the first object and the second object is, the text features output by the "initialized text feature extraction sub-model 401" for the first object and the second object respectively The lower the similarity between.
需要说明的是,图5中“v 1”表示由“初始化后的文本特征提取子模型401”针对老虎输出的文本特征;“v 2”表示由“初始化后的文本特征提取子模型401”针对豹输出的文本特征;……(以此类推);“v 6”表示由“初始化后的文本特征提取子模型401”针对猞猁输出的文本特征。 It should be noted that “v 1 ” in Fig. 5 indicates the text features output by the “initialized text feature extraction sub-model 401” for tigers; “v 2 ” indicates that the “initialized text feature extraction sub-model 401” targets The text features output by Leopard; ... (and so on); "v 6 " indicates the text features output by the "initialized text feature extraction sub-model 401" for lynx.
还需要说明的是,本申请实施例不限定S107的执行时间,只需在执行S102之前完成执行即可(也就是,S107只需在对待训练模型进行训练之前完成即可)。It should also be noted that the embodiment of the present application does not limit the execution time of S107, and it only needs to be completed before S102 is executed (that is, S107 only needs to be completed before training the model to be trained).
基于上述S107的相关内容可知,在利用样本二元组以及该样本二元组的实际信息相似度对待训练模型400进行训练之前,可以先利用预设先验知识,对该待训练模型400中文本特征提取子模型401进行预训练,以使该待训练模型400中文本特征提取子模型401能够学习到依据预设先验知识进行特征提取,从而使得在待训练模型400的训练过程中继续优化该文本特征提取子模型401的文本特征提取性能,从而使得训练好的待训练模型400中文本特征提取子模型401能够更好地依据预设先验知识进行特征提取,如此有利于提高待训练模型400的特征提取性能,从而有利于提高基于该待训练模型400构建的特征提取模型的特征提取性能,进而有利于提高利用该特征提取模型进行目标检测时的目标检测性能。Based on the relevant content of S107 above, it can be seen that before using the sample pair and the actual information similarity of the sample pair to train the model 400 to be trained, the text The feature extraction sub-model 401 is pre-trained, so that the text feature extraction sub-model 401 in the model to be trained 400 can learn to perform feature extraction according to preset prior knowledge, so that the training process of the model to be trained 400 continues to optimize the The text feature extraction performance of the text feature extraction sub-model 401, so that the text feature extraction sub-model 401 in the trained model 400 to be trained can better perform feature extraction based on preset prior knowledge, which is conducive to improving the model 400 to be trained feature extraction performance, which is beneficial to improve the feature extraction performance of the feature extraction model constructed based on the model to be trained 400, and further helps to improve the target detection performance when using the feature extraction model for target detection.
基于上述S102的相关内容可知,在获取到第n个样本二元组之后,可以将第n个样本二元组输入待训练模型,以使该待训练模型针对该第n个样本二元组中第n个样本图像和第n个样本物体文本标识分别进行特征提取,得到并输出该第n个样本图像的提取特征和该第n个样本物体文本标识的提取特征(也就是,第n个样本二元组的提取特征),以使该样本图像的提取特征以及该第n个样本物体文本标识的提取特征能够分别表示出该样本图像预测携带的信息 以及该样本物体文本标识预测携带的信息,以便后续能够基于该第n个样本图像的提取特征和该第n个样本物体文本标识的提取特征,确定该待训练模型的特征提取性能。其中,n为正整数,n≤N,N为正整数,N表示样本二元组的个数。Based on the relevant content of S102 above, after the nth sample pair is acquired, the nth sample pair can be input into the model to be trained, so that the model to be trained can target the nth sample pair in the nth sample pair. The nth sample image and the nth sample object text mark carry out feature extraction respectively, obtain and output the extraction feature of the nth sample image and the extraction feature of the nth sample object text mark (that is, the nth sample The extraction feature of the binary group), so that the extraction feature of the sample image and the extraction feature of the nth sample object text mark can respectively represent the information carried by the sample image prediction and the information carried by the sample object text mark prediction, In order to subsequently determine the feature extraction performance of the model to be trained based on the extracted features of the nth sample image and the extracted features of the nth sample object text identifier. Wherein, n is a positive integer, n≤N, N is a positive integer, and N represents the number of sample pairs.
S103:计算样本图像的提取特征与样本物体文本标识的提取特征之间的相似度,作为样本二元组的预测信息相似度。S103: Calculate the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, as the predicted information similarity of the sample binary group.
其中,样本二元组的预测信息相似度是指样本图像的提取特征与样本物体文本标识的提取特征之间的相似度,以使该样本二元组的预测信息相似度用于描述样本图像预测携带的信息和该样本物体文本标识预测携带的信息之间的相似程度。Among them, the similarity of the prediction information of the sample binary group refers to the similarity between the extracted features of the sample image and the extracted features of the sample object text identification, so that the similarity of the predicted information of the sample binary group can be used to describe the sample image prediction The degree of similarity between the information carried and the information carried by the sample object text identifier prediction.
另外,本申请实施例不限定样本二元组的预测信息相似度的确定方式(也就是,S103的实施方式),例如,在一种可能的实施方式中,若样本图像的提取特征包括该样本图像的特征图,则S103具体可以包括S1031-S1032:In addition, the embodiment of the present application does not limit the method of determining the similarity of the prediction information of the sample pair (that is, the implementation of S103). For example, in a possible implementation, if the extracted features of the sample image include the sample image feature map, S103 may specifically include S1031-S1032:
S1031:分别确定样本图像的特征图中各个像素级提取特征与样本物体文本标识的提取特征之间的相似度。S1031: Determine the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier.
其中,样本图像的特征图用于表示该样本图像携带的信息;而且,本申请实施例不限定样本图像的特征图,例如,若一个样本图像为h×w×3维,且样本物体文本标识的提取特征为1×c维,则该样本图像的特征图可以为h×w×c维。其中,h为正整数,w为正整数,c为正整数。Among them, the feature map of the sample image is used to represent the information carried by the sample image; moreover, the embodiment of the present application does not limit the feature map of the sample image, for example, if a sample image is h×w×3 dimensional, and the sample object text identifier The extracted feature of is 1×c dimension, then the feature map of the sample image can be h×w×c dimension. Wherein, h is a positive integer, w is a positive integer, and c is a positive integer.
另外,本申请实施例不限定样本图像的特征图的表示方式,例如,若样本图像的特征图为h×w×c维,则该样本图像的特征图可以利用h×w个像素级提取特征进行表示,且各个像素级提取特征均是1×c维。其中,样本图像的特征图中位于第i行第j列的像素级提取特征用于表示样本图像中第i行第j列像素点预测携带的信息。其中,i为正整数,i≤h;j为正整数,j≤w。In addition, the embodiment of the present application does not limit the representation of the feature map of the sample image. For example, if the feature map of the sample image is h×w×c-dimensional, the feature map of the sample image can use h×w pixel-level feature extraction To represent, and each pixel-level extracted feature is 1×c dimension. Among them, the pixel-level extracted feature located in the i-th row and j-column in the feature map of the sample image is used to represent the information carried by the pixel point prediction in the i-th row and j-column in the sample image. Wherein, i is a positive integer, i≤h; j is a positive integer, j≤w.
此外,本申请实施例不限定S1031的实施方式,例如,S1031可以利用公式(3)进行实施。In addition, the embodiment of the present application does not limit the implementation manner of S1031, for example, S1031 may be implemented using formula (3).
Figure PCTCN2022089230-appb-000009
Figure PCTCN2022089230-appb-000009
式中,b ij表示样本图像的特征图中位于第i行第j列的像素级提取特征与样本物体文本标识的提取特征之间的相似度,以使b ij用于描述该样本图像中第i行第j列像素点预测携带的信息与该样本物体文本标识预测携带的信息之间的 相似程度;
Figure PCTCN2022089230-appb-000010
表示样本图像的特征图中位于第i行第j列的像素级提取特征,以使
Figure PCTCN2022089230-appb-000011
用于描述该样本图像中第i行第j列像素点预测携带的信息,且
Figure PCTCN2022089230-appb-000012
是1×c维的特征向量;H n表示样本物体文本标识的提取特征,以使该H n用于描述该样本物体文本标识预测携带的信息,且H n是1×c维的特征向量;S(·)表示进行相似度计算;i为正整数,i≤h,h为正整数,j为正整数,j≤w,w为正整数。
In the formula, b ij represents the similarity between the pixel-level extracted features located in the i-th row and j-th column in the feature map of the sample image and the extracted features of the sample object text mark, so that b ij can be used to describe the first The degree of similarity between the predicted information carried by the pixel point in row i, column j, and the predicted information carried by the text mark of the sample object;
Figure PCTCN2022089230-appb-000010
Represents the pixel-level extraction features located in the i-th row and j-th column in the feature map of the sample image, so that
Figure PCTCN2022089230-appb-000011
It is used to describe the information carried by the pixel point prediction in the i-th row and j-th column in the sample image, and
Figure PCTCN2022089230-appb-000012
is a 1×c-dimensional feature vector; H n represents the extracted feature of the sample object text mark, so that this H n is used to describe the information carried by the sample object text mark prediction, and H n is a 1×c-dimensional feature vector; S(·) means to perform similarity calculation; i is a positive integer, i≤h, h is a positive integer, j is a positive integer, j≤w, w is a positive integer.
需要说明的是,本申请实施例不限定S(·)的实施方式,可以采用现有的任一种相似度计算方式(如,欧氏距离、余弦距离等)进行实施。It should be noted that the embodiment of the present application does not limit the implementation of S(·), and any existing similarity calculation method (eg, Euclidean distance, cosine distance, etc.) can be used for implementation.
S1032:根据样本图像的特征图中各个像素级提取特征与样本物体文本标识的提取特征之间的相似度,确定样本二元组的预测信息相似度。S1032: According to the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier, determine the similarity of the predicted information of the sample binary group.
本申请实施例不限定S1032,例如,S1032可以利用公式(4)进行计算。The embodiment of the present application does not limit S1032, for example, S1032 may be calculated by using formula (4).
Figure PCTCN2022089230-appb-000013
Figure PCTCN2022089230-appb-000013
式中,
Figure PCTCN2022089230-appb-000014
表示样本二元组的预测信息相似度(也就是,样本图像的提取特征与样本物体文本标识的提取特征之间的相似度);b ij表示样本图像的特征图中位于第i行第j列的像素级提取特征与样本物体文本标识的提取特征之间的相似度;i为正整数,i≤h,h为正整数,j为正整数,j≤w,w为正整数。
In the formula,
Figure PCTCN2022089230-appb-000014
Represents the similarity of the predicted information of the sample binary group (that is, the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier); The similarity between the pixel-level extracted features of the sample object and the extracted features of the sample object text mark; i is a positive integer, i≤h, h is a positive integer, j is a positive integer, j≤w, w is a positive integer.
基于S103的相关内容可知,对于包括第n个样本图像和第n个样本物体文本标识的第n个样本二元组来说,在获取到第n个样本图像的提取特征和第n个样本物体文本标识的提取特征之后,可以根据该第n个样本图像的提取特征和第n个样本物体文本标识的提取特征,计算该第n个样本二元组的预测信息相似度,以使该第n个样本二元组的预测信息相似度能够准确地描述出该第n个样本图像预测携带的信息和该第n个样本物体文本标识预测携带的信息之间的相似程度,以便后续能够基于该第n个样本二元组的预测信息相似度,确定该待训练模型的特征提取性能。其中,n为正整数,n≤N,N为正整数,N表示样本二元组的个数。Based on the relevant content of S103, it can be known that for the nth sample binary group including the nth sample image and the nth sample object text identifier, after obtaining the extracted features of the nth sample image and the nth sample object After extracting the features of the text mark, the predictive information similarity of the nth sample binary group can be calculated according to the extracting feature of the nth sample image and the extracting feature of the nth sample object text mark, so that the nth sample The predicted information similarity of the two sample pairs can accurately describe the similarity between the predicted information carried by the nth sample image and the predicted information carried by the nth sample object text mark, so that the subsequent information can be based on the The prediction information similarity of n sample pairs determines the feature extraction performance of the model to be trained. Wherein, n is a positive integer, n≤N, N is a positive integer, and N represents the number of sample pairs.
S104:判断是否达到预设停止条件,若是,则执行S106;若否,则执行S105。S104: Determine whether a preset stop condition is met, if yes, execute S106; if not, execute S105.
其中,预设停止条件可以预先设定。例如,预设停止条件可以为待训练模型的损失值低于预设损失阈值,也可以为待训练模型的损失值变化率低于预设 变化率阈值(也就是,待训练模型达到收敛),还可以为待训练模型的更新次数达到预设次数阈值。Wherein, the preset stop condition can be set in advance. For example, the preset stop condition can be that the loss value of the model to be trained is lower than the preset loss threshold, or that the rate of change of the loss value of the model to be trained is lower than the preset rate of change threshold (that is, the model to be trained reaches convergence), It is also possible for the number of updates of the model to be trained to reach a preset number threshold.
需要说明的是,待训练模型的损失值用于描述该待训练模型的特征提取性能;而且本申请实施例不限定该待训练模型的损失值的计算方式,可以采用现有的或者未来出现的任一种能够根据样本二元组的预测信息相似度和该样本二元组的实际信息相似度,计算出该待训练模型的损失值的方法进行实施。It should be noted that the loss value of the model to be trained is used to describe the feature extraction performance of the model to be trained; and the embodiment of the present application does not limit the calculation method of the loss value of the model to be trained, existing or future ones can be used Any method capable of calculating the loss value of the model to be trained according to the predicted information similarity of the sample pair and the actual information similarity of the sample pair is implemented.
S105:根据样本二元组的预测信息相似度和该样本二元组的实际信息相似度,更新待训练模型,并返回继续执行S102。S105: Update the model to be trained according to the predicted information similarity of the sample pair and the actual information similarity of the sample pair, and return to S102.
本申请实施例中,在确定当前轮的待训练模型没有达到预设停止条件之后,可以确定当前轮的待训练模型的特征提取性能依旧比较差,故可以依据该样本二元组的预测信息相似度和该样本二元组的实际信息相似度之间的差异性,针对该待训练模型进行更新,以使更新后的待训练模型具有更好的特征提取性能,并利用更新后的待训练模型继续执行S102及其后续步骤。In the embodiment of the present application, after it is determined that the current round of the model to be trained does not meet the preset stop condition, it can be determined that the feature extraction performance of the current round of the model to be trained is still relatively poor, so it can be based on the similarity of the prediction information of the sample pair Degree and the difference between the actual information similarity of the sample pair, update the model to be trained, so that the updated model to be trained has better feature extraction performance, and use the updated model to be trained Continue to execute S102 and its subsequent steps.
S106:根据待训练模型,确定特征提取模型。S106: Determine a feature extraction model according to the model to be trained.
本申请实施例中,在确定当前轮的待训练模型达到预设停止条件之后,可以确定当前轮的待训练模型具有较好的特征提取性能(尤其,能够保证包括样本物体的样本图像的提取特征尽可能地接近于用于唯一标识该样本物体的样本物体文本标识的提取特征),故可以依据当前轮的待训练模型构建特征提取模型(如,直接将当前轮的待训练模型确定为特征提取模型;或者,根据当前轮的待训练模型的模型结构以及模型参数,确定特征提取模型的模型结构以及模型参数,以使该征提取模型的模型结构以及模型参数分别与当前轮的待训练模型的模型结构以及模型参数保持相同),如此使得构建好的特征提取模型的特征提取性能与该当前轮的待训练模型的特征提取性能保持一致,从而使得构建好的特征提取模型也具有较好的特征提取性能。In the embodiment of the present application, after it is determined that the model to be trained in the current round reaches the preset stop condition, it can be determined that the model to be trained in the current round has better feature extraction performance (in particular, it can ensure that the feature extraction of the sample image including the sample object As close as possible to the extraction feature of the sample object text identifier used to uniquely identify the sample object), so the feature extraction model can be constructed according to the current round of the model to be trained (for example, directly determine the current round of the model to be trained as feature extraction or, according to the model structure and model parameters of the model to be trained in the current round, determine the model structure and model parameters of the feature extraction model, so that the model structure and model parameters of the feature extraction model are respectively the same as those of the model to be trained in the current round The model structure and model parameters remain the same), so that the feature extraction performance of the constructed feature extraction model is consistent with the feature extraction performance of the current round of the model to be trained, so that the constructed feature extraction model also has better features Extract performance.
基于上述S101至S106的相关内容可知,对于特征提取模型构建方法来说,在获取到样本二元组和该样本二元组的实际信息相似度之后,先利用样本二元组和该样本二元组的实际信息相似度训练待训练模型,以使由训练好的待训练模型针对该样本二元组输出的样本图像的提取特征与样本物体文本标识的提取特征之间的相似度几乎接近于该样本二元组的实际信息相似度,从而使得训练好的待训练模型具有较好的特征提取性能,进而使得基于该训练好的待训练 模型构建的特征提取模型也具有较好的特征提取性能,如此使得后续能够基于该构建好的特征提取模型更准确地进行目标检测过程,有利于提高目标检测准确性。Based on the relevant content of the above S101 to S106, it can be seen that for the feature extraction model construction method, after obtaining the actual information similarity between the sample doublet and the sample doublet, first use the sample doublet and the sample doublet The actual information similarity of the group trains the model to be trained, so that the similarity between the extracted features of the sample image output by the trained model for the sample pair and the extracted features of the sample object text identifier is almost close to the The similarity of the actual information of the sample binary groups, so that the trained model to be trained has better feature extraction performance, and then the feature extraction model constructed based on the trained model to be trained also has better feature extraction performance, In this way, the subsequent target detection process can be performed more accurately based on the constructed feature extraction model, which is conducive to improving the accuracy of target detection.
在构建好特征提取模型之后,可以利用该特征提取模型进行目标检测。基于此,本申请实施例还提供了一种目标检测方法,下面结合附图进行说明。After the feature extraction model is constructed, the feature extraction model can be used for target detection. Based on this, an embodiment of the present application further provides a target detection method, which will be described below with reference to the accompanying drawings.
方法实施例二Method embodiment two
参见图6,该图为本申请实施例提供的一种目标检测方法的流程图。Referring to FIG. 6 , this figure is a flow chart of a target detection method provided by an embodiment of the present application.
本申请实施例提供的目标检测方法,包括S601-S603:The target detection method provided in the embodiment of this application includes S601-S603:
S601:获取待检测图像和待检测物体文本标识。S601: Obtain an image to be detected and a text identification of an object to be detected.
其中,待检测图像是指需要进行目标检测处理的图像。Wherein, the image to be detected refers to an image that needs to be subjected to target detection processing.
待检测物体文本标识用于唯一标识待检测物体。也就是,S601-S603可以用于确定待检测图像中是否存在由待检测物体文本标识唯一标识的待检测物体。The text identifier of the object to be detected is used to uniquely identify the object to be detected. That is, S601-S603 may be used to determine whether there is an object to be detected that is uniquely identified by a text identifier of the object to be detected in the image to be detected.
需要说明的是,本申请实施例不限定待检测物体文本标识,例如,待检测物体文本标识可以是在特征提取模型的构建过程中使用过的任一种样本物体文本标识,也可以是除了在特征提取模型的构建过程中使用过的样本物体文本标识以外的其他任一种物体文本标识。例如,若在特征提取模型的构建过程中没有使用过“老虎”这一物体文本标识,则该待检测物体文本标识可以是老虎。可见,本申请实施例提供的目标检测方法是一种面向开放世界的目标检测方法。It should be noted that the embodiment of the present application does not limit the text identification of the object to be detected. For example, the text identification of the object to be detected can be any sample object text identification used in the process of building the feature extraction model, or it can be any text identification except in Any object text identification other than the sample object text identification used in the construction of the feature extraction model. For example, if the object text identifier "tiger" has not been used in the process of building the feature extraction model, the object text identifier to be detected may be a tiger. It can be seen that the object detection method provided in the embodiment of the present application is an open world-oriented object detection method.
S602:将待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到该特征提取模型输出的该待检测图像的提取特征和该待检测物体文本标识的提取特征。S602: Input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected output by the feature extraction model.
其中,特征提取模型用于针对该特征提取模型的输入数据进行特征提取;而且该特征提取模型是利用本申请实施例提供的特征提取模型构建方法的任一实施方式进行构建的,详情请参见上文 方法实施例一Wherein, the feature extraction model is used to perform feature extraction on the input data of the feature extraction model; and the feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiment of the present application. For details, please refer to the above Method embodiment one .
待检测图像的提取特征用于表示该待检测图像携带的信息。The extracted features of the image to be detected are used to represent the information carried by the image to be detected.
待检测物体文本标识的提取特征用于表示该待检测物体文本标识携带的信息。The extracted features of the text identifier of the object to be detected are used to represent the information carried by the text identifier of the object to be detected.
基于S602的相关内容可知,在获取到待检测图像和待检测物体文本标识之后,可以将该待检测图像和该待检测物体文本标识输入预先构建的特征提取模型,以使该特征提取模型针对该待检测图像和该待检测物体文本标识分别进行特征提取,得到并输出该待检测图像的提取特征和该待检测物体文本标识的提取特征,以使该待检测图像的提取特征能够表示出该待检测图像携带的信息,以及该待检测物体文本标识的提取特征能够表示出该待检测物体文本标识携带的信息。Based on the relevant content of S602, after the image to be detected and the text identifier of the object to be detected are obtained, the image to be detected and the text identifier of the object to be detected can be input into a pre-built feature extraction model, so that the feature extraction model is specific to the Feature extraction is performed on the image to be detected and the text mark of the object to be detected, and the extracted features of the image to be detected and the text mark of the object to be detected are obtained and output, so that the extracted features of the image to be detected can represent the The information carried by the detection image and the extracted features of the text mark of the object to be detected can represent the information carried by the text mark of the object to be detected.
S603:根据待检测图像的提取特征与待检测物体文本标识的提取特征之间的相似度程度,确定该待检测图像对应的目标检测结果。S603: Determine a target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
其中,待检测图像对应的目标检测结果用于描述该待检测图像与待检测物体文本标识之间的关联关系。另外,本申请实施例不限定待检测图像对应的目标检测结果的表示方式,例如,若待检测物体文本标识用于唯一标识待检测物体,则该待检测图像对应的目标检测结果可以包括该待检测图像中存在该待检测物体的可能性(如,待检测图像中各个像素点位于该待检测物体在该待检测图像中所处区域内的可能性),和/或该待检测物体在待检测图像中的位置。Wherein, the target detection result corresponding to the image to be detected is used to describe the relationship between the image to be detected and the text identifier of the object to be detected. In addition, this embodiment of the present application does not limit the representation of the target detection result corresponding to the image to be detected. For example, if the text identifier of the object to be detected is used to uniquely identify the object to be detected, the target detection result corresponding to the image to be detected may include the The possibility that the object to be detected exists in the detection image (such as the possibility that each pixel in the image to be detected is located in the area where the object to be detected is located in the image to be detected), and/or the object to be detected is located in the area to be detected Detect locations in an image.
另外,本申请实施例不限定待检测图像对应的目标检测结果的确定方式,例如,该待检测图像对应的目标检测结果的确定过程可以包括步骤21-步骤22:In addition, the embodiment of the present application does not limit the method of determining the target detection result corresponding to the image to be detected. For example, the process of determining the target detection result corresponding to the image to be detected may include steps 21-22:
步骤21:计算该待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度。Step 21: Calculate the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
其中,待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度用于描述该待检测图像携带的信息与该待检测物体文本标识携带的信息之间的相似程度。Wherein, the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected is used to describe the degree of similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected.
另外,本申请实施例不限定待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度的表示方式,例如,可以利用h×w维的相似度矩阵进行表示,此时,该h×w维的相似度矩阵中位于第i行第j列的相似度值可以描述出该待检测图像中第i行第j列像素点携带的信息与该待检测物体文本标识携带的信息之间的相似程度,从而可以用于表示该待检测图像中第i行第j列像素点位于待检测物体在该待检测图像中所处区域内的可能性。In addition, this embodiment of the present application does not limit the representation of the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected. For example, it can be represented by an h×w-dimensional similarity matrix. , the similarity value in the i-th row and j-column in the h×w-dimensional similarity matrix can describe the information carried by the pixel in the i-th row and j-column in the image to be detected and the information carried by the text mark of the object to be detected The degree of similarity between the information can be used to indicate the possibility that the pixel point in row i and column j in the image to be detected is located in the area where the object to be detected is located in the image to be detected.
需要说明的是,步骤21的相关内容可以参见上文S103的相关内容,只需将上文S103中“样本图像”替换为“待检测图像”、以及将“样本物体文本标识”替换为“待检测物体文本标识”即可。It should be noted that for the relevant content of step 21, please refer to the relevant content of S103 above, just replace "sample image" in S103 above with "image to be detected", and replace "sample object text identifier" with "to be detected Detect object text mark" is enough.
步骤22:依据预设相似条件、以及待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度,确定该待检测图像对应的目标检测结果。Step 22: Determine the target detection result corresponding to the image to be detected according to the preset similarity condition and the similarity between the extracted features of the image to be detected and the extracted features of the text mark of the object to be detected.
其中,预设相似度条件可以预先设定,例如,若待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度是利用h×w维的相似度矩阵进行表示的,则该预设相似度条件可以为大于预设相似度阈值(如,0.5)。Wherein, the preset similarity condition can be set in advance, for example, if the similarity between the extracted feature of the image to be detected and the extracted feature of the text mark of the object to be detected is represented by an h×w-dimensional similarity matrix, Then the preset similarity condition may be greater than a preset similarity threshold (eg, 0.5).
可见,当待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度程度是利用h×w维的相似度矩阵进行表示的,且预设相似度条件为大于预设相似度阈值时,步骤22具体可以包括:判断上述h×w维的相似度矩阵中位于第i行第j列的相似度值是否大于预设相似度阈值,若大于预设相似度阈值,则确定该待检测图像中第i行第j列像素点携带的信息与该待检测物体文本标识携带的信息比较相似,故可以确定待检测图像中第i行第j列像素点位于待检测物体在该待检测图像中所处区域内;若不大于预设相似度阈值,则可以确定该待检测图像中第i行第j列像素点携带的信息与该待检测物体文本标识携带的信息不太相似,故可以确定待检测图像中第i行第j列像素点不位于待检测物体在该待检测图像中所处区域内。It can be seen that when the degree of similarity between the extracted features of the image to be detected and the extracted features of the text mark of the object to be detected is represented by an h×w-dimensional similarity matrix, and the preset similarity condition is greater than the preset similarity degree threshold, step 22 may specifically include: judging whether the similarity value in the i-th row and j-column in the above-mentioned h×w-dimensional similarity matrix is greater than the preset similarity threshold, if greater than the preset similarity threshold, then determine The information carried by the pixel point in the i-th row and j-column in the image to be detected is similar to the information carried by the text mark of the object to be detected, so it can be determined that the pixel point in the i-th row and j-column in the image to be detected is located in the object to be detected In the area where the image to be detected is located; if it is not greater than the preset similarity threshold, it can be determined that the information carried by the pixel point in the i-th row and j-th column in the image to be detected is not very similar to the information carried by the text identifier of the object to be detected , so it can be determined that the pixel point in row i and column j in the image to be detected is not located in the area where the object to be detected is located in the image to be detected.
基于上述S601至S603的相关内容可知,在获取到待检测图像和待检测物体文本标识之后,可以先利用构建好的特征提取模型针对该待检测图像和该待检测物体文本标识进行特征提取,得到并输出该待检测图像的提取特征和该待检测物体文本标识的提取特征;再依据该待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度,确定该待检测图像对应的目标检测结果。Based on the relevant content of the above S601 to S603, after obtaining the image to be detected and the text identifier of the object to be detected, the constructed feature extraction model can be used to perform feature extraction on the image to be detected and the text identifier of the object to be detected, and obtain And output the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected; then determine the image to be detected according to the similarity between the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected Corresponding target detection results.
可见,因待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度能够准确地表示出该待检测图像携带的信息与该待检测物体文本标识携带的信息之间的相似程度,使得基于该相似度确定的该待检测图像对应的目标检测结果能够准确地表示出该待检测图像与该待检测物体文本标识之间的关联关系(如,该待检测图像中是否存在由待检测物体文本标识唯一标识的目标物体,以及该目标物体在该待检测图像中的位置等),如此有利于提高目标检测准确性。It can be seen that, because the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by The text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
还因构建好的特征提取模型能够依据不同物体之间的关联关系针对任意一个物体文本标识进行文本特征提取,使得本申请实施例提供的目标检测方法不仅能够依据在该特征提取模型的构建过程中使用过的样本物体文本标识进行目标检测,还可以依据除了在该特征提取模型的构建过程中使用过的样本物体文本标识以外的任意一种物体文本标识进行目标检测,如此有利于提高该特征提取模型针对非样本物体的目标检测性能,从而有利于提高本申请实施例提供的目标检测方法的目标检测性能。Also, because the constructed feature extraction model can extract text features for any object text mark according to the association relationship between different objects, the target detection method provided in the embodiment of the present application can not only be based on the The used sample object text identification for target detection can also be used for target detection based on any object text identification other than the sample object text identification used in the construction process of the feature extraction model, which is conducive to improving the feature extraction The model is aimed at the target detection performance of the non-sample object, thereby helping to improve the target detection performance of the target detection method provided in the embodiment of the present application.
另外,本申请实施例不限定目标检测方法的执行主体,例如,本申请实施例提供的目标检测方法可以应用于终端设备或服务器等数据处理设备。其中,终端设备可以为智能手机、计算机、个人数字助理(Personal Digital Assitant,PDA)或平板电脑等。服务器可以为独立服务器、集群服务器或云服务器。In addition, the embodiment of the present application does not limit the execution subject of the object detection method. For example, the object detection method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers. Wherein, the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer. The server can be an independent server, a cluster server or a cloud server.
基于上述方法实施例提供的特征提取模型构建方法,本申请实施例还提供了一种特征提取模型构建装置,下面结合附图进行解释和说明。Based on the method for constructing a feature extraction model provided by the foregoing method embodiments, an embodiment of the present application further provides a device for constructing a feature extraction model, which will be explained and described below with reference to the accompanying drawings.
装置实施例一Device embodiment one
装置实施例一提供的特征提取模型构建装置的技术详情,请参照上述方法实施例。For the technical details of the feature extraction model construction device provided in the first device embodiment, please refer to the above method embodiment.
参见图7,该图为本申请实施例提供的一种特征提取模型构建装置的结构示意图。Refer to FIG. 7 , which is a schematic structural diagram of a feature extraction model construction device provided in an embodiment of the present application.
本申请实施例提供的特征提取模型构建装置700,包括:The feature extraction model construction device 700 provided in the embodiment of the present application includes:
样本获取单元701,用于获取样本二元组和所述样本二元组的实际信息相似度;其中,所述样本二元组包括样本图像和样本物体文本标识;所述样本二元组的实际信息相似度用于描述所述样本图像实际携带的信息和所述样本物体文本标识实际携带的信息之间的相似程度;The sample obtaining unit 701 is configured to obtain a sample double group and the actual information similarity of the sample double group; wherein, the sample double group includes a sample image and a sample object text identifier; the actual information of the sample double group The information similarity is used to describe the degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
特征预测单元702,用于将所述样本二元组输入待训练模型,得到所述待训练模型输出的所述样本二元组的提取特征;其中,所述样本二元组的提取特征包括所述样本图像的提取特征和所述样本物体文本标识的提取特征;A feature prediction unit 702, configured to input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained; wherein, the extracted features of the sample pair include the Extracting features of the sample image and extracting features of the sample object text identifier;
模型更新单元703,用于根据所述样本二元组的实际信息相似度、以及所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度,更新所述待训练模型,并继续执行所述将所述样本二元组输入待训练模型的 步骤,直至在达到预设停止条件时,根据所述待训练模型,确定特征提取模型。A model updating unit 703, configured to update the model to be trained according to the actual information similarity of the sample pair and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier , and continue to execute the step of inputting the sample pair into the model to be trained until a preset stop condition is reached, and a feature extraction model is determined according to the model to be trained.
在一种可能的实施方式下,所述待训练模型包括文本特征提取子模型和图像特征提取子模型;In a possible implementation manner, the model to be trained includes a text feature extraction sub-model and an image feature extraction sub-model;
所述样本二元组的提取特征的确定过程,包括:The process of determining the feature extraction of the sample binary group includes:
将所述样本图像输入所述图像特征提取子模型,得到所述图像特征提取子模型输出的所述样本图像的提取特征;Inputting the sample image into the image feature extraction sub-model to obtain the extracted features of the sample image output by the image feature extraction sub-model;
将所述样本物体文本标识输入所述文本特征提取子模型,得到所述文本特征提取子模型输出的所述样本物体文本标识的提取特征。Inputting the sample object text identifier into the text feature extraction sub-model to obtain the extracted features of the sample object text identifier output by the text feature extraction sub-model.
在一种可能的实施方式下,所述特征提取模型构建装置700还包括:In a possible implementation manner, the feature extraction model building device 700 also includes:
初始化单元,用于利用预设先验知识,对所述文本特征提取子模型进行初始化处理;其中,所述预设先验知识用于描述不同物体之间的关联关系。The initialization unit is configured to use preset prior knowledge to initialize the text feature extraction sub-model; wherein the preset prior knowledge is used to describe the relationship between different objects.
在一种可能的实施方式下,所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度的确定过程,包括:In a possible implementation manner, the process of determining the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier includes:
分别确定所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度;根据所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度,确定所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度。Respectively determine the similarity between each pixel-level extraction feature in the feature map of the sample image and the extraction feature of the sample object text identifier; The similarity between the extracted features of the text identification determines the similarity between the extracted features of the sample image and the extracted features of the sample object text identification.
在一种可能的实施方式下,所述样本二元组的实际信息相似度的确定过程,包括:In a possible implementation manner, the process of determining the actual information similarity of the sample pair includes:
若所述样本物体文本标识用于唯一标识样本物体,且所述样本图像包括所述样本物体,则根据所述样本物体在所述样本图像中的实际位置,确定所述样本二元组的实际信息相似度。If the sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, then according to the actual position of the sample object in the sample image, determine the actual information similarity.
基于上述特征提取模型构建装置700的相关内容可知,在获取到样本二元组和该样本二元组的实际信息相似度之后,先利用样本二元组和该样本二元组的实际信息相似度训练待训练模型,以使由训练好的待训练模型针对该样本二元组输出的样本图像的提取特征与样本物体文本标识的提取特征之间的相似度几乎接近于该样本二元组的实际信息相似度,从而使得训练好的待训练模型具有较好的特征提取性能,进而使得基于该训练好的待训练模型构建的特征提取模型也具有较好的特征提取性能,如此使得后续能够基于该构建 好的特征提取模型更准确地进行目标检测过程,有利于提高目标检测准确性。Based on the relevant content of the above-mentioned feature extraction model construction device 700, it can be seen that after obtaining the actual information similarity between the sample doublet and the sample doublet, first use the sample doublet and the actual information similarity of the sample doublet Train the model to be trained so that the similarity between the extracted features of the sample image output by the trained model for the sample pair and the extracted features of the sample object text identifier is almost close to the actual value of the sample pair. Information similarity, so that the trained model to be trained has better feature extraction performance, and then the feature extraction model constructed based on the trained model to be trained also has better feature extraction performance, so that the follow-up can be based on this The constructed feature extraction model can perform the target detection process more accurately, which is conducive to improving the accuracy of target detection.
基于上述方法实施例提供的目标检测方法,本申请实施例还提供了一种目标检测装置,下面结合附图进行解释和说明。Based on the target detection method provided by the above method embodiment, the embodiment of the present application also provides a target detection device, which will be explained and described below with reference to the accompanying drawings.
装置实施例二Device embodiment two
装置实施例二提供的目标检测装置的技术详情,请参照上述方法实施例。For the technical details of the target detection device provided in the second embodiment of the device, please refer to the above method embodiment.
参见图8,该图为本申请实施例提供的一种目标检测装置的结构示意图。Referring to FIG. 8 , this figure is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
本申请实施例提供的目标检测装置800,包括:The target detection device 800 provided in the embodiment of the present application includes:
信息获取单元801,用于获取待检测图像和待检测物体文本标识;An information acquisition unit 801, configured to acquire an image to be detected and a text identification of an object to be detected;
特征提取单元802,用于将所述待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到所述特征提取模型输出的所述待检测图像的提取特征和所述待检测物体文本标识的提取特征;其中,所述特征提取模型是利用本申请实施例提供的特征提取模型构建方法的任一实施方式进行构建的;A feature extraction unit 802, configured to input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the text of the object to be detected output by the feature extraction model The extracted features of the identification; wherein, the feature extraction model is constructed using any implementation of the feature extraction model construction method provided in the embodiment of the present application;
结果确定单元803,用于根据所述待检测图像的提取特征与所述待检测物体文本标识的提取特征之间的相似度程度,确定所述待检测图像对应的目标检测结果。The result determination unit 803 is configured to determine a target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
基于上述目标检测装置800的相关内容可知,在获取到待检测图像和待检测物体文本标识之后,可以先利用构建好的特征提取模型针对该待检测图像和该待检测物体文本标识进行特征提取,得到并输出该待检测图像的提取特征和该待检测物体文本标识的提取特征;再依据该待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似程度,确定该待检测图像对应的目标检测结果。Based on the relevant content of the above target detection device 800, it can be known that after the image to be detected and the text identifier of the object to be detected are obtained, the constructed feature extraction model can be used to perform feature extraction on the image to be detected and the text identifier of the object to be detected, Obtain and output the extraction features of the image to be detected and the extraction features of the text identification of the object to be detected; then determine the detection The object detection result corresponding to the image.
可见,因待检测图像的提取特征与该待检测物体文本标识的提取特征之间的相似度能够准确地表示出该待检测图像携带的信息与该待检测物体文本标识携带的信息之间的相似程度,使得基于该相似度确定的该待检测图像对应的目标检测结果能够准确地表示出该待检测图像与该待检测物体文本标识之间的关联关系(如,该待检测图像中是否存在由待检测物体文本标识唯一 标识的目标物体,以及该目标物体在该待检测图像中的位置等),如此有利于提高目标检测准确性。It can be seen that, because the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected can accurately represent the similarity between the information carried by the image to be detected and the information carried by the text identifier of the object to be detected degree, so that the target detection result corresponding to the image to be detected based on the similarity can accurately represent the association between the image to be detected and the text mark of the object to be detected (for example, whether there is an object in the image to be detected by The text of the object to be detected identifies the uniquely identified target object, and the position of the target object in the image to be detected, etc.), which is beneficial to improve the accuracy of target detection.
还因构建好的特征提取模型能够依据不同物体之间的关联关系针对任意一个物体文本标识进行文本特征提取,使得本申请实施例提供的目标检测方法不仅能够依据在该特征提取模型的构建过程中使用过的样本物体文本标识进行目标检测,还可以依据除了在该特征提取模型的构建过程中使用过的样本物体文本标识以外的任意一种物体文本标识进行目标检测,如此有利于提高该特征提取模型针对非样本物体的目标检测性能,从而有利于提高本申请实施例提供的目标检测装置800的目标检测性能。Also, because the constructed feature extraction model can extract text features for any object text mark according to the association relationship between different objects, the target detection method provided in the embodiment of the present application can not only be based on the The used sample object text identification for target detection can also be used for target detection based on any object text identification other than the sample object text identification used in the construction process of the feature extraction model, which is conducive to improving the feature extraction The model is aimed at the target detection performance of non-sample objects, so as to help improve the target detection performance of the target detection device 800 provided in the embodiment of the present application.
进一步地,本申请实施例还提供了一种设备,所述设备包括处理器以及存储器:Further, the embodiment of the present application also provides a device, the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The processor is configured to execute any implementation of the feature extraction model construction method provided in the embodiment of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiment of the application.
进一步地,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。Further, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the feature extraction model construction method provided in the embodiment of the present application. Any implementation manner, or execute any implementation manner of the target detection method provided in the embodiment of the present application.
进一步地,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的特征提取模型构建方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。Furthermore, the embodiment of the present application also provides a computer program product, which, when running on the terminal device, enables the terminal device to execute any implementation manner of the feature extraction model construction method provided in the embodiment of the present application , or execute any implementation of the target detection method provided in the embodiment of the present application.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b 或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制。虽然本发明已以较佳实施例揭露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。The above descriptions are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it to be equivalent to equivalent changes Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention, which do not deviate from the technical solution of the present invention, still fall within the protection scope of the technical solution of the present invention.

Claims (12)

  1. 一种特征提取模型构建方法,其特征在于,所述方法包括:A method for building a feature extraction model, characterized in that the method comprises:
    获取样本二元组和所述样本二元组的实际信息相似度;其中,所述样本二元组包括样本图像和样本物体文本标识;所述样本二元组的实际信息相似度用于描述所述样本图像实际携带的信息和所述样本物体文本标识实际携带的信息之间的相似程度;Obtaining the similarity of actual information between the sample pair and the sample pair; wherein, the sample pair includes a sample image and a sample object text identifier; the actual information similarity of the sample pair is used to describe the The degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
    将所述样本二元组输入待训练模型,得到所述待训练模型输出的所述样本二元组的提取特征;其中,所述样本二元组的提取特征包括所述样本图像的提取特征和所述样本物体文本标识的提取特征;The sample pair is input into the model to be trained, and the extracted feature of the sample pair output by the model to be trained is obtained; wherein, the extracted feature of the sample pair includes the extracted feature of the sample image and Extracting features of the text identifier of the sample object;
    将所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度,确定为所述样本二元组的预测信息相似度;determining the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier as the predicted information similarity of the sample binary group;
    根据所述样本二元组的实际信息相似度与所述样本二元组的预测信息相似度,更新所述待训练模型,并继续执行所述将所述样本二元组输入待训练模型的步骤,直至在达到预设停止条件时,根据所述待训练模型,确定特征提取模型。updating the model to be trained according to the actual information similarity of the sample pair and the predicted information similarity of the sample pair, and continuing to execute the step of inputting the sample pair into the model to be trained , until the preset stop condition is reached, the feature extraction model is determined according to the model to be trained.
  2. 根据权利要求1所述的方法,其特征在于,所述待训练模型包括文本特征提取子模型和图像特征提取子模型;The method according to claim 1, wherein the model to be trained includes a text feature extraction sub-model and an image feature extraction sub-model;
    所述样本二元组的提取特征的确定过程,包括:The process of determining the feature extraction of the sample binary group includes:
    将所述样本图像输入所述图像特征提取子模型,得到所述图像特征提取子模型输出的所述样本图像的提取特征;Inputting the sample image into the image feature extraction sub-model to obtain the extracted features of the sample image output by the image feature extraction sub-model;
    将所述样本物体文本标识输入所述文本特征提取子模型,得到所述文本特征提取子模型输出的所述样本物体文本标识的提取特征。Inputting the sample object text identifier into the text feature extraction sub-model to obtain the extracted features of the sample object text identifier output by the text feature extraction sub-model.
  3. 根据权利要求2所述的方法,其特征在于,在所述将所述样本二元组输入待训练模型之前,所述方法还包括:The method according to claim 2, wherein, before the input of the sample pair to the model to be trained, the method further comprises:
    利用预设先验知识,对所述文本特征提取子模型进行初始化处理,以使初始化处理后的文本特征提取子模型针对任意两个物体输出的文本特征之间的相似度与所述两个物体之间的关联度呈正相关;其中,所述预设先验知识用于描述不同物体之间的关联度。Using preset prior knowledge, the text feature extraction sub-model is initialized, so that the similarity between the text features output by the initialized text feature extraction sub-model for any two objects is the same as that of the two objects The degree of correlation between them is positively correlated; wherein, the preset prior knowledge is used to describe the degree of correlation between different objects.
  4. 根据权利要求1所述的方法,其特征在于,若所述样本图像的提取特征包括所述样本图像的特征图,则所述样本图像的提取特征与所述样本物体文 本标识的提取特征之间的相似度的确定过程,包括:The method according to claim 1, wherein if the extracted feature of the sample image includes a feature map of the sample image, the relationship between the extracted feature of the sample image and the extracted feature of the sample object text mark The process of determining the similarity includes:
    分别确定所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度;Respectively determining the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier;
    根据所述样本图像的特征图中各个像素级提取特征与所述样本物体文本标识的提取特征之间的相似度,确定所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度。According to the similarity between each pixel-level extracted feature in the feature map of the sample image and the extracted feature of the sample object text identifier, determine the relationship between the extracted feature of the sample image and the extracted feature of the sample object text identifier similarity.
  5. 根据权利要求1所述的方法,其特征在于,所述样本二元组的实际信息相似度的确定过程,包括:The method according to claim 1, wherein the process of determining the actual information similarity of the sample pair includes:
    若所述样本物体文本标识用于唯一标识样本物体,且所述样本图像包括所述样本物体,则根据所述样本物体在所述样本图像中的实际位置,确定所述样本二元组的实际信息相似度。If the sample object text identifier is used to uniquely identify the sample object, and the sample image includes the sample object, then according to the actual position of the sample object in the sample image, determine the actual information similarity.
  6. 根据权利要求5所述的方法,其特征在于,若所述样本二元组的实际信息相似度包括所述样本图像中各个像素点对应的实际信息相似度,则所述根据所述样本物体在所述样本图像中的实际位置,确定所述样本二元组的实际信息相似度,包括:The method according to claim 5, wherein if the actual information similarity of the sample binary group includes the actual information similarity corresponding to each pixel in the sample image, then the The actual position in the sample image determines the actual information similarity of the sample binary group, including:
    根据所述样本物体在所述样本图像中的实际位置,确定所述样本物体的图像区域;determining the image area of the sample object according to the actual position of the sample object in the sample image;
    将所述样本物体的图像区域内各个像素点对应的实际信息相似度均确定为第一预设相似度值;Determining the actual information similarity corresponding to each pixel in the image area of the sample object as a first preset similarity value;
    将所述样本图像中除了所述样本物体的图像区域以外的各个像素点对应的实际信息相似度均确定为第二预设相似度值。The actual information similarity corresponding to each pixel in the sample image except the image area of the sample object is determined as a second preset similarity value.
  7. 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:
    获取待检测图像和待检测物体文本标识;Obtain the image to be detected and the text identification of the object to be detected;
    将所述待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到所述特征提取模型输出的所述待检测图像的提取特征和所述待检测物体文本标识的提取特征;其中,所述特征提取模型是利用权利要求1-6中任一项所述的特征提取模型构建方法进行构建的;Inputting the image to be detected and the text identification of the object to be detected into a pre-built feature extraction model to obtain the extraction features of the image to be detected output by the feature extraction model and the extraction features of the text identification of the object to be detected; wherein, The feature extraction model is constructed using the feature extraction model construction method described in any one of claims 1-6;
    根据所述待检测图像的提取特征与所述待检测物体文本标识的提取特征之间的相似度,确定所述待检测图像对应的目标检测结果。A target detection result corresponding to the image to be detected is determined according to the similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  8. 一种特征提取模型构建装置,其特征在于,包括:A feature extraction model construction device is characterized in that it comprises:
    样本获取单元,用于获取样本二元组和所述样本二元组的实际信息相似度;其中,所述样本二元组包括样本图像和样本物体文本标识;所述样本二元组的实际信息相似度用于描述所述样本图像实际携带的信息和所述样本物体文本标识实际携带的信息之间的相似程度;A sample acquisition unit, configured to acquire a similarity between a sample pair and the actual information of the sample pair; wherein, the sample pair includes a sample image and a sample object text identifier; the actual information of the sample pair The similarity is used to describe the degree of similarity between the information actually carried by the sample image and the information actually carried by the text mark of the sample object;
    特征预测单元,用于将所述样本二元组输入待训练模型,得到所述待训练模型输出的所述样本二元组的提取特征;其中,所述样本二元组的提取特征包括所述样本图像的提取特征和所述样本物体文本标识的提取特征;A feature prediction unit, configured to input the sample pair into the model to be trained, and obtain the extracted features of the sample pair output by the model to be trained; wherein, the extracted features of the sample pair include the Extracting features of the sample image and extracting features of the text identifier of the sample object;
    模型更新单元,用于根据所述样本二元组的实际信息相似度、以及所述样本图像的提取特征与所述样本物体文本标识的提取特征之间的相似度,更新所述待训练模型,并继续执行所述将所述样本二元组输入待训练模型的步骤,直至在达到预设停止条件时,根据所述待训练模型,确定特征提取模型。a model updating unit, configured to update the model to be trained according to the actual information similarity of the sample pair and the similarity between the extracted features of the sample image and the extracted features of the sample object text identifier, And continue to execute the step of inputting the sample pair into the model to be trained until the preset stop condition is reached, and a feature extraction model is determined according to the model to be trained.
  9. 一种目标检测装置,其特征在于,包括:A target detection device, characterized in that it comprises:
    信息获取单元,用于获取待检测图像和待检测物体文本标识;An information acquisition unit, configured to acquire the image to be detected and the text identification of the object to be detected;
    特征提取单元,用于将所述待检测图像和待检测物体文本标识输入预先构建的特征提取模型,得到所述特征提取模型输出的所述待检测图像的提取特征和所述待检测物体文本标识的提取特征;其中,所述特征提取模型是利用权利要求1-5中任一项所述的特征提取模型构建方法进行构建的;A feature extraction unit, configured to input the image to be detected and the text identifier of the object to be detected into a pre-built feature extraction model, and obtain the extracted features of the image to be detected and the text identifier of the object to be detected output by the feature extraction model feature extraction; wherein, the feature extraction model is constructed using the feature extraction model construction method described in any one of claims 1-5;
    结果确定单元,用于根据所述待检测图像的提取特征与所述待检测物体文本标识的提取特征之间的相似度程度,确定所述待检测图像对应的目标检测结果。The result determination unit is configured to determine the target detection result corresponding to the image to be detected according to the degree of similarity between the extracted features of the image to be detected and the extracted features of the text identifier of the object to be detected.
  10. 一种设备,其特征在于,所述设备包括处理器以及存储器:A device, characterized in that the device includes a processor and a memory:
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器用于根据所述计算机程序执行权利要求1-6中任一项所述的特征提取模型构建方法,或者执行权利要求7所述的目标检测方法。The processor is configured to execute the feature extraction model building method according to any one of claims 1-6 according to the computer program, or execute the target detection method according to claim 7.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行权利要求1-6中任一项所述的特征提取模型构建方法,或者执行权利要求7所述的目标检测方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the method for constructing a feature extraction model according to any one of claims 1-6, Or execute the target detection method described in claim 7.
  12. 一种计算机程序产品,其特征在于,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行权利要求1-6中任一项所述的特征提取模型构建方法,或者执行权利要求7所述的目标检测方法。A computer program product, characterized in that, when the computer program product runs on a terminal device, the terminal device executes the method for constructing a feature extraction model according to any one of claims 1-6, or executes the method of claim 1 The target detection method described in 7.
PCT/CN2022/089230 2021-06-28 2022-04-26 Feature extraction model construction method and target detection method, and device therefor WO2023273572A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110723063.XA CN113591839B (en) 2021-06-28 2021-06-28 Feature extraction model construction method, target detection method and device
CN202110723063.X 2021-06-28

Publications (1)

Publication Number Publication Date
WO2023273572A1 true WO2023273572A1 (en) 2023-01-05

Family

ID=78245050

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089230 WO2023273572A1 (en) 2021-06-28 2022-04-26 Feature extraction model construction method and target detection method, and device therefor

Country Status (2)

Country Link
CN (1) CN113591839B (en)
WO (1) WO2023273572A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591839B (en) * 2021-06-28 2023-05-09 北京有竹居网络技术有限公司 Feature extraction model construction method, target detection method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647350A (en) * 2018-05-16 2018-10-12 中国人民解放军陆军工程大学 A kind of picture and text associative search method based on binary channels network
CN110019889A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 Training characteristics extract model and calculate the method and relevant apparatus of picture and query word relative coefficient
CN111091597A (en) * 2019-11-18 2020-05-01 贝壳技术有限公司 Method, apparatus and storage medium for determining image pose transformation
US20200242197A1 (en) * 2019-01-30 2020-07-30 Adobe Inc. Generating summary content tuned to a target characteristic using a word generation model
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN111985616A (en) * 2020-08-13 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Image feature extraction method, image retrieval method, device and equipment
CN113591839A (en) * 2021-06-28 2021-11-02 北京有竹居网络技术有限公司 Feature extraction model construction method, target detection method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020592B (en) * 2019-02-03 2024-04-09 平安科技(深圳)有限公司 Object detection model training method, device, computer equipment and storage medium
CN111782921A (en) * 2020-03-25 2020-10-16 北京沃东天骏信息技术有限公司 Method and device for searching target
CN112990297B (en) * 2021-03-10 2024-02-02 北京智源人工智能研究院 Training method, application method and device of multi-mode pre-training model
CN112990204B (en) * 2021-05-11 2021-08-24 北京世纪好未来教育科技有限公司 Target detection method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019889A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 Training characteristics extract model and calculate the method and relevant apparatus of picture and query word relative coefficient
CN108647350A (en) * 2018-05-16 2018-10-12 中国人民解放军陆军工程大学 A kind of picture and text associative search method based on binary channels network
US20200242197A1 (en) * 2019-01-30 2020-07-30 Adobe Inc. Generating summary content tuned to a target characteristic using a word generation model
CN111091597A (en) * 2019-11-18 2020-05-01 贝壳技术有限公司 Method, apparatus and storage medium for determining image pose transformation
CN111897950A (en) * 2020-07-29 2020-11-06 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN111985616A (en) * 2020-08-13 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Image feature extraction method, image retrieval method, device and equipment
CN113591839A (en) * 2021-06-28 2021-11-02 北京有竹居网络技术有限公司 Feature extraction model construction method, target detection method and device

Also Published As

Publication number Publication date
CN113591839A (en) 2021-11-02
CN113591839B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN107239731B (en) Gesture detection and recognition method based on Faster R-CNN
WO2023087558A1 (en) Small sample remote sensing image scene classification method based on embedding smoothing graph neural network
Zhang et al. Real-time sow behavior detection based on deep learning
WO2020155518A1 (en) Object detection method and device, computer device and storage medium
CN109993102B (en) Similar face retrieval method, device and storage medium
CN109063719B (en) Image classification method combining structure similarity and class information
CN112668579A (en) Weak supervision semantic segmentation method based on self-adaptive affinity and class distribution
CN109165309B (en) Negative example training sample acquisition method and device and model training method and device
CN110766041A (en) Deep learning-based pest detection method
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN111931859B (en) Multi-label image recognition method and device
CN109977253B (en) Semantic and content-based rapid image retrieval method and device
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
WO2023134402A1 (en) Calligraphy character recognition method based on siamese convolutional neural network
CN111523586B (en) Noise-aware-based full-network supervision target detection method
WO2023273572A1 (en) Feature extraction model construction method and target detection method, and device therefor
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
Wei et al. Food image classification and image retrieval based on visual features and machine learning
CN111444816A (en) Multi-scale dense pedestrian detection method based on fast RCNN
CN110083724A (en) A kind of method for retrieving similar images, apparatus and system
CN108428234B (en) Interactive segmentation performance optimization method based on image segmentation result evaluation
CN113723558A (en) Remote sensing image small sample ship detection method based on attention mechanism
CN105844299B (en) A kind of image classification method based on bag of words
CN115482436B (en) Training method and device for image screening model and image screening method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE