WO2023273570A1 - Target detection model training method and target detection method, and related device therefor - Google Patents

Target detection model training method and target detection method, and related device therefor Download PDF

Info

Publication number
WO2023273570A1
WO2023273570A1 PCT/CN2022/089194 CN2022089194W WO2023273570A1 WO 2023273570 A1 WO2023273570 A1 WO 2023273570A1 CN 2022089194 W CN2022089194 W CN 2022089194W WO 2023273570 A1 WO2023273570 A1 WO 2023273570A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
sample image
target detection
detection model
Prior art date
Application number
PCT/CN2022/089194
Other languages
French (fr)
Chinese (zh)
Inventor
江毅
杨朔
孙培泽
袁泽寰
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023273570A1 publication Critical patent/WO2023273570A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the technical field of image processing, and in particular to a target detection model training method, a target detection method and related equipment.
  • Target detection also known as target extraction
  • target detection is an image segmentation technology based on target geometric statistics and features; and target detection has a wide range of applications (for example, target detection can be applied to robotics or automatic driving and other fields).
  • the present application provides a target detection model training method, a target detection method and related equipment, which can effectively improve the accuracy of target detection.
  • An embodiment of the present application provides a method for training a target detection model, the method comprising:
  • the performing text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image includes:
  • the method further includes:
  • the actual target text identifier of the added image After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added
  • the target text feature of the image the actual target text identifier of the added image is different from the actual target text identifier of the sample image;
  • the predicted target position of the historical sample image the actual target position of the historical sample image, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, the The predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, and update the target detection model , and continue to execute the step of inputting the historical sample image and the newly added image into the target detection model until a second stop condition is reached.
  • the process of determining the historical sample image includes:
  • the training used image determines the training used image belonging to each historical target category from the training used image corresponding to the target detection model;
  • the historical sample images corresponding to the respective historical object categories are respectively extracted from the training used images belonging to the various historical object categories.
  • the predicted target position of the historical sample image, the actual target position of the historical sample image, the image features of the historical sample image and the historical sample image The similarity between the target text features of the added image, the predicted target position of the added image, the actual target position of the added image, and the relationship between the image feature of the added image and the target text feature of the added image.
  • the similarity between updates the target detection model including:
  • the weighted weight corresponding to the historical image loss value is higher than that of the newly added image The weighted weight corresponding to the loss value
  • the target detection model is updated according to the detection loss value of the target detection model.
  • the inputting the sample image into a target detection model, and obtaining the image features of the sample image output by the target detection model and the predicted target position of the sample image include:
  • the predicted target text identifier of the sample image the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the The similarity between the target text features of the sample images is used to update the target detection model.
  • the embodiment of the present application also provides a target detection method, the method comprising:
  • the target detection model is a target detection model provided by an embodiment of the present application Any implementation of the training method for training.
  • the embodiment of the present application also provides a target detection model training device, the device comprising:
  • a first acquiring unit configured to acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image
  • the first extraction unit is used to extract the text features of the actual target text identifier of the sample image to obtain the target text features of the sample image;
  • a first prediction unit configured to input the sample image into a target detection model, and obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
  • a first update unit configured to update the target position according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image. the target detection model, and return to the first prediction unit to execute the inputting the sample image into the target detection model until a first stop condition is reached.
  • the embodiment of the present application also provides a target detection device, the device comprising:
  • a second acquiring unit configured to acquire an image to be detected
  • a target detection unit configured to input the image to be detected into a pre-trained target detection model, and obtain the target detection result of the image to be detected output by the target detection model; wherein, the target detection model is implemented using the present application Any implementation of the target detection model training method provided by the example is used for training.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the target detection model training method provided in the embodiments of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiments of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation of the target detection model training method provided in the embodiment of the present application way, or execute any implementation of the target detection method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product.
  • the terminal device executes any implementation manner of the target detection model training method provided in the embodiment of the present application, or executes Any implementation of the target detection method provided in the embodiments of this application.
  • the embodiment of the present application has at least the following advantages:
  • the text feature extraction is first performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; then the sample image, the target text feature of the sample image and the sample image are used
  • the actual target position of the target detection model is trained so that the target detection model can perform target detection learning under the constraints of the target text features of the sample image and the actual target position of the sample image, so that the trained target detection model It has better target detection performance, so that the trained target detection model can be used to perform more accurate target detection on the image to be detected, and the target detection result of the image to be detected is obtained and output, so that the target of the image to be detected
  • the detection result is more accurate, which is conducive to improving the accuracy of target detection.
  • FIG. 1 is a flow chart of a method for training a target detection model provided in an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a target detection model provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a target detection method provided in an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a target detection model training device provided in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
  • the following first introduces the training process of the target detection model (that is, the target detection model training method), and then introduces the application process of the target detection model (that is, the target detection method).
  • this figure is a flow chart of a method for training a target detection model provided by an embodiment of the present application.
  • the target detection model training method provided in the embodiment of the present application includes S101-S105:
  • S101 Acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image.
  • the sample image refers to the image used for training the target detection model.
  • the embodiment of the present application does not limit the number of sample images, for example, the number of sample images may be N (that is, use N sample images to train the target detection model).
  • the actual target text identifier of the sample image is used to uniquely represent the target object in the sample image.
  • this embodiment of the present application does not limit the actual target text identifier of the sample image, for example, the actual target text identifier of the sample image may be an object category (or object name, etc.). For example, if the sample image includes a cat, the actual target text identifier of the sample image may be a cat.
  • the actual target position of the sample image is used to represent the area actually occupied by the target object in the sample image in the sample image.
  • the present application does not limit the representation of the actual target position of the sample image, and any existing or future representation that can represent the area occupied by an object in the image can be used for implementation.
  • S102 Perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image.
  • the target text feature of the sample image is used to describe the text information (such as semantic information, etc.) carried by the actual target text mark of the sample image, so that the target text feature of the sample image can represent the target object in the sample image The features actually present in this sample image.
  • the embodiment of the present application does not limit the method of extracting the target text features of the sample image (that is, the implementation of S102), and any existing or future method that can perform feature extraction for a text can be used for implementation. .
  • the following description will be given in combination with examples.
  • S102 may specifically include: inputting the actual target text identifier of the sample image into a pre-trained language model, and obtaining the target text feature of the sample image output by the language model.
  • the language model is used for text feature extraction; and the embodiment of the present application does not limit the language model, and any existing or future language model can be used for implementation.
  • the language model can be trained in advance according to the sample text and the actual text features of the sample text.
  • the sample text refers to the text required for training the language model; and the actual text features of the sample text are used to describe the text information actually carried by the sample text (such as semantic information, etc.).
  • the embodiment of the present application does not limit the training process of the language model, and any existing or future method that can train the language model according to the sample text and the actual text features of the sample text can be used for implementation.
  • the pre-trained language model can be used to target the actual target text of the i-th sample image
  • the text mark is used for text feature extraction, and the target text feature of the i-th sample image is obtained and output, so that the target text feature of the i-th sample image can accurately represent the actual target text mark of the i-th sample image
  • the text information carried by so that the target text features of the i-th sample image can be used to constrain the training update process of the target detection model.
  • the pre-trained language model can accurately extract the text information (especially semantic information) carried by a text
  • the number of texts that can be described by the language model is unlimited, so that the language model can be used for different texts Any two of the output text features of these different texts are highly separable, so that it can effectively ensure that the text features of any two texts (for example, any two of the target text features of N sample images ) There is no overlap, which can effectively improve the detection accuracy of the target detection model.
  • the language model can learn the semantic correlation between different texts during the training process (for example, the semantic correlation between "cat” and “tiger” is higher than that between “cat” and “car”) ), so that the trained language model can better extract text features, which can effectively improve the detection accuracy of the target detection model.
  • S103 Input the sample image into the target detection model, and obtain the image features of the sample image and the predicted target position of the sample image output by the target detection model.
  • the image feature of the sample image is used to represent the feature that the target object in the sample image is predicted to appear in the sample image.
  • the predicted target position of the sample image is used to represent the predicted area occupied by the target object in the sample image in the sample image.
  • the target detection model is used for target detection (for example, to detect the category of the target object and the image position of the target object).
  • the embodiment of the present application does not limit the target detection model.
  • the input data of the target category prediction layer 202 includes the output data of the image feature extraction layer 201
  • the input data of the target position prediction layer 203 includes the output data of the image feature extraction layer 201 .
  • the working process of the target detection model 200 may include step 11-step 13:
  • Step 11 Input the sample image into the image feature extraction layer 201, and obtain the image features of the sample image output by the image feature extraction layer 201.
  • the image feature extraction layer 201 is used for performing image feature extraction on the input data of the image feature extraction layer 201 .
  • the embodiment of the present application does not limit the implementation manner of the image feature extraction layer 201, and any existing or future solution capable of image feature extraction can be used for implementation.
  • Step 12 Input the image features of the sample image into the target category prediction layer 202 to obtain the predicted target text identifier of the sample image output by the target category prediction layer 202 .
  • the object type prediction layer 202 is used for performing object type prediction on the input data of the object type prediction layer 202 .
  • the embodiment of the present application does not limit the implementation manner of the object category prediction layer 202, and any existing or future solution capable of performing object category prediction can be used for implementation.
  • the predicted target text identifier of the sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the sample image.
  • Step 13 Input the image features of the sample image into the target position prediction layer 203 to obtain the predicted target position of the sample image output by the target position prediction layer 203 .
  • the target position prediction layer 203 is used for performing object position prediction on the input data of the target position prediction layer 203 .
  • the embodiment of the present application does not limit the implementation of the object position prediction layer 203, and any existing or future solution capable of predicting object positions can be used for implementation.
  • step 11 to step 13 Based on the relevant content of the above step 11 to step 13, it can be seen that for the target detection model 200 shown in FIG. 202 and the target position prediction layer 203 respectively generate and output the image features of the sample image, the predicted target text identifier of the sample image, and the predicted target position of the sample image, so that the subsequent target detection model 200 can be determined based on these prediction information. Object detection performance.
  • the data dimension of the image feature of the sample image output by the image feature extraction layer 201 may be different from the data dimension of the target text feature of the sample image. Inconsistent, so in order to ensure that the similarity between the image features of the sample image and the target text features of the sample image can be successfully calculated in the future, a data dimension transformation layer can be added in the target detection model 200 shown in FIG.
  • the input data of the data dimension transformation layer includes the output data of the image feature extraction layer 201, so that the data dimension transformation layer can perform data dimension transformation for the output data of the image feature extraction layer 201 (such as the image features of the sample image), Therefore, the output data of the data dimension transformation layer can be consistent with the data dimension of the target text feature of the above sample image, which is beneficial to improve the calculation of the similarity between the image feature of the sample image and the target text feature of the sample image accuracy.
  • the i-th sample image can be input into the target detection model , so that the target detection model performs target detection processing on the i-th sample image, obtains and outputs the image features of the i-th sample image and the predicted target position of the i-th sample image, so that the subsequent can be based on the i-th sample image
  • the image features of each sample image and its predicted target position are used to determine the target detection performance of the target detection model.
  • S104 Determine whether the first stop condition is met, if yes, perform a preset action; if not, perform S105.
  • the first stop condition may be preset, and the embodiment of the present application does not limit the first stop condition, for example, the first stop condition may be that the predicted loss value of the target detection model is lower than the first preset loss threshold, or The rate of change of the predicted loss value of the target detection model is lower than the first rate of change threshold, or the number of updates of the target detection model reaches the first threshold.
  • the predicted loss value of the target detection model is used to represent the target detection performance of the target detection model for the above N sample images; and the embodiment of the present application does not limit the calculation method of the predicted loss value of the target detection model, which can be Use any existing or future model prediction loss value calculation method for implementation.
  • Preset actions can be preset.
  • the preset action may be to end the training process of the target detection model (that is, to end the target detection learning process of the target detection model for N sample images).
  • the preset actions may include the following S106-S109.
  • the target detection model of the current round it can be judged whether the target detection model of the current round meets the first stop condition;
  • the N sample images have better target detection performance, which means that the target detection performance of the current round of target detection model is better, so the target detection model of the current round can be saved, so that the subsequent work can be performed using the saved target detection model ( For example, to perform target detection work or to add a new object detection function to the target detection model);
  • the first stop condition is not met, it means that the target detection performance of the current round of target detection model for the above N sample images is still relatively poor , so the target detection model can be updated according to the label information corresponding to the N sample images and the prediction information output by the target detection model of the current round for the N sample images.
  • S105 Update the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, and return to execute S103.
  • the similarity between the image feature of the sample image and the target text feature of the sample image is used to represent the similarity between the image feature of the sample image and the target text feature of the sample image.
  • the embodiment of the present application does not limit the calculation method of the similarity between the image feature of the sample image and the target text feature of the sample image, for example, the Euclidean distance may be used for calculation.
  • the training objectives of the target detection model may include that the predicted target position of the sample image is as close as possible to the actual target position of the sample image, and the image features of the sample image are as close as possible to the target text features of the sample image (also That is, the similarity between the image feature of the sample image and the target text feature of the sample image is as large as possible).
  • the target detection model of the current round can first be based on the predicted target position of the i-th sample image and the sample The gap between the actual target positions of the images, and the similarity between the image features of the i-th sample image and the target text features of the i-th sample image, update the target detection model, so that the updated target detection model It has better target detection performance, so that the above S103 and its subsequent steps can be continued to be performed subsequently.
  • i is a positive integer
  • i ⁇ N and N is a positive integer.
  • the text feature extraction can be performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; and then use
  • the sample image, the target text feature of the sample image and the actual target position of the sample image are used to train the target detection model to obtain a trained target detection model.
  • the target text feature of the sample image can more accurately represent the actual target text mark of the sample image
  • the target detection model trained under the constraints of the target text feature of the sample image has a better target detection function, This is beneficial to improve the target detection performance.
  • the trained target detection model has better target detection performance for the target objects it has learned, so in order to further improve the prediction performance of the target detection model, the trained target detection model can be further learned Learned target objects (i.e., category incremental learning can be performed for target detection models).
  • the embodiment of the present application also provides a possible implementation of the target detection model training method.
  • the target detection model training method includes S106-S109 in addition to the above S101-S105:
  • the newly-added image refers to the image required for category incremental learning for the trained target detection model.
  • the embodiment of the present application does not limit the number of added images, for example, the number of added images is M; wherein, M is a positive integer.
  • S106-S109 can be used to realize that the target detection model further learns how to perform target detection on the M new images under the premise of keeping the learned target objects.
  • the actual target text identifier of the added image the actual target position of the added image, and the target text feature of the added image
  • the actual target text identifier of the sample image and the actual target of the sample image in S101 above location, and the relevant content of the target text feature of the sample image in S102 above, only need to identify the actual target text of the sample image in S101 above, the actual target position of the sample image, and the target text feature of the sample image in S102 above Just replace "sample image" with "new image” in the relevant content of .
  • the trained target detection model for example, it can be a target detection model trained by using the training process shown in S101-S105 above, or it can be a target detection model trained by using the training process shown in S101-S105 above.
  • the target detection model obtained by using the training process shown in S106-S109 to carry out category incremental learning at least once after the training process shown in the training process is completed after the newly added image and the actual target text identification of the newly added image are acquired and the actual target position of the newly added image, it can be determined that a class incremental learning is needed for the trained target detection model, so text feature extraction can be performed on the actual target text identifier of the newly added image to obtain the Add the target text features of the image, so that the target text features of the new image can be used to constrain the incremental learning process of the target detection model, so that the retrained target detection model can maintain the learned target Further learning how to perform object detection on these additional images based on the premise of objects.
  • S107 Input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image output by the target detection model, the predicted target position of the historical sample image, the image features of the newly added image and The predicted target location for this added image.
  • the historical sample images may include all or part of the images used in the historical training process of the target detection model.
  • the historical training process of the target detection model refers to the category learning process that the target detection model has experienced before the current sub-category incremental learning process for the target detection model. For example, if the trained target detection model has only experienced the category learning process shown in S101-S105 above, the historical training process of the target detection model refers to the training process shown in S101-S105 above. As another example, if the trained target detection model has gone through the category learning process shown in S101-S105 above and Q times the category incremental learning process shown in S106-S109, then the historical training process of the target detection model It may include the training process shown in S101-S105 above, the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109.
  • the determination process of the historical sample image may include Step 21-Step 24:
  • Step 21 According to the sample image, determine the image used for training corresponding to the target detection model.
  • the training used images corresponding to the target detection model refer to images that have been used in the historical training process of the target detection model.
  • two examples are used for description below.
  • Example 1 if the historical training process of the target detection model includes the training process shown in S101-S105 above, the training used images corresponding to the target detection model may include the above N sample images.
  • Example 2 if the historical training process of the target detection model can include the training process shown in S101-S105 above, the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109, the qth In the training process shown in S106-S109, G q newly added images are used for category incremental learning, and q is a positive integer, q ⁇ Q, then the training used images corresponding to the target detection model can include the above N sample images, G 1 new images, G 2 new images, ..., G Q new images.
  • the trained target detection model needs to be incrementally learned, it can first be determined based on the images involved in the historical training process of the target detection model.
  • the training used image of is used so that the training used image can accurately represent the image that has been used in the historical learning process of the object detection model.
  • Step 22 Determine at least one historical target category according to the actual target text identifiers of the images used for training.
  • the historical target category refers to the object category that the target detection model has learned during the historical training process of the target detection model.
  • two examples are used for description below.
  • Example 1 if the historical training process of the target detection model includes the training process shown in S101-S105 above, and the N sample images in the training process shown in S101-S105 above correspond to R 0 object categories, then the The R 0 object categories are all determined as historical object categories.
  • Example 2 if the historical training process of the target detection model can include the training process shown in S101-S105 above, and the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109, the In the training process shown in S101-S105 above, the N sample images correspond to R 0 object categories, and in the qth training process shown in S106-S109, G q newly added images correspond to R q object categories, and q is a positive integer, and q ⁇ Q, then R 0 object categories, R 1 object categories, R 2 object categories, ..., R Q object categories can all be determined as historical object categories.
  • R 0 object categories there are no repeated object categories among R 0 object categories, R 1 object categories, R 2 object categories, . . . , R Q object categories. That is, any two object categories among R 0 object categories, R 1 object categories, R 2 object categories, . . . , R q-1 object categories are different.
  • the actual target text identifiers of each training used images can be used to determine the historical object category corresponding to the target detection model, so that the The historical object categories can accurately represent the object categories that have been learned during the history learning process of this object detection model.
  • Step 23 According to the actual target text identification of the training used images, determine the training used images belonging to each historical target category from the training used images corresponding to the target detection model.
  • step 23 may specifically include: determining Y1 images belonging to the first historical target category in the training images corresponding to the target detection model to belong to the first The training images of 1 historical target category are used, and the Y 2 images belonging to the second historical target category in the training used images corresponding to the target detection model are determined as the training used images belonging to the second historical target category, ... (by analogy), all Y M images belonging to the Mth historical target category in the training used images corresponding to the target detection model are determined as the training used images belonging to the Mth historical target category.
  • Step 24 Extract historical sample images corresponding to each historical object category from training images that belong to each historical object category.
  • the extraction may be performed with reference to a preset extraction ratio (or number of extractions, etc.).
  • step 24 may specifically include: performing random extraction according to an extraction ratio of 10% from the training images that belong to the first historical object category, Obtain each historical sample image corresponding to the first historical target category, so that the actual target text identification of each historical sample image corresponding to the first historical target category is the first historical target category; subordinate to the second The training of the first historical target category has been randomly selected according to the sampling ratio of 10% in the used image, and each historical sample image corresponding to the second historical target category is obtained, so that each historical sample image corresponding to the second historical target category The actual target text identification of the image is the second historical target category; ...
  • the training images belonging to the Mth historical target category are randomly selected according to the sampling ratio of 10%, and the first Each historical sample image corresponding to the M historical target category, so that the actual target text identifiers of each historical sample image corresponding to the M th historical target category are all the M th historical target category.
  • some historical samples can be extracted from the images involved in the historical training process of the target detection model images, so that these historical sample images can represent the object categories that have been learned during the historical learning process of the object detection model.
  • image features of the historical sample images and the predicted target positions of the historical sample images please refer to the related content of “Image Features of Sample Images” and “Predicted Target Positions of Sample Images” in S103 above. Just replace “sample image” with “historical sample image” in the related content of “image feature of sample image” and “predicted target position of sample image” in S103 above.
  • image features of the newly added image and the predicted target position of the newly added image please refer to the relevant content of the “image feature of the sample image” and “predicted target position of the sample image” in S103 above.
  • image features of sample image and “predicted target position of sample image” in the related content of "image features of sample image” and “predicted target position of sample image”, “sample image” can be replaced with “new image”.
  • the historical sample image and the newly added image can be respectively input into the target detection model, so that the target detection model can target the historical sample image and the newly added image for target detection, obtain and output the image features of the historical sample image and the predicted target position, the image features of the newly added image and the predicted target position, so that the target detection model can be determined based on these predicted information.
  • the target detection model can target the historical sample image and the newly added image for target detection, obtain and output the image features of the historical sample image and the predicted target position, the image features of the newly added image and the predicted target position, so that the target detection model can be determined based on these predicted information.
  • the second stop condition may be preset, and the embodiment of the present application does not limit the second stop condition, for example, the second stop condition may be that the detection loss value of the target detection model is lower than the second preset loss threshold, or The rate of change of the detection loss value of the target detection model is lower than the second rate of change threshold, or the number of updates of the target detection model reaches the second threshold.
  • the detection loss value of the target detection model is used to represent the target detection performance of the target detection model for historical sample images and newly added images; and the embodiment of the present application does not limit the calculation method of the detection loss value of the target detection model , which can be implemented by using any existing or future model detection loss value calculation method.
  • the embodiment of the present application also provides a detection method of the target detection model.
  • the calculation method of the loss value may specifically include step 31-step 33:
  • Step 31 Determine the historical Image loss value.
  • the historical image loss value refers to the loss value generated when the target detection model performs target detection on the historical sample images, so that the historical image loss value is used to represent the target detection performance of the target detection model on the historical sample images.
  • the embodiment of the present application does not limit the calculation method of the historical image loss value, and any existing or future prediction loss value calculation method may be used for implementation.
  • Step 32 According to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, determine the loss value of the added image .
  • the newly added image loss value refers to the loss value generated when the target detection model performs target detection for the newly added image, so that the newly added image loss value is used to represent the target detection performance of the target detection model for the newly added image.
  • the embodiment of the present application does not limit the calculation method of the newly added image loss value, and any existing or future prediction loss value calculation method may be used for implementation.
  • Step 33 Perform weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model. Wherein, the weighting weight corresponding to the historical image loss value is higher than the weighting weight corresponding to the newly added image loss value.
  • the weighted weight corresponding to the historical image loss value refers to the weight value to be multiplied by the historical image loss value in the "weighted summation" in step 33 .
  • the weighting weights corresponding to the historical image loss values may be preset.
  • the weighted weight corresponding to the newly added image loss value refers to the weight value to be multiplied by the newly added image loss value in the "weighted sum" in step 33 .
  • the weighting weights corresponding to the newly added image loss values may be preset.
  • the target detection model trained based on the weighted weight corresponding to the historical image loss value can not only realize accurate target detection for the newly added image corresponding to the target detection model, but also realize Still for the training corresponding to the target detection model, images have been used for accurate target detection, which is conducive to improving the accuracy of category incremental learning.
  • the preset steps can be preset.
  • the preset step may be to end the current category incremental learning process of the target detection model.
  • the preset steps may include the above S106-S109 .
  • the target detection model of the current round it can be judged whether the target detection model of the current round meets the second stop condition; Both historical sample images and newly added images have better target detection performance, which means that the target detection performance of the current round of target detection model is better, so the current round of target detection model can be saved so that the saved target detection can be used later
  • the model performs follow-up work (such as performing target detection or adding new object detection functions to the target detection model); if the second stop condition is not reached, it means that the target detection model of the current round is aimed at the above-mentioned historical sample images and The target detection performance of the newly added image is still relatively poor, so it can be based on the label information corresponding to the historical sample image, the label information corresponding to the newly added image, and the target detection model of the current round for the historical sample image and the newly added image.
  • the output prediction information updates the target detection model.
  • S109 According to the predicted target position of the historical sample image, the actual target position of the historical sample image, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, and the Predict the target position, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, update the target detection model, and return to execute S107.
  • the training target of the target detection model may include that the predicted target position of the historical sample image is as close as possible to the actual target position of the historical sample image, and the image features of the historical sample image are as close as possible to the historical sample image.
  • the target text feature of the image that is, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image is as large as possible
  • the predicted target position of the new image is as close as possible to the
  • the actual target position of the added image, and the image feature of the added image is as close as possible to the target text feature of the added image (that is, the distance between the image feature of the added image and the target text feature of the added image similarity as large as possible).
  • S109 may specifically include S1091-S1094:
  • S1091 Determine the historical image according to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the similarity between the image feature of the historical sample image and the target text feature of the historical sample image loss value.
  • S1092 Determine the added image loss value according to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image.
  • S1093 Perform weighted summation of the historical image loss value and the newly added image loss value to obtain a detection loss value of the target detection model. Wherein, the weighting weight corresponding to the historical image loss value is higher than the weighting weight corresponding to the newly added image loss value.
  • the target detection model training method provided in the embodiment of the present application, for the trained target detection model, if it is necessary to add a new object detection function to the target detection model, then
  • the new image and its label information can be used to carry out category incremental learning for the target detection model, so that the learned target detection model can add the target detection function for the new image while maintaining the original target detection function. , which is conducive to continuously improving the target detection performance of the target detection model.
  • the embodiment of the present application also provides a possible implementation of the target detection model training method, which specifically includes steps 41-45:
  • Step 41 Obtain a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image.
  • Step 42 Perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image.
  • step 41-step 42 refer to the above S101-S102 respectively.
  • Step 43 Input the sample image into the target detection model, and obtain the image features of the sample image output by the target detection model, the predicted target text mark of the sample image, and the predicted target position of the sample image.
  • the predicted target text identifier of the sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the sample image.
  • step 43 can be implemented in any of the above S103 implementations, only the output data of the target detection model in S103 above is replaced by "the image features of the sample image and the predicted target position of the sample image" It only needs to be “the image feature of the sample image, the predicted target text identifier of the sample image, and the predicted target position of the sample image”.
  • Step 44 Judging whether the first stop condition is met, if yes, execute a preset action; if not, execute step 45.
  • step 44 the relevant content of step 44, please refer to the relevant content of S104 above.
  • the "predicted loss value of the target detection model" in step 44 is based on the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and The similarity between the image feature of the sample image and the target text feature of the sample image is calculated.
  • Step 45 According to the predicted target text mark of the sample image, the actual target text mark of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the target of the sample image The similarity between text features is used to update the target detection model, and return to step 43.
  • step 45 can be implemented using any of the implementations of S105 above, and it is only necessary to combine the "predicted target position of the sample image, the actual target position of the sample image, and the The similarity between the image features of the sample image and the target text features of the sample image" is replaced by "the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the sample image and the similarity between the image features of the sample image and the target text features of the sample image".
  • the update process of the target detection model in step 45 is based on the predicted target text mark of the sample image, the actual target text mark of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the sample.
  • the text feature extraction can be performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; Then use the sample image, the target text feature of the sample image, the actual target text identifier of the sample image, and the actual target position of the sample image to train the target detection model to obtain a trained target detection model.
  • the target detection model is trained under the constraints of the target text features of the sample image, the actual target text identifier, and the actual target position, the trained target detection model has a better target detection function. This is beneficial to improve the target detection performance.
  • the embodiment of the present application also provides a possible implementation of the target detection model training method.
  • the target detection model training method includes the above steps 41-45
  • step 46-step 49 is also included:
  • Step 46 After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added image target text features.
  • step 46 refers to the relevant content of S106 above.
  • Step 47 Input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image output by the target detection model, the predicted target text identifier of the historical sample image, and the The predicted target position, the image feature of the added image, the predicted target text identifier of the added image, and the predicted target position of the added image.
  • the predicted target text identifier of the historical sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the historical sample image.
  • the predicted target text identifier of the added image is used to represent the predicted identifier (eg, predicted category) of the target object in the added image.
  • step 47 can be implemented by using any of the implementation methods of S107 above. It is only necessary to convert the output data of the target detection model in S107 above from "the image characteristics of the historical sample image, the historical sample image The predicted target position of the image, the image feature of the newly added image, and the predicted target position of the newly added image" are replaced with "the image feature of the historical sample image, the predicted target text identifier of the historical sample image, the The predicted target position, the image feature of the added image, the predicted target text identifier of the added image, and the predicted target position of the added image” are enough.
  • Step 48 Judging whether the second stop condition is met, if yes, execute the preset step; if not, execute step 49.
  • the "detection loss value of the target detection model" in step 48 is based on the predicted target text identifier of the historical sample image, the actual target text identifier of the historical sample image, the predicted target position of the historical sample image, the historical sample image The actual target position of the example image, the predicted target text mark of the newly added image, the actual target text mark of the added image, the predicted target position of the added image, the actual target position of the added image, the historical sample image The similarity between the image feature and the target text feature of the historical sample image, and the similarity between the image feature of the added image and the target text feature of the added image are calculated.
  • Step 49 According to the predicted target text mark of the historical sample image, the actual target text mark of the historical sample image, the predicted target position of the historical sample image, the actual target position of the historical sample image, and the prediction of the newly added image
  • step 49 can be implemented by using any of the implementation methods of S109 above, and only need to set the "predicted target position of the historical sample image, the actual target of the historical sample image" in any implementation of S109 above position, the predicted target position of the newly added image, the actual target position of the newly added image, the similarity between the image features of the historical sample image and the target text features of the historical sample image, and the image
  • the similarity between the feature and the target text feature of the newly added image" is replaced by "the predicted target text identifier of the historical sample image, the actual target text identifier of the historical sample image, the predicted target position of the historical sample image,
  • the target detection model training method provided in the embodiment of the present application, for the trained target detection model, if it is necessary to add a new object detection function to the target detection model , then the target detection model can be incrementally learned by using the newly added image and its three label information (that is, target text features, actual target text identification, and actual target position), so that the learned target detection model can On the premise of maintaining the original target detection function, the target detection function for new images is added, which is conducive to continuously improving the target detection performance of the target detection model.
  • the target detection function for new images is added, which is conducive to continuously improving the target detection performance of the target detection model.
  • the target detection model After the target detection model is trained, the target detection model can be used for target detection. Based on this, an embodiment of the present application further provides a target detection method, which will be described below with reference to the accompanying drawings.
  • this figure is a flow chart of a target detection method provided by an embodiment of the present application.
  • the target detection method provided in the embodiment of this application includes S301-S302:
  • S301 Acquire an image to be detected.
  • the image to be detected refers to an image that needs to be subjected to target detection processing.
  • S302 Input the image to be detected into a pre-trained target detection model, and obtain a target detection result of the image to be detected output by the target detection model.
  • the target detection model is trained by using any implementation of the target detection model training method provided in the embodiment of the present application.
  • the object detection result of the image to be detected is obtained by the object detection model performing object detection on the image to be detected.
  • this embodiment of the present application does not limit the target detection result of the image to be detected.
  • the target detection result of the image to be detected may include the predicted target text identifier (for example, the predicted target category) of the target object in the image to be detected and/or the The area occupied by the target object in the image to be detected in the image to be detected.
  • the target detection model that has been trained can be used to perform target detection on the image to be detected, and the target detection result of the image to be detected can be obtained and output, so that The target detection result of the image to be detected can accurately represent the relevant information of the target object in the image to be detected (eg, target category information and target position information, etc.).
  • the target detection result of the image to be detected determined by using the target detection model is more accurate, which is beneficial to improve the accuracy of target detection.
  • the embodiment of the present application also provides a target detection model training device, which will be explained and described below with reference to the accompanying drawings.
  • this figure is a schematic structural diagram of a target detection model training device provided by an embodiment of the present application.
  • the target detection model training device 400 provided in the embodiment of the present application includes:
  • a first acquiring unit 401 configured to acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
  • the first extraction unit 402 is configured to perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image;
  • the first prediction unit 403 is configured to input the sample image into the target detection model, and obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
  • the first updating unit 404 is configured to, according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, Update the target detection model, and return to the first prediction unit 403 to execute the input of the sample image into the target detection model until the first stop condition is reached.
  • the first extraction unit 402 is specifically configured to:
  • the target detection model training device 400 further includes:
  • the second extraction unit is configured to, after the first stop condition is reached and the added image, the actual target text identifier of the added image, and the actual target position of the added image are acquired, the actual target position of the added image
  • the target text mark carries out text feature extraction, obtains the target text feature of described newly added image
  • the second prediction unit is configured to input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image and the predicted target of the historical sample image output by the target detection model position, the image feature of the added image, and the predicted target position of the added image; wherein, the historical sample image is determined according to the sample image;
  • the second updating unit is configured to: according to the predicted target position of the historical example image, the actual target position of the historical example image, the image feature of the historical example image and the target text feature of the historical example image The similarity between, the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image , updating the target detection model, and returning to the second prediction unit to execute the inputting the historical sample image and the newly added image into the target detection model until a second stop condition is reached.
  • the process of determining the historical sample image includes:
  • the training used image determines the training used image belonging to each historical target category from the training used image corresponding to the target detection model;
  • the historical sample images corresponding to the respective historical object categories are respectively extracted from the training used images belonging to the various historical object categories.
  • the second updating unit includes:
  • the first determination subunit is configured to: according to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the image features of the historical sample image and the target of the historical sample image The similarity between text features determines the historical image loss value;
  • the second determining subunit is configured to use the predicted target position of the added image, the actual target position of the added image, and the relationship between the image feature of the added image and the target text feature of the added image The similarity to determine the new image loss value;
  • the third determining subunit is configured to perform weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model; wherein, the weight corresponding to the historical image loss value The weight is higher than the weighted weight corresponding to the added image loss value;
  • the model update subunit is configured to update the target detection model according to the detection loss value of the target detection model.
  • the first prediction unit 403 is specifically configured to:
  • the first updating unit 404 is specifically used for:
  • the predicted target text identifier of the sample image the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the The similarity between the target text features of the sample images is used to update the target detection model, and return to the first prediction unit 403 to execute the input of the sample images into the target detection model until the first stop condition is reached.
  • the text feature extraction is first performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; image, the target text features of the sample image and the actual target position of the sample image to train the target detection model to obtain a trained target detection model.
  • the target text feature of the sample image can more accurately represent the actual target text mark of the sample image
  • the target detection model trained based on the target text feature of the sample image has a better target detection function, which is beneficial to Improve object detection performance.
  • the embodiment of the present application also provides a target detection device, which will be explained and described below with reference to the accompanying drawings.
  • this figure is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
  • the target detection device 500 provided in the embodiment of the present application includes:
  • a second acquiring unit 501 configured to acquire an image to be detected
  • the target detection unit 502 is configured to input the image to be detected into a pre-trained target detection model, and obtain the target detection result of the image to be detected output by the target detection model; wherein, the target detection model uses the Any implementation of the method for training the target detection model provided in the examples is used for training.
  • the target detection device 500 After acquiring the image to be detected, it can use the trained target detection model to perform target detection on the image to be detected, and obtain and output the target detection model. Detect the target detection result of the image, so that the target detection result of the image to be detected can accurately represent the relevant information of the target object in the image to be detected (eg, target category information and target position information, etc.). Among them, since the trained target detection model has better target detection performance, the target detection result of the image to be detected determined by using the target detection model is more accurate, which is beneficial to improve the accuracy of target detection.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the target detection model training method provided in the embodiments of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiments of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the target detection model training method provided in the embodiment of the present application. Any implementation manner, or execute any implementation manner of the target detection method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product, which, when running on the terminal device, enables the terminal device to execute any implementation manner of the target detection model training method provided in the embodiment of the present application , or execute any implementation of the target detection method provided in the embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a target detection model training method and a target detection method, and a related device therefor. Firstly, text feature extraction is performed on an actual target text identifier of a sample image, so as to obtain a target text feature of the sample image; and then, by using the sample image, the target text feature of the sample image and an actual target position of the sample image, a target detection model is trained, so as to enable the target detection model to perform target detection learning under the constraints of the target text feature of the sample image and the actual target position of the sample image, such that the trained target detection model has a better target detection performance, more accurate target detection can be subsequently performed on an image under test by using the trained target detection model so as to obtain and output a target detection result of the image under test, and the target detection result of the image under test is more accurate, thereby facilitating an improvement in the target detection accuracy.

Description

一种目标检测模型训练方法、目标检测方法及其相关设备A target detection model training method, target detection method and related equipment
本申请要求于2021年6月28日提交中国国家知识产权局、申请号为202110723057.4、申请名称为“一种目标检测模型训练方法、目标检测方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on June 28, 2021, with the application number 202110723057.4, and the title of the application is "a target detection model training method, target detection method and related equipment", The entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种目标检测模型训练方法、目标检测方法及其相关设备。The present application relates to the technical field of image processing, and in particular to a target detection model training method, a target detection method and related equipment.
背景技术Background technique
目标检测(也称,目标提取)是一种基于目标几何统计及特征的图像分割技术;而且目标检测的应用领域十分广泛(如,目标检测可以应用于机器人或者自动驾驶等领域)。Target detection (also known as target extraction) is an image segmentation technology based on target geometric statistics and features; and target detection has a wide range of applications (for example, target detection can be applied to robotics or automatic driving and other fields).
然而,因现有的目标检测技术依旧存在一些缺陷,使得如何提高目标检测准确性仍是一个亟待解决的技术问题。However, because the existing target detection technology still has some defects, how to improve the accuracy of target detection is still a technical problem to be solved urgently.
发明内容Contents of the invention
为了解决现有技术中存在的以上技术问题,本申请提供了一种目标检测模型训练方法、目标检测方法及其相关设备,能够有效地提高目标检测准确性。In order to solve the above technical problems in the prior art, the present application provides a target detection model training method, a target detection method and related equipment, which can effectively improve the accuracy of target detection.
为了实现上述目的,本申请实施例提供的技术方案如下:In order to achieve the above objectives, the technical solutions provided in the embodiments of the present application are as follows:
本申请实施例提供一种目标检测模型训练方法,所述方法包括:An embodiment of the present application provides a method for training a target detection model, the method comprising:
获取样本图像、所述样本图像的实际目标文本标识和所述样本图像的实际目标位置;acquiring a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征;Carry out text feature extraction to the actual target text mark of described sample image, obtain the target text feature of described sample image;
将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置;Inputting the sample image into a target detection model to obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并继续执行所述将所述样本图像输入目标检测模型的步骤,直至达到第一停止条件。updating the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, and Continue to execute the step of inputting the sample image into the object detection model until the first stop condition is reached.
在一种可能的实施方式中,所述对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征,包括:In a possible implementation manner, the performing text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image includes:
将所述样本图像的实际目标文本标识输入预先训练的语言模型,得到所述语言模型输出的所述样本图像的目标文本特征;其中,所述语言模型是根据样本文本和所述样本文本的实际文本特征进行训练的。Inputting the actual target text identifier of the sample image into a pre-trained language model to obtain the target text features of the sample image output by the language model; wherein, the language model is based on the actual text of the sample text and the sample text Text features are trained.
在一种可能的实施方式中,在达到第一停止条件之后,所述方法还包括:In a possible implementation manner, after the first stop condition is reached, the method further includes:
在获取到新增图像、所述新增图像的实际目标文本标识和所述新增图像的实际目标位置之后,对所述新增图像的实际目标文本标识进行文本特征提取,得到所述新增图像的目标文本特征;所述新增图像的实际目标文本标识不同于所述样本图像的实际目标文本标识;After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added The target text feature of the image; the actual target text identifier of the added image is different from the actual target text identifier of the sample image;
将历史样例图像和所述新增图像输入目标检测模型,得到所述目标检测模型输出的所述历史样例图像的图像特征、所述历史样例图像的预测目标位置、所述新增图像的图像特征和所述新增图像的预测目标位置;其中,所述历史样例图像是根据所述样本图像确定的;Inputting the historical sample image and the newly added image into the target detection model, and obtaining the image features of the historical sample image output by the target detection model, the predicted target position of the historical sample image, and the newly added image image features and the predicted target position of the added image; wherein, the historical sample image is determined according to the sample image;
根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度、所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,更新所述目标检测模型,并继续执行所述将所述历史样例图像和所述新增图像输入目标检测模型的步骤,直至达到第二停止条件。According to the predicted target position of the historical sample image, the actual target position of the historical sample image, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, the The predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, and update the target detection model , and continue to execute the step of inputting the historical sample image and the newly added image into the target detection model until a second stop condition is reached.
在一种可能的实施方式中,所述历史样例图像的确定过程,包括:In a possible implementation manner, the process of determining the historical sample image includes:
根据所述样本图像,确定所述目标检测模型对应的训练已使用图像;According to the sample image, determine the training used image corresponding to the target detection model;
根据所述训练已使用图像的实际目标文本标识,确定至少一个历史目标类别;determining at least one historical object category based on actual object text identifications of said training used images;
根据所述训练已使用图像的实际目标文本标识,从所述目标检测模型对应的训练已使用图像中确定属于各个历史目标类别的训练已使用图像;According to the actual target text identification of the training used image, determine the training used image belonging to each historical target category from the training used image corresponding to the target detection model;
分别从所述属于各个历史目标类别的训练已使用图像中抽取所述各个历史目标类别对应的历史样例图像。The historical sample images corresponding to the respective historical object categories are respectively extracted from the training used images belonging to the various historical object categories.
在一种可能的实施方式中,所述根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、所述历史样例图像的图像特征与所述历史 样例图像的目标文本特征之间的相似度、所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,更新所述目标检测模型,包括:In a possible implementation manner, the predicted target position of the historical sample image, the actual target position of the historical sample image, the image features of the historical sample image and the historical sample image The similarity between the target text features of the added image, the predicted target position of the added image, the actual target position of the added image, and the relationship between the image feature of the added image and the target text feature of the added image The similarity between updates the target detection model, including:
根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、以及所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度,确定历史图像损失值;According to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, Determine historical image loss values;
根据所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,确定新增图像损失值;Determine the added image according to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image loss value;
将所述历史图像损失值和所述新增图像损失值进行加权求和,得到所述目标检测模型的检测损失值;其中,所述历史图像损失值对应的加权权重高于所述新增图像损失值对应的加权权重;performing weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model; wherein, the weighted weight corresponding to the historical image loss value is higher than that of the newly added image The weighted weight corresponding to the loss value;
根据所述目标检测模型的检测损失值,更新所述目标检测模型。The target detection model is updated according to the detection loss value of the target detection model.
在一种可能的实施方式中,所述将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置,包括:In a possible implementation manner, the inputting the sample image into a target detection model, and obtaining the image features of the sample image output by the target detection model and the predicted target position of the sample image include:
将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征、所述样本图像的预测目标文本标识和所述样本图像的预测目标位置;Inputting the sample image into the target detection model to obtain the image features of the sample image output by the target detection model, the predicted target text identifier of the sample image, and the predicted target position of the sample image;
所述根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,包括:updating the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image ,include:
根据所述样本图像的预测目标文本标识、所述样本图像的实际目标文本标识、所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型。According to the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the The similarity between the target text features of the sample images is used to update the target detection model.
本申请实施例还提供了一种目标检测方法,所述方法包括:The embodiment of the present application also provides a target detection method, the method comprising:
获取待检测图像;Obtain the image to be detected;
将所述待检测图像输入预先训练的目标检测模型,得到所述目标检测模型输出的所述待检测图像的目标检测结果;其中,所述目标检测模型是利用本申 请实施例提供的目标检测模型训练方法的任一实施方式进行训练的。Inputting the image to be detected into a pre-trained target detection model to obtain a target detection result of the image to be detected output by the target detection model; wherein, the target detection model is a target detection model provided by an embodiment of the present application Any implementation of the training method for training.
本申请实施例还提供了一种目标检测模型训练装置,所述装置包括:The embodiment of the present application also provides a target detection model training device, the device comprising:
第一获取单元,用于获取样本图像、所述样本图像的实际目标文本标识和所述样本图像的实际目标位置;A first acquiring unit, configured to acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
第一提取单元,用于对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征;The first extraction unit is used to extract the text features of the actual target text identifier of the sample image to obtain the target text features of the sample image;
第一预测单元,用于将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置;a first prediction unit, configured to input the sample image into a target detection model, and obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
第一更新单元,用于根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并返回所述第一预测单元执行所述将所述样本图像输入目标检测模型,直至达到第一停止条件。A first update unit, configured to update the target position according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image. the target detection model, and return to the first prediction unit to execute the inputting the sample image into the target detection model until a first stop condition is reached.
本申请实施例还提供了一种目标检测装置,所述装置包括:The embodiment of the present application also provides a target detection device, the device comprising:
第二获取单元,用于获取待检测图像;a second acquiring unit, configured to acquire an image to be detected;
目标检测单元,用于将所述待检测图像输入预先训练的目标检测模型,得到所述目标检测模型输出的所述待检测图像的目标检测结果;其中,所述目标检测模型是利用本申请实施例提供的目标检测模型训练方法的任一实施方式进行训练的。A target detection unit, configured to input the image to be detected into a pre-trained target detection model, and obtain the target detection result of the image to be detected output by the target detection model; wherein, the target detection model is implemented using the present application Any implementation of the target detection model training method provided by the example is used for training.
本申请实施例还提供了一种设备,所述设备包括处理器以及存储器:The embodiment of the present application also provides a device, the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The processor is configured to execute any implementation of the target detection model training method provided in the embodiments of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiments of the present application.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation of the target detection model training method provided in the embodiment of the present application way, or execute any implementation of the target detection method provided in the embodiment of the present application.
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施 方式。The embodiment of the present application also provides a computer program product. When the computer program product runs on the terminal device, the terminal device executes any implementation manner of the target detection model training method provided in the embodiment of the present application, or executes Any implementation of the target detection method provided in the embodiments of this application.
与现有技术相比,本申请实施例至少具有以下优点:Compared with the prior art, the embodiment of the present application has at least the following advantages:
本申请实施例提供的技术方案中,先对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征;再利用该样本图像、该样本图像的目标文本特征和该样本图像的实际目标位置对目标检测模型进行训练,以使该目标检测模型能够在该样本图像的目标文本特征和该样本图像的实际目标位置的约束下进行目标检测学习,从而使得训练好的目标检测模型具有较好的目标检测性能,以便后续能够利用该训练好的目标检测模型针对待检测图像进行更准确地目标检测,得到并输出该待检测图像的目标检测结果,以使该待检测图像的目标检测结果更准确,如此有利于提高目标检测准确性。In the technical solution provided by the embodiment of the present application, the text feature extraction is first performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; then the sample image, the target text feature of the sample image and the sample image are used The actual target position of the target detection model is trained so that the target detection model can perform target detection learning under the constraints of the target text features of the sample image and the actual target position of the sample image, so that the trained target detection model It has better target detection performance, so that the trained target detection model can be used to perform more accurate target detection on the image to be detected, and the target detection result of the image to be detected is obtained and output, so that the target of the image to be detected The detection result is more accurate, which is conducive to improving the accuracy of target detection.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种目标检测模型训练方法的流程图;FIG. 1 is a flow chart of a method for training a target detection model provided in an embodiment of the present application;
图2为本申请实施例提供的一种目标检测模型的结构示意图;FIG. 2 is a schematic structural diagram of a target detection model provided by an embodiment of the present application;
图3为本申请实施例提供的一种目标检测方法的流程图;FIG. 3 is a flowchart of a target detection method provided in an embodiment of the present application;
图4为本申请实施例提供的一种目标检测模型训练装置的结构示意图;FIG. 4 is a schematic structural diagram of a target detection model training device provided in an embodiment of the present application;
图5为本申请实施例提供的一种目标检测装置的结构示意图。FIG. 5 is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
为了便于理解本申请的技术方案,下面先介绍目标检测模型的训练过程(也就是,目标检测模型训练方法),再介绍目标检测模型的应用过程(也就是,目标检测方法)。In order to facilitate the understanding of the technical solution of the present application, the following first introduces the training process of the target detection model (that is, the target detection model training method), and then introduces the application process of the target detection model (that is, the target detection method).
方法实施例一Method embodiment one
参见图1,该图为本申请实施例提供的一种目标检测模型训练方法的流程图。Referring to FIG. 1 , this figure is a flow chart of a method for training a target detection model provided by an embodiment of the present application.
本申请实施例提供的目标检测模型训练方法,包括S101-S105:The target detection model training method provided in the embodiment of the present application includes S101-S105:
S101:获取样本图像、该样本图像的实际目标文本标识和该样本图像的实际目标位置。S101: Acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image.
其中,样本图像是指训练目标检测模型所需使用的图像。另外,本申请实施例不限定样本图像的个数,例如,样本图像的个数可以是N(也就是,利用N个样本图像训练目标检测模型)。Wherein, the sample image refers to the image used for training the target detection model. In addition, the embodiment of the present application does not limit the number of sample images, for example, the number of sample images may be N (that is, use N sample images to train the target detection model).
样本图像的实际目标文本标识用于唯一表示该样本图像中目标物体。另外,本申请实施例不限定样本图像的实际目标文本标识,例如,该样本图像的实际目标文本标识可以是物体类别(或者,物体名称等)。例如,若样本图像中包括猫,则该样本图像的实际目标文本标识可以是猫。The actual target text identifier of the sample image is used to uniquely represent the target object in the sample image. In addition, this embodiment of the present application does not limit the actual target text identifier of the sample image, for example, the actual target text identifier of the sample image may be an object category (or object name, etc.). For example, if the sample image includes a cat, the actual target text identifier of the sample image may be a cat.
样本图像的实际目标位置用于表示该样本图像中目标物体在该样本图像内实际所占区域。另外,本申请不限定样本图像的实际目标位置的表示方式,可以采用现有的或者未来出现的任一种能够表示出一个物体在图像中所占区域的表示方式进行实施。The actual target position of the sample image is used to represent the area actually occupied by the target object in the sample image in the sample image. In addition, the present application does not limit the representation of the actual target position of the sample image, and any existing or future representation that can represent the area occupied by an object in the image can be used for implementation.
S102:对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征。S102: Perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image.
其中,样本图像的目标文本特征用于描述该样本图像的实际目标文本标识所携带的文本信息(如,语义信息等),以使该样本图像的目标文本特征能够表示出该样本图像中目标物体在该样本图像中实际呈现的特征。Among them, the target text feature of the sample image is used to describe the text information (such as semantic information, etc.) carried by the actual target text mark of the sample image, so that the target text feature of the sample image can represent the target object in the sample image The features actually present in this sample image.
另外,本申请实施例不限定样本图像的目标文本特征的提取方式(也就是,S102的实施方式),可以采用现有的或者未来出现的任一种能够针对一个文本进行特征提取的方法进行实施。为了便于理解,下面结合示例进行说明。In addition, the embodiment of the present application does not limit the method of extracting the target text features of the sample image (that is, the implementation of S102), and any existing or future method that can perform feature extraction for a text can be used for implementation. . For ease of understanding, the following description will be given in combination with examples.
作为示例,S102具体可以包括:将样本图像的实际目标文本标识输入预先训练的语言模型,得到该语言模型输出的该样本图像的目标文本特征。As an example, S102 may specifically include: inputting the actual target text identifier of the sample image into a pre-trained language model, and obtaining the target text feature of the sample image output by the language model.
其中,语言模型用于进行文本特征提取;而且本申请实施例不限定语言模型,可以采用现有的或者未来出现的任一种语言模型进行实施。Wherein, the language model is used for text feature extraction; and the embodiment of the present application does not limit the language model, and any existing or future language model can be used for implementation.
另外,语言模型可以预先根据样本文本和该样本文本的实际文本特征进行训练。其中,样本文本是指训练语言模型所需使用的文本;而且该样本文本的实际文本特征用于描述该样本文本实际携带的文本信息(如,语义信息等)。In addition, the language model can be trained in advance according to the sample text and the actual text features of the sample text. Wherein, the sample text refers to the text required for training the language model; and the actual text features of the sample text are used to describe the text information actually carried by the sample text (such as semantic information, etc.).
此外,本申请实施例不限定语言模型的训练过程,可以采用现有的或者未来出现的任一种能够依据样本文本和该样本文本的实际文本特征对语言模型进行训练的方法进行实施。In addition, the embodiment of the present application does not limit the training process of the language model, and any existing or future method that can train the language model according to the sample text and the actual text features of the sample text can be used for implementation.
基于上述S102的相关内容可知,若样本图像的个数为N,则在获取到第i个样本图像的实际目标文本标识之后,可以利用预先训练的语言模型针对该第i个样本图像的实际目标文本标识进行文本特征提取,得到并输出该第i个样本图像的目标文本特征,以使该第i个样本图像的目标文本特征能够准确地表征出该第i个样本图像的实际目标文本标识所携带的文本信息,以便后续利用该第i个样本图像的目标文本特征约束目标检测模型的训练更新过程。其中,i为正整数,i≤N,N为正整数。Based on the relevant content of S102 above, if the number of sample images is N, after the actual target text identifier of the i-th sample image is obtained, the pre-trained language model can be used to target the actual target text of the i-th sample image The text mark is used for text feature extraction, and the target text feature of the i-th sample image is obtained and output, so that the target text feature of the i-th sample image can accurately represent the actual target text mark of the i-th sample image The text information carried by , so that the target text features of the i-th sample image can be used to constrain the training update process of the target detection model. Wherein, i is a positive integer, i≤N, and N is a positive integer.
可见,因预先训练的语言模型能够准确地提取出一个文本所携带的文本信息(尤其是语义信息),使得该语言模型能够描述的文本个数是无限的,从而使得利用该语言模型针对不同文本输出的这些不同文本的文本特征中任意两者之间均是高度可分性的,如此能够有效地保证任意两个文本的文本特征之间(如,N个样本图像的目标文本特征中任意两个之间)不存在重叠,从而能够有效地提高目标检测模型的检测准确性。另外,还因语言模型在训练过程能够学习到不同文本之间的语义相关性(例如,“猫”与“老虎”之间的语义相关性高于“猫”与“汽车”之间的语义相关),使得训练好的语言模型能够更好地进行文本特征提取,如此能够有效地提高目标检测模型的检测准确性。It can be seen that because the pre-trained language model can accurately extract the text information (especially semantic information) carried by a text, the number of texts that can be described by the language model is unlimited, so that the language model can be used for different texts Any two of the output text features of these different texts are highly separable, so that it can effectively ensure that the text features of any two texts (for example, any two of the target text features of N sample images ) There is no overlap, which can effectively improve the detection accuracy of the target detection model. In addition, because the language model can learn the semantic correlation between different texts during the training process (for example, the semantic correlation between "cat" and "tiger" is higher than that between "cat" and "car") ), so that the trained language model can better extract text features, which can effectively improve the detection accuracy of the target detection model.
S103:将样本图像输入目标检测模型,得到该目标检测模型输出的该样本图像的图像特征和该样本图像的预测目标位置。S103: Input the sample image into the target detection model, and obtain the image features of the sample image and the predicted target position of the sample image output by the target detection model.
其中,样本图像的图像特征用于表示该样本图像中目标物体在该样本图像中预测呈现的特征。Wherein, the image feature of the sample image is used to represent the feature that the target object in the sample image is predicted to appear in the sample image.
样本图像的预测目标位置用于表示该样本图像中目标物体在该样本图像内预测所占区域。The predicted target position of the sample image is used to represent the predicted area occupied by the target object in the sample image in the sample image.
目标检测模型用于进行目标检测(如,检测目标物体所属类别和目标物体的图像位置)。另外,本申请实施例不限定目标检测模型,例如,如图2所示,目标检测模型200可以包括图像特征提取层201、目标类别预测层202和目标位置预测层203。其中,目标类别预测层202的输入数据包括图像特征提取层201的输出数据,而且目标位置预测层203的输入数据包括图像特征提取层201的输出数据。The target detection model is used for target detection (for example, to detect the category of the target object and the image position of the target object). In addition, the embodiment of the present application does not limit the target detection model. For example, as shown in FIG. Wherein, the input data of the target category prediction layer 202 includes the output data of the image feature extraction layer 201 , and the input data of the target position prediction layer 203 includes the output data of the image feature extraction layer 201 .
为了便于理解目标检测模型200的工作原理,下面结合样本图像进行说明。In order to facilitate the understanding of the working principle of the target detection model 200, the following description will be made in conjunction with sample images.
作为示例,在将样本图像输入目标检测模型200之后,该目标检测模型200的工作过程可以包括步骤11-步骤13:As an example, after the sample image is input into the target detection model 200, the working process of the target detection model 200 may include step 11-step 13:
步骤11:将样本图像输入图像特征提取层201,得到该图像特征提取层201输出的该样本图像的图像特征。Step 11: Input the sample image into the image feature extraction layer 201, and obtain the image features of the sample image output by the image feature extraction layer 201.
其中,图像特征提取层201用于针对该图像特征提取层201的输入数据进行图像特征提取。另外,本申请实施例不限定图像特征提取层201的实施方式,可以采用现有的或者未来出现的任一种能够进行图像特征提取的方案进行实施。Wherein, the image feature extraction layer 201 is used for performing image feature extraction on the input data of the image feature extraction layer 201 . In addition, the embodiment of the present application does not limit the implementation manner of the image feature extraction layer 201, and any existing or future solution capable of image feature extraction can be used for implementation.
步骤12:将样本图像的图像特征输入目标类别预测层202,得到该目标类别预测层202输出的该样本图像的预测目标文本标识。Step 12: Input the image features of the sample image into the target category prediction layer 202 to obtain the predicted target text identifier of the sample image output by the target category prediction layer 202 .
其中,目标类别预测层202用于针对该目标类别预测层202的输入数据进行物体类别预测。另外,本申请实施例不限定目标类别预测层202的实施方式,可以采用现有的或者未来出现的任一种能够进行物体类别预测的方案进行实施。Wherein, the object type prediction layer 202 is used for performing object type prediction on the input data of the object type prediction layer 202 . In addition, the embodiment of the present application does not limit the implementation manner of the object category prediction layer 202, and any existing or future solution capable of performing object category prediction can be used for implementation.
样本图像的预测目标文本标识用于表示样本图像中目标物体的预测标识(如,预测类别)。The predicted target text identifier of the sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the sample image.
步骤13:将样本图像的图像特征输入目标位置预测层203,得到该目标位置预测层203输出的该样本图像的预测目标位置。Step 13: Input the image features of the sample image into the target position prediction layer 203 to obtain the predicted target position of the sample image output by the target position prediction layer 203 .
其中,目标位置预测层203用于针对该目标位置预测层203的输入数据进行物体位置预测。另外,本申请实施例不限定目标位置预测层203的实施方式,可以采用现有的或者未来出现的任一种能够进行物体位置预测的方案进行实施。Wherein, the target position prediction layer 203 is used for performing object position prediction on the input data of the target position prediction layer 203 . In addition, the embodiment of the present application does not limit the implementation of the object position prediction layer 203, and any existing or future solution capable of predicting object positions can be used for implementation.
基于上述步骤11至步骤13的相关内容可知,对于图2所示的目标检测模型200来说,在将样本图像输入到该目标检测模型200之后,可以利用图像特征提取层201、目标类别预测层202和目标位置预测层203分别生成并输出该样本图像的图像特征、该样本图像的预测目标文本标识以及该样本图像的预测目标位置,以便后续能够基于这些预测信息来确定该目标检测模型200的目标检测性能。Based on the relevant content of the above step 11 to step 13, it can be seen that for the target detection model 200 shown in FIG. 202 and the target position prediction layer 203 respectively generate and output the image features of the sample image, the predicted target text identifier of the sample image, and the predicted target position of the sample image, so that the subsequent target detection model 200 can be determined based on these prediction information. Object detection performance.
需要说明的是,对于图2所示的目标检测模型200来说,在一些情况下,图像特征提取层201输出的样本图像的图像特征的数据维度可能与该样本图像的目标文本特征的数据维度不一致,故为了保证后续能够顺利地计算该样本图像的图像特征与该样本图像的目标文本特征之间的相似度,可以在图2所示的目标检测模型200中增加一个数据维度变换层,而且该数据维度变换层的输入数据包括图像特征提取层201的输出数据,以使该数据维度变换层能够针对该图像特征提取层201的输出数据(如,样本图像的图像特征)进行数据维度变换,从而使得该数据维度变换层的输出数据能够与上文样本图像的目标文本特征的数据维度保持一致,如此有利于提高样本图像的图像特征与该样本图像的目标文本特征之间的相似度的计算准确性。It should be noted that, for the target detection model 200 shown in FIG. 2 , in some cases, the data dimension of the image feature of the sample image output by the image feature extraction layer 201 may be different from the data dimension of the target text feature of the sample image. Inconsistent, so in order to ensure that the similarity between the image features of the sample image and the target text features of the sample image can be successfully calculated in the future, a data dimension transformation layer can be added in the target detection model 200 shown in FIG. 2 , and The input data of the data dimension transformation layer includes the output data of the image feature extraction layer 201, so that the data dimension transformation layer can perform data dimension transformation for the output data of the image feature extraction layer 201 (such as the image features of the sample image), Therefore, the output data of the data dimension transformation layer can be consistent with the data dimension of the target text feature of the above sample image, which is beneficial to improve the calculation of the similarity between the image feature of the sample image and the target text feature of the sample image accuracy.
基于上述S103的相关内容可知,若样本图像的个数为N,则在获取到第i个样本图像(或者,针对目标检测模型完成一次更新)之后,可以将第i个样本图像输入目标检测模型,以使该目标检测模型针对该第i个样本图像进行目标检测处理,得到并输出该第i个样本图像的图像特征和该第i个样本图像的预测目标位置,以便后续能够基于该第i个样本图像的图像特征及其预测目标位置,确定该目标检测模型的目标检测性能。其中,i为正整数,i≤N,N为正整数。Based on the relevant content of S103 above, if the number of sample images is N, after the i-th sample image is obtained (or, an update is completed for the target detection model), the i-th sample image can be input into the target detection model , so that the target detection model performs target detection processing on the i-th sample image, obtains and outputs the image features of the i-th sample image and the predicted target position of the i-th sample image, so that the subsequent can be based on the i-th sample image The image features of each sample image and its predicted target position are used to determine the target detection performance of the target detection model. Wherein, i is a positive integer, i≤N, and N is a positive integer.
S104:判断是否达到第一停止条件,若是,则执行预设动作;若否,则执行S105。S104: Determine whether the first stop condition is met, if yes, perform a preset action; if not, perform S105.
其中,第一停止条件可以预先设定,而且本申请实施例不限定第一停止条件,例如,该第一停止条件可以为目标检测模型的预测损失值低于第一预设损失阈值,也可以为目标检测模型的预测损失值的变化率低于第一变化率阈值,还可以为目标检测模型的更新次数达到第一次数阈值。Wherein, the first stop condition may be preset, and the embodiment of the present application does not limit the first stop condition, for example, the first stop condition may be that the predicted loss value of the target detection model is lower than the first preset loss threshold, or The rate of change of the predicted loss value of the target detection model is lower than the first rate of change threshold, or the number of updates of the target detection model reaches the first threshold.
需要说明的是,目标检测模型的预测损失值用于表示该目标检测模型针对上文N个样本图像的目标检测性能;而且本申请实施例不限定目标检测模型的预测损失值的计算方式,可以采用现有的或者未来出现的任一种模型预测损失值计算方法进行实施。It should be noted that the predicted loss value of the target detection model is used to represent the target detection performance of the target detection model for the above N sample images; and the embodiment of the present application does not limit the calculation method of the predicted loss value of the target detection model, which can be Use any existing or future model prediction loss value calculation method for implementation.
预设动作可以预先设定。例如,预设动作可以为结束目标检测模型的训练过程(也就是,结束目标检测模型针对N个样本图像的目标检测学习过程)。又如,对于需要向已训练好的目标检测模型增加新物体检测功能(也就是,针对目标检测模型进行增量式学习)时,该预设动作可以包括下文S106-S109。Preset actions can be preset. For example, the preset action may be to end the training process of the target detection model (that is, to end the target detection learning process of the target detection model for N sample images). As another example, when it is necessary to add a new object detection function to the trained object detection model (that is, to perform incremental learning on the object detection model), the preset actions may include the following S106-S109.
基于上述S104的相关内容可知,对于当前轮的目标检测模型来说,可以判断当前轮的目标检测模型是否达到第一停止条件;若达到第一停止条件,则表示当前轮的目标检测模型针对上述N个样本图像具有较好的目标检测性能,从而表示当前轮的目标检测模型的目标检测性能较好,故可以保存当前轮的目标检测模型,以便后续能够利用保存的目标检测模型进行后续工作(如,进行目标检测工作或者进行向目标检测模型增加新物体检测功能的工作);若未达到第一停止条件,则表示当前轮的目标检测模型针对上述N个样本图像的目标检测性能依旧比较差,故可以依据该N个样本图像对应的标签信息以及由当前轮的目标检测模型针对该N个样本图像输出的预测信息对目标检测模型进行模型更新处理。Based on the relevant content of S104 above, it can be known that for the target detection model of the current round, it can be judged whether the target detection model of the current round meets the first stop condition; The N sample images have better target detection performance, which means that the target detection performance of the current round of target detection model is better, so the target detection model of the current round can be saved, so that the subsequent work can be performed using the saved target detection model ( For example, to perform target detection work or to add a new object detection function to the target detection model); if the first stop condition is not met, it means that the target detection performance of the current round of target detection model for the above N sample images is still relatively poor , so the target detection model can be updated according to the label information corresponding to the N sample images and the prediction information output by the target detection model of the current round for the N sample images.
S105:根据样本图像的预测目标位置、该样本图像的实际目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度,更新目标检测模型,并返回执行S103。S105: Update the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, and return to execute S103.
其中,样本图像的图像特征与该样本图像的目标文本特征之间的相似度用于表示样本图像的图像特征与该样本图像的目标文本特征之间的相似程度。另外,本申请实施例不限定样本图像的图像特征与该样本图像的目标文本特征之间的相似度的计算方式,例如,可以采用欧式距离进行计算。Wherein, the similarity between the image feature of the sample image and the target text feature of the sample image is used to represent the similarity between the image feature of the sample image and the target text feature of the sample image. In addition, the embodiment of the present application does not limit the calculation method of the similarity between the image feature of the sample image and the target text feature of the sample image, for example, the Euclidean distance may be used for calculation.
另外,目标检测模型的训练目标可以包括样本图像的预测目标位置尽可能地接近于该样本图像的实际目标位置,以及该样本图像的图像特征尽可能地接近于该样本图像的目标文本特征(也就是,样本图像的图像特征与该样本图像的目标文本特征之间的相似度尽可能地大)。In addition, the training objectives of the target detection model may include that the predicted target position of the sample image is as close as possible to the actual target position of the sample image, and the image features of the sample image are as close as possible to the target text features of the sample image (also That is, the similarity between the image feature of the sample image and the target text feature of the sample image is as large as possible).
基于上述S105的相关内容可知,若样本图像的个数为N,则在确定当前轮的目标检测模型没有达到第一停止条件之后,可以先依据该第i个样本图像的预测目标位置与该样本图像的实际目标位置之间的差距、以及该第i个样本图像的图像特征与该第i个样本图像的目标文本特征之间的相似程度,更新目标检测模型,以使更新后的目标检测模型具有更好的目标检测性能,以便后续继续执行上文S103及其后续步骤。其中,i为正整数,i≤N,N为正整数。Based on the relevant content of S105 above, if the number of sample images is N, after determining that the target detection model of the current round does not meet the first stop condition, it can first be based on the predicted target position of the i-th sample image and the sample The gap between the actual target positions of the images, and the similarity between the image features of the i-th sample image and the target text features of the i-th sample image, update the target detection model, so that the updated target detection model It has better target detection performance, so that the above S103 and its subsequent steps can be continued to be performed subsequently. Wherein, i is a positive integer, i≤N, and N is a positive integer.
基于上述S101至S105的相关内容可知,在本申请实施例提供的目标检测模型训练方法中,可以先对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征;再利用该样本图像、该样本图像的目标文本特征和该样本图像的实际目标位置对目标检测模型进行训练,得到训练好的目标检测模型。其中,因样本图像的目标文本特征能够更准确地表示出该样本图像的实际目标文本标识,使得在该样本图像的目标文本特征的约束下训练好的目标检测模型具有更好的目标检测功能,如此有利于提高目标检测性能。Based on the relevant content of the above S101 to S105, in the target detection model training method provided in the embodiment of the present application, the text feature extraction can be performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; and then use The sample image, the target text feature of the sample image and the actual target position of the sample image are used to train the target detection model to obtain a trained target detection model. Among them, because the target text feature of the sample image can more accurately represent the actual target text mark of the sample image, the target detection model trained under the constraints of the target text feature of the sample image has a better target detection function, This is beneficial to improve the target detection performance.
方法实施例二Method embodiment two
实际上,训练好的目标检测模型针对其已经学习过的目标物体具有较好的目标检测性能,故为了进一步提高目标检测模型的预测性能,可以让训练好的目标检测模型进一步学习一些其仍未学习过的目标物体(也就是,可以针对目标检测模型进行类别增量式学习)。基于此,本申请实施例还提供了目标检测模型训练方法的一种可能的实施方式,在该实施方式中,该目标检测模型训练方法除了包括上述S101-S105以外,还包括S106-S109:In fact, the trained target detection model has better target detection performance for the target objects it has learned, so in order to further improve the prediction performance of the target detection model, the trained target detection model can be further learned Learned target objects (i.e., category incremental learning can be performed for target detection models). Based on this, the embodiment of the present application also provides a possible implementation of the target detection model training method. In this implementation, the target detection model training method includes S106-S109 in addition to the above S101-S105:
S106:在获取到新增图像、该新增图像的实际目标文本标识和该新增图像的实际目标位置之后,对该新增图像的实际目标文本标识进行文本特征提取,得到该新增图像的目标文本特征。S106: After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added image target text features.
其中,新增图像是指针对已训练好的目标检测模型进行类别增量式学习所需使用的图像。Wherein, the newly-added image refers to the image required for category incremental learning for the trained target detection model.
另外,本申请实施例不限定新增图像的个数,例如,新增图像的个数为M;其中,M为正整数。此时,S106-S109可以用于实现目标检测模型在保持已学习到的目标物体的前提下进一步学习如何针对M个新增图像进行目标检测。In addition, the embodiment of the present application does not limit the number of added images, for example, the number of added images is M; wherein, M is a positive integer. At this time, S106-S109 can be used to realize that the target detection model further learns how to perform target detection on the M new images under the premise of keeping the learned target objects.
另外,新增图像的实际目标文本标识、新增图像的实际目标位置、以及新增图像的目标文本特征的相关内容请分别参见上文S101中样本图像的实际目标文本标识、样本图像的实际目标位置、以及上文S102中样本图像的目标文本特征的相关内容,只需将上文S101中样本图像的实际目标文本标识、样本图像的实际目标位置、以及上文S102中样本图像的目标文本特征的相关内容中“样本图像”替换为“新增图像”即可。In addition, for the actual target text identifier of the added image, the actual target position of the added image, and the target text feature of the added image, please refer to the actual target text identifier of the sample image and the actual target of the sample image in S101 above. location, and the relevant content of the target text feature of the sample image in S102 above, only need to identify the actual target text of the sample image in S101 above, the actual target position of the sample image, and the target text feature of the sample image in S102 above Just replace "sample image" with "new image" in the relevant content of .
基于上述S106的相关内容可知,对于已训练好的目标检测模型(如,可以是利用上文S101-S105所示的训练过程训练好的目标检测模型,也可以是在利用上文S101-S105所示的训练过程训练好之后又至少一次利用S106-S109所示的训练过程进行类别增量式学习得到的目标检测模型)来说,在获取到新增图像、该新增图像的实际目标文本标识和该新增图像的实际目标位置之后,可以确定需要针对该已训练好的目标检测模型进行一次类别增量式学习,故可以针对该新增图像的实际目标文本标识进行文本特征提取,得到该新增图像的目标文本特征,以便后续能够利用该新增图像的目标文本特征约束该目标检测模型的类别增量式学习过程,以使再次训练好的目标检测模型能够在保持已学习到的目标物体的前提下进一步学习如何针对这些新增图像进行目标检测。Based on the relevant content of S106 above, it can be seen that for the trained target detection model (for example, it can be a target detection model trained by using the training process shown in S101-S105 above, or it can be a target detection model trained by using the training process shown in S101-S105 above. In the case of the target detection model obtained by using the training process shown in S106-S109 to carry out category incremental learning at least once after the training process shown in the training process is completed), after the newly added image and the actual target text identification of the newly added image are acquired and the actual target position of the newly added image, it can be determined that a class incremental learning is needed for the trained target detection model, so text feature extraction can be performed on the actual target text identifier of the newly added image to obtain the Add the target text features of the image, so that the target text features of the new image can be used to constrain the incremental learning process of the target detection model, so that the retrained target detection model can maintain the learned target Further learning how to perform object detection on these additional images based on the premise of objects.
S107:将历史样例图像和新增图像输入目标检测模型,得到该目标检测模型输出的该历史样例图像的图像特征、该历史样例图像的预测目标位置、该新增图像的图像特征和该新增图像的预测目标位置。S107: Input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image output by the target detection model, the predicted target position of the historical sample image, the image features of the newly added image and The predicted target location for this added image.
其中,历史样例图像可以包括目标检测模型的历史训练过程所使用的全部或者部分图像。Wherein, the historical sample images may include all or part of the images used in the historical training process of the target detection model.
目标检测模型的历史训练过程是指在针对目标检测模型进行当前次类别增量式学习过程之前该目标检测模型已经经历过的类别学习过程。例如,若已训练好的目标检测模型只经历过上文S101-S105所示的类别学习过程,则该目标检测模型的历史训练过程就是指上文S101-S105所示的训练过程。又如,若已训练好的目标检测模型经历过一次上文S101-S105所示的类别学习过程和Q次S106-S109所示的类别增量式学习过程,则该目标检测模型的历史训练过程可以包括上文S101-S105所示的训练过程、第1次S106-S109所示的训练过程至第Q次S106-S109所示的训练过程。The historical training process of the target detection model refers to the category learning process that the target detection model has experienced before the current sub-category incremental learning process for the target detection model. For example, if the trained target detection model has only experienced the category learning process shown in S101-S105 above, the historical training process of the target detection model refers to the training process shown in S101-S105 above. As another example, if the trained target detection model has gone through the category learning process shown in S101-S105 above and Q times the category incremental learning process shown in S106-S109, then the historical training process of the target detection model It may include the training process shown in S101-S105 above, the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109.
另外,本申请实施例不限定历史样例图像的确定过程,例如,在一种可能的实施方式下,历史样例图像的确定过程的确定过程可以包括步骤21-步骤24:In addition, this embodiment of the present application does not limit the determination process of the historical sample image. For example, in a possible implementation manner, the determination process of the historical sample image may include Step 21-Step 24:
步骤21:根据样本图像,确定目标检测模型对应的训练已使用图像。Step 21: According to the sample image, determine the image used for training corresponding to the target detection model.
其中,目标检测模型对应的训练已使用图像是指在该目标检测模型的历史训练过程中使用过的图像。为了便于理解,下面结合两个示例进行说明。Wherein, the training used images corresponding to the target detection model refer to images that have been used in the historical training process of the target detection model. For ease of understanding, two examples are used for description below.
示例1,若目标检测模型的历史训练过程包括上文S101-S105所示的训练过程,则该目标检测模型对应的训练已使用图像可以包括上文N个样本图像。Example 1, if the historical training process of the target detection model includes the training process shown in S101-S105 above, the training used images corresponding to the target detection model may include the above N sample images.
示例2,若目标检测模型的历史训练过程可以包括上文S101-S105所示的训练过程、第1次S106-S109所示的训练过程至第Q次S106-S109所示的训练过程,第q次S106-S109所示的训练过程中使用G q个新增图像进行类别增量式学习,且q为正整数,q≤Q,则该目标检测模型对应的训练已使用图像可以包括上文N个样本图像、G 1个新增图像、G 2个新增图像、……、G Q个新增图像。 Example 2, if the historical training process of the target detection model can include the training process shown in S101-S105 above, the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109, the qth In the training process shown in S106-S109, G q newly added images are used for category incremental learning, and q is a positive integer, q≤Q, then the training used images corresponding to the target detection model can include the above N sample images, G 1 new images, G 2 new images, ..., G Q new images.
基于上述步骤21的相关内容可知,在确定需要对已训练好的目标检测模型进行增量式学习之后,可以先依据在该目标检测模型的历史训练过程中涉及的图像,确定该目标检测模型对应的训练已使用图像,以使该训练已使用图像能够准确地表示出在该目标检测模型的历史学习过程中已经使用过的图像。Based on the relevant content of the above step 21, after it is determined that the trained target detection model needs to be incrementally learned, it can first be determined based on the images involved in the historical training process of the target detection model. The training used image of is used so that the training used image can accurately represent the image that has been used in the historical learning process of the object detection model.
步骤22:根据训练已使用图像的实际目标文本标识,确定至少一个历史目标类别。Step 22: Determine at least one historical target category according to the actual target text identifiers of the images used for training.
其中,历史目标类别是指目标检测模型在该目标检测模型的历史训练过程中已经学习到的物体类别。为了便于理解,下面结合两个示例进行说明。Wherein, the historical target category refers to the object category that the target detection model has learned during the historical training process of the target detection model. For ease of understanding, two examples are used for description below.
示例一,若目标检测模型的历史训练过程包括上文S101-S105所示的训练过程,且上文S101-S105所示的训练过程中N个样本图像对应于R 0个物体类别,则可以将该R 0个物体类别均确定为历史物体类别。 Example 1, if the historical training process of the target detection model includes the training process shown in S101-S105 above, and the N sample images in the training process shown in S101-S105 above correspond to R 0 object categories, then the The R 0 object categories are all determined as historical object categories.
示例二,若目标检测模型的历史训练过程可以包括上文S101-S105所示的训练过程、以及第1次S106-S109所示的训练过程至第Q次S106-S109所示的训练过程,该上文S101-S105所示的训练过程中N个样本图像对应于R 0个物体类别、第q次S106-S109所示的训练过程中G q个新增图像对应于R q个物体类别, 且q为正整数,q≤Q,则可以将R 0个物体类别、R 1个物体类别、R 2个物体类别、……、R Q个物体类别均确定为历史物体类别。 Example 2, if the historical training process of the target detection model can include the training process shown in S101-S105 above, and the training process shown in the first time S106-S109 to the training process shown in the Qth time S106-S109, the In the training process shown in S101-S105 above, the N sample images correspond to R 0 object categories, and in the qth training process shown in S106-S109, G q newly added images correspond to R q object categories, and q is a positive integer, and q≤Q, then R 0 object categories, R 1 object categories, R 2 object categories, ..., R Q object categories can all be determined as historical object categories.
需要说明的是,R 0个物体类别、R 1个物体类别、R 2个物体类别、……、R Q个物体类别中不存在重复出现的物体类别。也就是,R 0个物体类别、R 1个物体类别、R 2个物体类别、……、R q-1个物体类别中任意两个物体类别均不相同。 It should be noted that there are no repeated object categories among R 0 object categories, R 1 object categories, R 2 object categories, . . . , R Q object categories. That is, any two object categories among R 0 object categories, R 1 object categories, R 2 object categories, . . . , R q-1 object categories are different.
基于上述步骤22的相关内容可知,在获取到目标检测模型对应的训练已使用图像之后,可以利用各个训练已使用图像的实际目标文本标识,确定该目标检测模型对应的历史物体类别,以使该历史物体类别能够准确地表示出在该目标检测模型的历史学习过程中已经学到的物体类别。Based on the relevant content of the above step 22, it can be known that after obtaining the training used images corresponding to the target detection model, the actual target text identifiers of each training used images can be used to determine the historical object category corresponding to the target detection model, so that the The historical object categories can accurately represent the object categories that have been learned during the history learning process of this object detection model.
步骤23:根据训练已使用图像的实际目标文本标识,从目标检测模型对应的训练已使用图像中确定属于各个历史目标类别的训练已使用图像。Step 23: According to the actual target text identification of the training used images, determine the training used images belonging to each historical target category from the training used images corresponding to the target detection model.
作为示例,若历史目标类别的个数为M,而且目标检测模型对应的训练已使用图像中存在Y 1个图像属于第1个历史目标类别、Y 2个图像属于第2个历史目标类别、……、以及Y M个图像属于第M个历史目标类别,则步骤23具体可以包括:将目标检测模型对应的训练已使用图像中属于第1个历史目标类别的Y 1个图像均确定为属于第1个历史目标类别的训练已使用图像,将目标检测模型对应的训练已使用图像中属于第2个历史目标类别的Y 2个图像均确定为属于第2个历史目标类别的训练已使用图像,……(以此类推),将目标检测模型对应的训练已使用图像中属于第M个历史目标类别的Y M个图像均确定为属于第M个历史目标类别的训练已使用图像。 As an example, if the number of historical target categories is M, and there are Y 1 images belonging to the first historical target category, Y 2 images belonging to the second historical target category, ... ..., and YM images belong to the Mth historical target category, then step 23 may specifically include: determining Y1 images belonging to the first historical target category in the training images corresponding to the target detection model to belong to the first The training images of 1 historical target category are used, and the Y 2 images belonging to the second historical target category in the training used images corresponding to the target detection model are determined as the training used images belonging to the second historical target category, ... (by analogy), all Y M images belonging to the Mth historical target category in the training used images corresponding to the target detection model are determined as the training used images belonging to the Mth historical target category.
步骤24:分别从属于各个历史目标类别的训练已使用图像中抽取各个历史目标类别对应的历史样例图像。Step 24: Extract historical sample images corresponding to each historical object category from training images that belong to each historical object category.
需要说明的是,本申请实施例不限定步骤24中“抽取”的实施方式,例如,可以参照预先设定的抽取比例(或者,抽取个数等)进行抽取。It should be noted that this embodiment of the present application does not limit the implementation of "extraction" in step 24. For example, the extraction may be performed with reference to a preset extraction ratio (or number of extractions, etc.).
例如,若抽取比例为10%,且历史目标类别的个数为M,则步骤24具体可以包括:从属于第1个历史目标类别的训练已使用图像中按照10%的抽取比例进行随机抽取,得到第1个历史目标类别对应的各个历史样例图像,以使该第1个历史目标类别对应的各个历史样例图像的实际目标文本标识均为该第1个历史目标类别;从属于第2个历史目标类别的训练已使用图像中按照10%的抽 取比例进行随机抽取,得到第2个历史目标类别对应的各个历史样例图像,以使该第2个历史目标类别对应的各个历史样例图像的实际目标文本标识均为该第2个历史目标类别;……(以此类推);从属于第M个历史目标类别的训练已使用图像中按照10%的抽取比例进行随机抽取,得到第M个历史目标类别对应的各个历史样例图像,以使该第M个历史目标类别对应的各个历史样例图像的实际目标文本标识均为该第M个历史目标类别。For example, if the extraction ratio is 10%, and the number of historical object categories is M, then step 24 may specifically include: performing random extraction according to an extraction ratio of 10% from the training images that belong to the first historical object category, Obtain each historical sample image corresponding to the first historical target category, so that the actual target text identification of each historical sample image corresponding to the first historical target category is the first historical target category; subordinate to the second The training of the first historical target category has been randomly selected according to the sampling ratio of 10% in the used image, and each historical sample image corresponding to the second historical target category is obtained, so that each historical sample image corresponding to the second historical target category The actual target text identification of the image is the second historical target category; ... (and so on); the training images belonging to the Mth historical target category are randomly selected according to the sampling ratio of 10%, and the first Each historical sample image corresponding to the M historical target category, so that the actual target text identifiers of each historical sample image corresponding to the M th historical target category are all the M th historical target category.
基于上述步骤21至步骤24的相关内容可知,在确定需要对已训练好的目标检测模型进行增量式学习之后,可以从该目标检测模型的历史训练过程所涉及的图像中抽取一些历史样例图像,以使这些历史样例图像能够代表该在该目标检测模型的历史学习过程中已经学习到的物体类别。Based on the relevant content of the above steps 21 to 24, after it is determined that the trained target detection model needs to be incrementally learned, some historical samples can be extracted from the images involved in the historical training process of the target detection model images, so that these historical sample images can represent the object categories that have been learned during the historical learning process of the object detection model.
另外,历史样例图像的图像特征、历史样例图像的预测目标位置的相关内容请分别参见上文S103中“样本图像的图像特征”和“样本图像的预测目标位置”的相关内容,只需将上文S103中“样本图像的图像特征”和“样本图像的预测目标位置”的相关内容中“样本图像”替换为“历史样例图像”即可。In addition, for the image features of the historical sample images and the predicted target positions of the historical sample images, please refer to the related content of "Image Features of Sample Images" and "Predicted Target Positions of Sample Images" in S103 above. Just replace “sample image” with “historical sample image” in the related content of “image feature of sample image” and “predicted target position of sample image” in S103 above.
此外,新增图像的图像特征、新增图像的预测目标位置的相关内容请分别参见上文S103中“样本图像的图像特征”和“样本图像的预测目标位置”的相关内容,只需将上文S103中“样本图像的图像特征”和“样本图像的预测目标位置”的相关内容中“样本图像”替换为“新增图像”即可。In addition, for the image features of the newly added image and the predicted target position of the newly added image, please refer to the relevant content of the "image feature of the sample image" and "predicted target position of the sample image" in S103 above. In document S103, in the related content of "image features of sample image" and "predicted target position of sample image", "sample image" can be replaced with "new image".
基于上述S103的相关内容可知,在获取到历史样例图像和新增图像之后,可以将该历史样例图像和新增图像分别输入目标检测模型,以使该目标检测模型分别针对该历史样例图像和该新增图像进行目标检测,得到并输出该历史样例图像的图像特征以及预测目标位置、该新增图像的图像特征以及预测目标位置,以便后续能够基于这些预测信息确定目标检测模型的目标检测性能。Based on the relevant content of S103 above, after obtaining the historical sample image and the newly added image, the historical sample image and the newly added image can be respectively input into the target detection model, so that the target detection model can target the historical sample image and the newly added image for target detection, obtain and output the image features of the historical sample image and the predicted target position, the image features of the newly added image and the predicted target position, so that the target detection model can be determined based on these predicted information. Object detection performance.
S108:判断是否达到第二停止条件,若是,则执行预设步骤;若否,则执行S109。S108: Judging whether the second stop condition is met, if yes, perform preset steps; if not, perform S109.
其中,第二停止条件可以预先设定,而且本申请实施例不限定第二停止条件,例如,该第二停止条件可以为目标检测模型的检测损失值低于第二预设损失阈值,也可以为目标检测模型的检测损失值的变化率低于第二变化率阈值,还可以为目标检测模型的更新次数达到第二次数阈值。Wherein, the second stop condition may be preset, and the embodiment of the present application does not limit the second stop condition, for example, the second stop condition may be that the detection loss value of the target detection model is lower than the second preset loss threshold, or The rate of change of the detection loss value of the target detection model is lower than the second rate of change threshold, or the number of updates of the target detection model reaches the second threshold.
需要说明的是,目标检测模型的检测损失值用于表示该目标检测模型针对历史样例图像和新增图像的目标检测性能;而且本申请实施例不限定目标检测模型的检测损失值的计算方式,可以采用现有的或者未来出现的任一种模型检测损失值计算方法进行实施。It should be noted that the detection loss value of the target detection model is used to represent the target detection performance of the target detection model for historical sample images and newly added images; and the embodiment of the present application does not limit the calculation method of the detection loss value of the target detection model , which can be implemented by using any existing or future model detection loss value calculation method.
实际上,因各个历史目标类别对应的历史样例图像的个数通常比较少,故为了提高这些历史样例图像针对目标检测模型的影响,本申请实施例还提供了一种目标检测模型的检测损失值的计算方式,其具体可以包括步骤31-步骤33:In fact, since the number of historical sample images corresponding to each historical target category is usually relatively small, in order to improve the influence of these historical sample images on the target detection model, the embodiment of the present application also provides a detection method of the target detection model. The calculation method of the loss value may specifically include step 31-step 33:
步骤31:根据历史样例图像的预测目标位置、该历史样例图像的实际目标位置、以及该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度,确定历史图像损失值。Step 31: Determine the historical Image loss value.
其中,历史图像损失值是指目标检测模型针对历史样例图像进行目标检测时所产生的损失值,以使该历史图像损失值用于表示该目标检测模型针对历史样例图像的目标检测性能。Wherein, the historical image loss value refers to the loss value generated when the target detection model performs target detection on the historical sample images, so that the historical image loss value is used to represent the target detection performance of the target detection model on the historical sample images.
另外,本申请实施例不限定历史图像损失值的计算方式,可以采用现有的或者未来出现的任一种预测损失值计算方法进行实施。In addition, the embodiment of the present application does not limit the calculation method of the historical image loss value, and any existing or future prediction loss value calculation method may be used for implementation.
步骤32:根据新增图像的预测目标位置、该新增图像的实际目标位置、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度,确定新增图像损失值。Step 32: According to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, determine the loss value of the added image .
其中,新增图像损失值是指目标检测模型针对新增图像进行目标检测时所产生的损失值,以使该新增图像损失值用于表示该目标检测模型针对新增图像的目标检测性能。Wherein, the newly added image loss value refers to the loss value generated when the target detection model performs target detection for the newly added image, so that the newly added image loss value is used to represent the target detection performance of the target detection model for the newly added image.
另外,本申请实施例不限定新增图像损失值的计算方式,可以采用现有的或者未来出现的任一种预测损失值计算方法进行实施。In addition, the embodiment of the present application does not limit the calculation method of the newly added image loss value, and any existing or future prediction loss value calculation method may be used for implementation.
步骤33:将历史图像损失值和新增图像损失值进行加权求和,得到目标检测模型的检测损失值。其中,该历史图像损失值对应的加权权重高于该新增图像损失值对应的加权权重。Step 33: Perform weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model. Wherein, the weighting weight corresponding to the historical image loss value is higher than the weighting weight corresponding to the newly added image loss value.
其中,历史图像损失值对应的加权权重是指在步骤33的“加权求和”中该历史图像损失值所需乘以的权重值。另外,历史图像损失值对应的加权权重可以预先设定。Wherein, the weighted weight corresponding to the historical image loss value refers to the weight value to be multiplied by the historical image loss value in the "weighted summation" in step 33 . In addition, the weighting weights corresponding to the historical image loss values may be preset.
新增图像损失值对应的加权权重是指在步骤33的“加权求和”中该新增图像损失值所需乘以的权重值。另外,新增图像损失值对应的加权权重可以预先设定。The weighted weight corresponding to the newly added image loss value refers to the weight value to be multiplied by the newly added image loss value in the "weighted sum" in step 33 . In addition, the weighting weights corresponding to the newly added image loss values may be preset.
基于上述步骤31至步骤33的相关内容可知,为了提高少量历史样例图像及其标签信息对目标检测模型的训练更新过程所产生的约束力,可以在计算目标检测模型的检测损失值的过程中提升历史图像损失值对应的加权权重,以使基于该历史图像损失值对应的加权权重训练好的目标检测模型不仅能够实现针对该目标检测模型对应的新增图像进行准确地目标检测,也能够实现依旧针对该目标检测模型对应的训练已使用图像进行准确地目标检测,如此有利于提高类别增量式学习的准确性。Based on the relevant content of the above steps 31 to 33, in order to improve the binding force of a small number of historical sample images and their label information on the training update process of the target detection model, it can be used in the process of calculating the detection loss value of the target detection model Increase the weighted weight corresponding to the historical image loss value, so that the target detection model trained based on the weighted weight corresponding to the historical image loss value can not only realize accurate target detection for the newly added image corresponding to the target detection model, but also realize Still for the training corresponding to the target detection model, images have been used for accurate target detection, which is conducive to improving the accuracy of category incremental learning.
预设步骤可以预先设定。例如,预设步骤可以为结束目标检测模型的当前次的类别增量式学习过程。又如,对于需要再次向已训练好的目标检测模型增加新物体检测功能(也就是,针对目标检测模型进行下一次的类别增量式学习)时,该预设步骤可以包括上文S106-S109。The preset steps can be preset. For example, the preset step may be to end the current category incremental learning process of the target detection model. As another example, when it is necessary to add a new object detection function to the trained target detection model again (that is, to perform the next category incremental learning for the target detection model), the preset steps may include the above S106-S109 .
基于上述S108的相关内容可知,对于当前轮的目标检测模型来说,可以判断当前轮的目标检测模型是否达到第二停止条件;若达到第二停止条件,则表示当前轮的目标检测模型针对上述历史样例图像以及新增图像均具有较好的目标检测性能,从而表示当前轮的目标检测模型的目标检测性能较好,故可以保存当前轮的目标检测模型,以便后续能够利用保存的目标检测模型进行后续工作(如,进行目标检测工作或者进行向目标检测模型再次增加新物体检测功能的工作);若未达到第二停止条件,则表示当前轮的目标检测模型针对上述历史样例图像以及新增图像的目标检测性能依旧比较差,故可以依据历史样例图像对应的标签信息、新增图像对应的标签信息、以及由当前轮的目标检测模型针对该历史样例图像和该新增图像输出的预测信息对目标检测模型进行更新处理。Based on the relevant content of S108 above, it can be known that for the target detection model of the current round, it can be judged whether the target detection model of the current round meets the second stop condition; Both historical sample images and newly added images have better target detection performance, which means that the target detection performance of the current round of target detection model is better, so the current round of target detection model can be saved so that the saved target detection can be used later The model performs follow-up work (such as performing target detection or adding new object detection functions to the target detection model); if the second stop condition is not reached, it means that the target detection model of the current round is aimed at the above-mentioned historical sample images and The target detection performance of the newly added image is still relatively poor, so it can be based on the label information corresponding to the historical sample image, the label information corresponding to the newly added image, and the target detection model of the current round for the historical sample image and the newly added image. The output prediction information updates the target detection model.
S109:根据历史样例图像的预测目标位置、该历史样例图像的实际目标位置、该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度、新增图像的预测目标位置、该新增图像的实际目标位置、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度,更新目标检测模型,并返回执行S107。S109: According to the predicted target position of the historical sample image, the actual target position of the historical sample image, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, and the Predict the target position, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, update the target detection model, and return to execute S107.
其中,目标检测模型的训练目标可以包括历史样例图像的预测目标位置尽可能地接近于该历史样例图像的实际目标位置、该历史样例图像的图像特征尽可能地接近于该历史样例图像的目标文本特征(也就是,历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度尽可能地大)、新增图像的预测目标位置尽可能地接近于该新增图像的实际目标位置,以及该新增图像的图像特征尽可能地接近于该新增图像的目标文本特征(也就是,新增图像的图像特征与该新增图像的目标文本特征之间的相似度尽可能地大)。Among them, the training target of the target detection model may include that the predicted target position of the historical sample image is as close as possible to the actual target position of the historical sample image, and the image features of the historical sample image are as close as possible to the historical sample image. The target text feature of the image (that is, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image is as large as possible), and the predicted target position of the new image is as close as possible to the The actual target position of the added image, and the image feature of the added image is as close as possible to the target text feature of the added image (that is, the distance between the image feature of the added image and the target text feature of the added image similarity as large as possible).
另外,本申请实施例不限定S109的实施方式,例如,S109具体可以包括S1091-S1094:In addition, the embodiment of this application does not limit the implementation of S109. For example, S109 may specifically include S1091-S1094:
S1091:根据历史样例图像的预测目标位置、该历史样例图像的实际目标位置、以及该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度,确定历史图像损失值。S1091: Determine the historical image according to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the similarity between the image feature of the historical sample image and the target text feature of the historical sample image loss value.
S1092:根据新增图像的预测目标位置、该新增图像的实际目标位置、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度,确定新增图像损失值。S1092: Determine the added image loss value according to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image.
S1093:将历史图像损失值和新增图像损失值进行加权求和,得到目标检测模型的检测损失值。其中,该历史图像损失值对应的加权权重高于该新增图像损失值对应的加权权重。S1093: Perform weighted summation of the historical image loss value and the newly added image loss value to obtain a detection loss value of the target detection model. Wherein, the weighting weight corresponding to the historical image loss value is higher than the weighting weight corresponding to the newly added image loss value.
需要说明的是,S1091-S1093的相关内容请分别参见上文步骤31-步骤33的相关内容。It should be noted that, for the relevant content of S1091-S1093, please refer to the relevant content of step 31-step 33 above.
S1094:根据目标检测模型的检测损失值,更新该目标检测模型。S1094: Update the target detection model according to the detection loss value of the target detection model.
需要说明的是,本申请实施例不限定S1094的实施方式,可以采用现有的任一种依据损失值进行模型更新的方法进行实施。It should be noted that the embodiment of the present application does not limit the implementation of S1094, and any existing method for updating the model based on the loss value may be used for implementation.
基于上述S106至S109的相关内容可知,在本申请实施例提供的目标检测模型训练方法中,对于已训练好的目标检测模型来说,若需要向该目标检测模型中增加新物体检测功能,则可以利用新增图像及其标签信息针对该目标检测模型进行类别增量式学习,以使学习好的目标检测模型能够在保持原有目标检测功能的前提下新增针对新增图像的目标检测功能,如此有利于不断地提高目标检测模型的目标检测性能。Based on the relevant content of the above S106 to S109, in the target detection model training method provided in the embodiment of the present application, for the trained target detection model, if it is necessary to add a new object detection function to the target detection model, then The new image and its label information can be used to carry out category incremental learning for the target detection model, so that the learned target detection model can add the target detection function for the new image while maintaining the original target detection function. , which is conducive to continuously improving the target detection performance of the target detection model.
方法实施例三Method embodiment three
为了进一步提高目标检测模型的目标检测性能,本申请实施例还提供了目标检测模型训练方法的一种可能的实施方式,其具体包括步骤41-步骤45:In order to further improve the target detection performance of the target detection model, the embodiment of the present application also provides a possible implementation of the target detection model training method, which specifically includes steps 41-45:
步骤41:获取样本图像、该样本图像的实际目标文本标识和该样本图像的实际目标位置。Step 41: Obtain a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image.
步骤42:对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征。Step 42: Perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image.
需要说明的是,步骤41-步骤42的相关内容分别参见上文S101-S102。It should be noted that, for the relevant content of step 41-step 42, refer to the above S101-S102 respectively.
步骤43:将样本图像输入目标检测模型,得到该目标检测模型输出的该样本图像的图像特征、该样本图像的预测目标文本标识和该样本图像的预测目标位置。Step 43: Input the sample image into the target detection model, and obtain the image features of the sample image output by the target detection model, the predicted target text mark of the sample image, and the predicted target position of the sample image.
其中,样本图像的预测目标文本标识用于表示该样本图像中目标物体的预测标识(如,预测类别)。Wherein, the predicted target text identifier of the sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the sample image.
需要说明的是,步骤43可以采用上文S103的任一实施方式进行实施,只需将上文S103中目标检测模型的输出数据由“样本图像的图像特征和该样本图像的预测目标位置”替换为“样本图像的图像特征、该样本图像的预测目标文本标识和该样本图像的预测目标位置”即可。It should be noted that step 43 can be implemented in any of the above S103 implementations, only the output data of the target detection model in S103 above is replaced by "the image features of the sample image and the predicted target position of the sample image" It only needs to be “the image feature of the sample image, the predicted target text identifier of the sample image, and the predicted target position of the sample image”.
步骤44:判断是否达到第一停止条件,若是,则执行预设动作;若否,则执行步骤45。Step 44: Judging whether the first stop condition is met, if yes, execute a preset action; if not, execute step 45.
需要说明的是,步骤44的相关内容请参见上文S104的相关内容。另外,步骤44中“目标检测模型的预测损失值”是根据样本图像的预测目标文本标识、该样本图像的实际目标文本标识、该样本图像的预测目标位置、该样本图像的实际目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度进行计算的。It should be noted that, for the relevant content of step 44, please refer to the relevant content of S104 above. In addition, the "predicted loss value of the target detection model" in step 44 is based on the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and The similarity between the image feature of the sample image and the target text feature of the sample image is calculated.
步骤45:根据样本图像的预测目标文本标识、该样本图像的实际目标文本标识、该样本图像的预测目标位置、该样本图像的实际目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度,更新目标检测模型,并返回执行步骤43。Step 45: According to the predicted target text mark of the sample image, the actual target text mark of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the target of the sample image The similarity between text features is used to update the target detection model, and return to step 43.
需要说明的是,步骤45可以采用上文S105的任一实施方式进行实施,只需将上文S105的任一实施方式中“样本图像的预测目标位置、该样本图像的实际 目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度”替换为“样本图像的预测目标文本标识、该样本图像的实际目标文本标识、该样本图像的预测目标位置、该样本图像的实际目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度”即可。It should be noted that step 45 can be implemented using any of the implementations of S105 above, and it is only necessary to combine the "predicted target position of the sample image, the actual target position of the sample image, and the The similarity between the image features of the sample image and the target text features of the sample image" is replaced by "the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the sample image and the similarity between the image features of the sample image and the target text features of the sample image".
也就是,步骤45中目标检测模型的更新过程是根据样本图像的预测目标文本标识、该样本图像的实际目标文本标识、该样本图像的预测目标位置、该样本图像的实际目标位置、以及该样本图像的图像特征与该样本图像的目标文本特征之间的相似度进行实施。That is, the update process of the target detection model in step 45 is based on the predicted target text mark of the sample image, the actual target text mark of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the sample The similarity between the image features of the image and the target text features of the sample image is implemented.
基于上述步骤41至步骤45的相关内容可知,在本申请实施例提供的目标检测模型训练方法中,可以先对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征;再利用该样本图像、该样本图像的目标文本特征、该样本图像的实际目标文本标识、以及该样本图像的实际目标位置对目标检测模型进行训练,得到训练好的目标检测模型。其中,因目标检测模型是在样本图像的目标文本特征、实际目标文本标识以及实际目标位置这三种标签信息的约束下进行训练的,使得训练好的目标检测模型具有更好的目标检测功能,如此有利于提高目标检测性能。Based on the relevant content of the above step 41 to step 45, in the target detection model training method provided in the embodiment of the present application, the text feature extraction can be performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; Then use the sample image, the target text feature of the sample image, the actual target text identifier of the sample image, and the actual target position of the sample image to train the target detection model to obtain a trained target detection model. Among them, because the target detection model is trained under the constraints of the target text features of the sample image, the actual target text identifier, and the actual target position, the trained target detection model has a better target detection function. This is beneficial to improve the target detection performance.
方法实施例四Method Embodiment Four
为了进一步提高目标检测模型的预测性能,本申请实施例还提供了目标检测模型训练方法的一种可能的实施方式,在该实施方式中,该目标检测模型训练方法除了包括上述步骤41-步骤45以外,还包括步骤46-步骤49:In order to further improve the prediction performance of the target detection model, the embodiment of the present application also provides a possible implementation of the target detection model training method. In this embodiment, the target detection model training method includes the above steps 41-45 In addition, step 46-step 49 is also included:
步骤46:在获取到新增图像、该新增图像的实际目标文本标识和该新增图像的实际目标位置之后,对该新增图像的实际目标文本标识进行文本特征提取,得到该新增图像的目标文本特征。Step 46: After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added image target text features.
需要说明的是,步骤46的相关内容可以参见上文S106的相关内容。It should be noted that, for the relevant content of step 46, refer to the relevant content of S106 above.
步骤47:将历史样例图像和新增图像输入目标检测模型,得到该目标检测模型输出的该历史样例图像的图像特征、该历史样例图像的预测目标文本标识、该历史样例图像的预测目标位置、该新增图像的图像特征、该新增图像的预测目标文本标识和该新增图像的预测目标位置。Step 47: Input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image output by the target detection model, the predicted target text identifier of the historical sample image, and the The predicted target position, the image feature of the added image, the predicted target text identifier of the added image, and the predicted target position of the added image.
其中,历史样例图像的预测目标文本标识用于表示该历史样例图像中目标物体的预测标识(如,预测类别)。Wherein, the predicted target text identifier of the historical sample image is used to represent the predicted identifier (eg, predicted category) of the target object in the historical sample image.
新增图像的预测目标文本标识用于表示该新增图像中目标物体的预测标识(如,预测类别)。The predicted target text identifier of the added image is used to represent the predicted identifier (eg, predicted category) of the target object in the added image.
需要说明的是,步骤47的相关内容可以采用上文S107的任一实施方式进行实施,只需将上文S107中目标检测模型的输出数据由“历史样例图像的图像特征、该历史样例图像的预测目标位置、新增图像的图像特征和该新增图像的预测目标位置”替换为“历史样例图像的图像特征、该历史样例图像的预测目标文本标识、该历史样例图像的预测目标位置、新增图像的图像特征、该新增图像的预测目标文本标识和该新增图像的预测目标位置”即可。It should be noted that the relevant content of step 47 can be implemented by using any of the implementation methods of S107 above. It is only necessary to convert the output data of the target detection model in S107 above from "the image characteristics of the historical sample image, the historical sample image The predicted target position of the image, the image feature of the newly added image, and the predicted target position of the newly added image" are replaced with "the image feature of the historical sample image, the predicted target text identifier of the historical sample image, the The predicted target position, the image feature of the added image, the predicted target text identifier of the added image, and the predicted target position of the added image” are enough.
步骤48:判断是否达到第二停止条件,若是,则执行预设步骤;若否,则执行步骤49。Step 48: Judging whether the second stop condition is met, if yes, execute the preset step; if not, execute step 49.
需要说明的是,步骤48的相关内容可以参见上文S108的相关内容。另外,步骤48中“目标检测模型的检测损失值”是根据历史样例图像的预测目标文本标识、该历史样例图像的实际目标文本标识、该历史样例图像的预测目标位置、该历史样例图像的实际目标位置、新增图像的预测目标文本标识、该新增图像的实际目标文本标识、该新增图像的预测目标位置、该新增图像的实际目标位置、该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度进行计算的。It should be noted that, for the relevant content of step 48, reference may be made to the relevant content of S108 above. In addition, the "detection loss value of the target detection model" in step 48 is based on the predicted target text identifier of the historical sample image, the actual target text identifier of the historical sample image, the predicted target position of the historical sample image, the historical sample image The actual target position of the example image, the predicted target text mark of the newly added image, the actual target text mark of the added image, the predicted target position of the added image, the actual target position of the added image, the historical sample image The similarity between the image feature and the target text feature of the historical sample image, and the similarity between the image feature of the added image and the target text feature of the added image are calculated.
步骤49:根据历史样例图像的预测目标文本标识、该历史样例图像的实际目标文本标识、该历史样例图像的预测目标位置、该历史样例图像的实际目标位置、新增图像的预测目标文本标识、该新增图像的实际目标文本标识、该新增图像的预测目标位置、该新增图像的实际目标位置、该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度,更新目标检测模型,并返回执行步骤47。Step 49: According to the predicted target text mark of the historical sample image, the actual target text mark of the historical sample image, the predicted target position of the historical sample image, the actual target position of the historical sample image, and the prediction of the newly added image The target text identifier, the actual target text identifier of the added image, the predicted target position of the added image, the actual target position of the added image, the image features of the historical sample image and the target text features of the historical sample image and the similarity between the image features of the added image and the target text features of the added image, update the target detection model, and return to step 47.
需要说明的是,步骤49可以采用上文S109的任一实施方式进行实施,只需将上文S109的任一实施方式中“历史样例图像的预测目标位置、该历史样例图像的实际目标位置、新增图像的预测目标位置、该新增图像的实际目标位置、 该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度”替换为“历史样例图像的预测目标文本标识、该历史样例图像的实际目标文本标识、该历史样例图像的预测目标位置、该历史样例图像的实际目标位置、新增图像的预测目标文本标识、该新增图像的实际目标文本标识、该新增图像的预测目标位置、该新增图像的实际目标位置、该历史样例图像的图像特征与该历史样例图像的目标文本特征之间的相似度、以及该新增图像的图像特征与该新增图像的目标文本特征之间的相似度”即可。It should be noted that step 49 can be implemented by using any of the implementation methods of S109 above, and only need to set the "predicted target position of the historical sample image, the actual target of the historical sample image" in any implementation of S109 above position, the predicted target position of the newly added image, the actual target position of the newly added image, the similarity between the image features of the historical sample image and the target text features of the historical sample image, and the image The similarity between the feature and the target text feature of the newly added image" is replaced by "the predicted target text identifier of the historical sample image, the actual target text identifier of the historical sample image, the predicted target position of the historical sample image, The actual target position of the historical sample image, the predicted target text mark of the added image, the actual target text mark of the added image, the predicted target position of the added image, the actual target position of the added image, the historical sample The similarity between the image features of the example image and the target text features of the historical example image, and the similarity between the image features of the added image and the target text features of the added image” can be used.
基于上述步骤46至步骤49的相关内容可知,在本申请实施例提供的目标检测模型训练方法中,对于已训练好的目标检测模型来说,若需要向该目标检测模型中增加新物体检测功能,则可以利用新增图像及其三项标签信息(也就是,目标文本特征、实际目标文本标识和实际目标位置)针对该目标检测模型进行增量式学习,以使学习好的目标检测模型能够在保持原有目标检测功能的前提下新增针对新增图像的目标检测功能,如此有利于不断地提高目标检测模型的目标检测性能。Based on the relevant content of the above step 46 to step 49, in the target detection model training method provided in the embodiment of the present application, for the trained target detection model, if it is necessary to add a new object detection function to the target detection model , then the target detection model can be incrementally learned by using the newly added image and its three label information (that is, target text features, actual target text identification, and actual target position), so that the learned target detection model can On the premise of maintaining the original target detection function, the target detection function for new images is added, which is conducive to continuously improving the target detection performance of the target detection model.
在训练好目标检测模型之后,可以利用该目标检测模型进行目标检测。基于此,本申请实施例还提供了一种目标检测方法,下面结合附图进行说明。After the target detection model is trained, the target detection model can be used for target detection. Based on this, an embodiment of the present application further provides a target detection method, which will be described below with reference to the accompanying drawings.
方法实施例五Method Embodiment Five
参见图3,该图为本申请实施例提供的一种目标检测方法的流程图。Referring to FIG. 3 , this figure is a flow chart of a target detection method provided by an embodiment of the present application.
本申请实施例提供的目标检测方法,包括S301-S302:The target detection method provided in the embodiment of this application includes S301-S302:
S301:获取待检测图像。S301: Acquire an image to be detected.
其中,待检测图像是指需要进行目标检测处理的图像。Wherein, the image to be detected refers to an image that needs to be subjected to target detection processing.
S302:将待检测图像输入预先训练的目标检测模型,得到该目标检测模型输出的该待检测图像的目标检测结果。S302: Input the image to be detected into a pre-trained target detection model, and obtain a target detection result of the image to be detected output by the target detection model.
其中,目标检测模型是利用本申请实施例提供的目标检测模型训练方法的任一实施方式进行训练的。Wherein, the target detection model is trained by using any implementation of the target detection model training method provided in the embodiment of the present application.
待检测图像的目标检测结果是由目标检测模型针对该待检测图像进行目标检测得到的。另外,本申请实施例不限定待检测图像的目标检测结果,例如,待检测图像的目标检测结果可以包括该待检测图像中目标物体的预测目标文 本标识(如,预测目标类别)和/或该待检测图像中目标物体在该待检测图像内所占区域。The object detection result of the image to be detected is obtained by the object detection model performing object detection on the image to be detected. In addition, this embodiment of the present application does not limit the target detection result of the image to be detected. For example, the target detection result of the image to be detected may include the predicted target text identifier (for example, the predicted target category) of the target object in the image to be detected and/or the The area occupied by the target object in the image to be detected in the image to be detected.
基于上述S301至S302的相关内容可知,在获取到待检测图像之后,可以利用已训练好的目标检测模型针对该待检测图像进行目标检测,得到并输出该待检测图像的目标检测结果,以使该待检测图像的目标检测结果能够准确地表示出该待检测图像中目标物体的相关信息(如,目标类别信息以及目标位置信息等)。其中,因已训练好的目标检测模型具有较好的目标检测性能,使得利用该目标检测模型确定的待检测图像的目标检测结果更准确,如此有利于提高目标检测准确性。Based on the relevant content of the above S301 to S302, it can be known that after the image to be detected is obtained, the target detection model that has been trained can be used to perform target detection on the image to be detected, and the target detection result of the image to be detected can be obtained and output, so that The target detection result of the image to be detected can accurately represent the relevant information of the target object in the image to be detected (eg, target category information and target position information, etc.). Among them, since the trained target detection model has better target detection performance, the target detection result of the image to be detected determined by using the target detection model is more accurate, which is beneficial to improve the accuracy of target detection.
基于上述方法实施例提供的目标检测模型训练方法,本申请实施例还提供了一种目标检测模型训练装置,下面结合附图进行解释和说明。Based on the target detection model training method provided by the above method embodiment, the embodiment of the present application also provides a target detection model training device, which will be explained and described below with reference to the accompanying drawings.
装置实施例一Device embodiment one
装置实施例一提供的目标检测模型训练装置的技术详情,请参照上述方法实施例。For the technical details of the target detection model training device provided in the first device embodiment, please refer to the above method embodiment.
参见图4,该图为本申请实施例提供的一种目标检测模型训练装置的结构示意图。Referring to FIG. 4 , this figure is a schematic structural diagram of a target detection model training device provided by an embodiment of the present application.
本申请实施例提供的目标检测模型训练装置400,包括:The target detection model training device 400 provided in the embodiment of the present application includes:
第一获取单元401,用于获取样本图像、所述样本图像的实际目标文本标识和所述样本图像的实际目标位置;A first acquiring unit 401, configured to acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
第一提取单元402,用于对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征;The first extraction unit 402 is configured to perform text feature extraction on the actual target text identifier of the sample image to obtain the target text feature of the sample image;
第一预测单元403,用于将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置;The first prediction unit 403 is configured to input the sample image into the target detection model, and obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
第一更新单元404,用于根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并返回所述第一预测单元403执行所述将所述样本图像输入目标检测模型,直至达到第一停止条件。The first updating unit 404 is configured to, according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, Update the target detection model, and return to the first prediction unit 403 to execute the input of the sample image into the target detection model until the first stop condition is reached.
在一种可能的实施方式中,所述第一提取单元402,具体用于:In a possible implementation manner, the first extraction unit 402 is specifically configured to:
将所述样本图像的实际目标文本标识输入预先训练的语言模型,得到所述语言模型输出的所述样本图像的目标文本特征;其中,所述语言模型是根据样本文本和所述样本文本的实际文本特征进行训练的。Inputting the actual target text identifier of the sample image into a pre-trained language model to obtain the target text features of the sample image output by the language model; wherein, the language model is based on the actual text of the sample text and the sample text Text features are trained.
在一种可能的实施方式中,所述目标检测模型训练装置400还包括:In a possible implementation manner, the target detection model training device 400 further includes:
第二提取单元,用于在达到第一停止条件且获取到新增图像、所述新增图像的实际目标文本标识和所述新增图像的实际目标位置之后,对所述新增图像的实际目标文本标识进行文本特征提取,得到所述新增图像的目标文本特征;The second extraction unit is configured to, after the first stop condition is reached and the added image, the actual target text identifier of the added image, and the actual target position of the added image are acquired, the actual target position of the added image The target text mark carries out text feature extraction, obtains the target text feature of described newly added image;
第二预测单元,用于将历史样例图像和所述新增图像输入目标检测模型,得到所述目标检测模型输出的所述历史样例图像的图像特征、所述历史样例图像的预测目标位置、所述新增图像的图像特征和所述新增图像的预测目标位置;其中,所述历史样例图像是根据所述样本图像确定的;The second prediction unit is configured to input the historical sample image and the newly added image into the target detection model, and obtain the image features of the historical sample image and the predicted target of the historical sample image output by the target detection model position, the image feature of the added image, and the predicted target position of the added image; wherein, the historical sample image is determined according to the sample image;
第二更新单元,用于根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度、所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,更新所述目标检测模型,并返回所述第二预测单元执行所述将所述历史样例图像和所述新增图像输入目标检测模型,直至达到第二停止条件。The second updating unit is configured to: according to the predicted target position of the historical example image, the actual target position of the historical example image, the image feature of the historical example image and the target text feature of the historical example image The similarity between, the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image , updating the target detection model, and returning to the second prediction unit to execute the inputting the historical sample image and the newly added image into the target detection model until a second stop condition is reached.
在一种可能的实施方式中,所述历史样例图像的确定过程,包括:In a possible implementation manner, the process of determining the historical sample image includes:
根据所述样本图像,确定所述目标检测模型对应的训练已使用图像;According to the sample image, determine the training used image corresponding to the target detection model;
根据所述训练已使用图像的实际目标文本标识,确定至少一个历史目标类别;determining at least one historical object category based on actual object text identifications of said training used images;
根据所述训练已使用图像的实际目标文本标识,从所述目标检测模型对应的训练已使用图像中确定属于各个历史目标类别的训练已使用图像;According to the actual target text identification of the training used image, determine the training used image belonging to each historical target category from the training used image corresponding to the target detection model;
分别从所述属于各个历史目标类别的训练已使用图像中抽取所述各个历史目标类别对应的历史样例图像。The historical sample images corresponding to the respective historical object categories are respectively extracted from the training used images belonging to the various historical object categories.
在一种可能的实施方式中,所述第二更新单元,包括:In a possible implementation manner, the second updating unit includes:
第一确定子单元,用于根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、以及所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度,确定历史图像损失值;The first determination subunit is configured to: according to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the image features of the historical sample image and the target of the historical sample image The similarity between text features determines the historical image loss value;
第二确定子单元,用于根据所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,确定新增图像损失值;The second determining subunit is configured to use the predicted target position of the added image, the actual target position of the added image, and the relationship between the image feature of the added image and the target text feature of the added image The similarity to determine the new image loss value;
第三确定子单元,用于将所述历史图像损失值和所述新增图像损失值进行加权求和,得到所述目标检测模型的检测损失值;其中,所述历史图像损失值对应的加权权重高于所述新增图像损失值对应的加权权重;The third determining subunit is configured to perform weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model; wherein, the weight corresponding to the historical image loss value The weight is higher than the weighted weight corresponding to the added image loss value;
模型更新子单元,用于根据所述目标检测模型的检测损失值,更新所述目标检测模型。The model update subunit is configured to update the target detection model according to the detection loss value of the target detection model.
在一种可能的实施方式中,所述第一预测单元403,具体用于:In a possible implementation manner, the first prediction unit 403 is specifically configured to:
将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征、所述样本图像的预测目标文本标识和所述样本图像的预测目标位置;Inputting the sample image into the target detection model to obtain the image features of the sample image output by the target detection model, the predicted target text identifier of the sample image, and the predicted target position of the sample image;
所述第一更新单元404,具体用于:The first updating unit 404 is specifically used for:
根据所述样本图像的预测目标文本标识、所述样本图像的实际目标文本标识、所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并返回所述第一预测单元403执行所述将所述样本图像输入目标检测模型,直至达到第一停止条件。According to the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the The similarity between the target text features of the sample images is used to update the target detection model, and return to the first prediction unit 403 to execute the input of the sample images into the target detection model until the first stop condition is reached.
基于上述目标检测模型训练装置400的相关内容可知,对于目标检测模型训练装置400来说,先对样本图像的实际目标文本标识进行文本特征提取,得到该样本图像的目标文本特征;再利用该样本图像、该样本图像的目标文本特征和该样本图像的实际目标位置对目标检测模型进行训练,得到训练好的目标检测模型。其中,因样本图像的目标文本特征能够更准确地表示出该样本图像的实际目标文本标识,使得基于该样本图像的目标文本特征训练好的目标检测模型具有更好的目标检测功能,如此有利于提高目标检测性能。Based on the relevant content of the above-mentioned target detection model training device 400, it can be seen that for the target detection model training device 400, the text feature extraction is first performed on the actual target text identifier of the sample image to obtain the target text feature of the sample image; image, the target text features of the sample image and the actual target position of the sample image to train the target detection model to obtain a trained target detection model. Among them, because the target text feature of the sample image can more accurately represent the actual target text mark of the sample image, the target detection model trained based on the target text feature of the sample image has a better target detection function, which is beneficial to Improve object detection performance.
基于上述方法实施例提供的目标检测方法,本申请实施例还提供了一种目标检测装置,下面结合附图进行解释和说明。Based on the target detection method provided by the above method embodiment, the embodiment of the present application also provides a target detection device, which will be explained and described below with reference to the accompanying drawings.
装置实施例二Device embodiment two
装置实施例二提供的目标检测装置的技术详情,请参照上述方法实施例。For the technical details of the target detection device provided in the second embodiment of the device, please refer to the above method embodiment.
参见图5,该图为本申请实施例提供的一种目标检测装置的结构示意图。Referring to FIG. 5 , this figure is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
本申请实施例提供的目标检测装置500,包括:The target detection device 500 provided in the embodiment of the present application includes:
第二获取单元501,用于获取待检测图像;A second acquiring unit 501, configured to acquire an image to be detected;
目标检测单元502,用于将所述待检测图像输入预先训练的目标检测模型,得到所述目标检测模型输出的所述待检测图像的目标检测结果;其中,所述目标检测模型是利用本申请实施例提供的目标检测模型训练方法的任一实施方式进行训练的。The target detection unit 502 is configured to input the image to be detected into a pre-trained target detection model, and obtain the target detection result of the image to be detected output by the target detection model; wherein, the target detection model uses the Any implementation of the method for training the target detection model provided in the examples is used for training.
基于上述目标检测装置500的相关内容可知,对于目标检测装置500来说,在获取到待检测图像之后,可以利用已训练好的目标检测模型针对该待检测图像进行目标检测,得到并输出该待检测图像的目标检测结果,以使该待检测图像的目标检测结果能够准确地表示出该待检测图像中目标物体的相关信息(如,目标类别信息以及目标位置信息等)。其中,因已训练好的目标检测模型具有较好的目标检测性能,使得利用该目标检测模型确定的待检测图像的目标检测结果更准确,如此有利于提高目标检测准确性。Based on the relevant content of the target detection device 500 above, it can be seen that for the target detection device 500, after acquiring the image to be detected, it can use the trained target detection model to perform target detection on the image to be detected, and obtain and output the target detection model. Detect the target detection result of the image, so that the target detection result of the image to be detected can accurately represent the relevant information of the target object in the image to be detected (eg, target category information and target position information, etc.). Among them, since the trained target detection model has better target detection performance, the target detection result of the image to be detected determined by using the target detection model is more accurate, which is beneficial to improve the accuracy of target detection.
进一步地,本申请实施例还提供了一种设备,所述设备包括处理器以及存储器:Further, the embodiment of the present application also provides a device, the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。The processor is configured to execute any implementation of the target detection model training method provided in the embodiments of the present application according to the computer program, or execute any implementation of the target detection method provided in the embodiments of the present application.
进一步地,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。Further, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the target detection model training method provided in the embodiment of the present application. Any implementation manner, or execute any implementation manner of the target detection method provided in the embodiment of the present application.
进一步地,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的目标检测模型训练方法的任一实施方式,或者执行本申请实施例提供的目标检测方法的任一实施方式。Furthermore, the embodiment of the present application also provides a computer program product, which, when running on the terminal device, enables the terminal device to execute any implementation manner of the target detection model training method provided in the embodiment of the present application , or execute any implementation of the target detection method provided in the embodiment of the present application.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制。虽然本发明已以较佳实施例揭露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。The above descriptions are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it to be equivalent to equivalent changes Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention, which do not deviate from the technical solution of the present invention, still fall within the protection scope of the technical solution of the present invention.

Claims (12)

  1. 一种目标检测模型训练方法,其特征在于,所述方法包括:A method for training a target detection model, characterized in that the method comprises:
    获取样本图像、所述样本图像的实际目标文本标识和所述样本图像的实际目标位置;acquiring a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
    对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征;Carry out text feature extraction to the actual target text mark of described sample image, obtain the target text feature of described sample image;
    将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置;Inputting the sample image into a target detection model to obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
    根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并继续执行所述将所述样本图像输入目标检测模型的步骤,直至达到第一停止条件。updating the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image, and Continue to execute the step of inputting the sample image into the object detection model until the first stop condition is reached.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征,包括:The method according to claim 1, wherein said extracting the text features of the actual target text identifier of the sample image to obtain the target text features of the sample image comprises:
    将所述样本图像的实际目标文本标识输入预先训练的语言模型,得到所述语言模型输出的所述样本图像的目标文本特征。Inputting the actual target text identifier of the sample image into the pre-trained language model to obtain the target text features of the sample image output by the language model.
  3. 根据权利要求1所述的方法,其特征在于,在达到第一停止条件之后,所述方法还包括:The method according to claim 1, wherein after reaching the first stop condition, the method further comprises:
    在获取到新增图像、所述新增图像的实际目标文本标识和所述新增图像的实际目标位置之后,对所述新增图像的实际目标文本标识进行文本特征提取,得到所述新增图像的目标文本特征;所述新增图像的实际目标文本标识不同于所述样本图像的实际目标文本标识;After acquiring the added image, the actual target text identifier of the added image, and the actual target position of the added image, perform text feature extraction on the actual target text identifier of the added image to obtain the added The target text feature of the image; the actual target text identifier of the added image is different from the actual target text identifier of the sample image;
    将历史样例图像和所述新增图像输入目标检测模型,得到所述目标检测模型输出的所述历史样例图像的图像特征、所述历史样例图像的预测目标位置、所述新增图像的图像特征和所述新增图像的预测目标位置;其中,所述历史样例图像是根据所述样本图像确定的;Inputting the historical sample image and the newly added image into the target detection model, and obtaining the image features of the historical sample image output by the target detection model, the predicted target position of the historical sample image, and the newly added image image features and the predicted target position of the added image; wherein, the historical sample image is determined according to the sample image;
    根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度、所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,更新 所述目标检测模型,并继续执行所述将所述历史样例图像和所述新增图像输入目标检测模型的步骤,直至达到第二停止条件。According to the predicted target position of the historical sample image, the actual target position of the historical sample image, the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, the The predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image, and update the target detection model , and continue to execute the step of inputting the historical sample image and the newly added image into the target detection model until a second stop condition is reached.
  4. 根据权利要求3所述的方法,其特征在于,所述历史样例图像的确定过程,包括:The method according to claim 3, wherein the determination process of the historical sample image comprises:
    根据所述样本图像,确定所述目标检测模型对应的训练已使用图像;According to the sample image, determine the training used image corresponding to the target detection model;
    根据所述训练已使用图像的实际目标文本标识,确定至少一个历史目标类别;determining at least one historical object category based on actual object text identifications of said training used images;
    根据所述训练已使用图像的实际目标文本标识,从所述目标检测模型对应的训练已使用图像中确定属于各个历史目标类别的训练已使用图像;According to the actual target text identification of the training used image, determine the training used image belonging to each historical target category from the training used image corresponding to the target detection model;
    分别从所述属于各个历史目标类别的训练已使用图像中抽取所述各个历史目标类别对应的历史样例图像。The historical sample images corresponding to the respective historical object categories are respectively extracted from the training used images belonging to the various historical object categories.
  5. 根据权利要求3所述的方法,其特征在于,所述根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度、所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,更新所述目标检测模型,包括:The method according to claim 3, wherein the predicted target position based on the historical sample image, the actual target position of the historical sample image, the image features of the historical sample image and the The similarity between the target text features of the historical sample images, the predicted target position of the added image, the actual target position of the added image, and the relationship between the image features of the added image and the The similarity between target text features, update the target detection model, including:
    根据所述历史样例图像的预测目标位置、所述历史样例图像的实际目标位置、以及所述历史样例图像的图像特征与所述历史样例图像的目标文本特征之间的相似度,确定历史图像损失值;According to the predicted target position of the historical sample image, the actual target position of the historical sample image, and the similarity between the image feature of the historical sample image and the target text feature of the historical sample image, Determine historical image loss values;
    根据所述新增图像的预测目标位置、所述新增图像的实际目标位置、以及所述新增图像的图像特征与所述新增图像的目标文本特征之间的相似度,确定新增图像损失值;Determine the added image according to the predicted target position of the added image, the actual target position of the added image, and the similarity between the image feature of the added image and the target text feature of the added image loss value;
    将所述历史图像损失值和所述新增图像损失值进行加权求和,得到所述目标检测模型的检测损失值;其中,所述历史图像损失值对应的加权权重高于所述新增图像损失值对应的加权权重;performing weighted summation of the historical image loss value and the newly added image loss value to obtain the detection loss value of the target detection model; wherein, the weighted weight corresponding to the historical image loss value is higher than that of the newly added image The weighted weight corresponding to the loss value;
    根据所述目标检测模型的检测损失值,更新所述目标检测模型。The target detection model is updated according to the detection loss value of the target detection model.
  6. 根据权利要求1所述的方法,其特征在于,所述将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置,包括:The method according to claim 1, wherein the sample image is input into a target detection model to obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image, include:
    将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样 本图像的图像特征、所述样本图像的预测目标文本标识和所述样本图像的预测目标位置;The sample image is input into the target detection model to obtain the image features of the sample image output by the target detection model, the predicted target text mark of the sample image and the predicted target position of the sample image;
    所述根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,包括:updating the target detection model according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image ,include:
    根据所述样本图像的预测目标文本标识、所述样本图像的实际目标文本标识、所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型。According to the predicted target text identifier of the sample image, the actual target text identifier of the sample image, the predicted target position of the sample image, the actual target position of the sample image, and the image features of the sample image and the The similarity between the target text features of the sample images is used to update the target detection model.
  7. 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:
    获取待检测图像;Obtain the image to be detected;
    将所述待检测图像输入预先训练的目标检测模型,得到所述目标检测模型输出的所述待检测图像的目标检测结果;其中,所述目标检测模型是利用权利要求1-6任一项所述的目标检测模型训练方法进行训练的。Inputting the image to be detected into a pre-trained target detection model to obtain the target detection result of the image to be detected output by the target detection model; wherein, the target detection model is obtained by using any one of claims 1-6 The target detection model training method described above is used for training.
  8. 一种目标检测模型训练装置,其特征在于,所述装置包括:A target detection model training device, characterized in that the device comprises:
    第一获取单元,用于获取样本图像、所述样本图像的实际目标文本标识和所述样本图像的实际目标位置;A first acquiring unit, configured to acquire a sample image, an actual target text identifier of the sample image, and an actual target position of the sample image;
    第一提取单元,用于对所述样本图像的实际目标文本标识进行文本特征提取,得到所述样本图像的目标文本特征;The first extraction unit is used to extract the text features of the actual target text identifier of the sample image to obtain the target text features of the sample image;
    第一预测单元,用于将所述样本图像输入目标检测模型,得到所述目标检测模型输出的所述样本图像的图像特征和所述样本图像的预测目标位置;a first prediction unit, configured to input the sample image into a target detection model, and obtain the image features of the sample image output by the target detection model and the predicted target position of the sample image;
    第一更新单元,用于根据所述样本图像的预测目标位置、所述样本图像的实际目标位置、以及所述样本图像的图像特征与所述样本图像的目标文本特征之间的相似度,更新所述目标检测模型,并返回所述第一预测单元执行所述将所述样本图像输入目标检测模型,直至达到第一停止条件。A first update unit, configured to update the target position according to the predicted target position of the sample image, the actual target position of the sample image, and the similarity between the image feature of the sample image and the target text feature of the sample image. the target detection model, and return to the first prediction unit to execute the inputting the sample image into the target detection model until a first stop condition is reached.
  9. 一种目标检测装置,其特征在于,所述装置包括:A target detection device, characterized in that the device comprises:
    第二获取单元,用于获取待检测图像;a second acquiring unit, configured to acquire an image to be detected;
    目标检测单元,用于将所述待检测图像输入预先训练的目标检测模型,得到所述目标检测模型输出的所述待检测图像的目标检测结果;其中,所述目标检测模型是利用权利要求1-6任一项所述的目标检测模型训练方法进行训练 的。a target detection unit, configured to input the image to be detected into a pre-trained target detection model, and obtain a target detection result of the image to be detected output by the target detection model; wherein, the target detection model utilizes claim 1 The method for training the target detection model described in any one of -6 is trained.
  10. 一种设备,其特征在于,所述设备包括处理器以及存储器:A device, characterized in that the device includes a processor and a memory:
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器用于根据所述计算机程序执行权利要求1-6中任一项所述的目标检测模型训练方法,或者执行权利要求7所述的目标检测方法。The processor is configured to execute the target detection model training method according to any one of claims 1-6, or execute the target detection method according to claim 7 according to the computer program.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行权利要求1-6中任一项所述的目标检测模型训练方法,或者执行权利要求7所述的目标检测方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the target detection model training method according to any one of claims 1-6, Or execute the target detection method described in claim 7.
  12. 一种计算机程序产品,其特征在于,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行权利要求1-6中任一项所述的目标检测模型训练方法,或者执行权利要求7所述的目标检测方法。A computer program product, characterized in that, when the computer program product runs on a terminal device, the terminal device executes the target detection model training method described in any one of claims 1-6, or executes the The target detection method described in 7.
PCT/CN2022/089194 2021-06-28 2022-04-26 Target detection model training method and target detection method, and related device therefor WO2023273570A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110723057.4 2021-06-28
CN202110723057.4A CN113469176B (en) 2021-06-28 2021-06-28 Target detection model training method, target detection method and related equipment thereof

Publications (1)

Publication Number Publication Date
WO2023273570A1 true WO2023273570A1 (en) 2023-01-05

Family

ID=77873458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089194 WO2023273570A1 (en) 2021-06-28 2022-04-26 Target detection model training method and target detection method, and related device therefor

Country Status (2)

Country Link
CN (1) CN113469176B (en)
WO (1) WO2023273570A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113469176B (en) * 2021-06-28 2023-06-02 北京有竹居网络技术有限公司 Target detection model training method, target detection method and related equipment thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191453A (en) * 2018-09-14 2019-01-11 北京字节跳动网络技术有限公司 Method and apparatus for generating image category detection model
CN111860573A (en) * 2020-06-04 2020-10-30 北京迈格威科技有限公司 Model training method, image class detection method and device and electronic equipment
CN112560999A (en) * 2021-02-18 2021-03-26 成都睿沿科技有限公司 Target detection model training method and device, electronic equipment and storage medium
CN112926654A (en) * 2021-02-25 2021-06-08 平安银行股份有限公司 Pre-labeling model training and certificate pre-labeling method, device, equipment and medium
US20210192180A1 (en) * 2018-12-05 2021-06-24 Tencent Technology (Shenzhen) Company Limited Method for training object detection model and target object detection method
CN113469176A (en) * 2021-06-28 2021-10-01 北京有竹居网络技术有限公司 Target detection model training method, target detection method and related equipment thereof

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837856B (en) * 2019-10-31 2023-05-30 深圳市商汤科技有限公司 Neural network training and target detection method, device, equipment and storage medium
CN112861917B (en) * 2021-01-14 2021-12-28 西北工业大学 Weak supervision target detection method based on image attribute learning
CN113033660B (en) * 2021-03-24 2022-08-02 支付宝(杭州)信息技术有限公司 Universal language detection method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191453A (en) * 2018-09-14 2019-01-11 北京字节跳动网络技术有限公司 Method and apparatus for generating image category detection model
US20210192180A1 (en) * 2018-12-05 2021-06-24 Tencent Technology (Shenzhen) Company Limited Method for training object detection model and target object detection method
CN111860573A (en) * 2020-06-04 2020-10-30 北京迈格威科技有限公司 Model training method, image class detection method and device and electronic equipment
CN112560999A (en) * 2021-02-18 2021-03-26 成都睿沿科技有限公司 Target detection model training method and device, electronic equipment and storage medium
CN112926654A (en) * 2021-02-25 2021-06-08 平安银行股份有限公司 Pre-labeling model training and certificate pre-labeling method, device, equipment and medium
CN113469176A (en) * 2021-06-28 2021-10-01 北京有竹居网络技术有限公司 Target detection model training method, target detection method and related equipment thereof

Also Published As

Publication number Publication date
CN113469176A (en) 2021-10-01
CN113469176B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN109582793B (en) Model training method, customer service system, data labeling system and readable storage medium
TWI752455B (en) Image classification model training method, image processing method, data classification model training method, data processing method, computer device, and storage medium
CN109189767B (en) Data processing method and device, electronic equipment and storage medium
CN110046706B (en) Model generation method and device and server
CN111079847B (en) Remote sensing image automatic labeling method based on deep learning
WO2023115761A1 (en) Event detection method and apparatus based on temporal knowledge graph
CN109165309B (en) Negative example training sample acquisition method and device and model training method and device
WO2022048194A1 (en) Method, apparatus and device for optimizing event subject identification model, and readable storage medium
JP6892606B2 (en) Positioning device, position identification method and computer program
CN112149420A (en) Entity recognition model training method, threat information entity extraction method and device
CN111160959B (en) User click conversion prediction method and device
CN110458022B (en) Autonomous learning target detection method based on domain adaptation
CN110909784A (en) Training method and device of image recognition model and electronic equipment
WO2023273570A1 (en) Target detection model training method and target detection method, and related device therefor
WO2023273572A1 (en) Feature extraction model construction method and target detection method, and device therefor
WO2020135054A1 (en) Method, device and apparatus for video recommendation and storage medium
JP2019067299A (en) Label estimating apparatus and label estimating program
CN111539456A (en) Target identification method and device
CN108428234B (en) Interactive segmentation performance optimization method based on image segmentation result evaluation
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
CN111368792B (en) Feature point labeling model training method and device, electronic equipment and storage medium
CN115063858A (en) Video facial expression recognition model training method, device, equipment and storage medium
CN114021658A (en) Training method, application method and system of named entity recognition model
CN112990145B (en) Group-sparse-based age estimation method and electronic equipment
CN112069800A (en) Sentence tense recognition method and device based on dependency syntax and readable storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22831399

Country of ref document: EP

Kind code of ref document: A1