WO2020006961A1 - Image extraction method and device - Google Patents

Image extraction method and device Download PDF

Info

Publication number
WO2020006961A1
WO2020006961A1 PCT/CN2018/116334 CN2018116334W WO2020006961A1 WO 2020006961 A1 WO2020006961 A1 WO 2020006961A1 CN 2018116334 W CN2018116334 W CN 2018116334W WO 2020006961 A1 WO2020006961 A1 WO 2020006961A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
matched
feature vector
position information
training
Prior art date
Application number
PCT/CN2018/116334
Other languages
French (fr)
Chinese (zh)
Inventor
周恺卉
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020006961A1 publication Critical patent/WO2020006961A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for extracting an image.
  • image recognition models to identify images is a common method of image recognition technology.
  • An image recognition model is usually a model that is trained using a large number of training samples.
  • a target image such as a watermark image, a person image, an object image, etc.
  • the sample images are trained to obtain an image recognition model.
  • the embodiments of the present application provide a method and a device for extracting an image.
  • an embodiment of the present application provides a method for extracting an image.
  • the method includes: obtaining a reference object image and a set of images to be matched; and inputting the reference object image into a first subnetwork included in a pre-trained image recognition model.
  • the to-be-matched images in the to-be-matched image set perform the following extraction steps: input the to-be-matched image into a second sub-network included in the image recognition model, and obtain at least one position information The feature vector to be matched corresponding to the position information, wherein the feature vector to be matched is the feature vector of the region image included in the image to be matched, and the position information is used to characterize the position of the region image in the to-be-matched image; The distance between the matching feature vector and the reference feature vector; in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, the image to be matched is extracted as an image matching the reference object image.
  • the extracting step further includes: determining position information of an area image corresponding to a distance less than or equal to a distance threshold, and outputting the determined position information.
  • the extracting step further includes: generating a matched image including a position marker based on the output position information and the image to be matched, where the position marker is used to mark the image of the region to be matched corresponding to the output position information during matching. Position in the back image.
  • the second sub-network includes a dimension transformation layer for transforming the feature vector to a target dimension; and inputting the image to be matched into the second sub-network included in the image recognition model to obtain at least one feature vector to be matched,
  • the method includes: inputting the to-be-matched image into a second sub-network included in the image recognition model to obtain at least one to-be-matched feature vector having the same dimension as the reference feature vector.
  • the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training samples include sample object images, sample matching images, and labeled position information of the sample matching images.
  • the labeled position information indicates that the sample matching image includes Position of the regional image; select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain the first feature vector, and input the sample matching images
  • the second sub-network included in the initial model obtains at least one position information and a second feature vector corresponding to the position information; and from the obtained at least one position information, the position information representing the target region image in the sample matching image is determined as a target Position information, and determine the second feature vector corresponding to the target position information as the target second feature vector; based on the first loss value representing the error of the target position information and the second feature vector representing the distance between the target second feature vector and the first feature vector Two loss values, Whether training is complete initial model; in response to determining
  • determining whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector includes: The preset weight value uses the weighted summation result of the first loss value and the second loss value as the total loss value, and compares the total loss value with the target value, and determines whether the initial model is completed according to the comparison result.
  • the step of training to obtain an image recognition model further includes: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting training samples from unselected training samples in the training sample set, Use the adjusted initial model as the initial model and continue with the training steps.
  • an embodiment of the present application provides an apparatus for extracting an image.
  • the apparatus includes: an acquiring unit configured to acquire a reference object image and a set of images to be matched; and a generating unit configured to input the reference object image.
  • the first sub-network included in the pre-trained image recognition model obtains the feature vector of the reference object image as the reference feature vector;
  • the extraction unit is configured to perform the following extraction steps on the to-be-matched images in the to-be-matched image set:
  • the matching image is input to a second sub-network included in the image recognition model to obtain at least one position information and a feature vector corresponding to the position information, where the feature vector to be matched is a feature vector of an area image included in the image to be matched.
  • the matched image is an image that matches the reference target image.
  • the extraction unit includes: an output module configured to determine position information of a region image corresponding to a distance less than or equal to a distance threshold, and output the determined position information.
  • the extraction unit further includes a generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker is used to mark the output position information corresponding to The position of the image to be matched in the matched image.
  • the second sub-network includes a dimensional transformation layer for transforming the feature vector to a target dimension; and the extraction unit is further configured to: input the image to be matched into the second sub-network included in the image recognition model to obtain At least one feature vector to be matched having the same dimension as the reference feature vector.
  • the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training samples include sample object images, sample matching images, and labeled position information of the sample matching images.
  • the labeled position information indicates that the sample matching image includes Position of the regional image; select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain the first feature vector, and input the sample matching images
  • the second sub-network included in the initial model obtains at least one position information and a second feature vector corresponding to the position information; and from the obtained at least one position information, the position information representing the target region image in the sample matching image is determined as a target Position information, and determine the second feature vector corresponding to the target position information as the target second feature vector; based on the first loss value representing the error of the target position information and the second feature vector representing the distance between the target second feature vector and the first feature vector Two loss values, It is determined whether the initial model is trained; in response to
  • determining whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector includes: The preset weight value uses the weighted summation result of the first loss value and the second loss value as the total loss value, and compares the total loss value with the target value, and determines whether the initial model is completed according to the comparison result.
  • the step of training to obtain an image recognition model further includes: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting training samples from unselected training samples in the training sample set, Use the adjusted initial model as the initial model and continue with the training steps.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are read by one or more Each processor executes such that one or more processors implement the method as described in any implementation of the first aspect.
  • an embodiment of the present application provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
  • the method and device for extracting images obtained by the embodiments of the present application obtain a reference feature vector of a reference image and at least one feature vector to be matched of an image to be matched by using a pre-trained image recognition model, and then compare the reference feature vector and The distance of the feature vector to be matched is used to obtain an image matching the reference image, so that when the training sample required for training the image recognition model does not include the reference image, the image recognition model is used to extract the image that matches the reference image, improving It increases the flexibility of image recognition and enriches the means of image recognition.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for extracting an image according to the present application
  • FIG. 3 is a flowchart of an image recognition model obtained by training of a method for extracting an image according to the present application
  • FIG. 4 is a schematic diagram of an application scenario of a method for extracting an image according to the present application
  • FIG. 5 is a flowchart of still another embodiment of a method for extracting an image according to the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for extracting an image according to the present application.
  • FIG. 7 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 to which a method for extracting an image or an apparatus for extracting an image of an embodiment of the present application can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as image processing applications, shooting applications, social platform software, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • the server 105 may be a server that provides various services, such as a background server that provides support for various applications on the terminal devices 101, 102, and 103.
  • the background server may perform processing such as analysis on the acquired image, and output the processing result (for example, the extracted image matching the reference image).
  • the method for extracting an image provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103.
  • the device for extracting an image may be provided in the server 105 or in the terminal devices 101, 102, 103.
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster consisting of multiple servers or as a single server.
  • the server can be implemented as multiple software or software modules (such as software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
  • terminal devices, networks, and servers in FIG. 1 are merely exemplary. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • a flowchart 200 of one embodiment of a method for extracting an image according to the present application is shown.
  • the method for extracting an image includes the following steps:
  • Step 201 Obtain a reference object image and a set of images to be matched.
  • an execution subject of the method for extracting an image may obtain a reference object image and a set of images to be matched remotely or locally through a wired connection or a wireless connection.
  • the reference object image may be an image to be compared with other images, and the reference object image may be an image characterizing an object.
  • Objects can be various things, such as watermarks, signs, faces, objects, and so on.
  • the set of images to be matched may be a set of certain types of images (for example, images containing a trademark) stored in advance.
  • Step 202 Input a reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector.
  • the execution subject may input the reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as the reference feature vector.
  • the first sub-network is used to characterize the correspondence between the image and the feature vector of the image.
  • the image recognition model may be various neural network models created based on machine learning technology.
  • the neural network model may have a structure of various neural networks (for example, DenseBox, VGGNet, ResNet, SegNet, etc.).
  • the above reference feature vector may be a feature (e.g., shape, color, texture) extracted from a first sub-network included in a neural network model (e.g., a network composed of one or some convolutional layers included in the neural network model), which characterizes an image And other characteristics).
  • a feature e.g., shape, color, texture
  • a neural network model e.g., a network composed of one or some convolutional layers included in the neural network model
  • Step 203 For the to-be-matched images in the to-be-matched image set, perform the following extraction steps: input the to-be-matched images into a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the location information; Determine the distance between the obtained feature vector to be matched and the reference feature vector; and in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, extract the to-be-matched image as an image matching the reference object image.
  • the execution subject may perform the following extraction step on the to-be-matched image:
  • Step 2031 Input the image to be matched into the second sub-network included in the image recognition model, and obtain at least one location information and a feature vector to be matched corresponding to the location information.
  • the second sub-network is used to characterize the correspondence between the image and the position information of the image and the feature vector to be matched of the image.
  • the position information is used to characterize the position of the area image corresponding to the feature vector to be matched in the to-be-matched image.
  • the feature vector to be matched is a feature vector of an area image included in the image to be matched.
  • the second sub-network may determine each position information from the images to be matched according to the determined at least one position information. Characterize the region images, and determine the feature vector of each region image.
  • the area image may be an image characterizing an object (eg, a watermark, a logo, etc.).
  • the location information may include coordinate information and identification information. Among them, the coordinate information (such as the coordinates of the corner points of the regional image, the size of the regional image, etc.) is used to indicate the position of the regional image in the image to be matched, and the identification information (such as the serial number of the regional image and the category of the regional image) is used to identify the area image.
  • an image to be matched includes two watermarked images
  • the position information determined by the second subnet is "(1, x1, y1, w1, h1)" and (2, x2, y2, w2, h2)
  • 1, 2 are the serial numbers of the two watermarked images
  • (x1, y1), (x2, y2) are the coordinates of the upper left corner of the two watermarked images
  • w1 and w2 are the widths of the two watermarked images.
  • H1, h2 are the heights of the two watermark images, respectively.
  • the execution subject can extract feature vectors of the to-be-matched images, and extract feature vectors corresponding to the two position information as feature-to-be-matched feature vectors from the feature vectors of the to-be-matched images, respectively.
  • the second sub-network may be a neural network based on an existing target detection network (for example, SSD (Single Shot MultiBox Detector), R-CNN (Region-based Convolutional Neural Networks, Faster R-CNN, etc.).
  • SSD Single Shot MultiBox Detector
  • R-CNN Registered-based Convolutional Neural Networks
  • Faster R-CNN etc.
  • the foregoing second sub-network includes a dimension transformation layer for transforming a feature vector to a target dimension.
  • the dimension transformation layer may process the feature vector (for example, the values of some dimensions included in the feature vector are combined by taking an average value); it may also be a pooling layer included in the second sub-network.
  • the pooling layer can be used to down-sample or up-sample the input data to compress or increase the amount of data.
  • the above target dimensions may be various dimensions set by a technician, for example, the same dimensions as those of the reference feature vector.
  • the execution subject may input the image to be matched into a second sub-network included in the image recognition model, and the second sub-network extracts at least one feature vector of the image to be matched, and then extracts the dimensional transformation layer pair included in the second sub-network.
  • Each feature vector of the dimensional transform is performed to obtain at least one feature vector to be matched that has the same dimension as the reference feature vector.
  • a ROI Pooling (Region Of Interest, Pooling) layer can be used, so that each feature vector to be matched has the same dimension as the reference feature vector.
  • the ROI Pooling layer is a well-known technology that is widely studied and applied at present, and will not be repeated here.
  • Step 2032 Determine the distance between the obtained feature vector to be matched and the reference feature vector.
  • the execution body may determine a distance between each of the obtained at least one to-be-matched feature vector and a reference feature vector.
  • the distance may be any of the following: Euclidean distance, Mahalanobis distance, and the like.
  • the preset distance may be any value greater than or equal to 0.
  • the distance can represent the degree of similarity between two feature vectors, that is, the degree of similarity between two images. As an example, if the distance between the two feature vectors is larger, the images that are corresponding to the two feature vectors are less similar.
  • Step 2033 In response to determining that a distance smaller than or equal to a preset distance threshold exists in the determined distance, extract the image to be matched as an image matching the reference object image.
  • the distance threshold may be a value set by a technician according to experience, or may be a value calculated (for example, by calculating an average value) calculated by the execution subject based on historical data (for example, a recorded historical distance threshold). Specifically, if there is a distance smaller than the distance threshold in each of the determined distances, it indicates that an area image similar to the reference object image exists in the image to be matched, that is, the image to be matched matches the reference object image.
  • the image recognition model can be obtained by training in advance through the following steps:
  • Step 301 Obtain a training sample set.
  • the training samples include sample object images, sample matching images, and sample matching image labeling position information.
  • the labeling position information represents the position of the region image included in the sample matching image.
  • the sample object image may be an image characterizing an object (for example, a watermark, a logo, a human face, an object, etc.).
  • the number of labeled position information may be at least one, and each labeled information may correspond to a region image, and each region image includes a region image in which the characterized object and the sample object image are the same.
  • Step 302 Select training samples from the training sample set.
  • the selection method and the number of training samples are not limited in this application.
  • the training samples may be selected from the training sample set in a random selection or in the order of the number of training samples.
  • Step 303 input the sample object image included in the selected training sample into the first sub-network included in the initial model to obtain a first feature vector, and input the sample matching image into the second sub-network included in the initial model to obtain at least one position information and the The second feature vector corresponding to the position information.
  • the initial model may be various existing neural network models created based on machine learning technology.
  • the neural network model may have various existing neural network structures (for example, DenseBox, VGGNet, ResNet, SegNet, etc.).
  • Each of the above feature vectors may be a vector composed of data extracted from certain layers (such as a convolution layer) included in the neural network model.
  • the above-mentioned first sub-network and second sub-network are the same as the first sub-network and the second sub-network described in step 202 and step 203, respectively, and details are not described herein again.
  • Step 304 From the obtained at least one position information, determine position information representing the target region image in the sample matching image as the target position information, and determine a second feature vector corresponding to the target position information as the target second feature vector.
  • the above-mentioned target area image may be an area image in which the characterized object is the same as the object characterized by the sample object image.
  • the execution subject performing this step may use the position information specified by the technician as the target position information, the area image characterized by the target position information as the target area image, and the second feature vector of the target area image as the target second feature vector; or The execution subject executing this step may determine the similarity between the area image corresponding to each position information and the sample object image and determine the area image with the greatest similarity to the sample object image as the target area image according to the obtained position information. , Determining the position information of the target area image as the target position information, and determining the second feature vector of the target area image as the target second feature vector.
  • Step 305 Determine whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector.
  • the first loss value may represent a gap between the target position information and the labeled position information corresponding to the target area image.
  • the smaller the first loss value the smaller the difference between the target position information and the labeled position information corresponding to the target area image, that is, the closer the target position information is to the labeled position information.
  • the first loss value can be obtained according to any of the following loss functions: Softmax loss function, Smooth L1 (smooth L1 norm) loss function, and the like.
  • the second loss value may represent a distance between the target second feature vector and the first feature vector.
  • the larger the second loss value the greater the distance between the target second feature vector and the first feature vector, that is, the less similar the target area image and the sample object image.
  • the second loss value may be a distance between the target second feature vector and the first feature vector (eg, Euclidean distance, Mahalanobis distance, etc.).
  • the second loss value can be obtained by the Triplet loss function, where the Triplet error function is as follows:
  • L is the second loss value
  • is the sum sign
  • i is the serial number of each training sample selected this time
  • a represents the sample object image
  • p represents the positive sample image (that is, the target area image).
  • n represents a negative sample image (that is, an area image other than the target area image in the sample matching image; or a preset, characterized image where the object is different from the object characterized by the sample object image).
  • a feature vector representing a sample object image included in the training sample with the sequence number i A feature vector representing a positive sample image (such as an image of a target region) corresponding to the training sample with the number i, Characterize the feature vector of a negative sample image (for example, a region image other than the target region image in the sample matching image) corresponding to the training sample with the number i.
  • threshold represents a preset distance
  • Characterizing the first distance that is, the distance between the first feature vector and the feature vector of the positive sample image
  • Represent the first distance that is, the distance between the first feature vector and the feature vector of the negative sample image).
  • the "+" on the lower right side of the square brackets in the above formula means taking a positive value, that is, when the expression of the expression in the square brackets is positive, the positive value is taken, and when it is negative, 0 is taken.
  • the parameters of the initial model can be adjusted according to the back-propagation algorithm so that the L value is minimum or the L value converges, indicating that the training is complete.
  • the execution subject performing this step may obtain the total loss value based on the first loss value and the second loss value, compare the total loss value with the target value, and determine whether the initial model is completed according to the comparison result.
  • the target value may be a preset loss value threshold. When the difference between the total loss value and the target value is less than or equal to the loss value threshold, it is determined that the training is completed.
  • an executing subject executing this step may use a weighted summation result of the first loss value and the second loss value as a total loss value according to a preset weight value, and use the total loss value.
  • the loss value is compared with the target value, and whether the initial model is trained is determined based on the comparison result.
  • the above weight value can adjust the ratio of the first loss value and the second loss value to the total loss value, so that the image recognition model can achieve different functions in different application scenarios (for example, some scenes focus on extracting position information, and some scenes focus on Compare distances of feature vectors).
  • Step 306 In response to determining that the training is completed, determine the initial model as an image recognition model.
  • the execution subject of the image recognition model that is trained may respond to determining that the initial model is not trained, adjusting the parameters of the initial model, and unselected training samples from the training sample set.
  • a training sample is selected, and an initial model adjusted by parameters is used as an initial model, and the training step is continued.
  • the initial model is a convolutional neural network
  • the backpropagation algorithm can be used to adjust the weights in each convolutional layer in the initial model.
  • a training sample may be selected from the unselected training samples in the training sample set, and the initial model adjusted by parameters is used as the initial model, and steps 303 to 306 are continuously performed.
  • the execution subject of the image recognition model obtained through the training may be the same as or different from the execution subject of the method for extracting an image. If they are the same, the training subject who has obtained the image recognition model may store the structure information and parameter values of the parameters of the trained image recognition model locally after obtaining the image recognition model. If they are different, the execution subject trained in the image recognition model may send the structure information and parameter values of the trained image recognition model to the execution subject of the method for extracting the image after the image recognition model is trained.
  • FIG. 4 is a schematic diagram of an application scenario of a method for extracting an image according to this embodiment.
  • the server 401 first obtains a watermark image 402 (ie, a reference object image) uploaded by the terminal device 408, and obtains a set of images 403 to be matched locally.
  • the server 401 inputs the watermark image 402 into the first sub-network 4041 included in the pre-trained image recognition model 404, and obtains a feature vector of the watermark image 402 as a reference feature vector 405.
  • the server 401 selects one to-be-matched image 4031 from the to-be-matched image set 403, inputs the to-be-matched image 4031 into the second sub-network 4041 included in the image recognition model 404, and obtains the position information 4061, 4062, and 4063, and the corresponding position information.
  • the feature vectors to be matched 4071, 4072, and 4073 are the feature vectors of the watermark images 40311, 40312, and 40313 included in the image 4031 to be matched, respectively.
  • the server 401 determines that the distance between the feature vector to be matched 4071 and the reference feature vector 405 is less than or equal to a preset distance threshold, extracts the to-be-matched image 4031 as an image matching the reference object image, and sends the matched image to the terminal device 408.
  • the server 401 repeatedly selects the image to be matched from the set of images to be matched 403 and the watermarked image 402 to match, thereby extracting multiple images from the set of images to be matched 403 that match the watermarked image 402.
  • a pre-trained image recognition model is used to obtain a reference feature vector of a reference image and at least one feature vector to be matched of the image to be matched, and then compare the reference feature vector and the feature vector to be matched.
  • Distance to obtain an image matching the reference image thereby improving the pertinence of matching with the reference image, and realizing that when the training samples required for training the image recognition model do not include the reference image, the image recognition model is used to extract the The reference image matches the image, which improves the flexibility of image recognition and enriches the means of image recognition.
  • a flowchart 500 of still another embodiment of a method for extracting an image is shown.
  • the process 500 of the method for extracting an image includes the following steps:
  • Step 501 Obtain a reference object image and a set of images to be matched.
  • step 501 is substantially the same as step 501 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 502 Input a reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector.
  • step 502 is substantially the same as step 502 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 503 for the to-be-matched images in the to-be-matched image set, perform the following extraction step: input the to-be-matched images into a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information; Determining the distance between the obtained feature vector to be matched and the reference feature vector; in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, extracting the image to be matched as an image matching the reference object image; Position information of a region image corresponding to a distance equal to a distance threshold, and output the determined position information.
  • the execution subject may perform the following extraction step on the to-be-matched image:
  • Step 5031 Input the image to be matched into the second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information.
  • Step 5031 is basically the same as step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 5032 Determine the distance between the obtained feature vector to be matched and the reference feature vector. Step 5032 is basically the same as step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 5033 In response to determining that a distance smaller than or equal to a preset distance threshold exists in the determined distance, extract the image to be matched as an image matching the reference object image. Step 5033 is basically the same as step 2033 in the embodiment corresponding to FIG. 2, and details are not described herein again.
  • Step 5034 Determine the position information of the area image corresponding to the distance less than or equal to the distance threshold, and output the determined position information.
  • an execution subject of the method for extracting an image may be obtained from step 5031 based on the distance determined in step 5032 and equal to or less than a preset distance threshold.
  • position information corresponding to a distance less than or equal to the distance threshold is determined, and position information corresponding to a distance less than or equal to the distance threshold is output.
  • the execution subject may output position information in various ways.
  • the display body connected to the execution subject may display information such as identification information, coordinate information, and the like of a region image included in the position information.
  • the above-mentioned execution subject may generate a matched image including a position mark based on the output location information and the image to be matched after outputting the location information.
  • the position marker is used to mark the position of the region to be matched image corresponding to the output position information in the matched image.
  • the execution subject may draw a frame of a preset shape in the image to be matched according to the output position information, use the drawn frame as a position mark, and use the to-be-matched image including the position mark as a matched image.
  • the process 500 of the method for extracting an image in this embodiment highlights the steps of outputting position information. Therefore, the solution described in this embodiment can further determine the position of the target region image included in the image to be matched, and improve the specificity of image recognition.
  • this application provides an embodiment of an apparatus for extracting an image.
  • the apparatus embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 600 for extracting an image in this embodiment includes: an acquiring unit 601 configured to acquire a reference object image and a set of images to be matched; and a generating unit 602 configured to input a reference object image into a pre-training A first sub-network included in the image recognition model of FIG.
  • an extraction unit 603 is configured to perform the following extraction step on the to-be-matched images in the to-be-matched image set:
  • the image is input to a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information, where the feature vector to be matched is a feature vector of an area image included in the image to be matched, and the position information is used for Characterizing the position of the area image in the image to be matched; determining the distance between the obtained feature vector to be matched and the reference feature vector; and in response to determining that a distance less than or equal to a preset distance threshold exists in the determined distance, extracting the to be matched
  • the image is an image matching the reference target image.
  • the obtaining unit 601 may obtain the reference object image and the set of images to be matched from a remote or local source through a wired connection method or a wireless connection method.
  • the reference object image may be an image to be compared with other images, and the reference object image is an image representing an object.
  • Objects can be various things, such as watermarks, signs, faces, objects, and so on.
  • the set of images to be matched may be a set of certain types of images (for example, images containing a trademark) stored in advance.
  • the generating unit 602 may input the reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector.
  • the first sub-network is used to characterize the correspondence between the image and the feature vector of the image.
  • the image recognition model may be various neural network models created based on machine learning technology.
  • the neural network model may have a structure of various neural networks (for example, DenseBox, VGGNet, ResNet, SegNet, etc.).
  • the above reference feature vector may be a feature (e.g., shape, color, texture) extracted from a first sub-network included in a neural network model (e.g., a network composed of one or some convolutional layers included in the neural network model), which characterizes an image And other characteristics).
  • a feature e.g., shape, color, texture
  • a neural network model e.g., a network composed of one or some convolutional layers included in the neural network model
  • the extraction unit 603 may perform the following steps on the image to be matched:
  • the image to be matched is input to a second sub-network included in the image recognition model, and at least one position information and a feature vector corresponding to the position information are obtained.
  • the second sub-network is used to characterize the correspondence between the image and the position information of the image and the feature vector to be matched of the image.
  • the position information is used to characterize the position of the area image corresponding to the feature vector to be matched in the to-be-matched image.
  • the feature vector to be matched is a feature vector of an area image included in the image to be matched.
  • a distance between the obtained feature vector to be matched and the reference feature vector is determined.
  • the above-mentioned extraction unit 603 may determine a distance between each of the obtained at least one feature vector to be matched and the reference feature vector.
  • the distance may be any of the following: Euclidean distance, Mahalanobis distance, and the like.
  • the image to be matched is extracted as an image matching the reference object image.
  • the distance threshold may be a value set by a technician based on experience, or may be a value calculated (for example, by calculating an average value) calculated by the extraction unit 603 according to historical data (for example, a recorded historical distance threshold).
  • the extraction unit 603 may include an output module configured to determine position information of a region image corresponding to a distance less than or equal to a distance threshold, and output the determined position information.
  • the extraction unit 603 may further include: a generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker It is used to mark the position of the region to be matched image corresponding to the output position information in the matched image.
  • a generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker It is used to mark the position of the region to be matched image corresponding to the output position information in the matched image.
  • the second sub-network may include a dimension transformation layer for transforming a feature vector to a target dimension; and the extraction unit 603 may be further configured to: input the image to be matched into an image Identify the second sub-network included in the model, and obtain at least one feature vector to be matched that has the same dimension as the reference feature vector.
  • the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes labeled object information of the sample object image, the sample matched image, and the sample matched image, and the labeled position
  • the information characterizes the position of the region image included in the sample matching image; selects training samples from the training sample set, and executes the following training steps: inputting the sample object image included in the selected training sample into the first sub-network included in the initial model to obtain the first feature Vector, and input the sample matching image into the second sub-network included in the initial model to obtain at least one location information and a second feature vector corresponding to the location information; and determine the target characterizing the sample matching image from the obtained at least one location information
  • the position information of the area image is used as the target position information
  • the second feature vector corresponding to the target position information is determined as the target second feature vector; based on the first loss value representing the error of the target position information and the target feature second feature vector and the first feature Distance of vector Loss
  • the execution subject of the image recognition model trained to obtain the weighted summation result of the first loss value and the second loss value as the total loss value according to a preset weight value and The total loss value is compared with the target value, and whether the initial model is trained is determined based on the comparison result.
  • the step of training to obtain an image recognition model may further include: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting unselected data from the training sample set. Among the training samples, a training sample is selected, and the initial model adjusted by parameters is used as the initial model, and the training step is continued.
  • the apparatus obtains a reference feature vector of a reference image and at least one feature vector to be matched of an image to be matched by using a pre-trained image recognition model, and then compares the reference feature vector and the feature vector to be matched by Distance to obtain an image matching the reference image, thereby improving the pertinence of matching with the reference image, and realizing that when the training samples required for training the image recognition model do not include the reference image, the image recognition model is used to extract the
  • the reference image matches the image, which improves the flexibility of image recognition and enriches the means of image recognition.
  • FIG. 7 illustrates a schematic structural diagram of a computer system 700 suitable for implementing an electronic device (such as a server or a terminal device shown in FIG. 1) in the embodiment of the present application.
  • an electronic device such as a server or a terminal device shown in FIG. 1
  • the electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
  • the computer system 700 includes a central processing unit (CPU) 701, which can be based on a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random access memory (RAM) 703. Instead, perform various appropriate actions and processes.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 700 are also stored.
  • the CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704.
  • An input / output (I / O) interface 705 is also connected to the bus 704.
  • the following components are connected to the I / O interface 705: an input section 706 including a keyboard, a mouse, etc .; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc .; and a speaker; And a communication section 709 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • the driver 710 is also connected to the I / O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart.
  • the computer program may be downloaded and installed from a network through the communication section 709, and / or installed from a removable medium 711.
  • CPU central processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable medium or any combination of the foregoing.
  • the computer-readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium can also be any computer-readable medium other than a computer-readable medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of this application may be written in one or more programming languages, or a combination thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider Internet service provider
  • each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions.
  • the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present application may be implemented by software or hardware.
  • the described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a generation unit, and an extraction unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the obtaining unit may also be described as a “unit for obtaining a reference target image and an image set to be matched”.
  • the present application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device in.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: obtains a reference object image and a set of images to be matched; and inputs the reference object image into a pre-trained
  • the first sub-network included in the image recognition model obtains the feature vector of the reference object image as the reference feature vector; for the to-be-matched images in the to-be-matched image set, the following extraction step is performed: input the to-be-matched image into the first Two sub-networks to obtain at least one location information and a feature vector corresponding to the location information, wherein the feature vector to be matched is a feature vector of an area image included in the image to be matched, and the location information is used to characterize the area image in the to-be-matched image.

Abstract

Embodiments of the present application disclose an image extraction method and device. A specific embodiment of the method comprises: acquiring a reference object image and a set of images to be matched; and inputting the reference object image into a first sub network comprised in a pre-trained image recognition model, and acquiring a feature vector of the reference object image as a reference feature vector, wherein the following extraction steps are executed on images to be matched in the set of images to be matched: inputting the image to be matched into a second sub network comprised in the image recognition model, so as to acquire at least one piece of position information and a feature vector to be matched corresponding to the position information; determining the distance between the acquired feature vector to be matched and the reference feature vector; and in response to determining that the determined distances comprise a distance less than or equal to a preset distance threshold, extracting the image to be matched as an image matching the reference object image. The embodiments improve the flexibility of image recognition, and enrich the means of image recognition.

Description

用于提取图像的方法和装置Method and device for extracting images
本专利申请要求于2018年7月3日提交的、申请号为201810715195.6、申请人为北京字节跳动网络技术有限公司、发明名称为“用于提取图像的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。This patent application claims the priority of a Chinese patent application filed on July 3, 2018, with application number 201810715195.6, the applicant being Beijing BYTE Network Technology Co., Ltd., and the invention name "Method and Device for Extracting Images" , The entire application of which is incorporated herein by reference.
技术领域Technical field
本申请实施例涉及计算机技术领域,具体涉及用于提取图像的方法和装置。Embodiments of the present application relate to the field of computer technology, and in particular, to a method and an apparatus for extracting an image.
背景技术Background technique
目前,图像识别技术的应用领域越来越广泛,利用图像识别模型对图像进行识别,是图像识别技术的一种常用手段。图像识别模型通常是利用大量的训练样本进行训练得到的模型,为了使图像识别模型能够识别出某图像中的目标图像(例如水印图像、人物图像、物体图像等),通常需要利用包含目标图像的样本图像进行训练以得到图像识别模型。At present, the application fields of image recognition technology are becoming more and more extensive. Using image recognition models to identify images is a common method of image recognition technology. An image recognition model is usually a model that is trained using a large number of training samples. In order for the image recognition model to recognize a target image (such as a watermark image, a person image, an object image, etc.) in an image, it is usually necessary to use a The sample images are trained to obtain an image recognition model.
发明内容Summary of the invention
本申请实施例提出了用于提取图像的方法和装置。The embodiments of the present application provide a method and a device for extracting an image.
第一方面,本申请实施例提供了一种用于提取图像的方法,该方法包括:获取基准对象图像和待匹配图像集合;将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量;对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信 息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。In a first aspect, an embodiment of the present application provides a method for extracting an image. The method includes: obtaining a reference object image and a set of images to be matched; and inputting the reference object image into a first subnetwork included in a pre-trained image recognition model. To obtain the feature vector of the reference object image as the reference feature vector; for the to-be-matched images in the to-be-matched image set, perform the following extraction steps: input the to-be-matched image into a second sub-network included in the image recognition model, and obtain at least one position information The feature vector to be matched corresponding to the position information, wherein the feature vector to be matched is the feature vector of the region image included in the image to be matched, and the position information is used to characterize the position of the region image in the to-be-matched image; The distance between the matching feature vector and the reference feature vector; in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, the image to be matched is extracted as an image matching the reference object image.
在一些实施例中,提取步骤还包括:确定小于等于距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。In some embodiments, the extracting step further includes: determining position information of an area image corresponding to a distance less than or equal to a distance threshold, and outputting the determined position information.
在一些实施例中,提取步骤还包括:基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像,其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。In some embodiments, the extracting step further includes: generating a matched image including a position marker based on the output position information and the image to be matched, where the position marker is used to mark the image of the region to be matched corresponding to the output position information during matching. Position in the back image.
在一些实施例中,第二子网络包括用于将特征向量变换到目标维度的维度变换层;以及将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个待匹配特征向量,包括:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个与基准特征向量的维度相同的待匹配特征向量。In some embodiments, the second sub-network includes a dimension transformation layer for transforming the feature vector to a target dimension; and inputting the image to be matched into the second sub-network included in the image recognition model to obtain at least one feature vector to be matched, The method includes: inputting the to-be-matched image into a second sub-network included in the image recognition model to obtain at least one to-be-matched feature vector having the same dimension as the reference feature vector.
在一些实施例中,图像识别模型通过以下步骤训练得到:获取训练样本集,其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像的标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置;从训练样本集中选取训练样本,执行如下训练步骤:将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量;从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量;基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成;响应于确定训练完成,将初始模型确定为图像识别模型。In some embodiments, the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training samples include sample object images, sample matching images, and labeled position information of the sample matching images. The labeled position information indicates that the sample matching image includes Position of the regional image; select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain the first feature vector, and input the sample matching images The second sub-network included in the initial model obtains at least one position information and a second feature vector corresponding to the position information; and from the obtained at least one position information, the position information representing the target region image in the sample matching image is determined as a target Position information, and determine the second feature vector corresponding to the target position information as the target second feature vector; based on the first loss value representing the error of the target position information and the second feature vector representing the distance between the target second feature vector and the first feature vector Two loss values, Whether training is complete initial model; in response to determining that completion of the training, the initial model is determined as the image recognition model.
在一些实施例中,基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成,包括:根据预设的权重值,将第一损失 值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。In some embodiments, determining whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector, includes: The preset weight value uses the weighted summation result of the first loss value and the second loss value as the total loss value, and compares the total loss value with the target value, and determines whether the initial model is completed according to the comparison result.
在一些实施例中,训练得到图像识别模型的步骤还包括:响应于确定初始模型未训练完成,调整初始模型的参数,以及从训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行训练步骤。In some embodiments, the step of training to obtain an image recognition model further includes: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting training samples from unselected training samples in the training sample set, Use the adjusted initial model as the initial model and continue with the training steps.
第二方面,本申请实施例提供了一种用于提取图像的装置,该装置包括:获取单元,被配置成获取基准对象图像和待匹配图像集合;生成单元,被配置成将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量;提取单元,被配置成对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。In a second aspect, an embodiment of the present application provides an apparatus for extracting an image. The apparatus includes: an acquiring unit configured to acquire a reference object image and a set of images to be matched; and a generating unit configured to input the reference object image. The first sub-network included in the pre-trained image recognition model, obtains the feature vector of the reference object image as the reference feature vector; the extraction unit is configured to perform the following extraction steps on the to-be-matched images in the to-be-matched image set: The matching image is input to a second sub-network included in the image recognition model to obtain at least one position information and a feature vector corresponding to the position information, where the feature vector to be matched is a feature vector of an area image included in the image to be matched. Determining the distance of the region image in the to-be-matched image; determining the distance between the obtained to-be-matched feature vector and the reference feature vector; and in response to determining that a distance less than or equal to a preset distance threshold exists in the determined distance, extracting the to-be-matched feature vector The matched image is an image that matches the reference target image.
在一些实施例中,提取单元包括:输出模块,被配置成确定小于等于距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。In some embodiments, the extraction unit includes: an output module configured to determine position information of a region image corresponding to a distance less than or equal to a distance threshold, and output the determined position information.
在一些实施例中,提取单元还包括:生成模块,被配置成基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像,其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。In some embodiments, the extraction unit further includes a generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker is used to mark the output position information corresponding to The position of the image to be matched in the matched image.
在一些实施例中,第二子网络包括用于将特征向量变换到目标维度的维度变换层;以及提取单元进一步被配置成:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个与基准特征向量的维度相同的待匹配特征向量。In some embodiments, the second sub-network includes a dimensional transformation layer for transforming the feature vector to a target dimension; and the extraction unit is further configured to: input the image to be matched into the second sub-network included in the image recognition model to obtain At least one feature vector to be matched having the same dimension as the reference feature vector.
在一些实施例中,图像识别模型通过以下步骤训练得到:获取训 练样本集,其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像的标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置;从训练样本集中选取训练样本,执行如下训练步骤:将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量;从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量;基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成;响应于确定训练完成,将初始模型确定为图像识别模型。In some embodiments, the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training samples include sample object images, sample matching images, and labeled position information of the sample matching images. The labeled position information indicates that the sample matching image includes Position of the regional image; select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain the first feature vector, and input the sample matching images The second sub-network included in the initial model obtains at least one position information and a second feature vector corresponding to the position information; and from the obtained at least one position information, the position information representing the target region image in the sample matching image is determined as a target Position information, and determine the second feature vector corresponding to the target position information as the target second feature vector; based on the first loss value representing the error of the target position information and the second feature vector representing the distance between the target second feature vector and the first feature vector Two loss values, It is determined whether the initial model is trained; in response to determining that the training is complete, the initial model is determined as an image recognition model.
在一些实施例中,基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成,包括:根据预设的权重值,将第一损失值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。In some embodiments, determining whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector, includes: The preset weight value uses the weighted summation result of the first loss value and the second loss value as the total loss value, and compares the total loss value with the target value, and determines whether the initial model is completed according to the comparison result.
在一些实施例中,训练得到图像识别模型的步骤还包括:响应于确定初始模型未训练完成,调整初始模型的参数,以及从训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行训练步骤。In some embodiments, the step of training to obtain an image recognition model further includes: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting training samples from unselected training samples in the training sample set, Use the adjusted initial model as the initial model and continue with the training steps.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。According to a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes: one or more processors; a storage device on which one or more programs are stored; when one or more programs are read by one or more Each processor executes such that one or more processors implement the method as described in any implementation of the first aspect.
第四方面,本申请实施例提供了一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable medium having stored thereon a computer program that, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
本申请实施例提供的用于提取图像的方法和装置,通过利用预先训练的图像识别模型,得到基准图像的基准特征向量和待匹配图像的 至少一个待匹配特征向量,再通过比较基准特征向量和待匹配特征向量的距离,得到与基准图像匹配的图像,从而实现了在训练图像识别模型所需的训练样本不包括基准图像的情况下,利用图像识别模型提取出与基准图像匹配的图像,提高了图像识别的灵活性,丰富了图像识别的手段。The method and device for extracting images provided by the embodiments of the present application obtain a reference feature vector of a reference image and at least one feature vector to be matched of an image to be matched by using a pre-trained image recognition model, and then compare the reference feature vector and The distance of the feature vector to be matched is used to obtain an image matching the reference image, so that when the training sample required for training the image recognition model does not include the reference image, the image recognition model is used to extract the image that matches the reference image, improving It increases the flexibility of image recognition and enriches the means of image recognition.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:Other features, objects, and advantages of the present application will become more apparent by reading the detailed description of the non-limiting embodiments with reference to the following drawings:
图1是本申请的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present application can be applied; FIG.
图2是根据本申请的用于提取图像的方法的一个实施例的流程图;2 is a flowchart of an embodiment of a method for extracting an image according to the present application;
图3是根据本申请的用于提取图像的方法的训练得到图像识别模型的流程图;3 is a flowchart of an image recognition model obtained by training of a method for extracting an image according to the present application;
图4是根据本申请的用于提取图像的方法的一个应用场景的示意图;4 is a schematic diagram of an application scenario of a method for extracting an image according to the present application;
图5是根据本申请的用于提取图像的方法的又一个实施例的流程图;5 is a flowchart of still another embodiment of a method for extracting an image according to the present application;
图6是根据本申请的用于提取图像的装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of an apparatus for extracting an image according to the present application;
图7是适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。FIG. 7 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The following describes the present application in detail with reference to the accompanying drawings and embodiments. It can be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. It should also be noted that, for convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例 中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The application will be described in detail below with reference to the drawings and embodiments.
图1示出了可以应用本申请实施例的用于提取图像的方法或用于提取图像的装置的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which a method for extracting an image or an apparatus for extracting an image of an embodiment of the present application can be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. The network 104 is a medium for providing a communication link between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如图像处理类应用、拍摄类应用、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as image processing applications, shooting applications, social platform software, and the like.
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with a display screen, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上的各种应用提供支持的后台服务器。后台服务器可以对获取的图像进行分析等处理,并将处理结果(例如提取的与基准图像匹配的图像)输出。The server 105 may be a server that provides various services, such as a background server that provides support for various applications on the terminal devices 101, 102, and 103. The background server may perform processing such as analysis on the acquired image, and output the processing result (for example, the extracted image matching the reference image).
需要说明的是,本申请实施例所提供的用于提取图像的方法可以由服务器105执行,也可以由终端设备101、102、103执行。相应地,用于提取图像的装置可以设置于服务器105中,也可以设置于终端设备101、102、103中。It should be noted that the method for extracting an image provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103. Correspondingly, the device for extracting an image may be provided in the server 105 or in the terminal devices 101, 102, 103.
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模 块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster consisting of multiple servers or as a single server. When the server is software, it can be implemented as multiple software or software modules (such as software or software modules used to provide distributed services), or it can be implemented as a single software or software module. It is not specifically limited here.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely exemplary. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
继续参考图2,示出了根据本申请的用于提取图像的方法的一个实施例的流程200。该用于提取图像的方法,包括以下步骤:With continued reference to FIG. 2, a flowchart 200 of one embodiment of a method for extracting an image according to the present application is shown. The method for extracting an image includes the following steps:
步骤201,获取基准对象图像和待匹配图像集合。Step 201: Obtain a reference object image and a set of images to be matched.
在本实施例中,用于提取图像的方法的执行主体(例如图1所示的服务器或终端设备)可以通过有线连接方式或者无线连接方式从远程或者从本地获取基准对象图像和待匹配图像集合。其中,基准对象图像可以是待将其与其他图像进行对比的图像,基准对象图像可以是表征某对象的图像。对象可以是各种事物,例如水印、标志、人脸、物体等。待匹配图像集合可以是预先存储的某类图像(例如包含商标的图像)的集合。In this embodiment, an execution subject of the method for extracting an image (for example, a server or a terminal device shown in FIG. 1) may obtain a reference object image and a set of images to be matched remotely or locally through a wired connection or a wireless connection. . The reference object image may be an image to be compared with other images, and the reference object image may be an image characterizing an object. Objects can be various things, such as watermarks, signs, faces, objects, and so on. The set of images to be matched may be a set of certain types of images (for example, images containing a trademark) stored in advance.
步骤202,将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量。Step 202: Input a reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector.
在本实施例中,上述执行主体可以将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量。其中,第一子网络用于表征图像和图像的特征向量的对应关系。在本实施例中,图像识别模型可以是基于机器学习技术而创建的各种神经网络模型。该神经网络模型可以具有各种神经网络(例如DenseBox、VGGNet、ResNet、SegNet等)的结构。上述基准特征向量可以是由神经网络模型包括的第一子网络(例如由神经网络模型包括的某个或某些卷积层组成的网络)提取的、表征图像的特征(例如形状、颜色、纹理等特征)的数据组成的向量。In this embodiment, the execution subject may input the reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as the reference feature vector. The first sub-network is used to characterize the correspondence between the image and the feature vector of the image. In this embodiment, the image recognition model may be various neural network models created based on machine learning technology. The neural network model may have a structure of various neural networks (for example, DenseBox, VGGNet, ResNet, SegNet, etc.). The above reference feature vector may be a feature (e.g., shape, color, texture) extracted from a first sub-network included in a neural network model (e.g., a network composed of one or some convolutional layers included in the neural network model), which characterizes an image And other characteristics).
步骤203,对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存 在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。Step 203: For the to-be-matched images in the to-be-matched image set, perform the following extraction steps: input the to-be-matched images into a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the location information; Determine the distance between the obtained feature vector to be matched and the reference feature vector; and in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, extract the to-be-matched image as an image matching the reference object image.
在本实施例中,对于待匹配图像集合中的每个待匹配图像,上述执行主体可以对该待匹配图像执行如下提取步骤:In this embodiment, for each to-be-matched image in the to-be-matched image set, the execution subject may perform the following extraction step on the to-be-matched image:
步骤2031,将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量。其中,第二子网络用于表征图像与图像的位置信息、图像的待匹配特征向量的对应关系。位置信息用于表征待匹配特征向量对应的区域图像在该待匹配图像中的位置。待匹配特征向量是该待匹配图像包括的区域图像的特征向量。在本实施例中,第二子网络(例如由神经网络模型包括的某个或某些卷积层组成的网络)可以根据所确定的至少一个位置信息,从待匹配图像中确定每个位置信息表征的区域图像,以及确定每个区域图像的特征向量。区域图像可以是表征某个对象(例如水印、标志等)的图像。可选地,位置信息可以包括坐标信息和标识信息。其中,坐标信息(例如区域图像角点坐标、区域图像的大小等)用于指示区域图像在待匹配图像中的位置,标识信息(例如区域图像的序号、区域图像的类别等)用于标识区域图像。作为示例,假设某待匹配图像中包括两个水印图像,第二子网络确定的位置信息分别为“(1,x1,y1,w1,h1)”和(2,x2,y2,w2,h2),其中,1、2分别为两个水印图像的序号,(x1,y1)、(x2、y2)分别为两个水印图像的左上角点的坐标,w1、w2分别为两个水印图像的宽度,h1、h2分别为两个水印图像的高度。通过使用第二子网络,上述执行主体可以提取该待匹配图像的特征向量,以及从该待匹配图像的特征向量中,分别提取与上述两个位置信息对应的特征向量作为待匹配特征向量。Step 2031: Input the image to be matched into the second sub-network included in the image recognition model, and obtain at least one location information and a feature vector to be matched corresponding to the location information. The second sub-network is used to characterize the correspondence between the image and the position information of the image and the feature vector to be matched of the image. The position information is used to characterize the position of the area image corresponding to the feature vector to be matched in the to-be-matched image. The feature vector to be matched is a feature vector of an area image included in the image to be matched. In this embodiment, the second sub-network (for example, a network composed of one or some convolutional layers included in the neural network model) may determine each position information from the images to be matched according to the determined at least one position information. Characterize the region images, and determine the feature vector of each region image. The area image may be an image characterizing an object (eg, a watermark, a logo, etc.). Optionally, the location information may include coordinate information and identification information. Among them, the coordinate information (such as the coordinates of the corner points of the regional image, the size of the regional image, etc.) is used to indicate the position of the regional image in the image to be matched, and the identification information (such as the serial number of the regional image and the category of the regional image) is used to identify the area image. As an example, suppose that an image to be matched includes two watermarked images, and the position information determined by the second subnet is "(1, x1, y1, w1, h1)" and (2, x2, y2, w2, h2) , Where 1, 2 are the serial numbers of the two watermarked images, (x1, y1), (x2, y2) are the coordinates of the upper left corner of the two watermarked images, and w1 and w2 are the widths of the two watermarked images. , H1, h2 are the heights of the two watermark images, respectively. By using the second sub-network, the execution subject can extract feature vectors of the to-be-matched images, and extract feature vectors corresponding to the two position information as feature-to-be-matched feature vectors from the feature vectors of the to-be-matched images, respectively.
实践中,第二子网络可以是基于现有的目标检测网络(例如SSD(Single Shot MultiBox Detector)、R-CNN(Region-based Convolutional Neural Networks)、Faster R-CNN等)建立的神经网络。通过使用第二子网络,可以从待匹配图像中提取出待匹配区域图像的特征向量,从而提高图像之间进行匹配的针对性,有利于提高图像识别的效率和准确性。In practice, the second sub-network may be a neural network based on an existing target detection network (for example, SSD (Single Shot MultiBox Detector), R-CNN (Region-based Convolutional Neural Networks, Faster R-CNN, etc.). By using the second sub-network, the feature vector of the image of the region to be matched can be extracted from the image to be matched, thereby improving the pertinence of matching between the images, which is beneficial to the efficiency and accuracy of image recognition.
在本实施例的一些可选的实现方式中,上述第二子网络包括用于将特征向量变换到目标维度的维度变换层。其中,维度变换层可以对特征向量进行处理(例如对特征向量包括的某些维度的数值以取平均值的方式合并)的公式;也可以是第二子网络包括的池化(pooling)层。池化层可以用于对输入的数据进行下采样(Down Sample)或上采样(Up Sample),以压缩或增加数据量。上述目标维度可以是技术人员设置的各种维度,例如与基准特征向量的维度相同的维度。上述执行主体可以将该待匹配图像输入图像识别模型包括的第二子网络,由第二子网络提取出该待匹配图像的至少一个特征向量,再由第二子网络包括的维度变换层对提取的各个特征向量进行维度变换,得到至少一个与基准特征向量的维度相同的待匹配特征向量。实践中,可以采用ROI Pooling(Region Of Interest Pooling,感兴趣区域池化)层,使得每个待匹配特征向量与基准特征向量的维度相同。其中,ROI Pooling层是目前广泛研究和应用的公知技术,在此不再赘述。In some optional implementation manners of this embodiment, the foregoing second sub-network includes a dimension transformation layer for transforming a feature vector to a target dimension. The dimension transformation layer may process the feature vector (for example, the values of some dimensions included in the feature vector are combined by taking an average value); it may also be a pooling layer included in the second sub-network. The pooling layer can be used to down-sample or up-sample the input data to compress or increase the amount of data. The above target dimensions may be various dimensions set by a technician, for example, the same dimensions as those of the reference feature vector. The execution subject may input the image to be matched into a second sub-network included in the image recognition model, and the second sub-network extracts at least one feature vector of the image to be matched, and then extracts the dimensional transformation layer pair included in the second sub-network. Each feature vector of the dimensional transform is performed to obtain at least one feature vector to be matched that has the same dimension as the reference feature vector. In practice, a ROI Pooling (Region Of Interest, Pooling) layer can be used, so that each feature vector to be matched has the same dimension as the reference feature vector. Among them, the ROI Pooling layer is a well-known technology that is widely studied and applied at present, and will not be repeated here.
步骤2032,确定所得到的待匹配特征向量与基准特征向量的距离。具体地,上述执行主体可以确定所得到的至少一个待匹配特征向量中的每个待匹配特征向量与基准特征向量的距离。其中,距离可以是以下任意一种:欧式距离、马氏距离(Mahalanobis Distance)等。上述预设距离可以是大于等于0的任意数值。其中,距离可以表征两个特征向量之间的相似程度,也即可以表征两个图像之间的相似程度。作为示例,如果两个特征向量的距离越大,则这两个特征向量各自对应的图像之间越不相似。Step 2032: Determine the distance between the obtained feature vector to be matched and the reference feature vector. Specifically, the execution body may determine a distance between each of the obtained at least one to-be-matched feature vector and a reference feature vector. The distance may be any of the following: Euclidean distance, Mahalanobis distance, and the like. The preset distance may be any value greater than or equal to 0. Among them, the distance can represent the degree of similarity between two feature vectors, that is, the degree of similarity between two images. As an example, if the distance between the two feature vectors is larger, the images that are corresponding to the two feature vectors are less similar.
步骤2033,响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。其中,距离阈值可以是技术人员根据经验设置的数值,也可以是上述执行主体根据历史数据(例如记录的历史距离阈值)计算(例如计算平均值)出的数值。具体地,如果所确定的各个距离中存在小于距离阈值的距离,则表示该待匹配图像中存在与基准对象图像相似的区域图像,也即表示该待匹配图像与基准对象图像匹配。Step 2033: In response to determining that a distance smaller than or equal to a preset distance threshold exists in the determined distance, extract the image to be matched as an image matching the reference object image. The distance threshold may be a value set by a technician according to experience, or may be a value calculated (for example, by calculating an average value) calculated by the execution subject based on historical data (for example, a recorded historical distance threshold). Specifically, if there is a distance smaller than the distance threshold in each of the determined distances, it indicates that an area image similar to the reference object image exists in the image to be matched, that is, the image to be matched matches the reference object image.
通过执行本步骤,可以在训练图像识别模型所使用的训练样本中 不包括基准对象图像的情况下,从待匹配图像集合中提取出与基准对象图像匹配的图像,并且,将待匹配图像中包括的区域图像与基准对象进行对比,可以提高对图像进行匹配的针对性,进而提高图像识别的准确性。By performing this step, when a reference object image is not included in the training sample used for training the image recognition model, an image matching the reference object image is extracted from the set of images to be matched, and the image to be matched is included in Comparing the area image with the reference object can improve the pertinence of image matching, and then improve the accuracy of image recognition.
在本实施例的一些可选的实现方式中,如图3所示,图像识别模型可以预先通过以下步骤训练得到:In some optional implementations of this embodiment, as shown in FIG. 3, the image recognition model can be obtained by training in advance through the following steps:
步骤301,获取训练样本集。其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置。样本对象图像可以是表征某对象(例如水印、标志、人脸、物体等)的图像。标注位置信息的数量可以是至少一个,每个标注信息可以对应一个区域图像,各个区域图像中,包括表征的对象与样本对象图像表征的对象相同的区域图像。Step 301: Obtain a training sample set. The training samples include sample object images, sample matching images, and sample matching image labeling position information. The labeling position information represents the position of the region image included in the sample matching image. The sample object image may be an image characterizing an object (for example, a watermark, a logo, a human face, an object, etc.). The number of labeled position information may be at least one, and each labeled information may correspond to a region image, and each region image includes a region image in which the characterized object and the sample object image are the same.
步骤302,从训练样本集中选取训练样本。其中,训练样本的选取方式和选取数量在本申请中并不限制。例如,可以采用随机选取或按训练样本的编号顺序选取的方式,从训练样本集中选取训练样本。Step 302: Select training samples from the training sample set. The selection method and the number of training samples are not limited in this application. For example, the training samples may be selected from the training sample set in a random selection or in the order of the number of training samples.
步骤303,将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量。Step 303: input the sample object image included in the selected training sample into the first sub-network included in the initial model to obtain a first feature vector, and input the sample matching image into the second sub-network included in the initial model to obtain at least one position information and the The second feature vector corresponding to the position information.
其中,初始模型可以是基于机器学习技术而创建的现有的各种神经网络模型。该神经网络模型可以具有现有的各种神经网络的结构(例如DenseBox、VGGNet、ResNet、SegNet等)。上述各个特征向量可以是从神经网络模型包括的某些层(例如卷积层)中提取的数据组成的向量。上述第一子网络、第二子网络分别与步骤202和步骤203中描述的第一子网络、第二子网络相同,在此不再赘述。The initial model may be various existing neural network models created based on machine learning technology. The neural network model may have various existing neural network structures (for example, DenseBox, VGGNet, ResNet, SegNet, etc.). Each of the above feature vectors may be a vector composed of data extracted from certain layers (such as a convolution layer) included in the neural network model. The above-mentioned first sub-network and second sub-network are the same as the first sub-network and the second sub-network described in step 202 and step 203, respectively, and details are not described herein again.
步骤304,从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量。Step 304: From the obtained at least one position information, determine position information representing the target region image in the sample matching image as the target position information, and determine a second feature vector corresponding to the target position information as the target second feature vector.
具体地,上述目标区域图像可以是表征的对象与样本对象图像表征的对象相同的区域图像。执行本步骤的执行主体可以按照技术人员 指定的位置信息作为目标位置信息,以及将目标位置信息表征的区域图像作为目标区域图像,将目标区域图像的第二特征向量作为目标第二特征向量;或者,执行本步骤的执行主体可以根据所得到的位置信息,确定每个位置信息对应的区域图像与样本对象图像的相似度,以及将与样本对象图像的相似度最大的区域图像确定为目标区域图像,将目标区域图像的位置信息确定为目标位置信息,将目标区域图像的第二特征向量确定为目标第二特征向量。Specifically, the above-mentioned target area image may be an area image in which the characterized object is the same as the object characterized by the sample object image. The execution subject performing this step may use the position information specified by the technician as the target position information, the area image characterized by the target position information as the target area image, and the second feature vector of the target area image as the target second feature vector; or The execution subject executing this step may determine the similarity between the area image corresponding to each position information and the sample object image and determine the area image with the greatest similarity to the sample object image as the target area image according to the obtained position information. , Determining the position information of the target area image as the target position information, and determining the second feature vector of the target area image as the target second feature vector.
步骤305,基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成。Step 305: Determine whether the initial model training is completed based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector.
其中,第一损失值可以表征目标位置信息与目标区域图像对应的标注位置信息的差距。通常,第一损失值越小,目标位置信息与目标区域图像对应的标注位置信息的差距越小,也即目标位置信息越接近标注位置信息。实践中,第一损失值可以根据以下任意一种损失函数得到:Softmax损失函数、Smooth L1(平滑L1范数)损失函数等。The first loss value may represent a gap between the target position information and the labeled position information corresponding to the target area image. Generally, the smaller the first loss value, the smaller the difference between the target position information and the labeled position information corresponding to the target area image, that is, the closer the target position information is to the labeled position information. In practice, the first loss value can be obtained according to any of the following loss functions: Softmax loss function, Smooth L1 (smooth L1 norm) loss function, and the like.
第二损失值可以表征目标第二特征向量与第一特征向量的距离。通常第二损失值越大,目标第二特征向量与第一特征向量的距离越大,即目标区域图像与样本对象图像越不相似。作为示例,第二损失值可以是目标第二特征向量与第一特征向量的距离(例如欧式距离、马氏距离等)。The second loss value may represent a distance between the target second feature vector and the first feature vector. Generally, the larger the second loss value, the greater the distance between the target second feature vector and the first feature vector, that is, the less similar the target area image and the sample object image. As an example, the second loss value may be a distance between the target second feature vector and the first feature vector (eg, Euclidean distance, Mahalanobis distance, etc.).
作为另一示例,第二损失值可以由Triplet loss损失函数得到,其中,Triplet loss误差函数如下式所示:As another example, the second loss value can be obtained by the Triplet loss function, where the Triplet error function is as follows:
Figure PCTCN2018116334-appb-000001
Figure PCTCN2018116334-appb-000001
其中,L为第二损失值,Σ为求和符号,i为本次选取的各个训练样本的序号,a表征样本对象图像,p表征正样本图像(即目标区域图像)。n表征负样本图像(即样本匹配图像中的、除目标区域图像以外的其他区域图像;或者,预设的、表征的对象与样本对象图像表征的对象不同的图像)。
Figure PCTCN2018116334-appb-000002
表征序号为i的训练样本包括的样本对象图像的特征向量,
Figure PCTCN2018116334-appb-000003
表征序号为i的训练样本对应的正样本图像(例如目标区域图像)的特征向量,
Figure PCTCN2018116334-appb-000004
表征序号为i的训练样本对应的负样本图 像(例如样本匹配图像中的、除目标区域图像以外的其他区域图像)的特征向量。threshold表征预设距离,
Figure PCTCN2018116334-appb-000005
表征第一距离(即第一特征向量与正样本图像的特征向量的距离),
Figure PCTCN2018116334-appb-000006
表征第一距离(即第一特征向量与负样本图像的特征向量的距离)。上式中方括号右下侧的“+”表示取正值,即当方括号中的表达式的计算结果为正时,取该正值,当为负时,取0。实践中,在训练过程中,可以根据反向传播算法,调整初始模型的参数,使得L值最小或L值收敛,则表示训练完成。
Among them, L is the second loss value, Σ is the sum sign, i is the serial number of each training sample selected this time, a represents the sample object image, and p represents the positive sample image (that is, the target area image). n represents a negative sample image (that is, an area image other than the target area image in the sample matching image; or a preset, characterized image where the object is different from the object characterized by the sample object image).
Figure PCTCN2018116334-appb-000002
A feature vector representing a sample object image included in the training sample with the sequence number i,
Figure PCTCN2018116334-appb-000003
A feature vector representing a positive sample image (such as an image of a target region) corresponding to the training sample with the number i,
Figure PCTCN2018116334-appb-000004
Characterize the feature vector of a negative sample image (for example, a region image other than the target region image in the sample matching image) corresponding to the training sample with the number i. threshold represents a preset distance,
Figure PCTCN2018116334-appb-000005
Characterizing the first distance (that is, the distance between the first feature vector and the feature vector of the positive sample image),
Figure PCTCN2018116334-appb-000006
Represent the first distance (that is, the distance between the first feature vector and the feature vector of the negative sample image). The "+" on the lower right side of the square brackets in the above formula means taking a positive value, that is, when the expression of the expression in the square brackets is positive, the positive value is taken, and when it is negative, 0 is taken. In practice, during the training process, the parameters of the initial model can be adjusted according to the back-propagation algorithm so that the L value is minimum or the L value converges, indicating that the training is complete.
在本实施例中,执行本步骤的执行主体可以基于第一损失值和第二损失值,得到总损失值,将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。其中,目标值可以是预设的损失值阈值,当总损失值与目标值之差小于等于损失值阈值时,确定训练完成。In this embodiment, the execution subject performing this step may obtain the total loss value based on the first loss value and the second loss value, compare the total loss value with the target value, and determine whether the initial model is completed according to the comparison result. The target value may be a preset loss value threshold. When the difference between the total loss value and the target value is less than or equal to the loss value threshold, it is determined that the training is completed.
在本实施例的一些可选的实现方式中,执行本步骤的执行主体可以根据预设的权重值,将第一损失值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。上述权重值可以调整第一损失值和第二损失值占总损失值的比例,实现不同的应用场景下,使图像识别模型实现不同的功能(例如某些场景侧重提取位置信息,某些场景侧重比较特征向量的距离)。In some optional implementation manners of this embodiment, an executing subject executing this step may use a weighted summation result of the first loss value and the second loss value as a total loss value according to a preset weight value, and use the total loss value. The loss value is compared with the target value, and whether the initial model is trained is determined based on the comparison result. The above weight value can adjust the ratio of the first loss value and the second loss value to the total loss value, so that the image recognition model can achieve different functions in different application scenarios (for example, some scenes focus on extracting position information, and some scenes focus on Compare distances of feature vectors).
步骤306,响应于确定训练完成,将初始模型确定为图像识别模型。Step 306: In response to determining that the training is completed, determine the initial model as an image recognition model.
在本实施例的一些可选的实现方式中,训练得到图像识别模型的执行主体可以响应于确定初始模型未训练完成,调整初始模型的参数,以及从训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行训练步骤。例如,假设初始模型是卷积神经网络,可以采用反向传播算法调整初始模型中各卷积层中的权重。然后,可以从训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行步骤303-步骤306。In some optional implementation manners of this embodiment, the execution subject of the image recognition model that is trained may respond to determining that the initial model is not trained, adjusting the parameters of the initial model, and unselected training samples from the training sample set. In the process, a training sample is selected, and an initial model adjusted by parameters is used as an initial model, and the training step is continued. For example, assuming the initial model is a convolutional neural network, the backpropagation algorithm can be used to adjust the weights in each convolutional layer in the initial model. Then, a training sample may be selected from the unselected training samples in the training sample set, and the initial model adjusted by parameters is used as the initial model, and steps 303 to 306 are continuously performed.
需要说明的是,上述训练得到图像识别模型的执行主体可以与用于提取图像的方法的执行主体相同或者不同。如果相同,则训练得到图像识别模型的执行主体可以在训练得到图像识别模型后将训练好的图像识别模型的结构信息和参数的参数值存储在本地。如果不同,则训练得到图像识别模型的执行主体可以在训练得到图像识别模型后将训练好的图像识别模型的结构信息和参数的参数值发送给用于提取图像的方法的执行主体。It should be noted that the execution subject of the image recognition model obtained through the training may be the same as or different from the execution subject of the method for extracting an image. If they are the same, the training subject who has obtained the image recognition model may store the structure information and parameter values of the parameters of the trained image recognition model locally after obtaining the image recognition model. If they are different, the execution subject trained in the image recognition model may send the structure information and parameter values of the trained image recognition model to the execution subject of the method for extracting the image after the image recognition model is trained.
继续参见图4,图4是根据本实施例的用于提取图像的方法的应用场景的一个示意图。在图4的应用场景中,服务器401首先获取终端设备408上传的水印图像402(即基准对象图像),以及从本地获取待匹配图像集合403。服务器401将水印图像402输入预先训练的图像识别模型404包括的第一子网络4041,得到水印图像402的特征向量作为基准特征向量405。Continuing to refer to FIG. 4, FIG. 4 is a schematic diagram of an application scenario of a method for extracting an image according to this embodiment. In the application scenario of FIG. 4, the server 401 first obtains a watermark image 402 (ie, a reference object image) uploaded by the terminal device 408, and obtains a set of images 403 to be matched locally. The server 401 inputs the watermark image 402 into the first sub-network 4041 included in the pre-trained image recognition model 404, and obtains a feature vector of the watermark image 402 as a reference feature vector 405.
然后,服务器401从待匹配图像集合403中选择一个待匹配图像4031,将待匹配图像4031输入图像识别模型404包括的第二子网络4041,得到位置信息4061、4062、4063,以及位置信息对应的待匹配特征向量4071、4072、4073。其中,待匹配特征向量4071、4072、4073分别是待匹配图像4031包括的水印图像40311、40312、40313的特征向量。Then, the server 401 selects one to-be-matched image 4031 from the to-be-matched image set 403, inputs the to-be-matched image 4031 into the second sub-network 4041 included in the image recognition model 404, and obtains the position information 4061, 4062, and 4063, and the corresponding position information. The feature vectors to be matched 4071, 4072, and 4073. The feature vectors to be matched 4071, 4072, and 4073 are the feature vectors of the watermark images 40311, 40312, and 40313 included in the image 4031 to be matched, respectively.
最后,服务器401确定待匹配特征向量4071与基准特征向量405的距离小于等于预设的距离阈值,提取待匹配图像4031作为与基准对象图像匹配的图像并将匹配的图像发送至终端设备408。服务器401通过反复从待匹配图像集合403中选择待匹配图像与水印图像402进行匹配,从而从待匹配图像集合403中提取出多个与水印图像402匹配的图像。Finally, the server 401 determines that the distance between the feature vector to be matched 4071 and the reference feature vector 405 is less than or equal to a preset distance threshold, extracts the to-be-matched image 4031 as an image matching the reference object image, and sends the matched image to the terminal device 408. The server 401 repeatedly selects the image to be matched from the set of images to be matched 403 and the watermarked image 402 to match, thereby extracting multiple images from the set of images to be matched 403 that match the watermarked image 402.
本申请的上述实施例提供的方法,通过利用预先训练的图像识别模型,得到基准图像的基准特征向量和待匹配图像的至少一个待匹配特征向量,再通过比较基准特征向量和待匹配特征向量的距离,得到与基准图像匹配的图像,从而提高了与基准图像进行匹配的针对性,以及实现了在训练图像识别模型所需的训练样本不包括基准图像的情 况下,利用图像识别模型提取出与基准图像匹配的图像,提高了图像识别的灵活性,丰富了图像识别的手段。In the method provided by the foregoing embodiments of the present application, a pre-trained image recognition model is used to obtain a reference feature vector of a reference image and at least one feature vector to be matched of the image to be matched, and then compare the reference feature vector and the feature vector to be matched. Distance to obtain an image matching the reference image, thereby improving the pertinence of matching with the reference image, and realizing that when the training samples required for training the image recognition model do not include the reference image, the image recognition model is used to extract the The reference image matches the image, which improves the flexibility of image recognition and enriches the means of image recognition.
进一步参考图5,其示出了用于提取图像的方法的又一个实施例的流程500。该用于提取图像的方法的流程500,包括以下步骤:With further reference to FIG. 5, a flowchart 500 of still another embodiment of a method for extracting an image is shown. The process 500 of the method for extracting an image includes the following steps:
步骤501,获取基准对象图像和待匹配图像集合。Step 501: Obtain a reference object image and a set of images to be matched.
在本实施例中,步骤501与图2对应实施例中的步骤501基本一致,这里不再赘述。In this embodiment, step 501 is substantially the same as step 501 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤502,将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量。Step 502: Input a reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector.
在本实施例中,步骤502与图2对应实施例中的步骤502基本一致,这里不再赘述。In this embodiment, step 502 is substantially the same as step 502 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤503,对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像;确定小于等于距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。 Step 503, for the to-be-matched images in the to-be-matched image set, perform the following extraction step: input the to-be-matched images into a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information; Determining the distance between the obtained feature vector to be matched and the reference feature vector; in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, extracting the image to be matched as an image matching the reference object image; Position information of a region image corresponding to a distance equal to a distance threshold, and output the determined position information.
在本实施例中,对于待匹配图像集合中的每个待匹配图像,上述执行主体可以对该待匹配图像执行如下提取步骤:In this embodiment, for each to-be-matched image in the to-be-matched image set, the execution subject may perform the following extraction step on the to-be-matched image:
步骤5031,将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量。步骤5031与图2对应实施例中的步骤2031基本一致,这里不再赘述。Step 5031: Input the image to be matched into the second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information. Step 5031 is basically the same as step 2031 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤5032,确定所得到的待匹配特征向量与基准特征向量的距离。步骤5032与图2对应实施例中的步骤2032基本一致,这里不再赘述。Step 5032: Determine the distance between the obtained feature vector to be matched and the reference feature vector. Step 5032 is basically the same as step 2032 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤5033,响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。步骤5033与图2对应实施例中的步骤2033基本一致,这里不再赘述。Step 5033: In response to determining that a distance smaller than or equal to a preset distance threshold exists in the determined distance, extract the image to be matched as an image matching the reference object image. Step 5033 is basically the same as step 2033 in the embodiment corresponding to FIG. 2, and details are not described herein again.
步骤5034,确定小于等于距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。Step 5034: Determine the position information of the area image corresponding to the distance less than or equal to the distance threshold, and output the determined position information.
在本实施例中,用于提取图像的方法的执行主体(例如图1所示的服务器或终端设备)可以基于步骤5032中确定的小于等于预设的距离阈值的距离,从步骤5031中得到的至少一个位置信息中,确定小于等于距离阈值的距离对应的位置信息,以及将小于等于距离阈值的距离对应的位置信息输出。上述执行主体可以以各种方式将位置信息输出,例如,可以在与上述执行主体连接的显示器上显示位置信息包括的区域图像的标识信息、坐标信息等信息。In this embodiment, an execution subject of the method for extracting an image (such as a server or a terminal device shown in FIG. 1) may be obtained from step 5031 based on the distance determined in step 5032 and equal to or less than a preset distance threshold. Among the at least one position information, position information corresponding to a distance less than or equal to the distance threshold is determined, and position information corresponding to a distance less than or equal to the distance threshold is output. The execution subject may output position information in various ways. For example, the display body connected to the execution subject may display information such as identification information, coordinate information, and the like of a region image included in the position information.
在本实施例的一些可选的实现方式中,上述执行主体可以在输出位置信息之后,基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像。其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。具体地,上述执行主体可以在待匹配图像中,根据输出的位置信息绘制出预设形状的框,将所绘制的框作为位置标记,将包括位置标记的该待匹配图像作为匹配后图像。In some optional implementation manners of this embodiment, the above-mentioned execution subject may generate a matched image including a position mark based on the output location information and the image to be matched after outputting the location information. The position marker is used to mark the position of the region to be matched image corresponding to the output position information in the matched image. Specifically, the execution subject may draw a frame of a preset shape in the image to be matched according to the output position information, use the drawn frame as a position mark, and use the to-be-matched image including the position mark as a matched image.
从图5中可以看出,与图2对应的实施例相比,本实施例中的用于提取图像的方法的流程500突出了输出位置信息的步骤。由此,本实施例描述的方案可以进一步确定出待匹配图像包括的目标区域图像的位置,提高图像识别的针对性。As can be seen from FIG. 5, compared with the embodiment corresponding to FIG. 2, the process 500 of the method for extracting an image in this embodiment highlights the steps of outputting position information. Therefore, the solution described in this embodiment can further determine the position of the target region image included in the image to be matched, and improve the specificity of image recognition.
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种用于提取图像的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 6, as an implementation of the methods shown in the foregoing figures, this application provides an embodiment of an apparatus for extracting an image. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2. The device can be specifically applied to various electronic devices.
如图6所示,本实施例的用于提取图像的装置600包括:获取单元601,被配置成获取基准对象图像和待匹配图像集合;生成单元602,被配置成将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量;提取单元603,被配置成对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少 一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。As shown in FIG. 6, the apparatus 600 for extracting an image in this embodiment includes: an acquiring unit 601 configured to acquire a reference object image and a set of images to be matched; and a generating unit 602 configured to input a reference object image into a pre-training A first sub-network included in the image recognition model of FIG. 1 and obtains a feature vector of a reference object image as a reference feature vector; an extraction unit 603 is configured to perform the following extraction step on the to-be-matched images in the to-be-matched image set: The image is input to a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information, where the feature vector to be matched is a feature vector of an area image included in the image to be matched, and the position information is used for Characterizing the position of the area image in the image to be matched; determining the distance between the obtained feature vector to be matched and the reference feature vector; and in response to determining that a distance less than or equal to a preset distance threshold exists in the determined distance, extracting the to be matched The image is an image matching the reference target image.
在本实施例中,获取单元601可以通过有线连接方式或者无线连接方式从远程或者从本地获取基准对象图像和待匹配图像集合。其中,基准对象图像可以是待将其与其他图像进行对比的图像,基准对象图像是表征某对象的图像。对象可以是各种事物,例如水印、标志、人脸、物体等。待匹配图像集合可以是预先存储的某类图像(例如包含商标的图像)的集合。In this embodiment, the obtaining unit 601 may obtain the reference object image and the set of images to be matched from a remote or local source through a wired connection method or a wireless connection method. The reference object image may be an image to be compared with other images, and the reference object image is an image representing an object. Objects can be various things, such as watermarks, signs, faces, objects, and so on. The set of images to be matched may be a set of certain types of images (for example, images containing a trademark) stored in advance.
在本实施例中,生成单元602可以将所述基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到所述基准对象图像的特征向量作为基准特征向量。其中,第一子网络用于表征图像和图像的特征向量的对应关系。在本实施例中,图像识别模型可以是基于机器学习技术而创建的各种神经网络模型。该神经网络模型可以具有各种神经网络(例如DenseBox、VGGNet、ResNet、SegNet等)的结构。上述基准特征向量可以是由神经网络模型包括的第一子网络(例如由神经网络模型包括的某个或某些卷积层组成的网络)提取的、表征图像的特征(例如形状、颜色、纹理等特征)的数据组成的向量。In this embodiment, the generating unit 602 may input the reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector. The first sub-network is used to characterize the correspondence between the image and the feature vector of the image. In this embodiment, the image recognition model may be various neural network models created based on machine learning technology. The neural network model may have a structure of various neural networks (for example, DenseBox, VGGNet, ResNet, SegNet, etc.). The above reference feature vector may be a feature (e.g., shape, color, texture) extracted from a first sub-network included in a neural network model (e.g., a network composed of one or some convolutional layers included in the neural network model), which characterizes an image And other characteristics).
在本实施例中,提取单元603可以对该待匹配图像执行如下步骤:In this embodiment, the extraction unit 603 may perform the following steps on the image to be matched:
首先,将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量。其中,第二子网络用于表征图像与图像的位置信息、图像的待匹配特征向量的对应关系。位置信息用于表征待匹配特征向量对应的区域图像在该待匹配图像中的位置。待匹配特征向量是该待匹配图像包括的区域图像的特征向量。First, the image to be matched is input to a second sub-network included in the image recognition model, and at least one position information and a feature vector corresponding to the position information are obtained. The second sub-network is used to characterize the correspondence between the image and the position information of the image and the feature vector to be matched of the image. The position information is used to characterize the position of the area image corresponding to the feature vector to be matched in the to-be-matched image. The feature vector to be matched is a feature vector of an area image included in the image to be matched.
然后,确定所得到的待匹配特征向量与所述基准特征向量的距离。具体地,上述提取单元603可以确定所得到的至少一个待匹配特征向 量中的每个待匹配特征向量与基准特征向量的距离。其中,距离可以是以下任意一种:欧式距离、马氏距离(Mahalanobis Distance)等。Then, a distance between the obtained feature vector to be matched and the reference feature vector is determined. Specifically, the above-mentioned extraction unit 603 may determine a distance between each of the obtained at least one feature vector to be matched and the reference feature vector. The distance may be any of the following: Euclidean distance, Mahalanobis distance, and the like.
最后,响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与所述基准对象图像匹配的图像。其中,距离阈值可以是技术人员根据经验设置的数值,也可以是上述提取单元603根据历史数据(例如记录的历史距离阈值)计算(例如计算平均值)出的数值。Finally, in response to determining that a distance less than or equal to a preset distance threshold exists in the determined distance, the image to be matched is extracted as an image matching the reference object image. The distance threshold may be a value set by a technician based on experience, or may be a value calculated (for example, by calculating an average value) calculated by the extraction unit 603 according to historical data (for example, a recorded historical distance threshold).
在本实施例的一些可选的实现方式中,提取单元603可以包括:输出模块,被配置成确定小于等于距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。In some optional implementations of this embodiment, the extraction unit 603 may include an output module configured to determine position information of a region image corresponding to a distance less than or equal to a distance threshold, and output the determined position information.
在本实施例的一些可选的实现方式中,提取单元603还可以包括:生成模块,被配置成基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像,其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。In some optional implementations of this embodiment, the extraction unit 603 may further include: a generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker It is used to mark the position of the region to be matched image corresponding to the output position information in the matched image.
在本实施例的一些可选的实现方式中,第二子网络可以包括用于将特征向量变换到目标维度的维度变换层;以及提取单元603可以进一步被配置成:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个与基准特征向量的维度相同的待匹配特征向量。In some optional implementations of this embodiment, the second sub-network may include a dimension transformation layer for transforming a feature vector to a target dimension; and the extraction unit 603 may be further configured to: input the image to be matched into an image Identify the second sub-network included in the model, and obtain at least one feature vector to be matched that has the same dimension as the reference feature vector.
在本实施例的一些可选的实现方式中,图像识别模型通过以下步骤训练得到:获取训练样本集,其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像的标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置;从训练样本集中选取训练样本,执行如下训练步骤:将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量;从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量;基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成;响应于确 定训练完成,将初始模型确定为图像识别模型。In some optional implementations of this embodiment, the image recognition model is obtained by training in the following steps: obtaining a training sample set, where the training sample includes labeled object information of the sample object image, the sample matched image, and the sample matched image, and the labeled position The information characterizes the position of the region image included in the sample matching image; selects training samples from the training sample set, and executes the following training steps: inputting the sample object image included in the selected training sample into the first sub-network included in the initial model to obtain the first feature Vector, and input the sample matching image into the second sub-network included in the initial model to obtain at least one location information and a second feature vector corresponding to the location information; and determine the target characterizing the sample matching image from the obtained at least one location information The position information of the area image is used as the target position information, and the second feature vector corresponding to the target position information is determined as the target second feature vector; based on the first loss value representing the error of the target position information and the target feature second feature vector and the first feature Distance of vector Loss from the second value, it is determined whether the initial model training is completed; in response to determining the completion of training, the initial model is determined as the image recognition model.
在本实施例的一些可选的实现方式中,训练得到图像识别模型的执行主体可以根据预设的权重值,将第一损失值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。In some optional implementation manners of this embodiment, the execution subject of the image recognition model trained to obtain the weighted summation result of the first loss value and the second loss value as the total loss value according to a preset weight value, and The total loss value is compared with the target value, and whether the initial model is trained is determined based on the comparison result.
在本实施例的一些可选的实现方式中,训练得到图像识别模型的步骤还可以包括:响应于确定初始模型未训练完成,调整初始模型的参数,以及从训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行训练步骤。In some optional implementations of this embodiment, the step of training to obtain an image recognition model may further include: in response to determining that the initial model is not trained, adjusting parameters of the initial model, and selecting unselected data from the training sample set. Among the training samples, a training sample is selected, and the initial model adjusted by parameters is used as the initial model, and the training step is continued.
本申请的上述实施例提供的装置,通过利用预先训练的图像识别模型,得到基准图像的基准特征向量和待匹配图像的至少一个待匹配特征向量,再通过比较基准特征向量和待匹配特征向量的距离,得到与基准图像匹配的图像,从而提高了与基准图像进行匹配的针对性,以及实现了在训练图像识别模型所需的训练样本不包括基准图像的情况下,利用图像识别模型提取出与基准图像匹配的图像,提高了图像识别的灵活性,丰富了图像识别的手段。The apparatus provided by the foregoing embodiments of the present application obtains a reference feature vector of a reference image and at least one feature vector to be matched of an image to be matched by using a pre-trained image recognition model, and then compares the reference feature vector and the feature vector to be matched by Distance to obtain an image matching the reference image, thereby improving the pertinence of matching with the reference image, and realizing that when the training samples required for training the image recognition model do not include the reference image, the image recognition model is used to extract the The reference image matches the image, which improves the flexibility of image recognition and enriches the means of image recognition.
下面参考图7,其示出了适于用来实现本申请实施例的电子设备(例如图1所示的服务器或终端设备)的计算机系统700的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Reference is now made to FIG. 7, which illustrates a schematic structural diagram of a computer system 700 suitable for implementing an electronic device (such as a server or a terminal device shown in FIG. 1) in the embodiment of the present application. The electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
如图7所示,计算机系统700包括中央处理单元(CPU)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7, the computer system 700 includes a central processing unit (CPU) 701, which can be based on a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random access memory (RAM) 703. Instead, perform various appropriate actions and processes. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input / output (I / O) interface 705 is also connected to the bus 704.
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、 调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I / O interface 705: an input section 706 including a keyboard, a mouse, etc .; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc .; and a speaker; And a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the Internet. The driver 710 is also connected to the I / O interface 705 as needed. A removable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing a method shown in a flowchart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 709, and / or installed from a removable medium 711. When the computer program is executed by a central processing unit (CPU) 701, the above-mentioned functions defined in the method of the present application are executed. It should be noted that the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable medium or any combination of the foregoing. The computer-readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable Read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this application, a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium may include a data signal that is included in baseband or propagated as part of a carrier wave, and which carries computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium can also be any computer-readable medium other than a computer-readable medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of this application may be written in one or more programming languages, or a combination thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language—such as "C" or a similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider) Internet connection).
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, a program segment, or a part of code, which contains one or more functions to implement a specified logical function Executable instructions. It should also be noted that in some alternative implementations, the functions labeled in the blocks may also occur in a different order than those labeled in the drawings. For example, two blocks represented one after the other may actually be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented by a dedicated hardware-based system that performs the specified function or operation , Or it can be implemented with a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、生成单元、提取单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取基准对象图像和待匹配图像集合的单元”。The units described in the embodiments of the present application may be implemented by software or hardware. The described unit may also be provided in a processor, for example, it may be described as: a processor includes an acquisition unit, a generation unit, and an extraction unit. Wherein, the names of these units do not constitute a limitation on the unit itself in some cases, for example, the obtaining unit may also be described as a “unit for obtaining a reference target image and an image set to be matched”.
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一 个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取基准对象图像和待匹配图像集合;将基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到基准对象图像的特征向量作为基准特征向量;对于待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与基准对象图像匹配的图像。As another aspect, the present application also provides a computer-readable medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device in. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device: obtains a reference object image and a set of images to be matched; and inputs the reference object image into a pre-trained The first sub-network included in the image recognition model, obtains the feature vector of the reference object image as the reference feature vector; for the to-be-matched images in the to-be-matched image set, the following extraction step is performed: input the to-be-matched image into the first Two sub-networks to obtain at least one location information and a feature vector corresponding to the location information, wherein the feature vector to be matched is a feature vector of an area image included in the image to be matched, and the location information is used to characterize the area image in the to-be-matched image. In the position; determining the distance between the obtained feature vector to be matched and the reference feature vector; and in response to determining that there is a distance less than or equal to a preset distance threshold in the determined distance, extracting the to-be-matched image as a match with the reference object image image.
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present application and an explanation of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to the technical solution of the specific combination of the above technical features, but also covers the above technical features or Other technical solutions formed by arbitrarily combining their equivalent features. For example, a technical solution formed by replacing the above features with technical features disclosed in the present application (but not limited to) having similar functions.

Claims (16)

  1. 一种用于提取图像的方法,包括:A method for extracting an image includes:
    获取基准对象图像和待匹配图像集合;Obtaining a reference object image and a set of images to be matched;
    将所述基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到所述基准对象图像的特征向量作为基准特征向量;Inputting the reference object image into a first sub-network included in a pre-trained image recognition model, and obtaining a feature vector of the reference object image as a reference feature vector;
    对于所述待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与所述基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与所述基准对象图像匹配的图像。For the to-be-matched images in the to-be-matched image set, the following extraction step is performed: entering the to-be-matched images into a second sub-network included in the image recognition model to obtain at least one position information and a feature vector to be matched corresponding to the position information , Wherein the feature vector to be matched is a feature vector of an area image included in the image to be matched, and the position information is used to characterize the position of the area image in the to-be-matched image; determining the obtained feature vector to be matched and the reference feature vector In response to determining that a distance smaller than or equal to a preset distance threshold exists in the determined distance, extracting the image to be matched as an image matching the reference object image.
  2. 根据权利要求1所述的方法,其中,所述提取步骤还包括:The method according to claim 1, wherein the extracting step further comprises:
    确定小于等于所述距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。Determining position information of an area image corresponding to a distance less than or equal to the distance threshold, and outputting the determined position information.
  3. 根据权利要求2所述的方法,其中,所述提取步骤还包括:The method according to claim 2, wherein the extracting step further comprises:
    基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像,其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。Based on the output position information and the image to be matched, a matched image including a position marker is generated, where the position marker is used to mark the position of the region to be matched image corresponding to the output position information in the matched image.
  4. 根据权利要求1所述的方法,其中,所述第二子网络包括用于将特征向量变换到目标维度的维度变换层;以及The method according to claim 1, wherein the second sub-network includes a dimension transformation layer for transforming a feature vector to a target dimension; and
    所述将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个待匹配特征向量,包括:The step of inputting the image to be matched into a second sub-network included in the image recognition model to obtain at least one feature vector to be matched includes:
    将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个与基准特征向量的维度相同的待匹配特征向量。The image to be matched is input to a second sub-network included in the image recognition model, and at least one feature vector to be matched is obtained that has the same dimension as the reference feature vector.
  5. 根据权利要求1-4之一所述的方法,其中,所述图像识别模型通过以下步骤训练得到:The method according to any one of claims 1-4, wherein the image recognition model is trained by the following steps:
    获取训练样本集,其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像的标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置;Obtaining a training sample set, where the training sample includes labeled object information of the sample object image, the sample matched image, and the sample matched image, and the labeled position information represents a position of a region image included in the sample matched image;
    从所述训练样本集中选取训练样本,执行如下训练步骤:将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量;从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量;基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成;响应于确定训练完成,将初始模型确定为图像识别模型。Select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain a first feature vector, and input the sample matching images into the initial model. The second sub-network obtains at least one position information and a second feature vector corresponding to the position information; from the obtained at least one position information, determining position information representing the target region image in the sample matching image as target position information, and determining The second feature vector corresponding to the target position information is used as the target second feature vector; based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector, It is determined whether the initial model is trained; in response to determining that the training is complete, the initial model is determined as an image recognition model.
  6. 根据权利要求5所述的方法,其中,所述基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成,包括:The method according to claim 5, wherein the initial model is determined based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector. Whether the training is completed, including:
    根据预设的权重值,将第一损失值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。According to a preset weight value, a weighted summation result of the first loss value and the second loss value is used as a total loss value, and the total loss value is compared with a target value, and whether the initial model is trained is determined according to the comparison result.
  7. 根据权利要求5所述的方法,其中,训练得到所述图像识别模型的步骤还包括:The method according to claim 5, wherein the step of training to obtain the image recognition model further comprises:
    响应于确定初始模型未训练完成,调整初始模型的参数,以及从所述训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行所述训练步骤。In response to determining that the initial model is not trained, adjusting the parameters of the initial model, and selecting training samples from the unselected training samples in the training sample set, using the parameter-adjusted initial model as the initial model, and continuing to execute all Describe the training steps.
  8. 一种用于提取图像的装置,包括:A device for extracting an image includes:
    获取单元,被配置成获取基准对象图像和待匹配图像集合;An obtaining unit configured to obtain a reference object image and a set of images to be matched;
    生成单元,被配置成将所述基准对象图像输入预先训练的图像识别模型包括的第一子网络,得到所述基准对象图像的特征向量作为基准特征向量;A generating unit configured to input the reference object image into a first sub-network included in a pre-trained image recognition model, and obtain a feature vector of the reference object image as a reference feature vector;
    提取单元,被配置成对于所述待匹配图像集合中的待匹配图像,执行如下提取步骤:将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个位置信息和位置信息对应的待匹配特征向量,其中,待匹配特征向量是该待匹配图像包括的区域图像的特征向量,位置信息用于表征区域图像在该待匹配图像中的位置;确定所得到的待匹配特征向量与所述基准特征向量的距离;响应于确定所确定的距离中存在小于等于预设的距离阈值的距离,提取该待匹配图像作为与所述基准对象图像匹配的图像。The extraction unit is configured to perform the following extraction steps on the to-be-matched images in the to-be-matched image set: input the to-be-matched images into a second sub-network included in the image recognition model, and obtain at least one location information and location information The corresponding feature vector to be matched, wherein the feature vector to be matched is the feature vector of the area image included in the image to be matched, and the position information is used to characterize the position of the area image in the to-be-matched image; determining the obtained feature vector to be matched A distance from the reference feature vector; in response to determining that a distance less than or equal to a preset distance threshold exists in the determined distance, extracting the image to be matched as an image matching the reference object image.
  9. 根据权利要求8所述的装置,其中,所述提取单元包括:The apparatus according to claim 8, wherein the extraction unit comprises:
    输出模块,被配置成确定小于等于所述距离阈值的距离对应的区域图像的位置信息,以及将所确定的位置信息输出。The output module is configured to determine position information of a region image corresponding to a distance less than or equal to the distance threshold, and output the determined position information.
  10. 根据权利要求9所述的装置,其中,所述提取单元还包括:The apparatus according to claim 9, wherein the extraction unit further comprises:
    生成模块,被配置成基于输出的位置信息和该待匹配图像,生成包括位置标记的匹配后图像,其中,位置标记用于标记输出的位置信息对应的待匹配区域图像在匹配后图像中的位置。A generating module configured to generate a matched image including a position marker based on the output position information and the image to be matched, where the position marker is used to mark the position of the region-to-be-matched image corresponding to the output position information in the matched image .
  11. 根据权利要求8所述的装置,其中,所述第二子网络包括用于将特征向量变换到目标维度的维度变换层;以及The apparatus according to claim 8, wherein the second sub-network includes a dimension transformation layer for transforming a feature vector to a target dimension; and
    所述提取单元进一步被配置成:The extraction unit is further configured to:
    将该待匹配图像输入所述图像识别模型包括的第二子网络,得到至少一个与基准特征向量的维度相同的待匹配特征向量。The image to be matched is input to a second sub-network included in the image recognition model, and at least one feature vector to be matched is obtained that has the same dimension as the reference feature vector.
  12. 根据权利要求8-11之一所述的装置,其中,所述图像识别模 型通过以下步骤训练得到:The device according to any one of claims 8-11, wherein the image recognition model is obtained by training in the following steps:
    获取训练样本集,其中,训练样本包括样本对象图像、样本匹配图像、样本匹配图像的标注位置信息,标注位置信息表征样本匹配图像中包括的区域图像的位置;Obtaining a training sample set, where the training sample includes labeled object information of the sample object image, the sample matched image, and the sample matched image, and the labeled position information represents a position of a region image included in the sample matched image;
    从所述训练样本集中选取训练样本,执行如下训练步骤:将选取的训练样本包括的样本对象图像输入初始模型包括的第一子网络,得到第一特征向量,将样本匹配图像输入初始模型包括的第二子网络,得到至少一个位置信息和与位置信息对应的第二特征向量;从所得到的至少一个位置信息中,确定表征样本匹配图像中的目标区域图像的位置信息作为目标位置信息,确定目标位置信息对应的第二特征向量作为目标第二特征向量;基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成;响应于确定训练完成,将初始模型确定为图像识别模型。Select training samples from the training sample set, and perform the following training steps: input the sample object images included in the selected training samples into the first subnetwork included in the initial model, obtain a first feature vector, and input the sample matching images into the initial model. The second sub-network obtains at least one position information and a second feature vector corresponding to the position information; from the obtained at least one position information, determining position information representing the target region image in the sample matching image as target position information, and determining The second feature vector corresponding to the target position information is used as the target second feature vector; based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector, It is determined whether the initial model is trained; in response to determining that the training is complete, the initial model is determined as an image recognition model.
  13. 根据权利要求12所述的装置,其中,所述基于表征目标位置信息的误差的第一损失值和表征目标第二特征向量与第一特征向量的距离的差距的第二损失值,确定初始模型是否训练完成,包括:The apparatus according to claim 12, wherein the initial model is determined based on the first loss value representing the error of the target position information and the second loss value representing the difference between the distance between the target second feature vector and the first feature vector. Whether the training is completed, including:
    根据预设的权重值,将第一损失值与第二损失值的加权求和结果作为总损失值,以及将总损失值与目标值进行比较,根据比较结果确定初始模型是否训练完成。According to a preset weight value, a weighted summation result of the first loss value and the second loss value is used as a total loss value, and the total loss value is compared with a target value, and whether the initial model is trained is determined according to the comparison result.
  14. 根据权利要求12所述的装置,其中,训练得到所述图像识别模型的步骤还包括:The apparatus according to claim 12, wherein the step of training to obtain the image recognition model further comprises:
    响应于确定初始模型未训练完成,调整初始模型的参数,以及从所述训练样本集中的、未被选取的训练样本中,选取训练样本,使用参数调整后的初始模型作为初始模型,继续执行所述训练步骤。In response to determining that the initial model is not trained, adjusting the parameters of the initial model, and selecting training samples from the unselected training samples in the training sample set, using the parameter-adjusted initial model as the initial model, and continuing to execute all Describe the training steps.
  15. 一种电子设备,包括:An electronic device includes:
    一个或多个处理器;One or more processors;
    存储装置,其上存储有一个或多个程序,A storage device on which one or more programs are stored,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-7.
  16. 一种计算机可读介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-7中任一所述的方法。A computer-readable medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.
PCT/CN2018/116334 2018-07-03 2018-11-20 Image extraction method and device WO2020006961A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810715195.6 2018-07-03
CN201810715195.6A CN108898186B (en) 2018-07-03 2018-07-03 Method and device for extracting image

Publications (1)

Publication Number Publication Date
WO2020006961A1 true WO2020006961A1 (en) 2020-01-09

Family

ID=64347534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116334 WO2020006961A1 (en) 2018-07-03 2018-11-20 Image extraction method and device

Country Status (2)

Country Link
CN (1) CN108898186B (en)
WO (1) WO2020006961A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507289A (en) * 2020-04-22 2020-08-07 上海眼控科技股份有限公司 Video matching method, computer device and storage medium
CN111783872A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and device for training model, electronic equipment and computer readable storage medium
CN111984814A (en) * 2020-08-10 2020-11-24 广联达科技股份有限公司 Stirrup matching method and device in construction drawing
CN112183627A (en) * 2020-09-28 2021-01-05 中星技术股份有限公司 Method for generating predicted density map network and vehicle annual inspection mark number detection method
CN112488943A (en) * 2020-12-02 2021-03-12 北京字跳网络技术有限公司 Model training and image defogging method, device and equipment
CN112560958A (en) * 2020-12-17 2021-03-26 北京赢识科技有限公司 Person reception method and device based on portrait recognition and electronic equipment
CN112613386A (en) * 2020-12-18 2021-04-06 宁波大学科学技术学院 Brain wave-based monitoring method and device
CN112950563A (en) * 2021-02-22 2021-06-11 深圳中科飞测科技股份有限公司 Detection method and device, detection equipment and storage medium
CN113033557A (en) * 2021-04-16 2021-06-25 北京百度网讯科技有限公司 Method and device for training image processing model and detecting image
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113537309A (en) * 2021-06-30 2021-10-22 北京百度网讯科技有限公司 Object identification method and device and electronic equipment
CN113657406A (en) * 2021-07-13 2021-11-16 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
WO2023213233A1 (en) * 2022-05-06 2023-11-09 墨奇科技(北京)有限公司 Task processing method, neural network training method, apparatus, device, and medium
CN113095129B (en) * 2021-03-01 2024-04-26 北京迈格威科技有限公司 Gesture estimation model training method, gesture estimation device and electronic equipment

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109939439B (en) * 2019-03-01 2022-04-05 腾讯科技(深圳)有限公司 Virtual character blocking detection method, model training method, device and equipment
CN111723926B (en) * 2019-03-22 2023-09-12 北京地平线机器人技术研发有限公司 Training method and training device for neural network model for determining image parallax
CN110021052B (en) * 2019-04-11 2023-05-30 北京百度网讯科技有限公司 Method and apparatus for generating fundus image generation model
CN112036421A (en) * 2019-05-16 2020-12-04 搜狗(杭州)智能科技有限公司 Image processing method and device and electronic equipment
CN110660103B (en) * 2019-09-17 2020-12-25 北京三快在线科技有限公司 Unmanned vehicle positioning method and device
CN110969183B (en) * 2019-09-20 2023-11-21 北京方位捷讯科技有限公司 Method and system for determining damage degree of target object according to image data
CN110766081B (en) * 2019-10-24 2022-09-13 腾讯科技(深圳)有限公司 Interface image detection method, model training method and related device
CN110825904B (en) * 2019-10-24 2022-05-06 腾讯科技(深圳)有限公司 Image matching method and device, electronic equipment and storage medium
CN111353526A (en) * 2020-02-19 2020-06-30 上海小萌科技有限公司 Image matching method and device and related equipment
CN111597993B (en) * 2020-05-15 2023-09-05 北京百度网讯科技有限公司 Data processing method and device
CN111797790B (en) * 2020-07-10 2021-11-05 北京字节跳动网络技术有限公司 Image processing method and apparatus, storage medium, and electronic device
CN113590857A (en) * 2021-08-10 2021-11-02 北京有竹居网络技术有限公司 Key value matching method and device, readable medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140105509A1 (en) * 2012-10-15 2014-04-17 Canon Kabushiki Kaisha Systems and methods for comparing images
CN104376326A (en) * 2014-11-02 2015-02-25 吉林大学 Feature extraction method for image scene recognition
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
CN107679466A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Information output method and device
CN108038880A (en) * 2017-12-20 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for handling image
CN108154196A (en) * 2018-01-19 2018-06-12 百度在线网络技术(北京)有限公司 For exporting the method and apparatus of image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780612B (en) * 2016-12-29 2019-09-17 浙江大华技术股份有限公司 Object detecting method and device in a kind of image
CN106951484B (en) * 2017-03-10 2020-10-30 百度在线网络技术(北京)有限公司 Picture retrieval method and device, computer equipment and computer readable medium
CN107944395B (en) * 2017-11-27 2020-08-18 浙江大学 Method and system for verifying and authenticating integration based on neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140105509A1 (en) * 2012-10-15 2014-04-17 Canon Kabushiki Kaisha Systems and methods for comparing images
CN104376326A (en) * 2014-11-02 2015-02-25 吉林大学 Feature extraction method for image scene recognition
CN107239535A (en) * 2017-05-31 2017-10-10 北京小米移动软件有限公司 Similar pictures search method and device
CN107679466A (en) * 2017-09-21 2018-02-09 百度在线网络技术(北京)有限公司 Information output method and device
CN108038880A (en) * 2017-12-20 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for handling image
CN108154196A (en) * 2018-01-19 2018-06-12 百度在线网络技术(北京)有限公司 For exporting the method and apparatus of image

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507289A (en) * 2020-04-22 2020-08-07 上海眼控科技股份有限公司 Video matching method, computer device and storage medium
CN111783872A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Method and device for training model, electronic equipment and computer readable storage medium
CN111783872B (en) * 2020-06-30 2024-02-02 北京百度网讯科技有限公司 Method, device, electronic equipment and computer readable storage medium for training model
CN111984814A (en) * 2020-08-10 2020-11-24 广联达科技股份有限公司 Stirrup matching method and device in construction drawing
CN111984814B (en) * 2020-08-10 2024-04-12 广联达科技股份有限公司 Stirrup matching method and device in building drawing
CN112183627A (en) * 2020-09-28 2021-01-05 中星技术股份有限公司 Method for generating predicted density map network and vehicle annual inspection mark number detection method
CN112488943B (en) * 2020-12-02 2024-02-02 北京字跳网络技术有限公司 Model training and image defogging method, device and equipment
CN112488943A (en) * 2020-12-02 2021-03-12 北京字跳网络技术有限公司 Model training and image defogging method, device and equipment
CN112560958A (en) * 2020-12-17 2021-03-26 北京赢识科技有限公司 Person reception method and device based on portrait recognition and electronic equipment
CN112613386B (en) * 2020-12-18 2023-12-19 宁波大学科学技术学院 Brain wave-based monitoring method and device
CN112613386A (en) * 2020-12-18 2021-04-06 宁波大学科学技术学院 Brain wave-based monitoring method and device
CN112950563A (en) * 2021-02-22 2021-06-11 深圳中科飞测科技股份有限公司 Detection method and device, detection equipment and storage medium
CN113095129A (en) * 2021-03-01 2021-07-09 北京迈格威科技有限公司 Attitude estimation model training method, attitude estimation device and electronic equipment
CN113095129B (en) * 2021-03-01 2024-04-26 北京迈格威科技有限公司 Gesture estimation model training method, gesture estimation device and electronic equipment
CN113033557A (en) * 2021-04-16 2021-06-25 北京百度网讯科技有限公司 Method and device for training image processing model and detecting image
CN113537309A (en) * 2021-06-30 2021-10-22 北京百度网讯科技有限公司 Object identification method and device and electronic equipment
CN113537309B (en) * 2021-06-30 2023-07-28 北京百度网讯科技有限公司 Object identification method and device and electronic equipment
CN113657406A (en) * 2021-07-13 2021-11-16 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
CN113657406B (en) * 2021-07-13 2024-04-23 北京旷视科技有限公司 Model training and feature extraction method and device, electronic equipment and storage medium
WO2023213233A1 (en) * 2022-05-06 2023-11-09 墨奇科技(北京)有限公司 Task processing method, neural network training method, apparatus, device, and medium

Also Published As

Publication number Publication date
CN108898186B (en) 2020-03-06
CN108898186A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
WO2020006961A1 (en) Image extraction method and device
CN108509915B (en) Method and device for generating face recognition model
US10902245B2 (en) Method and apparatus for facial recognition
CN109214343B (en) Method and device for generating face key point detection model
US10853623B2 (en) Method and apparatus for generating information
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
CN108229296B (en) Face skin attribute identification method and device, electronic equipment and storage medium
CN108197618B (en) Method and device for generating human face detection model
US20190080148A1 (en) Method and apparatus for generating image
WO2020000879A1 (en) Image recognition method and apparatus
WO2021190115A1 (en) Method and apparatus for searching for target
CN107679466B (en) Information output method and device
CN109101919B (en) Method and apparatus for generating information
WO2019242222A1 (en) Method and device for use in generating information
WO2020024484A1 (en) Method and device for outputting data
CN109993150B (en) Method and device for identifying age
WO2020019591A1 (en) Method and device used for generating information
CN107507153B (en) Image denoising method and device
WO2020062493A1 (en) Image processing method and apparatus
CN111275784B (en) Method and device for generating image
CN108388889B (en) Method and device for analyzing face image
WO2021083069A1 (en) Method and device for training face swapping model
CN108229375B (en) Method and device for detecting face image
CN108509921B (en) Method and apparatus for generating information
CN110490959B (en) Three-dimensional image processing method and device, virtual image generating method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18925299

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19/04/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18925299

Country of ref document: EP

Kind code of ref document: A1