WO2020000961A1 - 图像标签识别方法、装置及服务器 - Google Patents

图像标签识别方法、装置及服务器 Download PDF

Info

Publication number
WO2020000961A1
WO2020000961A1 PCT/CN2018/123959 CN2018123959W WO2020000961A1 WO 2020000961 A1 WO2020000961 A1 WO 2020000961A1 CN 2018123959 W CN2018123959 W CN 2018123959W WO 2020000961 A1 WO2020000961 A1 WO 2020000961A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
label
sample image
sample
classification model
Prior art date
Application number
PCT/CN2018/123959
Other languages
English (en)
French (fr)
Inventor
张志伟
李岩
吴丽军
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2020000961A1 publication Critical patent/WO2020000961A1/zh
Priority to US17/137,282 priority Critical patent/US20210117726A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to a method, a device, and a server for identifying image tags.
  • Deep learning has been widely used in related fields such as video images, speech recognition, and natural language processing.
  • convolutional neural networks have greatly improved the accuracy of prediction results obtained in computer vision tasks such as object detection and classification due to their super-strong fitting capabilities and end-to-end global optimization capabilities.
  • Intermediate results when multimedia data such as video images are propagated layer by layer in a convolutional neural network are also stripped from the model as features describing the input data.
  • Disadvantage 1 The extracted features have a coarse granularity, that is, the features can have a distinguishing effect, but the discrimination effect is poor;
  • Disadvantage 2 The feature extraction method will select the most difficult sample in the same batch of samples as the loss to participate in the calculation, and this feature extraction The image classification model trained with the extracted features has a slow convergence rate. The two shortcomings mentioned above will eventually lead to the problems of low label recognition accuracy and difficulty in training.
  • the present disclosure provides an image tag recognition method, device, and server.
  • an image label recognition method which includes: constructing a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; and from the pre-labeled sample image A plurality of sample images are selected from the sample; the most similar sample image and the most difficult sample image of each sample image in the plurality of sample images are determined through the label routing map; wherein each sample image and its corresponding closest similarity The sample image and its corresponding most difficult sample image constitute an image pair; constructing a target loss function according to the image pair, and training according to the target loss function to obtain a second image classification model; and classifying by the second image Model for label recognition of the image to be identified.
  • an image label recognition device including: a construction module configured to construct a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; a selection module , Configured to select a plurality of sample images from the pre-labeled sample images; a determination module configured to determine a closest sample image of each sample image in the plurality of sample images through the label routing map And the most difficult sample image; wherein each sample image and its corresponding closest sample image and its corresponding most difficult sample image form an image pair; a training module is configured to construct a target loss function based on the image pair, and The target loss function is trained to obtain a second image classification model; a recognition module is configured to perform label recognition on the image to be recognized through the second image classification model.
  • an image tag recognition device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to be based on a pre-labeled sample image and A pre-trained first image classification model is used to construct a label routing map; a plurality of sample images are selected from the pre-labeled sample images; each label image of the plurality of sample images is determined through the label routing map The most similar sample image and the most difficult sample image; wherein each sample image and its corresponding closest similar sample image and its corresponding most difficult sample image form an image pair; constructing a target loss function based on the image pair, and Training the target loss function to obtain a second image classification model; and using the second image classification model to perform label recognition on the image to be identified.
  • a server including: a memory, a processor, and an image tag recognition program stored on the memory and executable on the processor, the image tag recognition program being When the processor executes, the steps of implementing any one of the image tag recognition methods described in the present disclosure are performed.
  • a computer-readable storage medium stores an image tag recognition program, and the image tag recognition program implements any one of the foregoing when executed by a processor. The steps of the image tag recognition method.
  • a computer program product including a computer program, the computer program including program instructions and stored on a computer-readable storage medium, the program instructions When executed by a processor, the steps of the image label recognition method according to any one of the foregoing.
  • the image label recognition scheme constructs a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; the label routing map is used to determine the most similar sample image and the most difficult of each sample image.
  • the sample images form image pairs, and the target loss function is constructed according to each image pair, and the second image classification model is trained according to the target loss function. This method of training the target classification model has fast model convergence, and the label classification is more detailed. High recognition accuracy.
  • Fig. 1 is a flowchart of steps in an image tag recognition method according to an exemplary embodiment
  • Fig. 2 is a flowchart of steps in an image tag recognition method according to an exemplary embodiment
  • Fig. 3 is a block diagram of an image tag recognition device according to an exemplary embodiment
  • Fig. 4 is a block diagram of an image tag recognition device according to an exemplary embodiment
  • Fig. 5 is a block diagram showing a server according to an exemplary embodiment.
  • Fig. 1 is a flowchart illustrating an image tag recognition method according to an exemplary embodiment.
  • the image tag recognition method shown in Fig. 1 is used in a terminal and includes the following steps:
  • Step 101 Construct a label routing map based on a pre-labeled sample image and a pre-trained first image classification model.
  • the first image classification model can be trained by referring to the existing manner, and the specific training manner of the first image classification model is not specifically limited in the embodiments of the present disclosure.
  • the label routing map contains multiple labels and the routing ratio of each label to other labels.
  • a label prediction can be performed on the pre-labeled sample images based on the first image classification model, according to the corresponding target label of each sample image; then the routing ratio between the labels is determined, and finally based on the routing ratio between the labels Draw a label routing diagram.
  • Step 102 Select a plurality of sample images from the pre-labeled sample images.
  • sample images can be set by those skilled in the art according to actual needs, and this embodiment is not specifically limited.
  • Step 103 Determine the closest sample image and the most difficult sample image of each sample image in the multiple sample images through the label routing map.
  • the sample image, the closest sample image of the sample image, and the most difficult sample image of the sample image constitute an image pair.
  • Step 104 Construct a target loss function according to the image pair, and train according to the target loss function to obtain a second image classification model.
  • an image pair loss average calculation function can be constructed, and the image pair loss average calculation function is weighted with a preset classification loss function. Sum is the constructed target loss function.
  • weights of the image pair loss average calculation function and the preset classification loss function can be set by those skilled in the art according to actual needs.
  • the training of the second image classification model is essentially a continuous update of model parameters, and image label prediction can be performed after the second image classification model converges to a preset standard. If the average loss value is less than the preset loss value, it can be determined that the first image classification model converges to a preset standard.
  • the preset loss value can be set by those skilled in the art according to actual needs. The smaller the preset loss value, the better the convergence of the second image classification model after training is; the larger the preset loss value, the second image classification The easier the training of the model.
  • Step 105 Perform label recognition on the image to be identified through the second image classification model.
  • the image to be identified may be a single frame image in a video, or may be only a multimedia image.
  • the image to be recognized is input into the second image classification model, and the label recognition result can be output after the model prediction.
  • the image label recognition method shown in this exemplary embodiment constructs a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; the label routing map is used to determine the most similar sample image and the most similar sample image of each sample image. Difficult sample images form image pairs, and the target loss function is constructed based on each image pair, and the second image classification model is trained according to the target loss function. This method of training the target classification model has fast model convergence and more detailed label classification. High label recognition accuracy.
  • Fig. 2 is a flowchart illustrating an image tag recognition method according to an exemplary embodiment.
  • the image tag recognition method shown in Fig. 2 is used in a terminal and includes the following steps.
  • Step 201 Construct a label routing map based on a pre-labeled sample image and a pre-trained first image classification model.
  • a pre-trained first image classification model is used to perform label prediction on each labeled sample image in advance to obtain a target label corresponding to each sample image.
  • Each sample image corresponds to a preset number of target tags; the preset number can be set by a person skilled in the art according to actual needs, for example, the preset number is 2, 3, or 4, etc.
  • a target label corresponding to a sample image may be determined in the following manner: using a pre-trained first image classification model to perform label prediction on each labeled sample image in advance to obtain a prediction vector for each sample image;
  • the prediction vector contains multiple points, each point corresponds to a label and a probability value; for each prediction vector, the probability values of each point in the prediction vector are sorted from large to small; the preset number of first
  • the label corresponding to the probability value is determined as the target label of the sample image corresponding to the prediction vector.
  • the pre-labeled sample images are grouped by tags.
  • the labels are preset labels, and each sample image is labeled with a label in advance.
  • Each label corresponds to a group, so a group corresponding to a label contains one or more pre-labeled sample images.
  • the number of the labels in the target label is determined; for each group, the quotient of the number and the number of sample images in the group is determined as the routing ratio of the labels to the labels corresponding to the group.
  • each pre-labeled sample image corresponds to a preset number of target labels, and the same label may exist in the target label corresponding to each sample image. Therefore, for each label, the label in the target label There may be more than one, so the first number of the tag in the target tag can be counted.
  • Each group contains at least one sample image, so the second number of sample images included in the group can be counted, and the quotient of the first number and the second number is finally calculated, and the quotient is determined as the label corresponding to the group The routing ratio of the label.
  • rj is the routing ratio
  • n is the number of labels
  • i is the sample image identifier
  • j is the label identifier
  • Step 202 Select a plurality of sample images from the pre-labeled sample images.
  • sample images can be set by those skilled in the art according to actual needs, and this embodiment is not specifically limited.
  • Step 203 For each of the plurality of sample images, determine a first label to which the sample image belongs.
  • Each sample image belongs to a group, and each group corresponds to a label, and the label corresponding to the group to which the sample image belongs is the first label to which the sample image belongs.
  • Step 204 Determine a second label with the smallest routing ratio to the first label, and randomly extract a sample image from the corresponding group of the second label as the closest sample image of the sample image.
  • a sample image is randomly extracted from the 10 sample images, and can be used as the closest sample image of the sample image.
  • Step 205 Determine a third label with the largest routing ratio to the first label, and randomly extract a sample image from the corresponding group of the third label as the most difficult sample image of the sample image.
  • Steps 203 to 205 are to determine the most difficult sample image and the closest sample image of a sample image, and combine the three into an image pair.
  • the above process may be repeatedly performed to determine an image pair corresponding to each sample image.
  • Step 206 Construct a target loss function according to the image pair, and train according to the target loss function to obtain a second image classification model.
  • an image pair loss average function By calculating the label routing ratio between the sample images, the closest sample images, and the most difficult sample images in each image pair, an image pair loss average function can be constructed. Among them, the calculation function of the average image loss is:
  • tripletloss dis (x a , x p ) -dis (x a , x n ) + ⁇
  • dis () is a distance measurement function, that is, a routing ratio measurement function between tags, xa, xp, xn are sample images, closest sample images, and most difficult sample images, respectively, and ⁇ is a minimum distance.
  • the weighted sum of the image loss average calculation function and the preset classification loss function is the constructed target loss function.
  • the target loss function can be expressed by the following formula:
  • Loss represents the target loss function
  • tripletloss represents the image average loss calculation function
  • ⁇ clf loss clf loss clf is a preset classification loss function
  • ⁇ triplet is the weight of tripletloss
  • ⁇ clf is the weight of loss clf .
  • Step 207 Use the second image classification model to perform label recognition on the image to be identified.
  • the image to be identified may be a single frame image in a video, or may be only a multimedia image.
  • the image to be recognized is input into the second image classification model, and the label recognition result can be output after the model prediction.
  • the image label recognition method shown in this exemplary embodiment constructs a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; by using the label routing map, the closest sample image of each sample image and The most difficult sample image constitutes an image pair, and the target loss function is constructed based on the image pair.
  • the second image classification model is trained based on the target loss function. This method of training the target classification model has fast model convergence and more detailed label classification. High label recognition accuracy.
  • Fig. 3 is a block diagram of an image tag recognition device according to an exemplary embodiment.
  • the device includes a construction module 301, a selection module 302, a determination module 303, a training module 304, and a recognition module 305.
  • a construction module 301 is configured to construct a label routing map based on a pre-labeled sample image and a pre-trained first image classification model; a selection module 302 is configured to select a plurality of sample images from the pre-labeled sample images; determine Module 303 is configured to determine the closest sample image and the most difficult sample image of each sample image in the multiple sample images through the label routing map; among them, the sample image, the closest sample image of the sample image, and the most difficult sample image
  • the sample images form an image pair;
  • the training module 304 is configured to construct a target loss function according to the image pair, and is trained according to the target loss function to obtain a second image classification model;
  • the recognition module 305 is configured to pass the second image classification model, Label recognition of the image to be identified.
  • the building module 301 may include a label prediction sub-module 3011 configured to perform label prediction on a pre-labeled sample image through a pre-trained first image classification model to obtain a target corresponding to each sample image.
  • Labels wherein each sample image corresponds to a preset number of target labels;
  • a grouping sub-module 3012 is configured to group pre-labeled sample images according to labels; wherein each label corresponds to a group;
  • determining a sub-module 3013 is Configured to determine the number of tags in the target tag for each tag;
  • the routing ratio determination submodule 3014 is configured to determine, for each group, the quotient of the number of each tag and the number of sample images in the group to determine The routing ratio between the label and the label corresponding to the packet;
  • the drawing submodule 3015 is configured to draw a label routing map according to the routing ratio between the labels.
  • the label prediction sub-module may include a vector prediction unit configured to perform label prediction on each sample image labeled in advance by using a pre-trained first image classification model to obtain a prediction vector for each sample image;
  • the prediction vector includes multiple points, and each point corresponds to a label and a probability value.
  • the sorting unit is configured to sort the probability value of each point in the prediction vector from large to small for each prediction vector.
  • the label determining unit is configured to determine, as a target label of the sample image corresponding to the prediction vector, a label corresponding to a preset number of probability values that are ranked first.
  • the determination module 303 may include: a label determination submodule 3031 configured to determine, for each sample image in the batch of sample images, a first label to which the sample image belongs; a first extraction submodule 3032, configured In order to determine the second label with the smallest routing ratio to the first label, a sample image is randomly extracted from the corresponding packet of the second label as the closest sample image of the sample image; the second extraction submodule 3033 is configured to determine the The third label with the largest routing ratio of one label randomly extracts a sample image from the corresponding group of the third label as the most difficult sample image of the sample image.
  • the weighted sum of the image pair loss average calculation function and the preset classification loss function is the target loss function.
  • Fig. 5 is a block diagram of an image tag identification terminal 600 according to an exemplary embodiment.
  • the device 600 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • the device 600 may include one or more of the following components: a processing component 602, a memory 604, a power component 606, a multimedia component 608, an audio component 610, an input / output (I / O) interface 612, a sensor component 614, And communication component 616.
  • the processing component 602 generally controls the overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 602 may include one or more processors 620 to execute instructions to complete all or part of the steps of the method described above.
  • the processing component 602 may include one or more modules to facilitate the interaction between the processing component 602 and other components.
  • the processing component 602 may include a multimedia module to facilitate the interaction between the multimedia component 608 and the processing component 602.
  • the memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on the device 600, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 604 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), Programming read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM Programming read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power component 606 provides power to various components of the device 600.
  • the power component 606 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 600.
  • the multimedia component 608 includes a screen that provides an output interface between the device 600 and a user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. A touch sensor can not only sense the boundaries of a touch or slide action, but also detect the duration and pressure associated with a touch or slide operation.
  • the multimedia component 608 includes a front camera and / or a rear camera. When the device 600 is in an operation mode, such as a shooting mode or a video mode, the front camera and / or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 610 is configured to output and / or input audio signals.
  • the audio component 610 includes a microphone (MIC) that is configured to receive an external audio signal when the device 600 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in the memory 604 or transmitted via the communication component 616.
  • the audio component 610 further includes a speaker for outputting audio signals.
  • the I / O interface 612 provides an interface between the processing component 602 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, or the like. These buttons can include, but are not limited to: a home button, a volume button, a start button, and a lock button.
  • the sensor component 614 includes one or more sensors for providing status assessment of various aspects of the device 600.
  • the sensor component 614 can detect the opening / closing state of the device 600 and the relative positioning of the components, such as the display and keypad of the device 600.
  • the sensor component 614 can also detect the change in the position of the device 600 or a component of the device 600. The presence or absence of contact with the device 600, the orientation or acceleration / deceleration of the device 600, and the temperature change of the device 600.
  • the sensor component 614 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor component 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 614 may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 616 is configured to facilitate wired or wireless communication between the device 600 and other devices.
  • the device 600 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication section 616 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component 616 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra wideband
  • Bluetooth Bluetooth
  • the apparatus 600 may be implemented by one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation is used to perform the above method.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor, or other electronic component implementation is used to perform the above method.
  • a non-transitory computer-readable storage medium including instructions such as a memory 604 including instructions, may be provided, which may be executed by the processor 620 of the device 600 to complete the above method.
  • the non-transitory computer-readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
  • Fig. 5 is a block diagram of a device 1900 for image tag recognition according to an exemplary embodiment.
  • the device 1900 may be provided as a server. 5
  • the device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932, for storing instructions executable by the processing component 1922, such as an application program.
  • the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above method, and specifically includes:
  • a label routing map is constructed; a batch of sample images is selected from the pre-labeled sample images; the label routing map is used to determine the best of each sample image in the batch of sample images. Similar sample images and the most difficult sample images; among them, the sample image, the most similar sample image of the sample image, and the most difficult sample image of the sample image form an image pair; construct a target loss function based on each image pair, and train based on the target loss function.
  • label recognition is performed on the image to be identified through the second image classification model.
  • the step of constructing a label routing map based on a pre-labeled sample image and a pre-trained first image classification model includes:
  • label prediction is performed on each labeled sample image in advance to obtain target labels corresponding to each sample image; wherein each sample image corresponds to a preset number of target labels; the pre-labeled samples are Images are grouped according to labels; where each label corresponds to a group; for each label, the number of labels in the target label is determined; for each group, the quotient of the number and the number of sample images in the group is determined as The routing ratio of the label to the corresponding label of the packet; according to the routing ratio of each label, draw a label routing map.
  • using a pre-trained first image classification model to perform label prediction on each labeled sample image in advance to obtain a target label corresponding to each sample image includes: classifying the image through the pre-trained first image The model performs label prediction on each sample image labeled in advance to obtain a prediction vector for each sample image.
  • the prediction vector contains multiple points, each point corresponding to a label and a probability value. For each prediction vector, the prediction is performed. The probability values of the points in the vector are sorted from large to small; the labels corresponding to the preset number of probability values are determined as the target labels of the sample image corresponding to the prediction vector.
  • the step of determining the closest sample image and the most difficult sample image of each sample image in the batch sample image by using a label routing map includes: for each sample image in the batch sample image, determining a sample image to which the sample image belongs. A first label; determining a second label with the smallest routing ratio to the first label, randomly extracting a sample image from the corresponding group of the second label as the closest sample image of the sample image; determining the largest routing ratio with the first label The third label randomly extracts a sample image from the corresponding group of the third label as the most difficult sample image of the sample image.
  • the weighted sum of the image pair loss average calculation function and the preset classification loss function is the target loss function.
  • the device 1900 may further include a power supply component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input / output (I / O) interface 1958.
  • the device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OSXTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
  • the present disclosure also provides a computer-readable storage medium on which an image label recognition program is stored.
  • the image label recognition program is executed by a processor, the image label recognition method according to any one of the foregoing is implemented. A step of.
  • the present disclosure also provides a computer program product.
  • the computer program product includes a computer program.
  • the computer program includes program instructions and is stored on a computer-readable storage medium. When the program instructions are executed by a processor, any of the foregoing. The steps of the image tag recognition method described in the item.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种图像标签识别方法、装置和服务器,其中所述方法包括:基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图(101);从预先标记的样本图像中选出多个样本图像(102);通过标签路由图,确定多个样本图像中每个样本图像的最相近样本图像和最困难样本图像(103);其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;依据图像对构建目标损失函数,并依据目标损失函数进行训练,以得到第二图像分类模型(104);以及通过所述第二图像分类模型,对待识别图像进行标签识别(105)。通过上述图像标签识别方法,能够将标签分类更加细化,提高目标分类模型的标签识别准确度。

Description

图像标签识别方法、装置及服务器
本申请要求了2018年6月29日提交的、申请号为201810712097.7、发明名称为“图像标签识别方法、装置及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像标签识别方法、装置及服务器。
背景技术
深度学习在视频图像、语音识别、自然语言处理等相关领域得到了广泛应用。卷积神经网络作为深度学习的一个重要分支,由于其超强的拟合能力以及端到端的全局优化能力,使得其在目标检测、分类等计算机视觉任务中所得预测结果的精度大幅提升。视频图像等多媒体数据在卷积神经网络中逐层传播时的中间结果也被从模型中剥离出来,作为描述输入数据的特征。这些特征同样被广泛应用在相似人脸检测、视频图像检索等领域。
虽然卷积神经网络的中间结果可以被抽离出来作为特征直接应用于相似人脸检测等领域,但是直接从卷积神经网络网络中获取的特征存在如下几个缺点:
缺点一、提取的特征粒度较粗,即特征可产生区分效果,但区分效果差;缺点二、该种特征提取方法会在同批样本内选取最困难的样本作为损失参与计算,该种特征提取方法所提取的特征训练的图像分类模型时,模型收敛速度慢。上述两个缺点最终将导致图像分类模型标签识别准确度低、训练难度大的问题。
发明内容
为克服相关技术中存在的问题,本公开提供了一种图像标签识别方法、装置及服务器。
根据本公开实施例的第一方面,提供一种图像标签识别方法,包括:基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;从所述预先标记的样本图像中选出多个样本图像;通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;以及通过所述第二图像分类模型,对待识别图像进行标签识别。
根据本公开实施例的第二方面,提供一种图像标签识别装置,包括:构建模块,被配置为基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;选择模块,被配置为从所述预先标记的样本图像中选出多个样本图像;确定模块,被配置为通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;训练模块,被配置为依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;识别模块,被配置为通过所述第二图像分类模型,对待识别图像进行标签识别。
根据本公开实施例的第三方面,提供一种图像标签识别装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;从所述预先标记的样本图像中选出多个样本图像;通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;以及通过所述第二图像分类模型,对待识别图像进行标签识别。
根据本公开实施例的第四方面,提供一种服务器,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图像标签识别程序,所述图像标签识别程序被所述处理器执行时实现本公开中所述的任意一种图像标签识别方法的步骤。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有图像标签识别程序,所述图像标签识别程序被处理器执行时实现上述任一项所述的图像标签识别方法的步骤。
根据本公开实施例的第五方面,还提供了一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序包括程序指令并被存储在计算机可读存储介质上,所述程序指令被处理器执行时上述任一项所述的图像标签识别方法的步骤。
本公开的实施例提供的技术方案可以包括以下有益效果:
本公开实施例提供的图像标签识别方案,基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;通过标签路由图,确定各样本图像的最相近样本图像和最困难样本图像构成图像对,依据各图像对构建目标损失函数,并依据目标损失函数训练第二图像分类模型,该种训练目标分类模型的方法模型收敛速度快,标签分类更加细化目标分类模型的标签识别准确度高。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公 开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是根据一示例性实施例示出的一种图像标签识别方法的步骤流程图;
图2是根据一示例性实施例示出的一种图像标签识别方法的步骤流程图;
图3是根据一示例性实施例示出的一种图像标签识别装置的框图;
图4是根据根据一示例性实施例示出的一种图像标签识别装置的框图;
图5是根据一示例性实施例示出的一种服务器的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
图1是根据一示例性实施例示出的一种图像标签识别方法的流程图,如图1所示图像标识别方法用于终端中,包括以下步骤:
步骤101:基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图。
第一图像分类模型可参照现有方式训练完成,本公开实施例中对第一图像分类模型的具体训练方式不做具体限定。标签路由图中包含多个标签以及各标签到其他标签的路由比例。
在构建标签路由图时,可基于第一图像分类模型对预先标记的各样本图像进行标签预测,依据各样本图像的对应的目标标签;然后确定标签间的路由比例,最终基于标签间的路由比例绘制标签路由图。
步骤102:从预先标记的样本图像中选择多个样本图像。
样本图像的具体个数可由本领域技术人员根据实际需求进行设置,本公开实施例中对此不做具体限制。
步骤103:通过标签路由图,确定该多个样本图像中每个样本图像的最相近样本图像和最困难样本图像。
其中,样本图像、样本图像的最相近样本图像以及样本图像的最困难样本图像构成图像对。
步骤104:依据图像对构建目标损失函数,并依据目标损失函数进行训练,以得到第二图像分类模型。
通过各图像对中样本图像、最相近样本图像以及最困难样本图像间的标签路由比例,可构建出图像对损失平均值计算函数,图像对损失平均值计算函数与预设的分类损失函数加权之和,则为所构建的目标损失函数。
对于图像对损失平均值计算函数与预设的分类损失函数所占权重,可由本领域技术人员根据实际需求进行设置。
对第二图像分类模型的训练实质上是对模型参数的不断更新,直至第二图像分类模型收敛到预设标准后即可进行图像标签预测。平均损失值小于预设损失值则可确定第一图像分类模型收敛到预设标准。预设损失值可以由本领域技术人员根据实际需求进行设置,预设损失值越小,则训练完成后的第二图像分类模型的收敛性越好;预设损失值越大,则第二图像分类模型的训练越容易。
步骤105:通过第二图像分类模型,对待识别图像进行标签识别。
待识别图像可以为视频中的单帧图像,也可以仅为一个多媒体图像。待识别图像输入到第二图像分类模型中,经过模型预测后可输出标签识别结果。
本示例性实施例示出的图像标签识别方法,基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;通过标签路由图,确定各样本图像的最相近样本图像和最困难样本图像构成图像对,依据各图像对构建目标损失函数,并依据目标损失函数训练第二图像分类模型,该种训练目标分类模型的方法模型收敛速度快,标签分类更加细化目标分类模型的标签识别准确度高。
图2是根据一示例性实施例示出的一种图像标签识别方法的流程图,如图2所示图像标识别方法用于终端中,包括以下步骤。
步骤201:基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图。
一种优选的构建标签路由图的方式如下:
首先,通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像对应的目标标签。
其中,每个样本图像对应预设数量的目标标签;预设数量可由本领域技术人员根据实际需求进行设置,例如:预设数量为2、3或4等。
在具体实现过程中,可通过如下方式确定样本图像对应的目标标签:通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像的预测向量;其中,预 测向量中包含多个点,每个点对应一个标签和一个概率值;针对每个预测向量,将预测向量中各点的概率值由大到小进行排序;将排序在前的预设数量的概率值对应的标签,确定为预测向量对应的样本图像的目标标签。
其次,将预先标记的样本图像按照标签进行分组。
标签为预先设定的各标签,每个样本图像被预先标记有标签。每个标签对应一个分组,因此一个标签对应的分组下包含一个或多个预先标记的样本图像。
再次,针对每个标签,确定目标标签中该标签的个数;针对每个分组,将个数与该分组中样本图像的个数之商,确定为标签到该分组对应的标签的路由比例。
由于预先标记的样本图像为多个,每个预先标记的样本图像对应预设数量的目标标签,每个样本图像对应的目标标签中可能存在相同标签,因此针对每个标签,目标标签中该标签的个数可能为多个,故可统计目标标签中该标签的第一个数。
每个分组中包含至少一个样本图像,因此可统计该分组中包含的样本图像的第二个数,最终计算第一个数与第二个数的商,将商值确定为标签到该分组对应的标签的路由比例。
Figure PCTCN2018123959-appb-000001
其中,rj为路由比例,n为标签个数,i为样本图像标识,j为标签标识。
重复执行本步骤,可确定每个标签到各分组对应的标签的路由比例,也即可确定标签间的路由比例。
最后,依据标签间的路由比例,绘制标签路由图。
步骤202:从预先标记的样本图像中选出多个样本图像。
样本图像的具体个数可由本领域技术人员根据实际需求进行设置,本公开实施例中对此不做具体限制。
步骤203:针对该多个样本图像中的每个样本图像,确定该样本图像所属的第一标签。
每个样本图像所属一个分组,每个分组对应一个标签,则样本图像所属分组对应的标签则为样本图像所属的第一标签。
步骤204:确定与第一标签的路由比例最小的第二标签,从第二标签对应分组中随机提取一个样本图像,作为该样本图像的最相近样本图像。
例如:第二标签对应分组中包含10个样本图像,则从这10个样本图像中随机提取一个样本图像,则可作为该样本图像的最相近样本图像。
步骤205:确定与第一标签的路由比例最大的第三标签,从第三标签对应分组中随机提取一个样本图像,作为样本图像的最困难样本图像。
其中,样本图像、样本图像的最相近样本图像以及样本图像的最困难样本图像构成图像对。步骤203至步骤205为确定一个样本图像的最困难样本图像和最相近样本图像,将三者组成一个图像对。在具体实现过程中,可重复执行上述流程确定各样本图像对应的图像对。
步骤206:依据图像对构建目标损失函数,并依据目标损失函数进行训练,以得到第二图像分类模型。
通过各图像对中样本图像、最相近样本图像以及最困难样本图像间的标签路由比例,可构建出图像对损失平均值计算函数。其中,图像对损失平均值计算函数为:
tripletloss=dis(x a,x p)-dis(x a,x n)+α
其中dis()为距离测度函数即标签间路由比例测度函数,xa,xp,xn分别为样本图像、最相近样本图像以及最困难样本图像,α为最小距离。
图像对损失平均值计算函数与预设的分类损失函数加权之和,则为所构建的目标损失函数,目标损失函数可通过如下公式表示:
loss=λ tripletloss tripletclfloss clf
loss表示目标损失函数,tripletloss表示图像对损失平均值计算函数,λ clfloss clfloss clf为预设的分类损失函数,λ triplet为tripletloss的权重,λ clf为loss clf的权重。
步骤207:通过第二图像分类模型,对待识别图像进行标签识别。
待识别图像可以为视频中的单帧图像,也可以仅为一个多媒体图像。待识别图像输入到第二图像分类模型中,经过模型预测后可输出标签识别结果。
本示例性实施例示出的图像标签识别方法,基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;通过标签路由图,确定每个样本图像的最相近样本图像和最困难样本图像构成图像对,依据图像对构建目标损失函数,并依据目标损失函数训练第二图像分类模型,该种训练目标分类模型的方法模型收敛速度快,标签分类更加细化目标分类模型的标签识别准确度高。
图3是根据一示例性实施例示出的一种图像标签识别装置的框图,参照图3该装置包括构建模块301、选择模块302、确定模块303、训练模块304以及识别模块305。
构建模块301,被配置为基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;选择模块302,被配置为从预先标记的样本图像中选择多个样本图像;确定模块303,被配置为通过标签路由图,确定多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,样本图像、样本图像的最相近样本图像以及样本图像的最困难样本图像构成图像对;训练模块304,被配置为依据图像对构建目标损失函数,并依据目标损失函数进行训练,得到第 二图像分类模型;识别模块305,被配置为通过第二图像分类模型,对待识别图像进行标签识别。
在一些实施例中,构建模块301可以包括:标签预测子模块3011,被配置为通过预先训练好的第一图像分类模型,对预先标记的样本图像进行标签预测,得到每个样本图像对应的目标标签;其中,每个样本图像对应预设数量的目标标签;分组子模块3012,被配置为将预先标记的样本图像按照标签进行分组;其中,每个标签对应一个分组;确定子模块3013,被配置为针对每个标签,确定目标标签中标签的个数;路由比例确定子模块3014,被配置为针对每个分组,将每个标签的个数与分组中样本图像的个数之商,确定为该标签到分组对应的标签的路由比例;绘制子模块3015,被配置为依据标签间的路由比例,绘制标签路由图。
在一些实施例中,标签预测子模块可以包括:向量预测单元,被配置为通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像的预测向量;其中,预测向量中包含多个点,每个点对应一个标签和一个概率值;排序单元,被配置为针对每个预测向量,将预测向量中各点的概率值由大到小进行排序;目标标签确定单元,被配置为将排序在前的预设数量的概率值对应的标签,确定为预测向量对应的样本图像的目标标签。
在一些实施例中,确定模块303可以包括:标签确定子模块3031,被配置为针对批样本图像中的每个样本图像,确定样本图像所属的第一标签;第一提取子模块3032,被配置为确定与第一标签的路由比例最小的第二标签,从第二标签对应分组中随机提取一个样本图像,作为样本图像的最相近样本图像;第二提取子模块3033,被配置为确定与第一标签的路由比例最大的第三标签,从第三标签对应分组中随机提取一个样本图像,作为样本图像的最困难样本图像。
在一些实施例中,图像对损失平均值计算函数与预设的分类损失函数加权之和,为目标损失函数。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图5是根据一示例性实施例示出的一种用于图像标签识别终端600的框图。例如,装置600可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图4,装置600可以包括以下一个或多个组件:处理组件602,存储器604,电源组件606,多媒体组件608,音频组件610,输入/输出(I/O)的接口612,传感器组件614,以及通信组件616。
处理组件602通常控制装置600的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件602可以包括一个或多个处理器620来执行指令,以完成上 述的方法的全部或部分步骤。此外,处理组件602可以包括一个或多个模块,便于处理组件602和其他组件之间的交互。例如,处理部件602可以包括多媒体模块,以方便多媒体组件608和处理组件602之间的交互。
存储器604被配置为存储各种类型的数据以支持在装置600的操作。这些数据的示例包括用于在装置600上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器604可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件606为装置600的各种组件提供电力。电源组件606可以包括电源管理系统,一个或多个电源,及其他与为装置600生成、管理和分配电力相关联的组件。
多媒体组件608包括在装置600和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件608包括一个前置摄像头和/或后置摄像头。当装置600处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件610被配置为输出和/或输入音频信号。例如,音频组件610包括一个麦克风(MIC),当装置600处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器604或经由通信组件616发送。在一些实施例中,音频组件610还包括一个扬声器,用于输出音频信号。
I/O接口612为处理组件602和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件614包括一个或多个传感器,用于为装置600提供各个方面的状态评估。例如,传感器组件614可以检测到装置600的打开/关闭状态,组件的相对定位,例如组件为装置600的显示器和小键盘,传感器组件614还可以检测装置600或装置600一个组件的位置改变,用户与装置600接触的存在或不存在,装置600方位或加速/减速和装置600的温度变化。传感器组件614可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件614还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例 中,该传感器组件614还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件616被配置为便于装置600和其他设备之间有线或无线方式的通信。装置600可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信部件616经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信部件616还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置600可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器604,上述指令可由装置600的处理器620执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
图5是根据一示例性实施例示出的一种用于进行图像标签识别的装置1900的框图。例如,装置1900可以被提供为一服务器。参照图5,装置1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法,具体包括:
基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;从预先标记的样本图像中选择一批样本图像;通过标签路由图,确定批样本图像中各样本图像的最相近样本图像和最困难样本图像;其中,样本图像、样本图像的最相近样本图像以及样本图像的最困难样本图像构成图像对;依据各图像对构建目标损失函数,并依据目标损失函数进行训练,以得到第二图像分类模型;通过第二图像分类模型,对待识别图像进行标签识别。
在一些实施例中,基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图的步骤,包括:
通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像对应的目标标签;其中,每个样本图像对应预设数量各目标标签;将预先标记的样本图像按照标签进行分组;其中,每个标签对应一个分组;针对每个标签,确定目标标签中标签的个数;针对每个分组,将个数与分组中样本图像的个数之商,确定为标签到分组对应的标签的路由比例; 依据各标签间的路由比例,绘制标签路由图。
在一些实施例中,通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像对应的目标标签的步骤,包括:通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到各样本图像的预测向量;其中,预测向量中包含多个点,每个点对应一个标签和一个概率值;针对每个预测向量,将预测向量中各点的概率值由大到小进行排序;将排序在前的预设数量的概率值对应的标签,确定为预测向量对应的样本图像的目标标签。
在一些实施例中,通过标签路由图,确定批样本图像中各样本图像的最相近样本图像和最困难样本图像的步骤,包括:针对批样本图像中的每个样本图像,确定样本图像所属的第一标签;确定与第一标签的路由比例最小的第二标签,从第二标签对应分组中随机提取一个样本图像,作为样本图像的最相近样本图像;确定与第一标签的路由比例最大的第三标签,从第三标签对应分组中随机提取一个样本图像,作为样本图像的最困难样本图像。
在一些实施例中,图像对损失平均值计算函数与预设的分类损失函数加权之和,为目标损失函数。
装置1900还可以包括一个电源组件1926被配置为执行装置1900的电源管理,一个有线或无线网络接口1950被配置为将装置1900连接到网络,和一个输入输出(I/O)接口1958。装置1900可以操作基于存储在存储器1932的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。
本公开同时提供一种计算机可读存储介质,所述计算机可读存储介质上存储有图像标签识别程序,所述图像标签识别程序被处理器执行时实现上述任一项所述的图像标签识别方法的步骤。
本公开还提供了一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序包括程序指令并被存储在计算机可读存储介质上,所述程序指令被处理器执行时上述任一项所述的图像标签识别方法的步骤。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (14)

  1. 一种图像标签识别方法,所述方法包括:
    基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;
    从所述预先标记的样本图像中选出多个样本图像;
    通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其最相近样本图像以及其最困难样本图像构成图像对;
    依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;以及
    通过所述第二图像分类模型,对待识别图像进行标签识别。
  2. 根据权利要求1所述的方法,其中,所述基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图的步骤,包括:
    通过预先训练好的所述第一图像分类模型,对预先标记的样本图像进行标签预测,得到每个样本图像对应的目标标签;其中,每个样本图像对应预设数量的目标标签;
    将所述预先标记的样本图像按照标签进行分组;其中,每个标签对应一个分组;
    统计所述目标标签中每个标签的个数;
    针对每个分组,将每个标签的个数与所述分组中样本图像的个数之商,确定为所述标签到所述分组对应的标签的路由比例;
    依据标签间的路由比例,绘制标签路由图。
  3. 根据权利要求2所述的方法,其中,所述通过预先训练好的所述第一图像分类模型,对预先标记的样本图像进行标签预测,得到每个样本图像对应的目标标签的步骤,包括:
    所述通过预先训练好的第一图像分类模型,对预先标记的样本图像进行标签预测,得到每个样本图像的预测向量;其中,预测向量中包含多个点,每个点对应一个标签和一个概率值;
    针对每个预测向量,将所述预测向量中包含的多个点的概率值由大到小进行排序;
    将排序在前的预设数量的概率值对应的标签,确定为所述预测向量对应的样本图像的目标标签。
  4. 根据权利要求1所述的方法,其中,通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像的步骤,包括:
    针对所述多个样本图像中的每个样本图像,确定所述样本图像所属的第一标签;
    确定与所述第一标签的路由比例最小的第二标签,从所述第二标签对应分组中随机提取一 个样本图像,作为所述样本图像的最相近样本图像;
    确定与所述第一标签的路由比例最大的第三标签,从所述第三标签对应分组中随机提取一个样本图像,作为所述样本图像的最困难样本图像。
  5. 根据权利要求1所述的方法,其中,图像对损失平均值计算函数与预设的分类损失函数加权之和,为所述目标损失函数。
  6. 一种图像标签识别装置,其中,所述装置包括:
    构建模块,被配置为基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;
    选择模块,被配置为从所述预先标记的样本图像中选出多个样本图像;
    确定模块,被配置为通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;
    训练模块,被配置为依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;
    识别模块,被配置为通过所述第二图像分类模型,对待识别图像进行标签识别。
  7. 根据权利要求6所述的装置,其中,所述构建模块包括:
    标签预测子模块,被配置为通过预先训练好的所述第一图像分类模型,对预先标记的样本图像进行标签预测,得到每个样本图像对应的目标标签;其中,每个样本图像对应预设数量的目标标签;
    分组子模块,被配置为将所述预先标记的样本图像按照标签进行分组;其中,每个标签对应一个分组;
    确定子模块,被配置为统计所述目标标签中每个标签的个数;
    路由比例确定子模块,被配置为针对每个分组,将每个标签的个数与所述分组中样本图像的个数之商,确定为所述标签到所述分组对应的标签的路由比例;
    绘制子模块,被配置为依据标签间的路由比例,绘制标签路由图。
  8. 根据权利要求7所述的装置,其中,所述标签预测子模块包括:
    向量预测单元,被配置为所述通过预先训练好的第一图像分类模型,对预先标记的各样本图像进行标签预测,得到每个样本图像的预测向量;其中,预测向量中包含多个点,每个点对应一个标签和一个概率值;
    排序单元,被配置为针对每个预测向量,将所述预测向量中包含的多个点的概率值由大到 小进行排序;
    目标标签确定单元,被配置为将排序在前的预设数量的概率值对应的标签,确定为所述预测向量对应的样本图像的目标标签。
  9. 根据权利要求6所述的装置,其中,所述确定模块包括:
    标签确定子模块,被配置为针对所述多个样本图像中的每个样本图像,确定其所属的第一标签;
    第一提取子模块,被配置为确定与所述第一标签的路由比例最小的第二标签,从所述第二标签对应分组中随机提取一个样本图像,作为所述样本图像的最相近样本图像;
    第二提取子模块,被配置为确定与所述第一标签的路由比例最大的第三标签,从所述第三标签对应分组中随机提取一个样本图像,作为所述样本图像的最困难样本图像。
  10. 根据权利要求6所述的装置,其中:
    图像对损失平均值计算函数与预设的分类损失函数加权之和,为所述目标损失函数。
  11. 一种图像标签识别装置,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为基于预先标记的样本图像和预先训练好的第一图像分类模型,构建标签路由图;
    从所述预先标记的样本图像中选出多个样本图像;
    通过所述标签路由图,确定所述多个样本图像中每个样本图像的最相近样本图像和最困难样本图像;其中,每个样本图像和其对应的最相近样本图像以及其对应的最困难样本图像构成图像对;
    依据所述图像对构建目标损失函数,并依据所述目标损失函数进行训练,以得到第二图像分类模型;以及
    通过所述第二图像分类模型,对待识别图像进行标签识别。
  12. 一种服务器,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图像标签识别程序,所述图像标签识别程序被所述处理器执行时实现如权利要求1至5中任一项所述的图像标签识别方法的步骤。
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储有图像标签识别程序,所述图像标签识别程序被处理器执行时实现如权利要求1至5中任一项所述的图像标签识别方法的步骤。
  14. 一种计算机程序产品,所述计算机程序产品包括计算机程序,所述计算机程序包括程序指令并被存储在计算机可读存储介质上,所述程序指令被处理器执行时实现如权利要求1至5中任一项所述的图像标签识别方法的步骤。
PCT/CN2018/123959 2018-06-29 2018-12-26 图像标签识别方法、装置及服务器 WO2020000961A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/137,282 US20210117726A1 (en) 2018-06-29 2020-12-29 Method for training image classifying model, server and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810712097.7 2018-06-29
CN201810712097.7A CN109117862B (zh) 2018-06-29 2018-06-29 图像标签识别方法、装置及服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/137,282 Continuation US20210117726A1 (en) 2018-06-29 2020-12-29 Method for training image classifying model, server and storage medium

Publications (1)

Publication Number Publication Date
WO2020000961A1 true WO2020000961A1 (zh) 2020-01-02

Family

ID=64822539

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/123959 WO2020000961A1 (zh) 2018-06-29 2018-12-26 图像标签识别方法、装置及服务器

Country Status (3)

Country Link
US (1) US20210117726A1 (zh)
CN (1) CN109117862B (zh)
WO (1) WO2020000961A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460150A (zh) * 2020-03-27 2020-07-28 北京松果电子有限公司 一种分类模型的训练方法、分类方法、装置及存储介质
CN111858999A (zh) * 2020-06-24 2020-10-30 北京邮电大学 一种基于分段困难样本生成的检索方法及装置
CN112966754A (zh) * 2021-03-10 2021-06-15 中国平安人寿保险股份有限公司 样本筛选方法、样本筛选装置及终端设备
CN113705716A (zh) * 2021-09-03 2021-11-26 北京百度网讯科技有限公司 图像识别模型训练方法、设备、云控平台及自动驾驶车辆
CN115512116A (zh) * 2022-11-01 2022-12-23 北京安德医智科技有限公司 图像分割模型优化方法、装置、电子设备及可读存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3834135A4 (en) * 2018-08-07 2022-05-04 BlinkAI Technologies, Inc. IMAGE ENHANCEMENT ARTIFICIAL INTELLIGENCE TECHNIQUES
EP3608701A1 (de) * 2018-08-09 2020-02-12 Olympus Soft Imaging Solutions GmbH Verfahren zur bereitstellung wenigstens einer auswertemethode für proben
CN110059724A (zh) * 2019-03-20 2019-07-26 东软睿驰汽车技术(沈阳)有限公司 一种视觉图片样本的获取方法及装置
CN109948577B (zh) * 2019-03-27 2020-08-04 无锡雪浪数制科技有限公司 一种布料识别方法、装置及存储介质
CN110442722B (zh) * 2019-08-13 2022-05-13 北京金山数字娱乐科技有限公司 分类模型训练的方法及装置、数据分类的方法及装置
CN110738267B (zh) * 2019-10-18 2023-08-22 北京达佳互联信息技术有限公司 图像分类方法、装置、电子设备及存储介质
CN110827247B (zh) * 2019-10-28 2024-03-15 上海万物新生环保科技集团有限公司 一种识别标签的方法及设备
CN111414921B (zh) * 2020-03-25 2024-03-15 抖音视界有限公司 样本图像处理方法、装置、电子设备及计算机存储介质
CN113221875B (zh) * 2021-07-08 2021-09-21 北京文安智能技术股份有限公司 基于主动学习的目标检测模型训练方法
CN114495228A (zh) * 2022-01-26 2022-05-13 北京百度网讯科技有限公司 人脸检测器的训练方法及装置、设备、介质和产品
CN114445811A (zh) * 2022-01-30 2022-05-06 北京百度网讯科技有限公司 一种图像处理方法、装置及电子设备
CN115359308B (zh) * 2022-04-06 2024-02-13 北京百度网讯科技有限公司 模型训练、难例识别方法、装置、设备、存储介质及程序
CN117036670B (zh) * 2022-10-20 2024-06-07 腾讯科技(深圳)有限公司 质量检测模型的训练方法、装置、设备、介质及程序产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809146A (zh) * 2016-03-28 2016-07-27 北京奇艺世纪科技有限公司 一种图像场景识别方法和装置
CN107087016A (zh) * 2017-03-06 2017-08-22 清华大学 基于视频监控网络的楼宇内移动物体的导航方法及系统
CN107688823A (zh) * 2017-07-20 2018-02-13 北京三快在线科技有限公司 一种图像特征获取方法及装置,电子设备
US9965717B2 (en) * 2015-11-13 2018-05-08 Adobe Systems Incorporated Learning image representation by distilling from multi-task networks

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024145B (zh) * 2010-12-01 2012-11-21 五邑大学 一种伪装人脸分层识别方法及系统
EP3371749A1 (en) * 2015-11-06 2018-09-12 Google LLC Regularizing machine learning models
EP3411828A4 (en) * 2016-02-01 2019-09-25 See-Out Pty Ltd. CLASSIFICATION AND LABELING OF IMAGES
CN105808709B (zh) * 2016-03-04 2019-10-29 智慧眼科技股份有限公司 人脸识别快速检索方法及装置
CN106372663B (zh) * 2016-08-30 2019-09-10 北京小米移动软件有限公司 构建分类模型的方法及装置
CN107563444A (zh) * 2017-09-05 2018-01-09 浙江大学 一种零样本图像分类方法及系统
CN107679507B (zh) * 2017-10-17 2019-12-24 北京大学第三医院 面部毛孔检测系统及方法
CN108171254A (zh) * 2017-11-22 2018-06-15 北京达佳互联信息技术有限公司 图像标签确定方法、装置及终端

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9965717B2 (en) * 2015-11-13 2018-05-08 Adobe Systems Incorporated Learning image representation by distilling from multi-task networks
CN105809146A (zh) * 2016-03-28 2016-07-27 北京奇艺世纪科技有限公司 一种图像场景识别方法和装置
CN107087016A (zh) * 2017-03-06 2017-08-22 清华大学 基于视频监控网络的楼宇内移动物体的导航方法及系统
CN107688823A (zh) * 2017-07-20 2018-02-13 北京三快在线科技有限公司 一种图像特征获取方法及装置,电子设备

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460150A (zh) * 2020-03-27 2020-07-28 北京松果电子有限公司 一种分类模型的训练方法、分类方法、装置及存储介质
CN111460150B (zh) * 2020-03-27 2023-11-10 北京小米松果电子有限公司 一种分类模型的训练方法、分类方法、装置及存储介质
CN111858999A (zh) * 2020-06-24 2020-10-30 北京邮电大学 一种基于分段困难样本生成的检索方法及装置
CN111858999B (zh) * 2020-06-24 2022-10-25 北京邮电大学 一种基于分段困难样本生成的检索方法及装置
CN112966754A (zh) * 2021-03-10 2021-06-15 中国平安人寿保险股份有限公司 样本筛选方法、样本筛选装置及终端设备
CN112966754B (zh) * 2021-03-10 2023-11-07 中国平安人寿保险股份有限公司 样本筛选方法、样本筛选装置及终端设备
CN113705716A (zh) * 2021-09-03 2021-11-26 北京百度网讯科技有限公司 图像识别模型训练方法、设备、云控平台及自动驾驶车辆
CN113705716B (zh) * 2021-09-03 2023-10-10 北京百度网讯科技有限公司 图像识别模型训练方法、设备、云控平台及自动驾驶车辆
CN115512116A (zh) * 2022-11-01 2022-12-23 北京安德医智科技有限公司 图像分割模型优化方法、装置、电子设备及可读存储介质
CN115512116B (zh) * 2022-11-01 2023-06-30 北京安德医智科技有限公司 图像分割模型优化方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
US20210117726A1 (en) 2021-04-22
CN109117862B (zh) 2019-06-21
CN109117862A (zh) 2019-01-01

Similar Documents

Publication Publication Date Title
WO2020000961A1 (zh) 图像标签识别方法、装置及服务器
CN109089133B (zh) 视频处理方法及装置、电子设备和存储介质
RU2649294C2 (ru) Способ и устройство для построения шаблона и способ и устройство для идентификации информации
US11520824B2 (en) Method for displaying information, electronic device and system
CN110009090B (zh) 神经网络训练与图像处理方法及装置
WO2019141042A1 (zh) 图像分类方法、装置及终端
WO2020107813A1 (zh) 图像的描述语句定位方法及装置、电子设备和存储介质
US20170193399A1 (en) Method and device for conducting classification model training
TW202113757A (zh) 目標對象匹配方法及目標對象匹配裝置、電子設備和電腦可讀儲存媒介
RU2664003C2 (ru) Способ и устройство для определения ассоциированного пользователя
CN110598504B (zh) 图像识别方法及装置、电子设备和存储介质
WO2021031645A1 (zh) 图像处理方法及装置、电子设备和存储介质
WO2020088126A1 (zh) 视频推荐方法、装置和计算机可读存储介质
WO2020078105A1 (zh) 姿势检测方法、装置、设备及存储介质
WO2021047069A1 (zh) 人脸识别方法和电子终端设备
CN111259967B (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
CN105335684B (zh) 人脸检测方法及装置
US11335348B2 (en) Input method, device, apparatus, and storage medium
CN110019676A (zh) 一种在查询信息中识别核心词的方法、装置和设备
CN111582383B (zh) 属性识别方法及装置、电子设备和存储介质
CN112148980B (zh) 基于用户点击的物品推荐方法、装置、设备和存储介质
CN111242303A (zh) 网络训练方法及装置、图像处理方法及装置
CN111523599B (zh) 目标检测方法及装置、电子设备和存储介质
US11546663B2 (en) Video recommendation method and apparatus
CN111210844A (zh) 语音情感识别模型的确定方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18924849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18924849

Country of ref document: EP

Kind code of ref document: A1